Arista Extensible API (EAPI) – Network Automation with JSON-RPC and Python Scripting

October 16, 2017
Arista EOS EAPI – Application Programmable Interface
by Pablo Narváez

NETWORK AUTOMATION

Network automation is usually associated with doing things more quickly which is true, but it’s not the only reason why we should adopt it.

Network administrators usually touch the CLI to make changes on the network. Things get messy when there’s more than one administrator in a multi-vendor environment: Chances for human error are increased when different admins try to make changes on the network using different CLI/tools at the same time.

Replacing manual changes with standardized configuration management tools for network automation help achieve more predictable behavior and minimize “the human error factor”.

Network automation is the use of IT controls to supervise and carry out every-day network management functions. These functions can range from basic network mapping and device discovery to network configuration management and the provisioning of virtual network resources.

Network automation is a powerful and flexible enabler to:

●   Efficiently automate repetitive manual operational tasks
●   Answer open questions and resolve nonfeasible tasks
●   Enable tailored solutions and architectures beyond standard features

Through a step-by-step approach and thanks to many open source examples made available, network automation is easy to adopt in your network today.

Just keep in mind:

“With network automation, the point is to start small, but think through what else you may need in the future.” – Network Programmability by Jason Edelman; Scott S. Lowe; Matt Oswalt

 ARISTA EAPI

Introduction

Arista EOS offers multiple programmable interfaces for applications. These interfaces can be leveraged by applications running on the switch, or external to EOS.

The Extensible API (eAPI) allows applications and scripts to have complete programmatic control over EOS, with a stable and easy to use syntax. It also provides access to all switch state.

Once the API is enabled, the switch accepts commands using Arista’s CLI syntax, and responds with machine-readable output and errors serialized in JSON, served over HTTP.

Configuring the Extensible API Interface

One of the benefits about working with Arista EOS eAPI is the ability to script with JSON-RPC.  A network administrator can get machine-friendly data from the switch using CLI syntax.

In this post, I will show you the use of eAPI with a simple example using Python.

First, we need to activate the eAPI in each switch. To enable it, we need to bring up the API virtual interface.

leaf01#conf ter
leaf01(config)#management api http-commands 
leaf01(config-mgmt-api-http-cmds)#no shutdown 
leaf01(config-mgmt-api-http-cmds)#

eAPI requires a username and password to be configured. This is a regular username setup on global configuration:

leaf01#conf ter
leaf01(config)#username admineapi secret arista

Default configuration for eAPI uses HTTPS on port 443. Both the port and the protocol can be changed.

leaf01#conf ter
leaf01 (config)#management api http-commands
leaf01(config-mgmt-api-http-cmds)#protocol ?
   http         Configure HTTP server options
   https        Configure HTTPS server options
   unix-socket  Configure Unix Domain Socket

leaf01(config-mgmt-api-http-cmds)#protocol http ?
   localhost  Server bound on localhost
   port       Specify the TCP port to serve on
   <cr>      

leaf01(config-mgmt-api-http-cmds)#protocol http port ?

  <1-65535>  TCP port

leaf01(config-mgmt-api-http-cmds)#protocol http port 8080
leaf01(config-mgmt-api-http-cmds)#

NOTE: When configuring a non-default http/https pot under “protocol”, that port needs to be manually added to an updated version of the switch´s control-plane access-list to permit remote access.

To verify that the eAPI is running use the following command:

leaf01#show management api http-commands
 Enabled:            Yes
 HTTPS server:       running, set to use port 443
 HTTP server:        shutdown, set to use port 80
 Local HTTP server:  shutdown, no authentication, set to use port 8080
 Unix Socket server: shutdown, no authentication
 VRF:                default
 Hits:               0
 Last hit:           never
 Bytes in:           0
 Bytes out:          0
 Requests:           0
 Commands:           0
 Duration:           0.000 seconds
 SSL Profile:        none
 QoS DSCP:           0

 URLs        
------------------------------------- 
Ethernet4   : https://172.16.0.2:443 
Ethernet5   : https://172.16.0.14:443  
Loopback0   : https://10.0.1.21:443    
Loopback1   : https://10.0.2.1:443     
Vlan11      : https://192.168.11.2:443 
Vlan4094    : https://172.16.254.1:443

In the output shown above notice the URLs, we are going to need them to access the switch eAPI through HTTP/HTTPS.

USING ARISTA EAPI

There are two methods of using the eAPI:

  • Web access
  • Programming

eAPI Web Access

The eAPI uses the lightweight, standardized protocol JSON-RPC 2.0 to communicate between your program (the client) and the switch (the server).

To explore the API, point your web browser to https://myswitch after enabling the API interface on the switch.

NOTE: “myswitch” refers to the IP address of the switch you want to configure. To select the appropriate IP address, choose one of the URLs displayed in the command output shown above.

This web-app lets you interactively explore the protocol, return values and model documentation.

eapi_web

The way it works is by sending a JSON-RPC request via an HTTP POST request to https://myswitch/command-api from the client, the request encapsulates a list of CLI commands it wishes to run and the switch replies with a JSON-RPC response containing the result of each CLI command that was executed. The commands in the request are run in order on the switch. After the switch has executed all commands, it exits back to unprivileged mode.  If any of the commands emit an error, no further commands from that request are executed, and the response from the switch will contain an error object containing the details of the error that occurred.

To test the eAPI via web browser, let’s try a common command like “show version”:

web_showversion

See the command response in the Response Viewer window.

You can try other commands available in CLI. Check the full list of CLIs supported commands and the corresponding output data entries definition in the top right corner in the “Command Documentation” tab.

Easy to use, right? While the web interface is useful for testing eAPI, it’s not really designed to be a day-to-day function. For a more robust, scalable and complete eAPI experience, the use of the Programming interface is recommended.

eAPI Programming Interface

When using the programming interface to communicate with the switches, we need to read the JSON formatted output. To do so, we are going to add JSON libraries to our environment. For this lab, we have a dedicated Ubuntu Linux server (client) to download the JSON/Python libraries.

NOTE: You don’t need to have an external PC to run the JSON/Python libraries, you can run scripts on the Arista switch itself since all the required JSON libraries are part of the base EOS build.

To enable JSON for use in Python, we need to download the libraries to the Linux server.

superadmin@server00-eapi:~$ sudo apt-get install python-pip
superadmin@server00-eapi:~$ sudo pip install jsonrpclib

This is all we need to communicate with the eAPI.

Now we need to create and run a python script to request some information to the switch. To do so, I will use a really simple example to retrieve the output of “show version”.

#!/usr/bin/python
 
from jsonrpclib import Server
 
switch = Server(http://admineapi:arista@192.168.11.2/command-api)
response = switch.runCmds(1, [“show version”])
 
print response

In order to create and run your own Python scripts the use of an IDE (Integrated Development Environment) is strongly recommended. An IDE is a software suite that consolidates the basic tools developers need to write and test software. Typically, an IDE contains a code editor, a compiler or interpreter (Python uses an interpreter) and a debugger that the developer accesses through a single graphical user interface (GUI). There are several IDEs available, please check the following link that contains a review of the most popular ones:

Python Integrated Development Environments

Let’s take a closer look at the script.

This line defines the target (switch). It is broken down as a URL with the following format:

<protocol>://<username>:<password>@<hostname or ip-address>/command-api

The “/command-api” must always be present when using eAPI.

You cannot abbreviate any CLI command and the number “1” in the command is the eAPI version which must always be 1.

Now let’s run the script.

superadmin@server00-eapi:~/scripting$ python hello.py 
[{u’memTotal’: 1893352, u’internalVersion’: u’4.17.5M-4414219.4175M’, u’serialNumber’: u”, u’systemMacAddress’: u’52:54:00:97:ea:40′, u’bootupTimestamp’: 1505842331.32, u’memFree’: 583364, u’version’: u’4.17.5M’, u’modelName’: u’vEOS’, u’isIntlVersion’: False, u’internalBuildId’: u’d02143c6-e42b-4fc3-99b6-97063bddb6b8′, u’hardwareRevision’: u”, u’architecture’: u’i386′}]

That may seem like gibberish at first glance, but it’s actually a JSON-formatted set of key-value pairs.

This is the same output, but spaced apart to line it up into more human readable format:

[{
 u'memTotal': 1893352, 
 u'internalVersion': u'4.17.5M-4414219.4175M', 
 u'serialNumber': u'', 
 u'systemMacAddress': u'52:54:00:97:ea:40', 
 u'bootupTimestamp': 1505842331.32, 
 u'memFree': 583364, 
 u'version': u'4.17.5M', 
 u'modelName': u'vEOS',
 u'isIntlVersion': False, 
 u'internalBuildId': u'd02143c6-e42b-4fc3-99b6-97063bddb6b8', 
 u'hardwareRevision': u'', 
 u'architecture': u'i386'
 }]

Now that we have the key-value pairs, we can reference them to pull out the desired information… this is where the magic happens.

Basically, we have bulk data, so we need an automated way to retrieve the information.

To so, change the script to extract just the value-pair that you need. The format is:

Response[0][“key-name”]

In the next example, I will request the system MAC Address, the EOS version and the total physical memory; all other information will not be displayed.

superadmin@server00-eapi:~/scripting$ cat hello.py  
#!/usr/bin/python
 
from jsonrpclib import Server
 
switch = Server(http://admineapi:arista@192.168.11.2/command-api)
response = switch.runCmds(1, [“show version”])
 
print “The system MAC address is:”, response[0][“systemMacAddress”]
print “The system version is:”, response[0][“version”]
print “The total physical memory is:”, response[0][“memTotal”]

This is the result of running the script:

superadmin@server00-eapi:~/scripting$ python hello.py 
The system MAC address is: 52:54:00:97:ea:40
The system version is: 4.17.5M
The total physical memory is: 1893352

Just imagine how you could use this tool compared to the closed vendor-specific monitoring apps, the eAPI provides you with the desired information the way you want it when you want it… You can even create reports and verify compliance with some advanced scripting so this is the flexibility that a programmable operating system provides.

Complex functions require a more sophisticated script. One such example is device provisioning. For deployment automation, you can send multiple commands at once to configure the switch , please see the example below.

#!/usr/bin/python
 
from jsonrpclib import Server
switch = Server(http://admineapi:arista@192.168.11.2/command-api)

for x in range (10, 19):
      response  = switch.runCmds(1, [
           “enable”,
           “configure”,
           “interface ethenet2” + str(x),
           “description [GAD Eth-“ + str(x) + “]”],
           “json”)
print “Done.”

Some commands may require input. This can be accomplished by surrounding the command with curly braces and adding the “cmd” and “input” keywords using the following format:

#!/usr/bin/python
 
 from jsonrpclib import Server
 switch = Server(http://admineapi:arista@192.168.11.2/command-api)

response  = switch.runCmds(1, [
     {“cmd”: “enable”, “input”: “arista”},
      “configure”,
      “interface ethenet2”,
      “description I can code!”],
      “json”)

The Arista eAPI (and the API of any other programmable NOS for that matter) is a tremendously powerful tool that puts the very concept of Software Defined Networking within easy reach. The ability to issue CLI commands remotely through scripts is one of the major benefits of network automation and programmable infrastructure.

You can always check my github repository to download the configuration files.

 

Advertisements

BGP Interoperability between Free Range Routing (FRR) and Arista EOS

August 17, 2017
Free Range Routing
by Pablo Narváez

Today I will test BGP between the FRR routing stack and Arista EOS. The sample configuration that I will show later in this post is just a basic integration between the two devices, nothing at all complex. Basically, I just wanted to expand my virtual environment by adding a DC/routing perimeter while testing FRR.

For this lab, I will be using the same environment I already built in my previous post so I can easily integrate FRR into the existing topology.

FREE RANGE ROUTING OVERVIEW

FRR is a routing software package that provides TCP/IP based routing services with routing protocols support such as RIPv1, RIPv2, RIPng, OSPFv2, OSPFv3, IS-IS, BGP-4, and BGP-4+.

In addition to traditional IPv4 routing protocols, FRR also supports IPv6. Since the beginning, this project has been supported by Cumulus Networks, Big Switch Networks, 6WIND, Volta Networks and Linkedin, among others.

FRR has been forked from the Quagga open-source project. For those who are not familiar with Quagga, it’s an open-source implementation of a full routing stack for Linux; it’s mostly used for WRT custom firmware, some cloud implementations, and even for control plane functionality on some network operating systems (NOS) like Cumulus Linux.

NOTE: FRR replaces Quagga as the routing suite in Cumulus Linux 3.4.0.

Quagga still exists but has a completely different development process than FRR. You can learn more about Quagga here.

ROUTING STACK VS NETWORK OPERATING SYSTEM

Just to be clear what FRR is and what it’s not, a network operating system (NOS) is the totality from Layer-1 hardware all the way up to the control plane. FRR is a full implementation of the routing control plane so it needs a base operating system to run on top.

In this regard, FRR is not a NOS that can run directly on baremetal. Instead, it´s a modern implementation of the IPv4/IPv6 routing stack that provides control plane functionality as Linux daemons.

FRR SYSTEM ARCHITECTURE

FRR is made from a collection of several daemons that work together to build the routing table.

Zebra is responsible for changing the kernel routing table and for redistribution of routes between different routing protocols. In this model, it’s easy to add a new routing protocol daemon to the entire routing system without affecting any other software.

There is no need for the FRR daemons to run on the same machine. You can actually run several instances of the same protocol daemon on the same machine and keep them apart from the rest of the daemons.

frr-architecture
FRR Architecture

Currently FRR supports GNU/Linux and BSD. The list of officially supported platforms are listed below. Note that FRR may run correctly on other platforms, and may run with partial functionality on further platforms.

  • GNU/Linux
  • FreeBSD
  • NetBSD
  • OpenBSD

FRR DOWNLOAD

FRR is distributed under the GNU General Public License and is available for download from the official FRR website.

FRR INSTALLATION

There are three steps for installing the software: configuration, compilation, and installation.

I chose Ubuntu to deploy FRR but several Linux distros are supported. If you want to install it on Ubuntu, follow these instructions. You can check the FRR webpage for any other Linux/BSD distro, it’s been pretty well documented.

When configuring FRR, there are several options to customize the build to include or exclude specific features and dependencies. You can check all the options here.

Once installed, check the FRR daemons to make sure it’s running:

ps-ef-frr

If you installed FRR from source (link above), the FRR daemon (and all the routing daemons you specify during the installation) will run as a system service after the Linux kernel is booted. As you can see in the screen capture above, the routing processes (bgpd, ospfd, ldpd, etc.) are running as part of the main FRR service.

As with any other Linux system service, you can manage the frr service with systemctl.

$ systemctl start|stop|restart frr

To access FRR, each daemon has its own configuration file and terminal interface which can be a very annoying thing. To resolve this problem, FRR provides an integrated user interface shell called vtysh.

vtysh connects to each daemon with UNIX domain socket and then works as a proxy for user input so there’s no need to connect to each daemon separately.

To access vtysh from the host OS, just type in the following command:

superadmin@frr01:~$ vtysh

Hello, this is FRRouting (version 3.1-dev-MyOwnFRRVersion-g7e4f56d).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

This is a git build of frr-3.1-dev-320-g7e4f56d
Associated branch(es):
 local:master
 github/frrouting/frr.git/master

frr01#

INTERESTING FACTS ABOUT FRR

  • If you install FRR from source and follow the instructions provided above, there’s no need to modify or even touch any of the daemon configuration files (.conf) located in /etc/frr. When you log into FRR with vtysh a single configuration file is created for all the daemons. The single conf file will be stored as frr.conf in the host
  • Don’t expect to see Ethernet/WAN interfaces, FRR will show you the actual host network adapters: ens3, ens4, ens5, ens10 (depending on the Linux distro and your setup adapter names might change).
frr01#
frr01# conf ter
frr01(config)# interface ?
 IFNAME Interface's name
 ens3 ens4 ens5 ens10 lo
frr01(config)# interface ens3
frr01(config-if)#
  • As you may have noticed by now, if you know Cisco IOS or Arista EOS you are good to go! The FRR CLI is basically the same. You can check the list of CLI commands here.

BGP INTEROPERABILITY TESTING

As shown in the diagram below, I will use my existing network setup to connect the FRR routers to the Arista Spine switches.

frr-networking
Free Range Routing Network Setup

Each network device has a loopback interface which is being announced into BGP (10.0.1.21-23, 10.0.1.31-32). When we finish the configuration, we should be able to ping all these interfaces from the FRR routers.

The interfaces between the FRR routers and the Arista switches are configured as point-to-point Layer-3 links.

frr01# show running-config
Building configuration...

Current configuration:
!
frr version 3.1-dev
frr defaults traditional
hostname frr01
username root nopassword
!
service integrated-vtysh-config
!
log syslog informational
!
interface ens4
 description link_to_spine01-eth4
 ip address 172.16.0.25/30
!
interface ens5
 description link_to_spine02-eth4
 ip address 172.16.0.29/30
!
interface ens10
 description link_to_frr02-ens10
 ip address 172.16.254.5/30
!
interface lo
 description router-id
 ip address 10.0.1.1/32
!
frr02# show running-config
Building configuration...

Current configuration:
!
frr version 3.1-dev
frr defaults traditional
hostname frr02
username root nopassword
!
service integrated-vtysh-config
!
log syslog informational
!
interface ens4
 description link_to_spine01-eth5
 ip address 172.16.0.33/30
!
interface ens5
 description link_to_spine02-eth5
 ip address 172.16.0.37/30
!
interface ens10
 description link_to_frr02-ens10
 ip address 172.16.254.6/30
!
interface lo
 description router-id
 ip address 10.0.1.2/32
!

I will configure ASN 65000 for the FRR routers; frr01 and frr02 will have iBGP peer sessions with each other and eBGP peer sessions with the Arista switches.

frr01#
router bgp 65000
 bgp router-id 10.0.1.1
 distance bgp 20 200 200 
 neighbor ebgp-to-spine-peers peer-group
 neighbor ebgp-to-spine-peers remote-as 65020
 neighbor 172.16.0.26 peer-group ebgp-to-spine-peers
 neighbor 172.16.0.30 peer-group ebgp-to-spine-peers
 neighbor 172.16.254.6 remote-as 65000
 !
 address-family ipv4 unicast
 network 10.0.1.1/32
 exit-address-family
 vnc defaults
 response-lifetime 3600
 exit-vnc
frr02#
router bgp 65000
 bgp router-id 10.0.1.2
 distance bgp 20 200 200
 neighbor ebgp-to-spine-peers peer-group
 neighbor ebgp-to-spine-peers remote-as 65020
 neighbor 172.16.0.34 peer-group ebgp-to-spine-peers
 neighbor 172.16.0.38 peer-group ebgp-to-spine-peers
 neighbor 172.16.254.5 remote-as 65000
 !
 address-family ipv4 unicast
 network 10.0.1.2/32
 exit-address-family
 vnc defaults
 response-lifetime 3600
 exit-vnc

Since BGP was already configured in the Arista switches as part of my previous labs, I just added the eBGP sessions towards FRR.

spine01#
router bgp 65020
 router-id 10.0.1.11
 distance bgp 20 200 200
 maximum-paths 2 ecmp 64
 neighbor ebgp-to-frr-peers peer-group
 neighbor ebgp-to-frr-peers remote-as 65000
 neighbor ebgp-to-frr-peers maximum-routes 12000
 neighbor 172.16.0.2 remote-as 65021
 neighbor 172.16.0.2 maximum-routes 12000
 neighbor 172.16.0.6 remote-as 65021
 neighbor 172.16.0.6 maximum-routes 12000
 neighbor 172.16.0.10 remote-as 65022
 neighbor 172.16.0.10 maximum-routes 12000
 neighbor 172.16.0.25 peer-group ebgp-to-frr-peers
 neighbor 172.16.0.33 peer-group ebgp-to-frr-peers
 network 10.0.1.11/32
 redistribute connected
spine02#
router bgp 65020
 router-id 10.0.1.12
 distance bgp 20 200 200
 maximum-paths 2 ecmp 64
 neighbor ebgp-to-frr-peers peer-group
 neighbor ebgp-to-frr-peers remote-as 65000
 neighbor ebgp-to-frr-peers maximum-routes 12000
 neighbor 172.16.0.14 remote-as 65021
 neighbor 172.16.0.14 maximum-routes 12000
 neighbor 172.16.0.18 remote-as 65021
 neighbor 172.16.0.18 maximum-routes 12000
 neighbor 172.16.0.22 remote-as 65022
 neighbor 172.16.0.22 maximum-routes 12000
 neighbor 172.16.0.29 peer-group ebgp-to-frr-peers
 neighbor 172.16.0.37 peer-group ebgp-to-frr-peers
 network 10.0.1.12/32
 redistribute connected

NOTE: The “redistribute connected” command will redistribute all the directly connected interfaces into BGP for connectivity testing purposes. In production, link addresses are not typically advertised. This is because:

  • Link addresses take up valuable FIB resources. In a large CLOS (Leaf-Spine) environment, the number of such addresses can be quite large
  • Link addresses expose an additional attack vector for intruders to use to either break in or engage in DDOS attacks

We can verify the interoperability between FRR and Arista by checking the BGP neighbor adjacencies. The output of the “show ip bgp summary” command shows the BGP state as established, which indicates that the BGP peer relationship has been established successfully.

frr-bgp-summary

Finally, we check the routing table to make sure we can reach all the loopback interfaces from the FRR routers.

frr-ip-route

You can always check my github repository to download the configuration files.

 

Arista Layer-3 Leaf-Spine Fabric with VXLAN HER: Lab Part 4

August 2, 2017
Configuring and Testing VXLAN
by Pablo Narváez

This is the last article in the series, we will finish this lab with the VXLAN configuration and testing connectivity between servers.

VIRTUAL EXTENSIBLE LAN (VXLAN)

The VXLAN protocol is an RFC (7348). The standard defines a MAC in IP encapsulation protocol allowing the construction of Layer-2 domains across a Layer-3 IP infrastructure. The protocol is typically deployed as a data center technology to create overlay networks across a transparent Layer-3 infrastructure:

  • Providing Layer-2 connectivity between racks or PODs without requiring an underlying Layer-2 infrastructure
  • Logical connecting geographically disperse data centers at Layer-2 as a data center Interconnect (DCI) technology
  • VLAN supports up to 16 million virtual overlay tunnels over a physical Layer-2/3 underlay network for Layer-2 network connectivity and multi-tenancy
vxlan-fabric-netwokdiagram
Two-tier Layer-3 VXLAN Ethernet Fabric

VXLAN encapsulation/decapsulation is performed by a VXLAN Tunnel End Point (VTEP), this can be either:

  • A VXLAN-enabled hypervisor such as ESXi, KVM, or XEN (software VTEP)
  • A network switch (hardware VTEP)
Software & Hardware VTEPs
Software & Hardware VTEPs

To create VXLAN overlay virtual networks, IP connectivity is required between VTEPs.

VXLAN CONTROL PLANE

When VXLAN was released, the IETF defined the VXLAN standards (RFC 7348) with a multicast-based flood&learn mechanism that acts as a really complex control plane. It was evident the RFC was incomplete (to say the least) as flooding multicast-based VXLAN in the underlay represented several challenges in the data center including scalability and complexity.

To overcome these limitations, networking vendors started to introduce control plane technologies to replace the multicast-based flooding.

Depending on the vendor, you can have more than one option to deploy a VXLAN control plane solution. To simplify things, it’s a good idea to categorize these technologies:

  1. Network-centric vs Hypervisor-based
  2. Head End Replication vs Dynamic tunnel

Network-centric vs Hypervisor-based solutions

The VXLAN control plane process implies the creation of VXLAN tables that contain the VNI/VLAN mapping information, remote MAC Addresses available per VTEP, and the VTEP/Hypervisor IP Address to establish the VXLAN tunnels.

Some networking vendors have control plane solutions based on SDN that leave the control plane process to an external software layer called Controller. The SDN Controller is responsible for replicating, synchronizing and maintaining the VXLAN tables on the Hypervisors, among other tasks. In order for the Hypervisors to speak with the Controller, VXLAN agents are installed either as part of the host kernel or as a VM inside the Hypervisor on each compute node; the agents (called VTEPs) receive the VXLAN information from the Controller so they can encapsulate/decapsulate traffic based on the instructions contained in the tables.

The use of an SDN Controller as a VXLAN control plane solution is just one option. An alternative is to deploy the VXLAN control plan directly on the Ethernet Fabric. This network-centric solution requires the Ethernet Fabric to be VXLAN-capable meaning the data center switches have to support VXLAN. In the Hypervisor-based solution, the underlay is not aware of the overlay network so the switches do not need to support VXLAN.

NOTE: Since the VXLAN data/control planes are not standardized among vendors, you should expect to find some incompatibility in a multi-vendor network.

Head End Replication vs Dynamic Tunnels Setup

If you want to deploy the VXLAN control plane on the underlay, we need to decide on how to setup the VXLAN tunnels.

VXLAN tunnels can be setup manually (Head End Replication) or dynamically (MP-BGP EVPN). Head End Replication (HER) is the static mappings of VTEPS for the management of broadcast, unicast, multicast, and unknown packets. It requires to configure each switch with the VNI/VLAN mappings and the list of VTEPs to share MAC addresses and forward BUM traffic. This options works well for small and medium-sized networks. However, scalability and human errors are the primary concerns for large networks.

To automate and simplify the VXLAN tunnels setup, Multi-Protocol Border Gateway Protocol Ethernet VPN (MP-BGP EVPN) is used as a routing protocol to coordinate the creation of dynamic tunnels. EVPN is an extension to the MP-BGP address family which allows to carry VXLAN/MAC information in the BGP routing updates.

VXLAN ROUTING AND BRIDGING

The deployment of VXLAN bridging provides Layer-2 connectivity across the Layer-3 Leaf-Spine underlay for hosts. To provide Layer-3 connectivity between the hosts VXLAN routing is required.

VXLAN routing, sometimes referred to as inter-VXLAN routing, provides IP routing between VXLAN VNIs in the overlay network. VXLAN routing involves the routing of traffic based, not on the destination IP address of the outer VXLAN header but the inner header or overlay tenant IP address.

VXLAN Routing Topologies

The introduction of VXLAN routing into the overlay network can be achieved by a direct or indirect routing model:

  • Direct Routing: The direct routing model provides routing at the first-hop Leaf node for all subnets within the overlay network.  This ensures optimal routing of the overlay traffic at the first hop Leaf switch
  • Indirect Routing: To reduce the amount of state (ARP /MAC entries and routes) each Leaf node holds, the Leaf nodes only route for a subset of the subnets

The Direct Routing model works by creating anycast IP addresses for the host subnets across each of the Leaf nodes, providing a logical distributed router. Each Leaf node acts as the default gateway for all the overlay subnets, allowing the VXLAN routing to always occur at the first-hop.

CONFIGURING VXLAN

For this lab, I’m going to use Direct Routing and Head End Replication (HER) to setup the VXLAN tunnel. In later posts, I will add a couple of SDN Controllers to demonstrate the centralized VXLAN control plane option with VXLAN agents on the compute nodes.

NOTE: As of the writing of this article, EVPN is not supported on vEOS. In fact, Arista just announced EVPN support on the latest EOS release, so it’s still a work in progress.

To provide direct routing, the Leaf nodes of the MLAG domain were configured with an IP interface for every subnet. I already covered this part in my previous post: I configured VARP with the “ip virtual router” representing the default gateway for the subnet.

VXLAN Routing with MLAG

On the other hand, Layer-2 connectivity between racks will be achieved by configuring a VXLAN VTEP on the Leaf switches. For the Dual-homed Compute Leaf, a single logical VTEP is required for the MLAG domain. We need to configure the VTEP on both MLAG peers with the same Virtual Tunnel Interface (VTI) IP address, this ensures both peers decapsulate traffic destined to the same IP address.

The logical VTEP in combination with MLAG provides an active-active VXLAN topology.

VXLAN Overlay Networks
VXLAN Overlay Networks

The logical VTEP address is configured as a new loopback interface. This IP address will be used as the VXLAN tunnel source interface.

Let’s configure the Loopback1 interface, we need to configure the same IP address on the MLAG peers.

hostname leaf01
 !
 interface loopback1
 ip address 10.0.2.1/32
 !
hostname leaf02
 !
 interface loopback1
 ip address 10.0.2.1/32
 !
hostname leaf03
 !
 interface loopback1
 ip address 10.0.2.2/32
 !

Next, we need to assign Loopback1 to the VXLAN tunnel interface (VTI).

hostname leaf01
 !
 interface vxlan1
 vxlan source-interface loopback1
 !
hostname leaf02
 !
 interface vxlan1
 vxlan source-interface loopback1
 !
hostname leaf03
 !
 interface vxlan1
 vxlan source-interface loopback1
 !

To map the hosts VLANs to the VNIs, I will use the following mapping:

vlan 11 –> vni 1011
vlan 12 –>
 vni 1012
vlan 13 –>
 vni 1013

hostname leaf01
 !
 interface vxlan1
 vxlan source-interface loopback1
 vxlan vlan11 vni 1011
 vxlan vlan 12 vni 1012
 vxlan vlan 13 vni 1013
 !
hostname leaf02
 !
 interface vxlan1
 vxlan source-interface loopback1
 vxlan vlan11 vni 1011
 vxlan vlan 12 vni 1012
 vxlan vlan 13 vni 1013
 !
hostname leaf03
 !
 interface vxlan1
 vxlan source-interface loopback1
 vxlan vlan11 vni 1011
 vxlan vlan 12 vni 1012
 vxlan vlan 13 vni 1013
 !

Now we have to configure the flood list for the VNIs so the VTEPs can send BUM traffic and learn MAC address between them.

hostname leaf01
 !
 interface vxlan1
 vxlan source-interface loopback1
 vxlan vlan11 vni 1011
 vxlan vlan 12 vni 1012
 vxlan vlan 13 vni 1013
 vxlan vlan11 flood vtep 10.0.2.2
 vxlan vlan12 flood vtep 10.0.2.2
 vxlan vlan13 flood vtep 10.0.2.2
 !
hostname leaf02
 !
 interface vxlan1
 vxlan source-interface loopback1
 vxlan vlan11 vni 1011
 vxlan vlan 12 vni 1012
 vxlan vlan 13 vni 1013
 vxlan vlan11 flood vtep 10.0.2.2
 vxlan vlan12 flood vtep 10.0.2.2
 vxlan vlan13 flood vtep 10.0.2.2
 !
hostname leaf03
 !
 interface vxlan1
 vxlan source-interface loopback1
 vxlan vlan11 vni 1011
 vxlan vlan 12 vni 1012
 vxlan vlan 13 vni 1013
 vxlan vlan11 flood vtep 10.0.2.1
 vxlan vlan12 flood vtep 10.0.2.1
 vxlan vlan13 flood vtep 10.0.2.1
 !

Finally, to provide IP connectivity between the VTEPs, the loopback IP address of the VTIs need to be advertised into BGP. We just need to announce the logical VTEP IP address into BGP when a new VTEP is added to the topology.

hostname leaf01
!
router bgp 65021
network 10.0.2.1/32
!
hostname leaf02
!
router bgp 65021
network 10.0.2.1/32
!
hostname leaf03
!
router bgp 65022
network 10.0.2.2/32
!

With the Leaf switches announcing their respective VTEP into the underlay BGP routing topology, each Leaf switch learns two equal cost paths (via the Spine switches) to the remote VTEP.

Leaf01 show ip route

Leaf02 show ip route

With the direct routing model, the host subnets exist only on the Leaf switches, there is no need to announce them into BGP; the Spine switches are transparent to the overlay subnets, they only learn the VTEP addresses.

Layer-2 and Layer-3 connectivity between the servers is now possible. Below is the resultant MAC and VXLAN address table for the Leaf switches and the ping results between servers.

show Leaf mac-address

show Leaf vxlan-address

ping server01

You can always check my github repository to download the configuration files.

Articles in the series:

 

Arista Layer-3 Leaf-Spine Fabric with VXLAN HER: Lab Part 3

August 2, 2017
Configuring the Layer-3 Ethernet fabric

by Pablo Narváez

Hello there, welcome to the third article in the series. In this post, we will configure the Layer-3 Fabric (Underlay) so we are ready for VXLAN (Overlay).

During the configuration process, you will notice that the Layer-3 Leaf-Spine design (L3LS) design has a number of elements that need to be considered to implement it.

The diagram below shows the fabric we are building, details will be noted and explained in detail the following sections.

Layer3 Ethernet Fabric
Layer3 Ethernet Fabric

For ease of implementation, I’m going to split the configuration in three parts:

  • Management Network: Out-of-band management
  • Layer-2 Configuration: Servers and Leaf switches
  • Layer-3 Configuration: L3LS interconnects and Leaf-Spine routing

Like any network design IP addressing and logical networks need to be allocated and assigned. For the setup, we will use the IP addressing shown below.

IP Address Allocation Table
IP Address Allocation Table

You don’t really need to collect all this information to configure the lab, I just added the MAC addresses and some description to be more organized and to troubleshooting any problem more easily if necessary. However, if you are interested in getting all this information for you home lab, you can access the VM setting on virt-manager and check the details for every VM as described in the previous post.

MANAGEMENT NETWORK

The out-of-band management network provides access and control of the devices outside of the production network. As the name would imply, the primary use of the OOB network is access and control of infrastructure when the production network is unavailable

To configure this network, use the IP Address Allocation Table as a reference. As an example, the configurations for Leaf01 is shown below.

hostname leaf01
!
username admin role network-admin secret xxxxxx
!
vrf definition mgmt
 rd 0:65010
!
interface Management1
 description oob-mgmt
 vrf forwarding mgmt
 ip address 10.0.0.21/24
!
ip routing
!
no ip routing vrf mgmt
!
logging vrf mgmt host 10.0.0.1
!

Note that a VRF (Virtual Routing and Forwarding) is used for management, this will isolate the management network and make it inaccessible outside its subnet (unless you explicitly allow it). From the host machine (base OS), you will be able to ssh/telnet into the switches and servers through this network. Please check the previous article to see the details of the OOB Network.

LAYER-2 CONFIGURATION

As shown in the network diagram above, we have two different Compute Leafs:

  • Dual-homed Server Leaf
  • Single-homed Server Leaf

For the dual-homed server Leaf, our design will consist of a pair of Leaf switches presented to the servers as a single switch through the use of MLAG (Multi-chassis Link Aggregation). One benefit of using MLAG in this particular design is to eliminate the dependence on spanning-tree for loop prevention so all links between Leaf-switches and servers are active. Servers don’t require knowledge of MLAG and can simply be configured with dynamic or static LACP or NIC Bonding.

Layer2 Leaf-Compute Network
Layer-2 Leaf-Compute Network

In regards to the server gateways, this design uses an Anycast default gateway technique known as Virtual Address Resolution Protocol or VARP. On a MLAG pair both switches coordinate the advertisement of an identical MAC and IP address (the VARP address) for the default gateway on each segment. Each default gateway can receive and service the server requests making a first hop intelligent routing decision without traversing the peer link.

MLAG works well with both Virtual Redundant Router Protocol (VRRP) and Virtual ARP (VARP). There are some reasons why I chose VARP over VRRP: Simple configuration.

However, if you were to deploy a virtual gateway technology in production, choosing VARP would make a lot of sense for the following reasons:

  • Reduces the burden on switch CPUs
  • Switches process all traffic requests independently so there’s no unnecessary traffic traversing the peer-link
  • There is no control protocol or messaging bus as utilized in VRRP so switches don’t send control traffic over the peer-link to maintain gateway coordination or to move the control functions from the primary switch to the peer device in case of failure.

For single-homed servers, we don’t need to enable either MLAG or VARP just configure regular access ports and virtual interfaces (SVI).

MLAG Configuration

Server level configuration always needs to be reviewed as well, particularly with dual-homed active/active configurations. If you need help configuring NIC-Bonding/LACP on the servers, please check the following links:

As a rule of thumb, the MLAG group (domain) configuration must be identical on both switches so look carefully at the very few differences between the switches.

MLAG Domain Diagram
MLAG Domain Diagram

The MLAG peer-VLAN 4094 is created and added to the mlagpeer trunk group. The MLAG peers also must have IP reachability with each other over the peer link (SVI for vlan 4094).

hostname leaf01
!
vlan 4094
 name mlag-vlan
 trunk group mlagpeer
!
interface Vlan4094
 description mlag-vlan
 ip address 172.16.254.1/30
!
hostname leaf02
!
vlan 4094
 name mlag-vlan
 trunk group mlagpeer
!
interface Vlan4094
 description mlag-vlan
 ip address 172.16.254.2/30
!

To ensure forwarding between the peers on the peer link, spanning-tree must also be disabled on this vlan. Once the port channel is created for the peer link and configured as a trunk port on Ethernet6 and Ethernet7, additional VLANs may be added if necessary to transit the peer link. In this example, I’m configuring vlan 11 for server01, adding this vlan to the port channel, and configuring the server-facing ports (Ethernet1 on both switches).

hostname leaf01
!
no spanning-tree vlan 4094
!
interface Port-Channel11
 description service01-portchannel
 switchport trunk allowed vlan 11
 switchport mode trunk
 mlag 11
!
interface Port-Channel54
 description mlag-portchannel
 switchport mode trunk
 switchport trunk group mlagpeer
!
interface Ethernet1
description server01-ens2
channel-group 11 mode active
!
interface Ethernet6
description link_to_leaf02-eth6 (mlag-peerlink1)
channel-group 54 mode active
!
interface Ethernet7
 description link_to_leaf02-eth7 (mlag-peerlink2)
 channel-group 54 mode active
!

Next, we need to configure the actual MLAG domain which must be unique for each MLAG pair. As part of the domain configuration, we are using interface vlan 4094 for IP reachability and Port-Channel 54 as the physical peer link.

hostname leaf01
!
mlag configuration
 domain-id mlagDomain
 local-interface Vlan 4094
 peer-address 172.16.254.2
 peer-link Port-Channel54
 reload-delay 500
!
hostname leaf02
!
mlag configuration
 domain-id mlagDomain
 local-interface Vlan 4094
 peer-address 172.16.254.1
 peer-link Port-Channel54
 reload-delay 500
!

MLAG Verification

To verify the MLAG operation, you can run the following commands:

  • show mlag
  • show mlag config-sanity

Make sure the peer configuration is consistent and the MLAG status is Active. If you see a different MLAG state or any other error in the output check the MLAG troubleshooting guide posted here.

For single-homed servers, ports are configured as access ports and assigned a VLAN. A Switched Virtual Interface (SVI) is created for each VLAN, which acts as the default gateway for the host/workload.

VARP Configuration

Leaf01 and Leaf02 are MLAG peers and are configured to run VARP to provide an active/active redundant first hop gateway for server01 and server04. To provide routing within each rack, the Leaf nodes of an MLAG domain must be configured with an IP interface in every subnet.

hostname leaf01
!
int vlan 11
 description service01-gateway
 ip add 192.168.11.2/24
 ip virtual-router address 192.168.11.1
!
ip virtual-router mac-address 001c.aaaa.aaaa
!
hostname leaf02
!
int vlan 11
 description service01-gateway
 ip add 192.168.11.3/24
 ip virtual-router address 192.168.11.1
!
ip virtual-router mac-address 001c.aaaa.aaaa
!

The global common virtual MAC address is unique for each MLAG domain. In this example, the default gateway for vlan 11 uses 192.168.11.1 which is resolved into the virtual MAC address 001c.aaaa.aaaa.

NOTE: As stated above, there’s no need to configure MLAG/VARP on Leaf03.

Repeat the same steps to configure the remaining VLANs (vlans 11-12).  When this is done, you must be able to reach the default gateway IP addresses from the servers assuming everything is configured correctly on the server side.

LAYER-3 CONFIGURATION

Leaf-Spine Interconnects

All Leaf switches are directly connected to all spine switches. In a L3LS topology all of these interconnections are routed links. These routed interconnects can be designed as point-to-point links or as port channels. For production environments, there are pros and cons to each design, Leaf-Spine interconnects require careful consideration to ensure uplinks are not over-subscribed. Point-to-point routed links will be the focus of this guide.

Point-to-Point Routed Interfaces
Point-to-Point Routed Interfaces

As you can see, each Leaf has a point- to-point network between itself and each Spine. In real-life environments, you need to strike the right balance between address conservation and leaving room for the unknown. Using a /31 mask will work as will a /30, the decision will depend on your personal circumstances.

Check the configuration for Leaf01, then you can configure the remaining switches as described in the IP Address Allocation Table.

# leaf01
...
interface Ethernet1
 description server01-ens2
 switchport access vlan 11
!
interface Ethernet2
 description server02-ens2
!
interface Ethernet3
 description server03-ens2
!
interface Ethernet4
 description link_to_spine01-eth1
 no switchport
 ip address 172.16.0.2/30
!
interface Ethernet5
 description link_to_spine02-eth1
 no switchport
 ip address 172.16.0.14/30
!
interface Ethernet6
 description link_to_leaf02-eth6 (mlag-peerlink1)
 channel-group 54 mode on
!
interface Ethernet7
 description link_to_leaf02-eth7 (mlag-peerlink2)
 channel-group 54 mode on
!
interface Loopback0
 description router-id
 ip address 10.0.1.21/32
!
....
!
ip routing
no ip routing vrf mgmt
!

Border Gateway Protocol (BGP) Design

Leaf and Spine switches are interconnected with Layer-3 point-to-point links, and every Leaf is connected to all Spines with at least one interface. Also, there’s no direct dependency or interconnection between Spine switches. All the Leaf nodes can send traffic evenly towards the Spine through the use of Equal Cost Multi Path (ECMP) which is inherent to the use of routing technologies in the design.

NOTE: We have just two Spine switches in our lab, but you can add additional nodes on demand. It’s not required to have an even number of Spine switches, just make sure to have at least one link from each Leaf to every Spine.

Even though you can use OSPF, IS-IS or BGP as the fabric routing protocol, BGP has become the routing protocol of choice for large data centers. Some of the reasons to choose BGP over its alternatives are:

  • Extensive Multi-Vendor interoperability
  • Native Traffic Engineering (TE) capabilities
  • Minimized information flooding, when compared to link-state protocols
  • Reliance on TCP rather than adjacency forming
  • Reduced complexity and simplified troubleshooting
  • Mature and proven stability at scale

As you may know, we have two options to deploy BGP as the fabric routing protocol: eBGP vs iBGP. There are pros and cons for each of them…

eBGP vs. iBGP

There are number of reasons to choose eBGP but one of the more compelling reasons is simplicity, particularly when configuring load sharing (via ECMP) which is one of the main design goals of the L3LS. Using eBGP ensures all routes/paths are utilized with the least amount of complexity and fewest steps to configure.

I’ve tested both options and my personal choice is eBGP even on production environments. Although an iBGP implementation is technically feasible using eBGP allows for a simpler less complex design that is easier to troubleshoot.

NOTE: When integrating a MLAG Leaf configuration into a Layer-3 Leaf-Spine, iBGP peering is recommend between the MLAG peers. The reason for the peering is due to specific failure conditions that the design must take into consideration, this will be explained in detail in the next section.

BGP Autonomous System Number (ASN)

BGP supports several designs when assigning Autonomous System Numbers (ASN) in a L3LS topology. For this lab, the Common Spine ASN – Discrete Leaf ASN design will be used.

This design uses a single ASN for all spine nodes and discrete ASNs for each leaf nodes. Some benefits of this design are:

  • Each rack can now be identified by its ASN
  • Traceroute and bgp commands will show discrete AS making troubleshooting easier
  • Uses inherent BGP loop prevention
  • Unique AS numbers help troubleshooting and don’t require flexing the EBGP path selection algorithm

As an alternative, you can use the Common Spine ASN – Common Leaf ASN design where a common (shared) ASN will be assigned to the Spine nodes and another ASN to the Leaf nodes. If you want to try this option, please check the configuration guide posted here.

BGP Configuration

For the Spine configuration the default BGP distance is altered to give preference to external BGP routes (this might not be necessary for the lab, but keep it in mind when deploying this configuration in production environments). Leaf neighbors are also defined and utilize a peer-group to simplify configuration.

Note that all spine switches share a common ASN while each Leaf-pair has a different ASN, see the BGP diagram below for details.

bgp_asn_scheme-1
BGP ASN Scheme

NOTE: This guide will use the private AS numbers between 64512 through 65535.

Loopback interfaces will be used as the router-id on each switch, so we are configuring a Loopback0 interface with a /30 mask for every switch.

Follow this table below to configure the loopback interfaces.

Loopback IP Address Allocation Table
Loopback IP Address Allocation Table

To start, let’s see the  Spine switches configuration.

hostname spine01
!
router bgp 65020
 router-id 10.0.1.11
 bgp log-neighbor-changes
 distance bgp 20 200 200
 maximum-paths 2 ecmp 64
 neighbor 172.16.0.2 remote-as 65021
 neighbor 172.16.0.6 remote-as 65021
 neighbor 172.16.0.10 remote-as 65022
 network 10.0.1.11/32
!
hostname spine02
!
router bgp 65020
 router-id 10.0.1.12
 bgp log-neighbor-changes
 distance bgp 20 200 200
 maximum-paths 2 ecmp 64
 neighbor 172.16.0.14 remote-as 65021
 neighbor 172.16.0.18 remote-as 65021
 neighbor 172.16.0.22 remote-as 65022
 network 10.0.1.12/32
!

This example uses static BGP peer groups. When a static peer group is created, the group name can be used to apply the configuration to all members of the group.

The Leaf switch configuration is very similar to the Spine, a single peer-group is utilized to peer with the spine with a standard configuration.

hostname leaf01
!
router bgp 65021
 router-id 10.0.1.21
 bgp log-neighbor-changes
 distance bgp 20 200 200
 maximum-paths 2 ecmp 2
 neighbor ebgp-to-spine-peers peer-group
 neighbor ebgp-to-spine-peers remote-as 65020
 neighbor ebgp-to-spine-peers maximum-routes 12000
 neighbor 172.16.0.1 peer-group ebgp-to-spine-peers
 neighbor 172.16.0.13 peer-group ebgp-to-spine-peers
 neighbor 172.16.254.2 remote-as 65021
 neighbor 172.16.254.2 next-hop-self
 neighbor 172.16.254.2 maximum-routes 12000
 network 10.0.1.21/32
 redistributed connected
!
hostname leaf02
!
router bgp 65021
 router-id 10.0.1.22
 bgp log-neighbor-changes
 distance bgp 20 200 200
 maximum-paths 2 ecmp 2
 neighbor ebgp-to-spine-peers peer-group
 neighbor ebgp-to-spine-peers remote-as 65020
 neighbor ebgp-to-spine-peers maximum-routes 12000 
 neighbor 172.16.0.5 peer-group ebgp-to-spine-peers
 neighbor 172.16.0.17 peer-group ebgp-to-spine-peers
 neighbor 172.16.254.1 remote-as 65021
 neighbor 172.16.254.1 next-hop-self
 neighbor 172.16.254.1 maximum-routes 12000
 network 10.0.1.22/32
 redistributed connected
!

NOTE: The “redistribute connected” command will redistribute all the directly connected interfaces into BGP for connectivity testing purposes. In production, link addresses are not typically advertised. This is because:

  • Link addresses take up valuable FIB resources. In a large CLOS (Leaf-Spine) environment, the number of such addresses can be quite large
  • Link addresses expose an additional attack vector for intruders to use to either break in or engage in DDOS attacks

When we have an MLAG domain as part of a Layer-3 fabric, iBGP peering is recommend between the MLAG peers. The reason for the peering is due to specific failure conditions that the design must take into consideration; such failures include the Leaf-Spine uplinks, routes learned via iBGP will come into effect if all uplinks fail.

Let’s say all Leaf01 uplinks fail, with an iBGP peering between Leaf01 and Leaf02 any server traffic forwarded to Leaf01 would follow the remaining route pointing to Leaf02 and then be ECMP-routed to the Spine.

NOTE: In normal operation paths learned via eBGP (Leaf to Spine uplinks) will always be preferred over paths learned via iBGP (MLAG peers).

The neighbor next-hop-self command configures the switch to list its address as the next hop in routes that it advertises to the specified BGP-speaking neighbor or neighbors in the specified peer group. This is used in networks where BGP neighbors do not directly access all other neighbors on the same subnet.

Route Advertising

In production environments, you need to ensure that only the proper routes are advertised from Leaf switches, so a route map must be applied to the Spine BGP-peers. The route map references the prefix-list which contains the routes that are intended to be advertised to the Spine.

Although not mandatory, using a route map or a prefix list provides a level of protection in the network. Without a route map random networks could be created at the Leaf, which would automatically be added to the routing table.

BGP Verification

You can verify the BGP operation by running the following commands:

  • show ip bgp summary
  • show ip route

The state for all neighbors should be ESTABLISHED.

Since the server VLANs will be encapsulated in VXLAN between VTEPs, we don’t need to advertise them into BGP so I’m going to filter networks 192.168.11.0/24, 192.168.12.0/24, 192.168.13.0/24 out of the Leaf.

# leaf01 and leaf02
ip prefix-list filter-out-to-spine seq 10 deny 192.168.11.0/24
ip prefix-list filter-out-to-spine seq 20 deny 192.168.12.0/24
ip prefix-list filter-out-to-spine seq 20 deny 192.168.13.0/24 
ip prefix-list filter-out-to-spine seq 30 permit 0.0.0.0/0 le 32
!
router bgp 65021
   neighbor ebgp-to-spine-peers prefix-list filter-out-to-spine out

# leaf03
ip prefix-list filter-out-to-spine seq 10 deny 192.168.11.0/24
ip prefix-list filter-out-to-spine seq 20 deny 192.168.12.0/24
ip prefix-list filter-out-to-spine seq 20 deny 192.168.13.0/24 
ip prefix-list filter-out-to-spine seq 30 permit 0.0.0.0/0 le 32
!
router bgp 65022
   neighbor ebgp-to-spine-peers prefix-list filter-out-to-spine out

Once the filter-list is applied on the Leaf switches, the output of “show ip route” on the Spine should display the loopback interfaces and point-to-point links but no server networks must be shown.

spine01#show ip route
VRF name: default
Codes: C - connected, S - static, K - kernel,
 O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
 E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
 N2 - OSPF NSSA external type2, B I - iBGP, B E - eBGP,
 R - RIP, I L1 - ISIS level 1, I L2 - ISIS level 2,
 O3 - OSPFv3, A B - BGP Aggregate, A O - OSPF Summary,
 NG - Nexthop Group Static Route, V - VXLAN Control Service

Gateway of last resort is not set

B E 10.0.1.1/32 [20/0] via 172.16.0.25, Ethernet4 
                       via 172.16.0.33, Ethernet5
 B E 10.0.1.2/32 [20/0] via 172.16.0.25, Ethernet4 
                        via 172.16.0.33, Ethernet5
 C 10.0.1.11/32 is directly connected, Loopback0 
B E 10.0.1.21/32 [20/0] via 172.16.0.2, Ethernet1
                        via 172.16.0.6, Ethernet2 
B E 10.0.1.22/32 [20/0] via 172.16.0.2, Ethernet1
                        via 172.16.0.6, Ethernet2
 B E 10.0.1.23/32 [20/0] via 172.16.0.10, Ethernet3
 B E 10.0.2.1/32 [20/0] via 172.16.0.2, Ethernet1
                        via 172.16.0.6, Ethernet2
 B E 10.0.2.2/32 [20/0] via 172.16.0.10, Ethernet3
 C 172.16.0.0/30 is directly connected, Ethernet1
 C 172.16.0.4/30 is directly connected, Ethernet2
 C 172.16.0.8/30 is directly connected, Ethernet3
 B E 172.16.0.12/30 [20/0] via 172.16.0.2, Ethernet1
                           via 172.16.0.6, Ethernet2
 B E 172.16.0.16/30 [20/0] via 172.16.0.2, Ethernet1
                           via 172.16.0.6, Ethernet2
 B E 172.16.0.20/30 [20/0] via 172.16.0.10, Ethernet3
 C 172.16.0.24/30 is directly connected, Ethernet4
 C 172.16.0.32/30 is directly connected, Ethernet5
 B E 172.16.254.0/30 [20/0] via 172.16.0.2, Ethernet1
                            via 172.16.0.6, Ethernet2

There you go! The Underlay (L3SL Fabric) is up and running. In the next post, we will configure and test VXLAN.

You can always check my github repository to download the configuration files.

Articles in the series:

Arista Layer-3 Leaf-Spine Fabric with VXLAN HER: Lab Part 2

July 19, 2017
Virtual Environment Setup
by Pablo Narváez

Welcome to the second blog post in my multi-part series describing in detail how I will deploy a L3SL-V Ethernet fabric with Arista vEOS and Ubuntu/KVM. In this post, I’m going to dive into the first component of the deployment: the virtual environment and the VMs.

vm_spreadsheet3
Virtual Machine Spreadsheet – Inventory List

As it’s shown in the inventory list above (spreadsheet), I’m going to use a single server with KVM to create multiple VMs. I’ll go with Ubuntu Desktop for the server OS, but you can choose the Server version (Since Ubuntu 12.04, there is no difference in kernel between Ubuntu Desktop and Ubuntu Server).

This guide assumes you have a Linux user graphical interface, so the Desktop version is desired.

To download and install Ubuntu, please follow these links:

NOTE: I chose a Type 2 hypervisor (run on a host OS) over bare-metal to have a flexible environment: for this kind of setups I prefer to have a base OS in order to use some traffic monitoring tools (like Wireshark) and to have a centralized repository for software images.

In addition to that, I chose KVM over VirtualBox because of the number of NIC cards (vNICs) supported: VirtualBox only supports 8 network adapters per VM. Since I will be using this lab to deploy some other unplanned functionalities, I just didn’t want to end up with such limitations in case I need to add additional vNICs.

If you want to give VirtualBox a try, you can follow these links:

KVM INSTALLATION

The procedure described below is a summary of the official guide posted here.

Pre-Installation Checklist

To run KVM, you need a processor that supports hardware virtualization. To see if your processor supports it, you need to install cpu-checker:

$ sudo apt-get install cpu-checker

Now, you can review the output from this command:

$ kvm-ok

which may provide an output like this:

INFO: /dev/kvm exists
KVM acceleration can be used

If this is your case, you are good to go.

If you see :

INFO: Your CPU does not support KVM extensions
KVM acceleration can NOT be used

You can still run virtual machines, but it’ll be much slower without the KVM extensions.

NOTE: Running a 64 bit kernel on the host operating system is recommended but not required. On a 32-bit kernel install, you’ll be limited to 2GB RAM at maximum for a given VM.

Installation of KVM

You need to install a few packages first:

$ sudo apt-get install qemu-kvm libvirt-bin ubuntu-vm-builder bridge-utils virt-viewer virt-manager
  •  qemu-kvm(kvmin Karmic and earlier) is the backend
  • libvirt-binprovides libvirtd which you need to administer qemu and kvm instances using libvirt
  • ubuntu-vm-builderis a powerful command line tool for building virtual machines
  • bridge-utilsprovides a bridge from your network to the virtual machines. This package is optional, but highly recommended in case you have multiple network adapters on the host and want to map some VMs to an external network. Another option is Open vSwitch as a replacement for the Linux Bridge.
  • virt-viewer is a tool for viewing instances. This package is optional, but strongly recommended to display a graphical console for VMs
  • virt-manager is GUI tool to manage virtual machines. This module is optional, but strongly recommended to simplify the VM life-cycle management. If not installed, you will have to manage VMs with the Virsh command-line

After the installation, you need to relogin so that your user becomes an effective member of KVM and libvirtd user groups. Only the members of this group can run virtual machines.

Verify the Installation

After you relogin, test if your installation has been completed successfully with the following command:

$ virsh list --all
 Id Name                 State
----------------------------------
$

If you get something like this:

$ virsh list --all
libvir: Remote error : Permission denied
error: failed to connect to the hypervisor
$

Something is wrong (e.g. you did not relogin) and you probably want to fix this before you move on.

To troubleshoot any issues during or after the installation, please check the official KVM installation guide posted here.

VM VIRTUAL NETWORKING

This is what we are going to build.

vm_network_diagram2
Virtual Machine Network Diagram

The drawing shows where each network adapter (vNIC) is, what network it’s configured for, and how the VMs are interconnected. Every connection between two adapters represents an isolated segment which must be configured as a virtual network in KVM.

To ensure that every link is isolated, we need to configure each virtual network with an exclusive name (I configured them as “net-x”), disable IP routing, and use every virtual network only once for a unique link.

The links between each VM will act like physical cables, but the virtual network connecting the management interfaces of the Ubuntu Linux servers and the Arista switches are on a common shared network (“net-oob”). The host will also have an adapter connected to this network so we can ssh into each device through the Out-of-Band management interface (OOB).

vm_oob-management_diagram
Virtual Machine Out-of-Band Management Network Diagram

The first network adapter will always end up being the Management1 interface in each switch. To simplify things, I dedicated the first network adapter (vNIC1) for management on each VM.

CREATING VMs

We will have two different types of VMs: Ubuntu servers an Arista switches. For Linux, we are going to install the same software image that we used for the host OS. For the Arista switches, there are two files that are needed: the virtual hard drive (vmdk) and the Aboot ISO file.

You need to register at arista.com to download the software. Once you login, go to Support > Software Download to retrieve the following files:

vEOS-lab-4.17.5M.vmdk
Aboot-veos-8.0.0.iso

NOTE: There are several folders and more than one image format, make sure to download the correct files from the vEOS-lab folder.

To build the VMs faster, we are going to create two base VMs (golden images, one for the servers and one for the switches), then clone them multiple times.

Creating the Base VM for Servers

The easiest way to create a virtual machine in KVM is to use the Virtual Machine Manager application. You can find it in your applications dashboard.

virt-manager_dashboard

Or you can use the command-line:

$ virt-manager

virt-manager_cli

The first thing to do is create the virtual networks for the network adapters. Look at the network drawing and the spreadsheet at the beginning of this section.

In the virt-manager main window, click the edit button on the toolbar, then click on Connection Details.

virt-manager_menu

Go into the Virtual Networks tab, click the add button (“+” icon, lower left corner).

Give the network a name. This is going to be our first virtual network, so we will start with “net-1”.

virt-manager_network

We will simulate physical network connections so we don’t need to assign IP addresses for now.

NOTE: When creating the out-of-band management network (“net-oob“) you might want to enable the IP address definition so KVM adds a virtual network adapter on the host to communicate directly with the VMs (for ssh/admin purposes).

Uncheck the Enable IPv4 network address space definition option, do the same for IPv6 in the next step.

virt-manager_ipv4

We need isolated segments to interconnect the VMs, so choose the isolated virtual network option and then click the finish button to continue.

virt-manager_isolated-net

Repeat the same steps to create the rest of the networks. Don’t forget to add the management network (“net-oob”).

NOTE: Enable the IP address space when creating “net-oob” for management. In this case, I will be using 10.0.0.0/24 for the network and the host will receive the IP address 10.0.0.1/24.

When this done, there must be a total of 18 networks (net-1 thru net-17, plus net-oob).

virt-manager_virt-nets

Now we need to create the actual VMs. Go back into the virt-manager main screen and click the Create New Virtual Machine icon on the toolbar to start the installation.

First, set the virtual machine’s name (“server01”) and select the installation method, select Local install media (ISO image or CDROM).

Next, we need to find and select the Linux image (Ubuntu 16.04.2 ISO file). Make sure to check the Automatically detect the operating system option.

virt-manager_media

You should now choose how much memory to allocate for the VM. Allocate 2040MB of memory and 1 CPU.

Remember: To allocate more than 2GB of memory to a virtual machine, you need to have a 64-bit CPU and a 64-bit Linux kernel.

Check the Enable storage for this virtual machine option and allocate the disk space for the vm. In my case, I will leave the default space (20 GB).

By default, KVM configures NAT for the network adapters. We need to configure the network adapters on each VM as shown in the spreadsheet.

To do so, before clicking on the Finish button, make sure to check the Customize configuration before install option to edit the VM settings.

NOTE: You can always customize the VM configuration after the installation.

virt-manager_customize

Now we need to configure all of the internal networks within the VMs, I’ll show some examples.

From the left hand side menu, click on the only NIC adapter available and open the Netwok Source drop-down menu. You will see all the virtual networks we created in the previuos steps.

virt-manager_net-menu

From the drop-down menu, choose Virtual Network net-oob” to assign the management network to the adapter and configure “e1000” for Device model.

Remember: The first NIC on all VMS will always be the management interface.

virt-manager_nic1

Then, click on the Add Hardware button (lower left corner) and add two NIC adapters for “net-1” and “net-2” respectively. Don’t forget to choose “e1000” for the Device model option.

virt-manager_nic2

You can click on the Begin Installation button on the upper left corner to start the OS installation.

virt-manager_os-instpng

Virt-manager will boot the guest operating system, you may now proceed to install the OS.

Cloning the Base VM (Ubuntu Servers)  

Now we need to clone the base VM to build the rest of the servers. The main virt-manager window will show server01, right click on it and click Clone.

NOTE: You need to power off the VM to clone it.

virt-manager_clone

In the clone window, change the server name (“server02”, in this case), leave the default settings for Networking and make sure to choose the Clone this disk option for the disk storage.

virt-manager_clone-conf

Click on the Clone button to finish. Repeat the same steps to create the rest of the servers.

Finally, we need to configure the network settings for each adapter on every server (remember, every cloned VM will have server01 settings so we need to change that).

I will show you one example: In the virt-manager main window, right click on server02 and click Open.

virt-manager_clone-menu

Within the configuration menu, click on the second NIC adapter and choose “net-3” from the Netwok source drop-down menu; then, assign “net-4” to the third NIC.

virt-manager_clone-nics

Configure the remaining network adapters on each VM as shown in the spreadsheet.

Creating the Fabric Switches

Not quite done yet! We need to build the VMs to run Arista vEOS. The process will be quite different and we will have to tweak some settings to make the vEOS boot, so stick with me.

In the virt-manager main window, click on the Create a new virtual machine and select the last option: Import existing disk image.

Browse and locate the vEOS-lb-4.17.5M.vmdk file in your server, leave the default setting for OS Type and Version.

virt-manager_veos-os

Next, allocate 2048MB of memory and 1 CPU.

NOTE: With the latest vEOS release it is now required to allocate at least 2GB of memory.

Before clicking on the Finish button, name the VM (“spine01”) and make sure to check the Customize configuration before install option.

It’s time to tweak some settings before installing the OS. First, take a look at the screen below, this is what you should have by now.

virt-manager_veos-conf

While in this window, modify the following:

  1. Remove the IDE Disk 1 – I know, I know, it’s the disk we just created a few steps back with the vmdk file, but it’s critical to build the disks from scratch
  2. Remove the sound controller (Sound: ich 6 in my case)
  3. Change the video settings from QXL to VGA
  4. Change the NIC configuration – Choose the “net-oob” virtual network for management and configure “e1000” for the Device model option.
  5. Add three additional NICs for “vnet-11”, “vnet-13” and “vnet-15” respectively,  choose “e1000” for Device model.
  6. Add two disk storage devices, one IDE disk with the vmdk file and one IDE CD device with the Aboot.iso file (see below).

virt-manager_veos-disks

Arista vEOS is very particular about how its storage is configured. Both drives need to be IDE and the Aboot.iso (boot-loader) needs to be installed as a CD. If a SCSI controller gets created, it must be deleted or vEOS will not load.

virt-manager_veos-cdrom

Next, we need to make the VM boot from the CD to load the Aboot.iso file. Change the Boot Options to boot from the IDE CDROM 1.

virt-manager_veos-bootseq

Click Apply to close the window. Go ahead and click on the Begin Installation button, you will see the boot-loader run.

virt-manager_veos-bootload

If you have ever installed vEOS on other hypervisors, you will notice that it takes too long to boot in KVM with the same resources allocated. In addition to that, it’s important to be aware that you will not see the booting sequence, be patient and wait for the command-line to appear!

virt-manager_veos-cli

¡Listo! The base VM is ready. Clone the VM so you can create spine02, leaf01, leaf02 and leaf03.

Don’t forget to customize the network configuration for each VM, configure the network adapters as shown in the spreadsheet.

VERIFY THE INSTALLATION

You should see all the VMs on the virt-manager main window. Run all the VMs and wait until they are operational.

virt-manager_all-vms

You can also test if your installation has been completed successfully with the following command:

$ virsh list --all

All VMs must be in the running state and you should be able to access the user interface on each VM.

virsh_all-vms

When this is done, the lab should be bootable and every device should be connected to every other device according to the original network diagram.

In the next post, we will configure the L2/L3 protocols for the fabric, stay tuned!

You can always check my github repository to download the configuration files.

Articles in the series:

 

Arista Layer3 Spine-Leaf w/ VXLAN (L3LS-V): Lab Part 1

July 11, 2017
Lab, Introduction
by Pablo Narváez

This is the first post in a series where I’ll go deep on how VXLAN is deployed on Arista switches and how it operates in a Layer 3 Ethernet fabric.[Post-2]network_diagram-1

For this lab, I will create a self-contained virtual environment so physical appliances will not be used at this time. Please note, the IP Storage, Services and Border Leafs will not be deployed yet; once we are done with VXLAN, I will add new features and functionalities including the extra Leafs.

Hardware and versions to be used in my lab:

• 2x HPE DL-360 ProLiant Gen8 server
– 2x 64bit 8-core Intel Xeon processors (E5-2650)
– 128GB RAM
– 4x 300GB SAS drives (sda, RAID 0)
– 2x 1TB SAS drivers (sdb, RAID 0)
– 1x iLO dedicated GE network port
– 4x embedded GE network ports
– 1x 10GbE dual-port network module
• 2x HPE DL-360 ProLiant Gen8 server
• Ubuntu 16.0.4 LTS with KVM
• VMware iESXi 6.5
• Arista vEOS 4.17.5M

I will provide detailed instructions for what to do (and configure) on every device. Equipment and operating systems versions may change along the way, so I will make appropriate notes wherever needed.

As stated before, I am going to follow this up with a series of articles focusing on the different infrastructure layers; as those articles are released, the links will be updated here:

  • Lab Part 1: Introduction (this article)
  • Lab Part 2: Virtual environment setup (next up)
  • Lab Part 3: Production servers deployment
  • Lab Part 4: Configuring the Ethernet fabric
  • Lab Part 5: Running and testing VXLAN

Arista Layer-3 Leaf-Spine Fabric with VXLAN HER: Lab Part 1

July 11, 2017
Introduction
by Pablo Narváez

This is the first post in a series where I’ll go deep on how VXLAN is deployed on Arista switches and how it operates in a Layer-3 Ethernet fabric.

vxlan-fabric-netwokdiagram
Two-tier Layer-3 VXLAN Ethernet Fabric

For this lab, I will create a self-contained virtual environment with Ubuntu Linux/KVM and Arista virtual EOS (vEOS). Please note, the IP Storage, Services and Border Leafs will not be deployed yet; once we are done with VXLAN, I will add new features and functionalities including the extra Leafs.

Hardware and versions to be used in my lab:

• 2x HPE DL-360 ProLiant Gen8 server
– 2x 64bit 8-core Intel Xeon processors (E5-2650)
– 128GB RAM
– 4x 300GB SAS drives (sda, RAID 0)
– 2x 1TB SAS drivers (sdb, RAID 0)
– 1x iLO dedicated GE network port
– 4x embedded GE network ports
– 1x 10GbE dual-port network module
• Ubuntu 16.04.2 LTS with KVM
• VMware iESXi 6.5
• Arista vEOS 4.17.5M

I will provide detailed instructions for what to do (and configure) on every device. Equipment and operating systems versions may change along the way, so I will make appropriate notes wherever needed.

This was just an overview of the virtual lab I will be building. As stated before, I am going to follow this up with a series of articles focusing on the different infrastructure layers; as those articles are released, the links will be updated here:

You can always check my github repository to download the configuration files.