Network Automation 101

Enginyeria La Salle, May 18

Christian Adell @chadell0

Agenda

Time Topic
Day 1
15:30 - 16:00 Network Programmability & Automation
16:00 - 16:30 Ansible 101
16:30 - 17:00 Exercise 1
Day 2
15:30 - 16:00 Ansible 102
16:00 - 16:45 Exercise 2
16:45 - 17:00 StackStorm 101

Day 1

Network Programmability and Automation

What is Software Defined Networking?

RFC 1925, The Twelve Networking Truths

(6)  It is easier to move a problem around (for example, by moving
    the problem to a different part of the overall network
    architecture) than it is to solve it.

    (6a) (corollary). It is always possible to add another level of
            indirection.

(11) Every old idea will be proposed again with a different name and
    a different presentation, regardless of whether it works.

    (11a) (corollary). See rule 6a.

Openflow

  • It was the PhD work of Martin Casado at Stanford University, supervised by Nick McKeown
  • OpenFlow could be considered as the trigger of SDN, when he was trying to program the network as he was doing with computers
    • for almost 20 years, network opertations had not evolved like other IT silos
  • but, in the end, it's just a protocol that allows for the decoupling of a network device's control plane from the data plane
    • and it was not the first trying to accomplish this, i.e. ForCes, RCP, PCE.
    • For more information look for the paper "The Road to SDN: An Intellectual History of Programmable Networks"

Network Functions Virtualization

  • It consists on taking functions that have been deployed as hardward, and instead deploying them as software
  • It enables breaking down a monolithic piece of HW into N pieces of software
  • It offers a better way to scale out and minimize failure domains using a pay-as-yo-grow model
  • But...
    • it needs a rethink on how the network is architected (traffic doesn't need to go through a specific device) and give up the current single pane of management (CLI or GUI)
    • Also, some vendors, are not actively selling their virtual appliances...
  • Agility is one of the major values, decreasing time to provision of new services and adopting DevOps culture

Virtual switching

  • These are just software-based switches in the hypervisor level providing local conectivity between VMs and Containers
  • They are the new edge layer of the network, instead of the Top of the Rack switches (TOR)
  • Examples:
    • VMware standard switch (VSS)
    • VMware distributed switch (VDS)
    • Cisco Nexus 1000v
    • Cisco Application Virtual Switch (AVS)
    • Open vSwitch (OVS)

Network virtualization

  • It's just providing an overlay network (Layer 2) using protocols such as VxLAN or EVPN
  • So, the result is a virtual network decoupled from the physical network
  • Usually, these solutions offers extra services controlled from a a single point of management
  • Examples:
    • VMware NSX
    • Nuage Virtual Service Platform (VSP)
    • Juniper Contrail

Device APIs

  • CLI, repeat with me, CLI, CLI...
  • CLI, or GUI, are not well suited for automation because they don't offer structured data
  • API (Application Programmable Interface) offers a clean interface to operate with network devices
  • Examples:
    • RESTful APIs
    • NETCONF

Network Automation

---

- hosts: all
  connection: local
  gather_facts: no

  tasks:
    - name: configure the login banner
      net_banner:
        banner: "{{ 'motd' if ansible_network_os == 'nxos' else 'login' }}"
        text: "{{ network_banner }}"
        state: present
      when: network_banner is defined

    - name: configure the MOTD
      net_banner:
        banner: motd
        text: "{{ network_motd }}"
        state: present
      when: network_motd is defined and ansible_network_os != 'nxos'

  • APIs facilitates Network Automation
  • It's extremely easy to retrieve network information (structured) and deploy configuration at scale
  • It allows a more predictable and uniform network
  • Some examples:
    • Custom Python scripts
    • Ansible
    • Salt
    • Stackstorm

Bare-metal switching

  • Network devices were always bought as hardware appliance, operating system, applications from the same vendor
  • Now, the bare-metal switching is about disaggregation and being able to purchase every piece from different vendors
  • Examples of hardware boxes:
    • HP
    • Dell
    • Edgecore
  • Examples of network operating systems:
    • Cumulus Networks
    • Big Switch
    • FBOSS

Data center network fabrics

  • Network architecture have evolved to standarized blocks
  • We changed from managing individual boxes to managing a system, offering a single interface
  • This solutions offer distributed gateways, multi-path and some form of "logic"
  • Examples:
    • Cisco Appliance Centric Infrastructure (ACI)
    • Big Switch Big Cloud Fabrice (BCF)

SD-WAN

  • It democratises WAN connections, being able to create private WAN services over multiple technologies and providers
  • It's conceptually similar to overlay networks, making provision quicker and also more agile
  • Examples:
    • Viptela, now Cisco
    • CloudGenix
    • VeloCloud

Controller networking

  • In some of the previous solutions, they rely on a central point of control that orchestrates everything
  • These platforms offer:
    • Network Virtualization
    • Monitoring
    • ... or any other any function that could be related to application running on top
  • Example:
    • OpenDaylight

Network Automation

Why

  • Simplified Architectures
  • Deterministic Outcomes
  • Business Agility

Some examples

  • Device Provisioning
  • Data Collection
  • Migrations
  • Configuration Management
  • Compliance
  • Reporting
  • Troubleshooting

Data Formats, Data Models and Config Templates

XML

<rpc-reply xmlns:junos="http://xml.juniper.net/junos/13.3R5/junos">
    <software-information>
        <host-name>M320-TEST-re0</host-name>
        <product-model>m320</product-model>
        <product-name>m320</product-name>
        <junos-version>13.3R5.9</junos-version>
    </software-information>
    <cli>
        <banner>{master}</banner>
    </cli>
</rpc-reply>

YAML

---
parameter_defaults: 
  ControlPlaneDefaultRoute: "192.0.2.1"
  ControlPlaneSubnetCidr: 24
  DnsServers:
    - "192.168.23.1"
  EC2MetadataIp: "192.0.2.1" 
  ExternalAllocationPools:
    - end: "10.0.0.250"
      start: "10.0.0.4"
  ExternalNetCidr: "10.0.0.1/24"
  NeutronExternalNetworkBridge: ""

JSON

{
  "parameter_defaults": {
    "ControlPlaneDefaultRoute": "192.0.2.1", 
    "ControlPlaneSubnetCidr": "24", 
    "DnsServers": [
        "192.168.23.1"
    ], 
    "EC2MetadataIp": "192.0.2.1", 
    "ExternalAllocationPools": [
        {
            "end": "10.0.0.250", 
            "start": "10.0.0.4"
        }
    ], 
    "ExternalNetCidr": "10.0.0.1/24", 
    "NeutronExternalNetworkBridge": ""
  }
}

YANG

 module configuration {
  namespace "http://xml.juniper.net/xnm/1.1/xnm";
  prefix junos;
  organization
    "Juniper Networks, Inc.";
  revision "2015-09-11" {
    description "Initial revision";
  }
  typedef ipv4addr {
    type string;
  }
  grouping juniper-config {
    container backup-router {
      description "IPv4 router to use while booting";
      leaf address {
        description "Address of router to use while booting";
        type ipv4addr;
        mandatory true;
      }
    ...

https://raw.githubusercontent.com/Juniper/yang/master/14.2/configuration.yang

JINJA

{% for key, value in vlanDict.iteritems() -%}
vlan {{ key }}
    name {{ value }}
{% endfor %}
>>> vlanDict = { 
>>>     123: 'TEST-VLAN-123', 
>>>     234: 'TEST-VLAN-234', 
>>>     345: 'TEST-VLAN-345'}
>>> from jinja2 import Environment
>>> env = Environment(loader=FileSystemLoader('./Templates/'))
>>> template = env.get_template('ourtemplate')
>>> print template.render(vlanDict)

vlan 123
    name TEST-VLAN-123
vlan 234
    name TEST-VLAN-234
vlan 345
    name TEST-VLAN-345

Ansible 101

Review of automation tools

  • Ansible
  • Chef
  • Puppet
  • Salt
  • StackStorm

Understanding how Ansible works

  • Automating servers
    • Distributed execution
    • Copy via SSH python code and runs in every device
  • Automating network devices
    • Centralised execution
    • Runs python code locally and reach network devices by SNMP, SSH or APIs

Basic files and defaults

  • Inventory file: Contains the devices (ip or fqdn) that will be automated (and associated variables)
    • /etc/ansible/hosts
    • ANSIBLE_INVENTORY
    • -i, --inventory-file
  • Variable files:
    • Group variables
      • group_vars/{name of group}.yml or
      • group_vars/{name of group}/{variables}.yml
    • Host variables
      • host_vars/{name of group}.yml or
      • host_vars/{name of group}/{variables}.yml

Inventory file

[barcelona-dc]
switch01
switch02

[madrid-dc]
172.31.200.1
switch03

[barcelona-cpe]
vmx1

[madrid-cpe]
172.22.3.1

[barcelona:children]
barcelona-dc
barcelona-cpe

Assigning variables

[all:vars]
ntp_server=10.20.30.4

[barcelona:vars]
ntp_server=192.168.0.1

[madrid:vars]
ntp_server=10.0.0.1

[barcelona-dc]
switch01 ntp_server=192.168.0.3
switch02

Variables' file

File: group_vars/barcelona-dc.yml

---
snmp:
    contact: Ausias March
    location: Barcelona Data Center, Passeig Colon
    communities:
        - community: public
          type: ro
        - community: private
          type: rw

How will we access this data?: snmp.communitites[0].type

Executing an Ansible Playbook

It's the file that contain your automation instructions

---
    - name: PLAY 1 - Configure Interface Speed
      hosts: barcelona-dc
      connection: local
      gather_facts: no

      tasks:

        - name: TASK1 - Get interface information
          ios_command:
            commands:
                - show run | include interfaces
            provider:
                username: myusername
                password: mypassword
                host: "{{ inventory_hostname }}"

Exercise 1

Goal

Experiment with basic Ansible automation, learning by doing

All you need is here: https://github.com/chadell/ansible-cumulus-vyos

Scenario

TODO

From the mgmt server:

  1. Run the example playbook (exercise1.yml) against the targeted hosts (inventory.cfg)
    • Analyse output and what the playbook is doing
  2. Update the hostname of all the devices to match the fqdn: router01, router02, switch, server, oob-switch
    • Make a PR to the Github repository with your playbook (be aware of identifying yourself)
  3. Contribute to improve this workshop by fixing errors, typos or promoting improvements by PRs

Day 2

Agenda

Time Topic
Day 1
15:30 - 16:00 Network Programmability & Automation
16:00 - 16:30 Ansible 101
16:30 - 17:00 Exercise 1
Day 2
15:30 - 16:00 Ansible 102
16:00 - 16:45 Exercise 2
16:45 - 17:00 StackStorm 101

Ansible 102

Core modules

  • command: used to send exec-level commands
    • ios_command, vyos_command, junos_command, and so on
  • config: used to send configuration commands
    • ios_config, vyos_config, junos_config, and so on
  • **facts: used to gather information from network devices
    • ios_facts, vyos_facts, junos_facts, and so on

Note: to find out the parameters of each module (plus some examples), you can use the ansible-doc utility: ansible-doc ios_config

Creating and using configuration templates

  1. Creating variable files
  2. Creating Jinja templates
  3. Generating network configuration files

Creating variable files (1)

group_vars/barcelona-dc.yml

---
snmp:
    contact: Ausias March
    location: Barcelona Data Center, Passeig Colon
    communities:
        - community: public
          type: ro
        - community: privat
          type: rw

Creating variable files (2)

group_vars/madrid-dc.yml

---
snmp:
    contact: Francisco de Quevedo
    location: Madrid Data Center, Paseo de la Castellana
    communities:
        - community: publico
          type: ro
        - community: privado
          type: rw

Creating variable files (3)

group_vars/all.yml

---
base_provider:
    username: vagrant
    password: vagrant
    host: "{{ inventory_hostname}}"

Creating Jinja templates (1)

templates/snmp/ios.j2

snmp-server location {{ snmp.location }}
snmp-server contact {{ snmp.contact }}
{% for community in snmp.communities %}
snmp-server community {{ community.community }} {{ community.type | upper }}
{% endfor %}

Creating Jinja templates (2)

templates/snmp/junos.j2

set snmp location {{ snmp.location }}
set snmp contact {{ snmp.contact }}
{% for community in snmp.communities %}
{% if community.type | lower == "rw" %}
set snmp community {{ community.community }} authorization read-write
{% elif community.type | lower == "ro" %}
set snmp community {{ community.community }} authorization read-only
{% endif %}
{% endfor %}

Generating network configuration files (1)

We will use the template module. It use the src parameter as the proper template to use and the dest parameter to point to the location where to store the rendered configuration (it assumes the folders already exist)

# play definition omitted
tasks:
  - name: GENERATE CONFIGS FOR EACH OS
    template:
      src: "./snmp/{{ os }}.j2"
      dest: "./configs/snmp/{{ inventory_hostname }}.cfg

and run it!

$ ansible-playbook -i inventory.cfg snmp.yml

Generating network configuration files (2)

What is os? and inventory_hostname?

  • os is a variable, so for each inventory element the task looks for the value of the os variable. It could be defined in specific files (as pointed out before), or in the inventory file with:
[eos]
eos-spine1
eos-spine2

[eos:vars]
os=eos
  • inventory_hostname is just the name of the network device from the inventory file

Ensuring a configuration exists

  1. Idempotency: make the change only when it's needed, so if you run the playbook twice without changes, it will have effect the first time
  2. Using the config module
  3. Understanding check mode, verbosity and limit

Using the config module (1)

Let's use the eos_config module to deploy the SNMP configuration from previous example

  - name: PLAY 2 - ENSURE EOS SNMP CONFIGS ARE DEPLOYED
    hosts: eos
    connection: local
    gather_facts: no

    tasks:
      - name: DEPLOY CONFIGS FOR EOS
        eos_config:
          src: "./configs/snmp/{{ inventory_hostname }}.cfg"
          provider: "{{ base_provider }}"

Using the config module (2)

Notes:

  1. This could be the second task of the previous example (we are using the output file as src)
  2. We are running agains a subset of hosts (eos), and using their specific module (eos_config)
  3. We are using as provider (access credentials) an object defined as a variable for all the devices, such as:
base_provider:
  username: vagrant
  password: vagrant
  host: "{{ inventory_hostname }}

Other options/parameters for config module

  • commands, instead of using src (a file), we could embed a list of commands to be executed in the network device
  • parents, needed when we are working with nested configuration, for instance an interface mode, we reference these dependencies
  • other specific parameters, check them using ansible-doc

Understanding check mode, verbosity and limit

  • Check mode, is the ability to run playbooks in "dry run" mode, the ability of knowing if changes will occur. Use it by enabling the --check when executing the playbook
  • Verbosity, every module returns JSON data with metadata of the comanand and the respone from the device. Use it by enabling the -v flag when running the playbook.
  • Limit, usually you define the hosts to run the playbook against is the hosts paramters, but you can be more concrete by using the --limit option and a list of the groups from the inventory

Gathering and viewing network data

Even Ansible is used often to deploy configurations it also makes possible to automate the collection of data from network devices.

In this part we will analyse two key methods for gathering data:

  • core facts modules
  • arbitrary show commands with the command module

Using the core facts modules

The core facts module return the following data as JSON (so it could be used in the playbook!):

Core facts modules Result
ansible_net_version The operation system version running on the remote device
ansible_net_hostname The configured hostname of the device
ansible_net_config The current active config from the device
ansible_net_interfaces A hash of all interfaces running on the system

Get fact from network devices

Even by default the gather_facts provides all this information, in network devices that don't let remote python code execution (non Linux based NOS), we have to use specific facts modules (i.e. ios_facts):

---
  - name: PLAY 1 - COLLECT FACTS FOR IOS
    hosts: iosxe
    connection: local
    gather_facts: no

    tasks:
      - name: COLLECT FACTS FOR IOS
        ios_facts:
          provider: "{{ base_provider }}

Using the debug module

In order to view the facts that are being returned from the module you can run the playbook in verbose mode or simply yse the debug module with the var parameter while referencing a valid facts key:

# play definition omitted
  tasks:
    - name: COLLECT FACTS FROM IOS
      ios_facts:
        provider: "{{ base_provider }}"
    
    - name: DEBUG OS VERSION
      debug:
        var: ansible_net version
    
    - name: DEBUG HOSTNAME
      debug:
        var: ansible_net_hostname

Using data from responses

To get the return data (JSON) from a module you can use the verbose mode, but there is also another way, using the register task attribute, which allows you to save the JSON response data as a variable

  - name: ISSUE SHOW COMMAND
    ios_command:
      command:
        - show run | include snmp-server community
      provider: "{{ base_provider }}"
    register: snmp_data

The register's associated value is the variable you want to save the data in

Access returned data

Since the snmp_data variable is now created (or registered), the debug module can be used to view the data. After viewing it, you need to understand the data structure to use it, even you could get it from the ansible-doc help.

  - name: DEBUG COMMAND STRING RESPONSE WITH JINJA SHORTHAND SYNTAX
    debug:
      var: snmp_data.stdout.0

  - name: DEBUG COMMAND STRING RESPONSE WITH STANDARD PYTHON SYNTAX
    debug:
      var: snmp_data['stdout'][0]

or just use it with templates as {{ snmp_data['stdout'][0] }}

Compliance checks

  • set_fact: it's a module that creates a variable out of some other complex set of data.
  • assert: it's a module to ensure that a condition is True of False
  tasks:
    - name: RETRIEVE VLANS JSON RESPONSE
      eos_command:
        commands:
          - show vlan brief | json
        provider: "{{ base_provider}}"
      register: vlan_data

    - name: CREATE EXISTING_VLANS FACT TO SIMPLIFY ACCESSING VLANS
      set_fact:
        existing_vlans_ids: "{{ vlan_data.stdout.0.vlans.keys() }}"

    - name: PERFORM COMPLIANCE CHECKS
      assert:
        that:
          - "'20' in existing_vlans_ids"

Generating reports

  • assemble: it's a module that assembles all the individual reports into a single master report
  • delimiter: useful to split partial outputs
- name: PLAY CREATE REPORTS
  hosts: "iosxe,eos,nxos"
  connection: local
  gather_facts: no

  tasks:
    - name: GENERATE DEVICE SPECIFIC REPORTS
      template:
        src: ./reports/facts.j2
        dest: ./reports/facts/{{ inventory_hostname }}.md

    - name: CREATE MASTER REPORT
      assemble:
        src: ./reports/facts/
        dest: ./reports/master-report.md
        delimiter: "---"
      run_once: true

Roles

Roles are ways of automatically loading certain vars_files, tasks, and handlers based on a known file structure. Grouping content by roles also allows easy sharing of roles with other users.

roles/
   common/
     tasks/
     templates/
     vars/
     defaults/
     meta/
   webservers/
     tasks/
     defaults/
---
- hosts: webservers
  roles:
     - common
     - webservers

Using 3rd-party Ansible modules

All of the examples we've reviewd in this chapter have used Ansible core modules (included in Ansible installation) However, there is an active community for 3rd-party modules.

NAPALM

Network Automation and Programmability Abstraction Layer with Multi-vendor support is an open source community developing mutli-vendor network automation integrations

NAPALM modules

  • Declarative configuration management (napalm_install_config): NAPALM focuses on the desired state configuration. You deploy the new configuration (no commands) and NAPALM abstracts away how this operates per vendor and makes it so you don't have to micromanage device configurations.

  • Obtaining configuration and operational state from devices: The module napalm_get_facts is used to obtain a base set of facts and other information (usch as route entries, MAC table, etc.). The big benefit is that it the data is preparsed and normalised for all vendors supported.

Bring your own module

Installing 3rd party modules is quite straightforward:

  1. Choose a path on your Linux system where you want to store all your 3rd party modules
  2. Navigate to that path and perform a git clone on each repository that you want to use
  3. Open your Ansbile config file (/etc/ansible/ansible.cfg) and update your module path with the chosen directory
  • You can locate this file running ansible --version
  1. Install any dependencies the modules have (this are usually documented on the project's GitHub)

Exercise 2

Goal

Extend Ansible feautures to provision a network

All you need is here: https://github.com/chadell/ansible-cumulus-vyos

Scenario

TODO

  1. Create a network design:
  • A vlan between router01 and router02 and configure iBGP (create all the necessary config)
  • A vlan to communicate router01, router02 and the server
  • Everything using templating
  • A validation that the design is deployed
  • A report with all the configs applied to router01, router02 and switch
  1. Contribute to improve this workshop by fixing errors, typos or promoting improvements by PRs

StackStorm: Event-Driven Automation

Use-cases

Facilitated Troubleshooting

  • Triggering on system failures captured by Nagios, Sensu, New Relic and other monitoring systems
  • Running a series of diagnostic checks on physical nodes, OpenStack or Amazon instances
  • Application components, and posting results to a shared communication context, like HipChat or JIRA.

Automated remediation

  • Identifying and verifying hardware failure on OpenStack compute node
  • Properly evacuating instances
  • Emailing admins about potential downtime, but if anything goes wrong
  • Freezing the workflow and calling PagerDuty to wake up a human.

Continuous deployment

  • Build and test with Jenkins, provision a new AWS cluster
  • Turn on some traffic with the load balancer
  • Roll-forward or roll-back, based on NewRelic app performance data.

Concepts (I)

  • Sensors are Python plugins for either inbound or outbound integration that receives or watches for events respectively. When an event from external systems occurs and is processed by a sensor, a StackStorm trigger will be emitted into the system.
  • Triggers are StackStorm representations of external events. There are generic triggers (e.g. timers, webhooks) and integration triggers (e.g. Sensu alert, JIRA issue updated). A new trigger type can be defined by writing a sensor plugin.

Concepts (II)

  • Actions are StackStorm outbound integrations. There are generic actions (ssh, REST call), integrations (OpenStack, Docker, Puppet), or custom actions. Actions are either Python plugins, or any scripts, consumed into StackStorm by adding a few lines of metadata. Actions can be invoked directly by user via CLI or API, or used and called as part of rules and workflows.

Concepts (III)

  • Rules map triggers to actions (or to workflows), applying matching criteria and mapping trigger payload to action inputs. Workflows stitch actions together into “uber-actions”, defining the order, transition conditions, and passing the data. Most automations are more than one-step and thus need more than one action. Workflows, just like “atomic” actions, are available in the Action library, and can be invoked manually or triggered by rules.

Concepts (IV)

  • Packs are the units of content deployment. They simplify the management and sharing of StackStorm pluggable content by grouping integrations (triggers and actions) and automations (rules and workflows). A growing number of packs are available on StackStorm Exchange. Users can create their own packs, share them on Github, or submit to the StackStorm Exchange.
  • Audit trail of action executions, manual or automated, is recorded and stored with full details of triggering context and execution results. It is also captured in audit logs for integrating with external logging and analytical tools: LogStash, Splunk, statsd, syslog.

Wrap-up