Implement Event-Driven Ansible

With the pieces all in place, and all the information gathered, we can now start to realize the vision of having Ansible autonomously recover from misconfigurations.

IOS configuration

In your file explorer, under playbooks, take a look at the playbook configure_ios_telemetry.yml.

This playbook will have the job of configuring all our Cisco devices to send telemetry messages to Telegraf. There are several things to look at here.

  • The hosts line: We are working with two devices, rtr1 and rtr2.

  • The vars block: Our inputs for the playbook

    • telemetry_grpc_port - We know from Exercise 3.1 that Telegraf is listening on port 8089.

    • telemetry_nodes - We know from Exercise 3.2 that the YANG node we want to listen for has prefix interfaces-ios-xe-oper and xpath interfaces.

  • The only task

    • Creates push subscriptions to all the YANG nodes described in the telemetry_nodes variable.

    • The line update-policy periodic 3000 is counting in centiseconds, so we’ll expect a message every 30 seconds.

Go ahead and run the playbook:

ansible-navigator run playbooks/configure_ios_telemetry.yml

In your terminal, ssh to rtr1:

ssh rtr1

First, we’ll verify that the telemetry configuration is looking good:

show running-config | section telemetry

You should see:

telemetry ietf subscription 1
 encoding encode-kvgpb
 filter xpath /interfaces-ios-xe-oper:interfaces
 source-address 172.16.137.114
 stream yang-push
 update-policy periodic 3000
 receiver ip address 172.16.88.75 8089 protocol grpc-tcp
Your IP addresses will differ from the above example.

Next, ensure that the router sees the new subscription as valid:

show telemetry ietf subscription 1

You should see:

ID         Type       State      State Description
1          Configured Valid      Subscription validated

Exit out of the router SSH session.

exit

Now let’s make sure that this is producing messages in Kafka.

Run the following Kafka listener command:

sudo docker exec -it broker kafka-console-consumer --bootstrap-server localhost:9092 --topic eda | jq 'select(.tags.name=="Tunnel0")' | tee interface.json
This is the same command from earlier, but with a couple pipe additions. This is to help us filter out messages that are relevant to what we’re doing, and persist to a file so we can more easily inspect the messages.

Wait for a message to appear in your terminal. It may take up to 30 seconds. Once one does, press CTRL+C a few times to stop the listener.

You should now have an interface.json file available in your file explorer. This is the same data from the console, but more readable. Open it so that we can see how the data is being formatted by Telegraf.

Your file should look like the following (some sections removed for brevity):

{
  "fields": {
    "admin_status": "if-state-up",
    "auto_downstream_bandwidth": 0,
    "auto_upstream_bandwidth": 0,
    "bia_address": "00:00:00:00:00:00",
[...]
  },
  "name": "Cisco-IOS-XE-interfaces-oper:interfaces/interface",
  "tags": {
    "host": "telegraf",
    "name": "Tunnel0",
    "path": "Cisco-IOS-XE-interfaces-oper:interfaces/interface",
    "source": "rtr1",
    "subscription": "1"
  },
  "timestamp": 1745445626
}
You might actually get two of these messages because of a sync condition with Telegraf (if you check the timestamps, they will be 30 seconds apart). You can ignore the second message.

This is information about Tunnel0, the interface we were looking at previously, arranged in JSON format. It looks like fields.admin_status will be a good data point to use, and its normal value is "if-state-up". It also looks like we can use tags.name to identify that the message is about Tunnel0 specifically, which will be useful since the subscription we set up is sending messages about every interface the router has.

If you recall looking at the YANG tree Cisco-IOS-XE-interfaces-oper.tree in Exercise 3.2, you may notice that all the data available here matches up with that.

Using ansible-rulebook

In your file explorer, expand the rulebooks directory. Then inside that directory, open the file interface_status_cli.yml.

This is a rulebook which is the mechanism for controlling Event-Driven Ansible (EDA), the same way a playbook controls regular Ansible.

We can break this down into sections:

  • The sources block

    • We have a Kafka source available on port 9092, using a topic called "eda", which we know from Exercise 3.1.

    • We’re accessing that in EDA by using the ansible.eda.kafka event source plugin.

  • The rules block

    • We have a single rule.

    • The rule condition looks for event.body.fields.admin_status to be anything other than "if-state-up", which we identified in the previous exercise as the good/normal state.

    • The rule condition also looks for event.body.tags.name to be "Tunnel0" (so that we don’t trigger on other interfaces being down).

    • The response to this condition is to run the configuration playbook - the same one we ran in Exercise 2.2 to recover from a misconfiguration.

While you’re here, also take a look at interface_status_aap.yml. We aren’t going to use it now, but we will later. Note the difference between the two rulebooks at the very end under action.

Now open another terminal. In your student workbench, at the top of the terminal, you have a + button. Click it, and you should have a new bash terminal come up. You can swap between them on the right.

In your new terminal, run the following command:

ansible-rulebook --rulebook rulebooks/interface_status_cli.yml -i inventory
You should not initially see any output. This is normal, since no conditions are currently being met.

For now, leave this alone and switch back to your original terminal.

What we want to do now is bring the Tunnel0 interface down again and see if ansible-rulebook will react to it. Run the following commands from earlier:

ssh rtr1
configure terminal
interface Tunnel0
shutdown
end
show ip interface brief

Now, switch back to your ansible-rulebook terminal (using the navigation on the right side of the terminal - the correct one will be labeled python3 any time ansible-rulebook is running) and observe.

You will need to wait up to 30 seconds for the next Kafka message to come in, but once it does, you should see the playbook run. When it runs, you should observe a couple things:

  • The task that applies interface configuration has reported changed

  • The playbook has only run against rtr1. Recall from Exercise 2.1 there was an ansible_eda variable in the hosts line we were not yet using; now we are. We could also access other event data from within the playbook this way if we needed to.

Press CTRL+C to stop ansible-rulebook.

Switch back to your original terminal, which should still have an SSH session open to rtr1. Verify that the interface is up, then exit the SSH session.

show ip interface brief
exit

If desired, try running the same commands on rtr2 and verify that everything works the same way there.

We now have a functioning self-healing network environment…​ as long as ansible-rulebook is running. We don’t want to have that up in a terminal all the time, so let’s move on and do something about that.