Implement Event-Driven Ansible
With the pieces all in place, and all the information gathered, we can now start to realize the vision of having Ansible autonomously recover from misconfigurations.
IOS configuration
In your file explorer, under playbooks
, take a look at the playbook configure_ios_telemetry.yml
.
This playbook will have the job of configuring all our Cisco devices to send telemetry messages to Telegraf. There are several things to look at here.
-
The hosts line: We are working with two devices, rtr1 and rtr2.
-
The vars block: Our inputs for the playbook
-
telemetry_grpc_port
- We know from Exercise 3.1 that Telegraf is listening on port 8089. -
telemetry_nodes
- We know from Exercise 3.2 that the YANG node we want to listen for has prefixinterfaces-ios-xe-oper
and xpathinterfaces
.
-
-
The only task
-
Creates push subscriptions to all the YANG nodes described in the
telemetry_nodes
variable. -
The line
update-policy periodic 3000
is counting in centiseconds, so we’ll expect a message every 30 seconds.
-
Go ahead and run the playbook:
ansible-navigator run playbooks/configure_ios_telemetry.yml
In your terminal, ssh to rtr1
:
ssh rtr1
First, we’ll verify that the telemetry configuration is looking good:
show running-config | section telemetry
You should see:
telemetry ietf subscription 1 encoding encode-kvgpb filter xpath /interfaces-ios-xe-oper:interfaces source-address 172.16.137.114 stream yang-push update-policy periodic 3000 receiver ip address 172.16.88.75 8089 protocol grpc-tcp
Your IP addresses will differ from the above example. |
Next, ensure that the router sees the new subscription as valid:
show telemetry ietf subscription 1
You should see:
ID Type State State Description 1 Configured Valid Subscription validated
Exit out of the router SSH session.
exit
Now let’s make sure that this is producing messages in Kafka.
Run the following Kafka listener command:
sudo docker exec -it broker kafka-console-consumer --bootstrap-server localhost:9092 --topic eda | jq 'select(.tags.name=="Tunnel0")' | tee interface.json
This is the same command from earlier, but with a couple pipe additions. This is to help us filter out messages that are relevant to what we’re doing, and persist to a file so we can more easily inspect the messages. |
Wait for a message to appear in your terminal. It may take up to 30 seconds. Once one does, press CTRL+C a few times to stop the listener.
You should now have an interface.json
file available in your file explorer. This is the same data from the console, but more readable. Open it so that we can see how the data is being formatted by Telegraf.
Your file should look like the following (some sections removed for brevity):
{
"fields": {
"admin_status": "if-state-up",
"auto_downstream_bandwidth": 0,
"auto_upstream_bandwidth": 0,
"bia_address": "00:00:00:00:00:00",
[...]
},
"name": "Cisco-IOS-XE-interfaces-oper:interfaces/interface",
"tags": {
"host": "telegraf",
"name": "Tunnel0",
"path": "Cisco-IOS-XE-interfaces-oper:interfaces/interface",
"source": "rtr1",
"subscription": "1"
},
"timestamp": 1745445626
}
You might actually get two of these messages because of a sync condition with Telegraf (if you check the timestamps, they will be 30 seconds apart). You can ignore the second message. |
This is information about Tunnel0
, the interface we were looking at previously, arranged in JSON format. It looks like fields.admin_status
will be a good data point to use, and its normal value is "if-state-up". It also looks like we can use tags.name
to identify that the message is about Tunnel0
specifically, which will be useful since the subscription we set up is sending messages about every interface the router has.
If you recall looking at the YANG tree Cisco-IOS-XE-interfaces-oper.tree in Exercise 3.2, you may notice that all the data available here matches up with that.
|
Using ansible-rulebook
In your file explorer, expand the rulebooks
directory. Then inside that directory, open the file interface_status_cli.yml
.
This is a rulebook which is the mechanism for controlling Event-Driven Ansible (EDA), the same way a playbook controls regular Ansible.
We can break this down into sections:
-
The
sources
block-
We have a Kafka source available on port 9092, using a topic called "eda", which we know from Exercise 3.1.
-
We’re accessing that in EDA by using the
ansible.eda.kafka
event source plugin.
-
-
The
rules
block-
We have a single rule.
-
The rule condition looks for
event.body.fields.admin_status
to be anything other than "if-state-up", which we identified in the previous exercise as the good/normal state. -
The rule condition also looks for
event.body.tags.name
to be "Tunnel0" (so that we don’t trigger on other interfaces being down). -
The response to this condition is to run the configuration playbook - the same one we ran in Exercise 2.2 to recover from a misconfiguration.
-
While you’re here, also take a look at interface_status_aap.yml
. We aren’t going to use it now, but we will later. Note the difference between the two rulebooks at the very end under action
.
Now open another terminal. In your student workbench, at the top of the terminal, you have a +
button. Click it, and you should have a new bash terminal come up. You can swap between them on the right.
In your new terminal, run the following command:
ansible-rulebook --rulebook rulebooks/interface_status_cli.yml -i inventory
You should not initially see any output. This is normal, since no conditions are currently being met. |
For now, leave this alone and switch back to your original terminal.
What we want to do now is bring the Tunnel0
interface down again and see if ansible-rulebook
will react to it. Run the following commands from earlier:
ssh rtr1
configure terminal
interface Tunnel0
shutdown
end
show ip interface brief
Now, switch back to your ansible-rulebook
terminal (using the navigation on the right side of the terminal - the correct one will be labeled python3
any time ansible-rulebook
is running) and observe.
You will need to wait up to 30 seconds for the next Kafka message to come in, but once it does, you should see the playbook run. When it runs, you should observe a couple things:
-
The task that applies interface configuration has reported
changed
-
The playbook has only run against rtr1. Recall from Exercise 2.1 there was an
ansible_eda
variable in the hosts line we were not yet using; now we are. We could also access other event data from within the playbook this way if we needed to.
Press CTRL+C to stop ansible-rulebook.
Switch back to your original terminal, which should still have an SSH session open to rtr1. Verify that the interface is up, then exit the SSH session.
show ip interface brief
exit
If desired, try running the same commands on rtr2
and verify that everything works the same way there.
We now have a functioning self-healing network environment… as long as ansible-rulebook
is running. We don’t want to have that up in a terminal all the time, so let’s move on and do something about that.