Debugging OVN-Kubernetes - The Setup
The Software Defined Network (SDN) used in the OpenShift Container Platform is based off the upstream OVN-Kubernetes project and serves as the backbone for all container and virtual machine networking needs.
OVN-Kubernetes (Open Virtual Networking - Kubernetes) is an open-source project that provides a robust networking solution for Kubernetes clusters with OVN (Open Virtual Networking) and Open vSwitch (Open Virtual Switch) at its core. It is a Kubernetes networking conformant plugin written according to the CNI (Container Network Interface) specifications.
Everyone knows it’s there, but how do we debug it? When you run into an issue, how do you narrow down the cause?
In this section of the lab, we are going to walk you through 3 scenarios and how to leverage tools like ovnkube-trace, ovn-trace and ovn-nbctl to identify and resolve network communication issues between internal and external resouces.
Lab Setup
|
This lab is not using a running OpenShift Cluster. The following scenarios were deployed into a running OpenShift Cluster and a backup of the OVN and OVS databases, along with flows and interfaces, were restored into this demo environment. All commands and outputs yield the same results as if this OVN-Kubernetes dump was running in an OpenShift Cluster. |
The Demo Application
What does our example application look like?
For this lab, we will trace traffic flows between 2 frontend pods, 2 backend pods and 1 virtual machine across 3 different namespaces and an external source.
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
sitea-frontend-8798d9cdc-77jj7 1/1 Running 0 14d 10.129.6.15 worker-5.ocpv.tamlab.rdu2.redhat.com <none> <none>
siteb-frontend-66f9f55ddc-9qcwr 1/1 Running 0 14d 10.129.6.17 worker-5.ocpv.tamlab.rdu2.redhat.com <none> <none>
sitea-backend-796cf44b87-l7mhb 1/1 Running 0 14d 10.129.6.13 worker-5.ocpv.tamlab.rdu2.redhat.com <none> <none>
siteb-backend-5b765679b5-l4gp6 1/1 Running 0 14d 10.129.6.14 worker-5.ocpv.tamlab.rdu2.redhat.com <none> <none>
virt-launcher-external-vm-x7gnx 1/1 Running 0 59m 10.129.6.209 worker-5.ocpv.tamlab.rdu2.redhat.com <none> 1/1
Now that we know what the application looks like, what other information do we need? The output above gives us our pod IPs, but we also need MAC addresses and to check if any of our containers or VMs are using secondary networks. From the output below, we can see all of our IP and MAC information and that our external virtual machine is connected to a secondary network default/vlan530.
|
The oc commands do not run, they are just for reference and to show some examples of yq parsing to get the information we need from a running cluster. |
oc get pods -n frontend sitea-frontend-8798d9cdc-77jj7 -o yaml | yq '.metadata.annotations.["k8s.ovn.org/pod-networks"]'
{"default":{"ip_addresses":["10.129.6.15/23"],"mac_address":"0a:58:0a:81:06:0f","gateway_ips":["10.129.6.1"],"routes":[{"dest":"10.128.0.0/14","nextHop":"10.129.6.1"},{"dest":"172.30.0.0/16","nextHop":"10.129.6.1"},{"dest":"169.254.0.5/32","nextHop":"10.129.6.1"},{"dest":"100.64.0.0/16","nextHop":"10.129.6.1"}],"ip_address":"10.129.6.15/23","gateway_ip":"10.129.6.1","role":"primary"}}
oc get pods -n frontend siteb-frontend-66f9f55ddc-9qcwr -o yaml | yq '.metadata.annotations.["k8s.ovn.org/pod-networks"]'
{"default":{"ip_addresses":["10.129.6.17/23"],"mac_address":"0a:58:0a:81:06:11","gateway_ips":["10.129.6.1"],"routes":[{"dest":"10.128.0.0/14","nextHop":"10.129.6.1"},{"dest":"172.30.0.0/16","nextHop":"10.129.6.1"},{"dest":"169.254.0.5/32","nextHop":"10.129.6.1"},{"dest":"100.64.0.0/16","nextHop":"10.129.6.1"}],"ip_address":"10.129.6.17/23","gateway_ip":"10.129.6.1","role":"primary"}}
oc get pods -n backend sitea-backend-796cf44b87-l7mhb -o yaml | yq '.metadata.annotations.["k8s.ovn.org/pod-networks"]'
{"default":{"ip_addresses":["10.129.6.13/23"],"mac_address":"0a:58:0a:81:06:0d","gateway_ips":["10.129.6.1"],"routes":[{"dest":"10.128.0.0/14","nextHop":"10.129.6.1"},{"dest":"172.30.0.0/16","nextHop":"10.129.6.1"},{"dest":"169.254.0.5/32","nextHop":"10.129.6.1"},{"dest":"100.64.0.0/16","nextHop":"10.129.6.1"}],"ip_address":"10.129.6.13/23","gateway_ip":"10.129.6.1","role":"primary"}}
oc get pods -n backend siteb-backend-5b765679b5-l4gp6 -o yaml | yq '.metadata.annotations.["k8s.ovn.org/pod-networks"]'
{"default":{"ip_addresses":["10.129.6.14/23"],"mac_address":"0a:58:0a:81:06:0e","gateway_ips":["10.129.6.1"],"routes":[{"dest":"10.128.0.0/14","nextHop":"10.129.6.1"},{"dest":"172.30.0.0/16","nextHop":"10.129.6.1"},{"dest":"169.254.0.5/32","nextHop":"10.129.6.1"},{"dest":"100.64.0.0/16","nextHop":"10.129.6.1"}],"ip_address":"10.129.6.14/23","gateway_ip":"10.129.6.1","role":"primary"}}
oc get pods -n external virt-launcher-external-vm-x7gnx -o yaml | yq '.metadata.annotations.["k8s.ovn.org/pod-networks"]'
{"default":{"ip_addresses":["10.129.6.209/23"],"mac_address":"0a:58:0a:81:06:d1","gateway_ips":["10.129.6.1"],"routes":[{"dest":"10.128.0.0/14","nextHop":"10.129.6.1"},{"dest":"172.30.0.0/16","nextHop":"10.129.6.1"},{"dest":"169.254.0.5/32","nextHop":"10.129.6.1"},{"dest":"100.64.0.0/16","nextHop":"10.129.6.1"}],"ip_address":"10.129.6.209/23","gateway_ip":"10.129.6.1","role":"primary"},"default/vlan530":{"ip_addresses":null,"mac_address":"02:a1:3b:00:00:15","role":"secondary"}}
You can also pull the networking information of the VM directly off the Virtual Machine Instance (vmi) object.
oc get vmi -n external -o wide
NAME AGE PHASE IP NODENAME READY LIVE-MIGRATABLE PAUSED
external-vm 72m Running 10.6.153.247 worker-5.ocpv.tamlab.rdu2.redhat.com True True
oc get vmi -n external external-vm -o yaml | yq '.status.interfaces'
- infoSource: domain, guest-agent, multus-status
interfaceName: eth0
ipAddress: 10.6.153.247
ipAddresses:
- 10.6.153.247
- 2620:52:9:1699:a1:3bff:fe00:15
linkState: up
mac: 02:a1:3b:00:00:15
name: default
podInterfaceName: pod37a8eec1ce1
queueCount: 1
Network Policies
We also know that our cluster has some default NetworkPolicies and MultiNetworkPolicies that get applied to every new Namespace.
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: default-deny
namespace: frontend
spec:
podSelector: {}
policyTypes:
- Ingress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-from-openshift-ingress
namespace: frontend
spec:
ingress:
- from:
- namespaceSelector:
matchLabels:
network.openshift.io/policy-group: ingress
podSelector: {}
policyTypes:
- Ingress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-from-backend-to-siteb-frontend
namespace: frontend
spec:
podSelector:
matchLabels:
deployment: siteb-frontend
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: backend
policyTypes:
- Ingress
apiVersion: k8s.cni.cncf.io/v1beta1
kind: MultiNetworkPolicy
metadata:
annotations:
k8s.v1.cni.cncf.io/policy-for: default/vlan530
name: deny-by-default
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
apiVersion: k8s.cni.cncf.io/v1beta1
kind: MultiNetworkPolicy
metadata:
annotations:
k8s.v1.cni.cncf.io/policy-for: default/vlan530
name: egressallow-gh
spec:
egress:
- to:
- ipBlock:
cidr: 140.82.113.0/24
- ipBlock:
cidr: 8.8.8.8/32
podSelector:
matchLabels:
internet: "true"
policyTypes:
- Egress
Setup the OVN Environment
To begin, lets start our OVN environment by moving into the ModuleOVN directory and executing the start script.
cd ~/ModuleOVN/
/home/lab-user/ModuleOVN/ovs-dbg/bin/ovs-offline -w /home/lab-user/ModuleOVN/ovs-offline start
After executing the above command, you will see output of 4 containers starting.
Starting container ovsdb-server-ovn_nb
7c3f080cf89e038255fc2b35825ce69dc85440dfd48864b8b4876bc14c4473ee
Starting container ovsdb-server-ovn_sb
270f9af1197081df32789c541eaf54c77e5ccadfee560ad9895fd19572432d16
Starting container ovsdb-server-ovs
066bd7f2e5d4c5023c7e552a273cd318add2489d06c4e46cfe9e2340bcac0bc6
Starting container ovs-vswitchd
b0bc4411308193f003731553655ec405fa04f6d9848a213e6b6c653ffbe80b5e
When that is finished, confirm your 4 containers are running.
podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7c3f080cf89e localhost/ovs-offline:latest ovsdb-ovn_nb 5 minutes ago Up 5 minutes ovsdb-server-ovn_nb
270f9af11970 localhost/ovs-offline:latest ovsdb-ovn_sb 5 minutes ago Up 5 minutes ovsdb-server-ovn_sb
066bd7f2e5d4 localhost/ovs-offline:latest ovsdb-ovs 5 minutes ago Up 5 minutes ovsdb-server-ovs
b0bc44113081 localhost/ovs-offline:latest vswitchd-dummy 4 minutes ago Up 4 minutes ovs-vswitchd
The last step is to move into the pre-created execution environment by sourcing the following file. This produces a number of alias commands to allow for easier execution of commands against the OVN environment.
source /tmp/ovs-offline/bin/activate
You will now see your terminal change, pre-pending (ovs-offline) to your existing prompt [lab-user@rhel9 ModuleOVN]$.
* You can now run the following offline commands directly:
ovs-vsctl [...]
ovsdb-client [...]
ovn-nbctl [...]
ovsdb-client [...] $OVN_NB_DB
ovn-sbctl [...]
ovsdb-client [...] $OVN_SB_DB
* You can restore your previous environment with:
ovs-offline-deactivate
(ovs-offline) [lab-user@rhel9 ModuleOVN]$
To confirm your environment is up, run the following command which will print an overview of the database contents.
ovn-nbctl show
(ovs-offline) [lab-user@rhel9 ModuleOVN]$ ovn-nbctl show
switch cc63f8ab-f7ff-4906-a15b-fb7b24c62eb2 (br.ex.localnet_ovn_localnet_switch)
port default.vlan530_faatam_virt-launcher-faatam-idm1-ghrg2
addresses: ["02:a1:3b:00:00:2f"]
port default.vlan530_external_virt-launcher-external-vm-x7gnx
addresses: ["02:a1:3b:00:00:15"]
port br.ex.localnet_ovn_localnet_port
type: localnet
addresses: ["unknown"]
Now that we know what the demo environment looks like and you have started the OVN environment, let’s get into the first exercise! If you have any issues or questions, let us know.
Resetting the OVN Environment
If the showroom environment gets closed or reset, the OVS environment will shutdown.
You can easily reset it following these 4 steps.
Stop the existing environment to cleanup all of the resources.
/home/lab-user/ModuleOVN/ovs-dbg/bin/ovs-offline -w /home/lab-user/ModuleOVN/ovs-offline stop
Offline OVS Debugging stopped
*****************************
Ensure you are back in the ModuleOVN directory.
cd ~/ModuleOVN/
[lab-user@rhel9 ModuleOVN]$
Re-extract the ovs-offline repository.
tar zxvf ovs-offline.tgz
ovs-offline/
ovs-offline/restore_flows/
ovs-offline/restore_flows/do_restore.sh
ovs-offline/restore_flows/br-ex.groups.dump
ovs-offline/restore_flows/br-ex.flows.dump
ovs-offline/restore_flows/br-int.groups.dump
ovs-offline/restore_flows/br-int.flows.dump
ovs-offline/restore_flows/restore.sh
ovs-offline/var-run/
ovs-offline/ovnkube-node-vv8p4_ovs.db
ovs-offline/db/
ovs-offline/db/ovs/
ovs-offline/db/ovs/ovnkube-node-vv8p4_ovs.db
ovs-offline/db/ovn_nb/
ovs-offline/db/ovn_nb/ovnnb_db.db
ovs-offline/db/ovn_sb/
ovs-offline/db/ovn_sb/ovnsb_db.db
ovs-offline/ovnnb_db.db
ovs-offline/ovnsb_db.db
Start the OVS environment.
/home/lab-user/ModuleOVN/ovs-dbg/bin/ovs-offline -w /home/lab-user/ModuleOVN/ovs-offline start
Starting container ovsdb-server-ovn_nb
4b5cc986ee085e84422417f3ac9f07aab296a099cf84d1e880500fcf7f028768
Starting container ovsdb-server-ovn_sb
1d07be31060f7ea64e4f2dd059ac1670e9933f5ec4c525dd45c9e37d0dd782ef
Starting container ovsdb-server-ovs
0804b5568c438c4dae2490aed4ead22b333cee73eba9fa991148f61a3e81c665
Starting container ovs-vswitchd
154738531b7e6ed856a8fe5538adaec54880339b17c84c0dca46fd946c5eb015
Offline OVS Debugging started
******************************