Using OpenShift Lightspeed (Insights) to Proactively Analyze a Cluster
OpenShift Lightspeed (Insights) is a free tool that runs on your OpenShift cluster that collects a specific data set every 2 hours, anonymizes it, and then sends it to Red Hat to be reviewed. The archives are stored on SupportShell allowing Support Engineers and Technical Account Managers to review that data.
In this module we will be reviewing how to leverage a tool to parse an OpenShift Lightspeed (Insights) archive and proactively analyze any OpenShift cluster.
You can find more details on OpenShift Lightspeed (Insights) and the entire remote health monitoring package in our documentation: About remote health monitoring
You can find details on what OpenShift Lightspeed (Insights) collects here: Showing data collected by remote health monitoring
You can also review the source code to see additional information on what is collected here: GitHub: Insights Operator - gather_most_recent_metrics.go
The ocp_insights.py script
With this script we will be able to parse the insights archive, review customer namespace memory usage, look for namespaces with overlapping UIDs, review storage classes, and look at etcd metrics.
cd ~/Module3/
ocp_insights.py -h
ocp_insights.py --help
usage: ocp_insights.py [-h] [--id ID] [--file FILE] [--alerts] [--customer_memory] [--etcd_metrics] [--events] [--list] [--extract] [--cluster_info] [--node_info] [--cluster_operators] [--remote] [--server SERVER] [--node_logs NODE_NAME]
OpenShift Insights Cluster Report.
options:
-h, --help show this help message and exit
--id ID ClusterID of a connected cluster used to find all connected Clusters
--file FILE Use a specific Insights Archive File; must specify full path.
--alerts Prints out Alerts in valid JSON
--customer_memory Prints Customer Namespace memory usage.
--etcd_metrics Prints etcd Slow Apply metrics for all Insights Archives for the cluster.
--events Prints namespace events if they exist.
--list List available archives for a specific cluster. Must be used with --id option. Can be combined with --extract.
--extract Extract archive for a specific cluster to user's home directory. Must be used with --id option. Can be combined with --list to select which archive to extract.
--cluster_info Prints only cluster information (ID, name, version, platform, network, encryption, etc.).
--node_info Prints only node information (name, status, role, version, OS, CPU, memory).
--cluster_operators Prints only cluster operator information (name, version, status).
--remote Connect to remote server to perform analysis
--server SERVER Remote server to connect to (overrides default). Use with --remote option.
--node_logs NODE_NAME
Print logs for a master node. Provide the full node name (FQDN). Master nodes only.
What is in the output?
-
This script was written with the intention to parse Insights Archives to display the data the same way it is output by OpenShift’s CLI. All of the following information is output when running the script unless specifically stated below.
-
We include the following information:
-
ClusterVersion
-
Channel
-
Previously Installed Versions
-
Platform
-
VMware, AWS, Nutanix, IBMCloud, etc
-
-
NetworkType
-
Proxy Configuration
-
Ingress and API IP Addresses
-
etcd Encryption
-
Audit Profile
-
Node Information
-
Name, State, Role, Created Date, Version, OS, CPU, and Memory
-
-
Cluster Operator status with the same output as oc get clusteroperators
-
Installed Operators
-
Installed OLM Operators
-
Display Name, Version, and Namespace
-
-
MachineConfigPools
-
MachineSets
-
Failing Pods
-
Alerts
-
PodNetworkConnectivityChecks
-
Namespace Events^
-
Alerts in JSON^
-
Customer Namespace Memory Usage^
-
Control-Plane Node Error Logs^
-
-
^ Indicates an additional flag is needed to view the data.
-
Cluster Information
The --cluster_info option shows a detailed break down of the high level information regarding your cluster including ID, Name, Version, Channel, Cluster Status, Platform, Install Type, and more.
ocp_insights.py --cluster_info --file ./Cluster_1/insights.tar.gz
Cluster ID: DB08D743-5559-4259-8ABB-7C5B528439B7
Cluster Name: prod.example.com
Cluster Version: 4.15.57
Channel: stable-4.15
Previous Versions: 4.15.57, 4.15.44, 4.15.43, 4.15.37, 4.15.24, 4.15.20, 4.15.19, 4.15.18, 4.15.17, 4.15.15, 4.15.14, 4.15.13,
4.15.12, 4.15.11, 4.14.23, 4.14.21, 4.13.40, 4.12.55, 4.11.59, 4.11.57, 4.11.56, 4.11.55, 4.11.54, 4.11.53, 4.11.50, 4.11.49,
4.11.48, 4.11.30, 4.10.37
Cluster Status: Failing
Reason: ClusterOperatorDegraded
Message: Cluster operator authentication is degraded
Platform: None
Install Type: UPI
Network Type: OVNKubernetes
IPsec: Enabled
Proxy Settings:
HTTP: False
HTTPS: False
etcd Encryption: AES-CBC
Audit Profile: Default
Node Information
The --node_info option displays all of the node information, including Node Name, Node Status, Role(s), Creation Date, Kubelet Version, CoreOS Version and the CPU and Memory details of each server.
|
The total memory returned, is the memory available to the Kubelet and not necessarily equal to the physical amount of memory on the node. |
ocp_insights.py --node_info --file ./Cluster_2/insights.tar.gz
Node Information:
NAME READY ROLE CREATED ON VERSION OS CPU MEMORY
master0-dev True control-plane,master 2024-05-11 07:30:21 v1.29.10+67d3387 Red Hat Enterprise Linux CoreOS 416.94.202412170927-0 6 31 GB
master1-dev True control-plane,master 2024-05-11 07:30:23 v1.29.10+67d3387 Red Hat Enterprise Linux CoreOS 416.94.202412170927-0 6 31 GB
master2-dev True control-plane,master 2024-05-11 07:30:20 v1.29.10+67d3387 Red Hat Enterprise Linux CoreOS 416.94.202412170927-0 6 31 GB
infra0-dev True infra 2024-05-11 07:48:46 v1.29.10+67d3387 Red Hat Enterprise Linux CoreOS 416.94.202412170927-0 4 16 GB
infra1-dev True infra 2024-05-11 07:48:45 v1.29.10+67d3387 Red Hat Enterprise Linux CoreOS 416.94.202412170927-0 4 16 GB
infra2-dev True infra 2024-05-11 07:48:43 v1.29.10+67d3387 Red Hat Enterprise Linux CoreOS 416.94.202412170927-0 4 16 GB
worker0-dev True worker 2024-05-11 07:48:40 v1.29.10+67d3387 Red Hat Enterprise Linux CoreOS 416.94.202412170927-0 12 79 GB
worker1-dev True worker 2024-05-11 07:48:41 v1.29.10+67d3387 Red Hat Enterprise Linux CoreOS 416.94.202412170927-0 12 79 GB
worker2-dev True worker 2024-05-12 11:15:08 v1.29.10+67d3387 Red Hat Enterprise Linux CoreOS 416.94.202412170927-0 12 79 GB
worker3-dev True worker 2025-01-22 15:47:13 v1.29.10+67d3387 Red Hat Enterprise Linux CoreOS 416.94.202412170927-0 12 79 GB
worker4-dev True worker 2025-01-22 16:19:06 v1.29.10+67d3387 Red Hat Enterprise Linux CoreOS 416.94.202412170927-0 12 79 GB
worker5-dev True worker 2025-01-22 16:53:23 v1.29.10+67d3387 Red Hat Enterprise Linux CoreOS 416.94.202412170927-0 12 79 GB
Cluster Operators
The --cluster_operators returns the same output as running oc get clusteroperators or oc get co. All Cluster Operators are returned with the Name, Version, Status, and the Reason if the operator is having issues.
ocp_insights.py --cluster_operators --file ./Cluster_1/insights.tar.gz
Cluster Operators:
NAME VERSION AVAILABLE PROGRESSING DEGRADED REASON
authentication 4.15.57 True False True OAuthServerConfigObservationDegraded: error validating configMap openshift-config/ca-config-map: certificate expired:...
baremetal 4.15.57 True False False
cloud-controller-manager 4.15.57 True False False
cloud-credential 4.15.57 True False False
cluster-autoscaler 4.15.57 True False False
config-operator 4.15.57 True False False
console 4.15.57 True False False
control-plane-machine-set 4.15.57 True False False
csi-snapshot-controller 4.15.57 True False False
dns 4.15.57 True True False DNS "default" reports Progressing=True: "Have 26 available DNS pods, want 27."
etcd 4.15.57 True False False
image-registry 4.15.57 True False False
ingress 4.15.57 True False False
insights 4.15.57 True False False
kube-apiserver 4.15.57 True False False
kube-controller-manager 4.15.57 True False False
kube-scheduler 4.15.57 True False False
kube-storage-version-migrator 4.15.57 True False False
machine-api 4.15.57 True False False
machine-approver 4.15.57 True False False
machine-config 4.15.57 True False False
marketplace 4.15.57 True False False
monitoring 4.15.57 True False False
network 4.15.57 True True False DaemonSet "/openshift-multus/multus" is not available (awaiting 1 nodes) DaemonSet "/openshift-multus/network-metrics..."
node-tuning 4.15.57 True False False
openshift-apiserver 4.15.57 True False False
openshift-controller-manager 4.15.57 True False False
openshift-samples 4.15.57 True False False
operator-lifecycle-manager 4.15.57 True False False
operator-lifecycle-manager-catalog 4.15.57 True False False
operator-lifecycle-manager-packageserver 4.15.57 True False False
service-ca 4.15.57 True False False
storage 4.15.57 True False False
Namespace Events
The --events option returns any events collected by the Insights Operator when the archive was recreated.
ocp_insights.py --events --file ./Cluster_2/insights.tar.gz
Namespace Errors:
NAMESPACE TYPE REASON TIME
openshift-authentication-operator Warning OpenShiftAPICheckFailed 2025-11-17 22:23:11
openshift-authentication-operator Warning OpenShiftAPICheckFailed 2025-11-17 22:23:11
openshift-oauth-apiserver Warning Unhealthy 2025-11-17 21:09:54
openshift-oauth-apiserver Warning Unhealthy 2025-11-17 21:39:58
openshift-oauth-apiserver Warning ProbeError 2025-11-17 21:39:58
openshift-oauth-apiserver Warning Unhealthy 2025-11-17 22:49:50
openshift-oauth-apiserver Warning FailedToUpdateEndpointSlices 2025-11-17 22:49:58
openshift-oauth-apiserver Warning Unhealthy 2025-11-17 22:50:00
openshift-oauth-apiserver Warning ProbeError 2025-11-17 22:50:01
openshift-oauth-apiserver Warning Unhealthy 2025-11-17 22:50:17
openshift-oauth-apiserver Warning ProbeError 2025-11-17 22:50:22
openshift-oauth-apiserver Warning Unhealthy 2025-11-17 22:50:22
openshift-oauth-apiserver Warning Unhealthy 2025-11-17 22:59:51
openshift-oauth-apiserver Warning ProbeError 2025-11-17 22:59:52
openshift-oauth-apiserver Warning ProbeError 2025-11-17 23:00:24
openshift-oauth-apiserver Warning ProbeError 2025-11-17 23:00:35
openshift-oauth-apiserver Warning ProbeError 2025-11-17 23:00:35
Customer Namespace Memory Usage
The --customer_memory option returns the memory usage of each Customer Namespace based on the metric container_memory_usage_bytes which is collected by the Insights Operator.
ocp_insights.py --customer_memory --file ./Cluster_2/insights.tar.gz | grep -m1 'Customer Namespace Memory Usage' -A73
Customer Namespace Memory Usage:
NAMESPACE MEMORY
app-dev 3.82 GB
app-test 4.55 GB
app-uat 2.67 GB
apps-dev 11.40 GB
apps-test 12.68 GB
apps-uat 13.92 GB
aseventas-uat 97.91 MB
asociados-test 635.70 MB
assistants-dev 338.67 MB
assistants-test 290.35 MB
assistants-uat 321.08 MB
cicd 7.17 GB
cloudintegration-test 359.88 MB
cloudintegration-uat 542.36 MB
community-dev 3.74 GB
community-test 3.61 GB
community-uat 2.40 GB
concepts-dev 514.27 MB
concepts-test 503.07 MB
dev-uat 0.00 MB
empresas-test 611.77 MB
groups-dev 601.91 MB
groups-test 591.89 MB
kafka-dev 2.63 GB
kafka-test 7.49 GB
kafka-uat 14.17 GB
liquid-test 691.91 MB
liquid-uat 604.99 MB
mediflex-test 2.34 GB
mediflex-uat 315.70 MB
minio-operator 903.01 MB
nexus 2.28 GB
nfs-provisioner 38.49 MB
notification-observability 292.96 MB
poc 3.08 GB
pos-system-dev 17.33 MB
pos-system-test 248.90 MB
prefab-dev 626.19 MB
prefab-test 647.61 MB
presto-test 672.07 MB
presto-uat 703.39 MB
providers-test 2.95 GB
providers-uat 3.57 GB
revproxy-test 578.97 MB
revproxy-uat 248.02 MB
rhdh-operator 1.00 GB
rhpam-73 2.85 GB
sandbox 569.27 MB
sigo-dev 7.29 GB
sigo-test 8.38 GB
sigo-uat 7.86 GB
sso 3.42 GB
sso-lab 1.82 GB
testafip-test 660.54 MB
testafip-uat 882.96 MB
testcomisiones-test 355.80 MB
testmobile-test 1.49 GB
testmobile-uat 1.30 GB
testproviders-test 4.80 GB
testweb-test 1.55 GB
testweb-uat 1.18 GB
threescale 7.36 GB
threescale-apicast-test 75.05 MB
userdevops-test 6.01 GB
users-test 264.38 MB
users-uat 308.11 MB
valor-dev 315.38 MB
valor-test 335.54 MB
zabbix 224.45 MB
Total Customer Namespace Memory Usage: 176.41 GB
etcd Metrics
Along with the customer namespace metrics, we also collect several etcd metrics including etcd_server_slow_apply_total and etcd_server_slow_read_indexes_total.
These two metrics are a great indicator of performance issues with the underlying disk that supports etcd. Tracking these over multiple Insights Archives is a good way to determine if the cluster is suffering from etcd performance problems.
The ocp_insight.py script currently returns the etcd_server_slow_apply_total which indicates how many Took Too Long messages have occurred since the pod’s last restart.
ocp_insights.py --etcd_metrics --file ./Cluster_4/insights.tar.gz
etcd-ocpmstr1.openshift.example.com,Mon Dec 22 12:59:37 PM UTC 2025,7
etcd-ocpmstr2.openshift.example.com,Mon Dec 22 12:59:37 PM UTC 2025,1568
etcd-ocpmstr3.openshift.example.com,Mon Dec 22 12:59:37 PM UTC 2025,181506
Control-Plane Node Logs
The Insights Operator collects Control Plane node logs that follow the following substrings: Control Plane Node Logs
Reviewing these logs can be helpful to understand any potential failure happening on the Control Plane nodes allowing you to quickly identify and narrow down any issues.
ocp_insights.py --file Cluster_4/insights.tar.gz --node_logs ocpmstr1.openshift.example.com
Node Logs for: ocpmstr1.openshift.example.com
================================================================================
Dec 09 05:45:42.865541 ocpmstr1.openshift.example.com kubenswrapper[2743]: E1209 05:45:42.865519 2743 log.go:32] "ContainerStatus from runtime service failed" err="rpc error: code = NotFound desc = could not find container \"e4ddc0cccb0810f9151de5a685b32203aa4d2da65a293312637b4319b2d5bf6d\": container with ID starting with e4ddc0cccb0810f9151de5a685b32203aa4d2da65a293312637b4319b2d5bf6d not found: ID does not exist" containerID="e4ddc0cccb0810f9151de5a685b32203aa4d2da65a293312637b4319b2d5bf6d"
Dec 09 05:45:42.865865 ocpmstr1.openshift.example.com kubenswrapper[2743]: E1209 05:45:42.865847 2743 log.go:32] "ContainerStatus from runtime service failed" err="rpc error: code = NotFound desc = could not find container \"3880f1b1f78d7dc0e92d05003f4c8b89306882b48435812afa595d390ecad271\": container with ID starting with 3880f1b1f78d7dc0e92d05003f4c8b89306882b48435812afa595d390ecad271 not found: ID does not exist" containerID="3880f1b1f78d7dc0e92d05003f4c8b89306882b48435812afa595d390ecad271"
Dec 09 05:46:35.557704 ocpmstr1.openshift.example.com kubenswrapper[2743]: E1209 05:46:35.557659 2743 prober.go:240] "Unable to write all bytes from execInContainer" err="short write" expectedBytes=39168 actualBytes=10240
Dec 09 05:47:35.583757 ocpmstr1.openshift.example.com kubenswrapper[2743]: E1209 05:47:35.583731 2743 prober.go:240] "Unable to write all bytes from execInContainer" err="short write" expectedBytes=39168 actualBytes=10240
Dec 09 05:47:36.196500 ocpmstr1.openshift.example.com kubenswrapper[2743]: E1209 05:47:36.196468 2743 log.go:32] "ContainerStatus from runtime service failed" err="rpc error: code = NotFound desc = could not find container \"b2e93c0ddaf39f460c1ec556dd9cd08a6f860a69c0dc8c8dc03d914661a9880a\": container with ID starting with b2e93c0ddaf39f460c1ec556dd9cd08a6f860a69c0dc8c8dc03d914661a9880a not found: ID does not exist" containerID="b2e93c0ddaf39f460c1ec556dd9cd08a6f860a69c0dc8c8dc03d914661a9880a"
Dec 09 05:47:36.196881 ocpmstr1.openshift.example.com kubenswrapper[2743]: E1209 05:47:36.196830 2743 log.go:32] "ContainerStatus from runtime service failed" err="rpc error: code = NotFound desc = could not find container \"6e3fc4a4935538bf5761bfdcc4e19f12d276381ca5a766617c67e828d1290cac\": container with ID starting with 6e3fc4a4935538bf5761bfdcc4e19f12d276381ca5a766617c67e828d1290cac not found: ID does not exist" containerID="6e3fc4a4935538bf5761bfdcc4e19f12d276381ca5a766617c67e828d1290cac"
Dec 09 05:47:36.197244 ocpmstr1.openshift.example.com kubenswrapper[2743]: E1209 05:47:36.197218 2743 log.go:32] "ContainerStatus from runtime service failed" err="rpc error: code = NotFound desc = could not find container \"b03c2767e27a93a9f1f7c037fb5d667b20e12b457aebc7756ff9e10d4653d22a\": container with ID starting with b03c2767e27a93a9f1f7c037fb5d667b20e12b457aebc7756ff9e10d4653d22a not found: ID does not exist" containerID="b03c2767e27a93a9f1f7c037fb5d667b20e12b457aebc7756ff9e10d4653d22a"
Dec 09 05:48:35.538461 ocpmstr1.openshift.example.com kubenswrapper[2743]: E1209 05:48:35.538417 2743 prober.go:240] "Unable to write all bytes from execInContainer" err="short write" expectedBytes=39168 actualBytes=10240
Dec 09 05:49:35.544008 ocpmstr1.openshift.example.com kubenswrapper[2743]: E1209 05:49:35.543968 2743 prober.go:240] "Unable to write all bytes from execInContainer" err="short write" expectedBytes=39168 actualBytes=10240
Dec 09 05:50:35.544143 ocpmstr1.openshift.example.com kubenswrapper[2743]: E1209 05:50:35.544108 2743 prober.go:240] "Unable to write all bytes from execInContainer" err="short write" expectedBytes=39168 actualBytes=10240
...
Additional Output
Looking at the full --file output, we can review additional information such as:
ocp_insights.py --file ./Cluster_4/insights.tar.gz
Install Plans
The Insights Operator captures all Install Plans on the cluster which is extremely helpful to see current and previous installed operators. We return CSV and Namespace.
...
Install Plans:
CSV NAMESPACE
cephcsi-operator.v4.19.5-rhodf openshift-storage
cephcsi-operator.v4.19.7-rhodf openshift-storage
kubernetes-nmstate-operator.4.19.0-202510081435 openshift-nmstate
kubernetes-nmstate-operator.4.19.0-202510142112 openshift-nmstate
kubernetes-nmstate-operator.4.19.0-202510291015 openshift-nmstate
kubernetes-nmstate-operator.4.19.0-202511111644 openshift-nmstate
kubernetes-nmstate-operator.4.19.0-202511260712 openshift-nmstate
kubevirt-hyperconverged-operator.v4.19.1 openshift-cnv
kubevirt-hyperconverged-operator.v4.19.12 openshift-cnv
kubevirt-hyperconverged-operator.v4.19.15 openshift-cnv
kubevirt-hyperconverged-operator.v4.19.6 openshift-cnv
local-storage-operator.v4.19.0-202510071855 openshift-local-storage
local-storage-operator.v4.19.0-202510142112 openshift-local-storage
local-storage-operator.v4.19.0-202510291015 openshift-local-storage
local-storage-operator.v4.19.0-202511102034 openshift-local-storage
local-storage-operator.v4.19.0-202511260712 openshift-local-storage
mcg-operator.v4.19.6-rhodf openshift-storage
recipe.v4.19.8-rhodf openshift-storage
rook-ceph-operator.v4.19.4-rhodf openshift-storage
sriov-network-operator.v4.19.0-202510060211 openshift-sriov-network-operator
sriov-network-operator.v4.19.0-202510142112 openshift-sriov-network-operator
sriov-network-operator.v4.19.0-202510211212 openshift-sriov-network-operator
sriov-network-operator.v4.19.0-202511102034 openshift-sriov-network-operator
sriov-network-operator.v4.19.0-202511260712 openshift-sriov-network-operator
...
OLM Operators
The Insights Operator captures all OLM Operators on the cluster which is extremely helpful to see current and previous installed operators. We return Name, Display Name, Version, and Namespace when applicable.
...
Installed OLM Operators:
NAME DISPLAY NAME VERSION NAMESPACE
axyom 5gc
axyom openshift-operators
casa-redis-operator Casa Redis Operator v0.2.33 5gc
casa-redis-operator openshift-operators
cephcsi-operator CephCSI operator v4.19.8-rhodf openshift-storage
kubernetes-nmstate-operator Kubernetes NMState Operator 4.19.0-202511260712 openshift-nmstate
kubevirt-hyperconverged OpenShift Virtualization v4.19.15 openshift-cnv
local-storage-operator Local Storage v4.19.0-202511260712 openshift-local-storage
mcg-operator NooBaa Operator v4.19.8-rhodf openshift-storage
ocs-client-operator OpenShift Data Foundation Client v4.19.8-rhodf openshift-storage
ocs-operator OpenShift Container Storage v4.19.8-rhodf openshift-storage
odf-csi-addons-operator CSI Addons v4.19.8-rhodf openshift-storage
odf-dependencies Data Foundation Dependencies v4.19.8-rhodf openshift-storage
odf-operator OpenShift Data Foundation v4.19.8-rhodf openshift-storage
odf-prometheus-operator Prometheus Operator v4.19.8-rhodf openshift-storage
recipe Recipe v4.19.8-rhodf openshift-storage
rook-ceph-operator Rook-Ceph v4.19.8-rhodf openshift-storage
sgwc-operator SGWC Operator v0.6.5 5gc
sgwc-operator openshift-operators
sriov-network-operator SR-IOV Network Operator v4.19.0-202511260712 openshift-sriov-network-operator
upf-operator UPF Operator v1.9.42 5gc
upf-operator openshift-operators
...
MachineConfigPools and MachineSets
The Insights Operator also collects MachineConfigPool and MachineConfigSet information and returns them when applicable.
...
MachineConfigPools:
NAME CONFIG PAUSED UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT
master rendered-master-8831ba6d556d1c6a582116beaa537dbb False True False False 3 3 3 0
worker rendered-worker-b33efe42325e084f9dcef59f47b93fc9 False True False False 5 5 5 0
MachineSets:
NAME DESIRED CURRENT READY AVAILABLE
prodshift-2nvq7-dmz 2 2 2 2
prodshift-2nvq7-worker 3 3 3 3
...
Storage Classes
For customers using persistent storage via OpenShift Data Foundations or through a 3rd party like Portworx, Infinidat or VMware, we collect storage class information which is helpful to determine what storage is being used by the cluster.
...
StorageClasses:
NAME PROVISIONER RECLAIM POLICY BINDING MODE VOLUME EXPANSION
5gc-nfs-storage cluster.local/nfs-provisioner-nfs-subdir-external-provisioner Delete Immediate True
...
Cluster Memory Usage
Just like with the Customer Memory Usage option, the Insights Operator also collects the memory usage of the cluster namespaces. This is a great way to see if there has been a large growth in memory usage when comparing multiple must-gathers and to get an idea of what the memory usage is compared to the size of the cluster.
...
Cluster Namespace Memory Usage:
NAMESPACE MEMORY
openshift-apiserver 1.21 GB
openshift-apiserver-operator 237.52 MB
openshift-authentication 204.72 MB
openshift-authentication-operator 192.42 MB
openshift-catalogd 400.57 MB
openshift-cloud-controller-manager-operator 186.32 MB
openshift-cloud-credential-operator 125.80 MB
openshift-cluster-machine-approver 114.22 MB
openshift-cluster-node-tuning-operator 1.10 GB
openshift-cluster-olm-operator 92.02 MB
openshift-cluster-samples-operator 75.02 MB
openshift-cluster-storage-operator 232.67 MB
openshift-cluster-version 116.95 MB
openshift-cnv 2.99 GB
openshift-config-operator 76.99 MB
...
openshift-route-controller-manager 236.91 MB
openshift-service-ca 185.80 MB
openshift-service-ca-operator 80.82 MB
openshift-sriov-network-operator 660.84 MB
openshift-storage 2.09 GB
Total Cluster Namespace Memory Usage: 72.78 GB
...
Restarting Containers
The Insights Operator collects information on container restarts which is extremely useful when trying to understand problems in the cluster. We return Namespace, Pod Name, Container Name, Restart Count, and the last Restart Time.
...
Containers with more than 3 restarts:
NAMESPACE POD NAME CONTAINER NAME RESTARTS RESTART TIME
openshift-apiserver apiserver-75d755b989-qplnj openshift-apiserver 35 2025-12-08 18:32:39
openshift-apiserver-operator openshift-apiserver-operator-c6fcd76c-w9dhr openshift-apiserver-operator 23 2025-12-08 17:08:43
openshift-authentication oauth-openshift-5f85d8547d-j5pgl oauth-openshift 14 2025-12-08 16:46:27
openshift-authentication oauth-openshift-5f85d8547d-j9vhg oauth-openshift 8 2025-12-08 18:33:11
openshift-authentication-operator authentication-operator-55c75748d8-hrmjj authentication-operator 28 2025-12-08 17:08:57
openshift-catalogd catalogd-controller-manager-644c7b6647-jwbvm manager 37 2025-12-08 18:35:47
openshift-cloud-controller-manager-operator cluster-cloud-controller-manager-operator-7594f66d79-kp5wk cluster-cloud-controller-manager 12 2025-12-08 16:39:36
openshift-cloud-controller-manager-operator cluster-cloud-controller-manager-operator-7594f66d79-kp5wk config-sync-controllers 12 2025-12-08 16:39:08
openshift-cluster-machine-approver machine-approver-66dd4cccf8-6h2cq machine-approver-controller 28 2025-12-08 18:35:36
openshift-cluster-version cluster-version-operator-76d4d9c9cb-2jjg9 cluster-version-operator 8 2025-12-08 18:32:45
openshift-cnv aaq-operator-69dbd4dbcc-f8kht aaq-operator 113 2025-12-09 02:16:23
openshift-cnv cdi-deployment-8d44554f7-dzhmt cdi-deployment 120 2025-12-09 04:21:18
openshift-cnv cdi-operator-655b595b9d-gm89z cdi-operator 120 2025-12-09 04:16:16
openshift-cnv hco-operator-67d948fc78-l7ff6 hyperconverged-cluster-operator 126 2025-12-09 04:21:30
openshift-cnv hco-webhook-8546b78db4-nh89x hyperconverged-cluster-webhook 140 2025-12-09 04:21:19
openshift-cnv hostpath-provisioner-operator-98d45dd45-v9xzw hostpath-provisioner-operator 36 2025-12-09 02:16:22
openshift-cnv kubevirt-ipam-controller-manager-6998cd6677-zhmrd manager 116 2025-12-09 04:16:24
...
Alerts
The Insights Operator collects all alerts that are firing on the cluster. We return Alert Name, State, and Start Time.
...
ALERT NAME STATE START TIME
InsightsRecommendationActive ACTIVE 2025-12-08 20:31:14.889
CDINoDefaultStorageClass ACTIVE 2025-12-08 20:22:13.091
KubeDaemonSetRolloutStuck ACTIVE 2025-12-08 12:37:32.089
KubeJobFailed ACTIVE 2025-12-08 12:37:32.089
AlertmanagerReceiversNotConfigured ACTIVE 2025-12-08 12:37:25.872
UpdateAvailable ACTIVE 2025-12-08 18:35:39.959
KubeJobFailed ACTIVE 2025-12-08 12:37:32.089
ClusterOperatorDegraded ACTIVE 2025-12-08 19:05:31.735
CDIStorageProfilesIncomplete ACTIVE 2025-12-09 02:22:13.091
KubeJobFailed ACTIVE 2025-12-08 12:37:32.089
ClusterOperatorDegraded ACTIVE 2025-12-08 20:55:01.735
KubeJobFailed ACTIVE 2025-12-08 12:37:32.089
...
PodNetworkConnectivityChecks
The Insights Operator also collects PodNetworkConnectivityChecks which is a service that runs on a Compute node in the cluster and then connects to listener pods on each node of the cluster. It checks for various services and it is helpful to determine if nodes or services become unreachable.
...
PodNetworkConnectivityChecks:
ERROR TIMESTAMP
kubernetes-apiserver-endpoint-ocpmstr3 2025-12-08 18:27:40Z
kubernetes-apiserver-service-cluster 2025-12-08 16:36:38Z
kubernetes-apiserver-service-cluster 2025-12-08 18:30:38Z
kubernetes-default-service-cluster-0 2025-12-08 17:10:38Z
kubernetes-default-service-cluster-0 2025-12-08 18:30:38Z
openshift-apiserver-endpoint-ocpmstr3 2025-12-08 18:32:29Z
openshift-apiserver-service-cluster 2025-12-08 16:40:38Z
...
Conditional Update Risks
Red Hat provides Conditional Update Risks to customers so that they are aware of any potential risk of a failed upgrade or an issue post upgrade.
...
Conditional Update Risks:
RISK REFERENCE AFFECTED_VERSIONS
ConsoleCrashOnMissingPlugin https://issues.redhat.com/browse/CONSOLE-4762 4.19.10, 4.19.11, 4.19.12, 4.19.13, 4.19.7, 4.19.9
HyperShiftClusterVersionOperatorMetrics https://issues.redhat.com/browse/OTA-1705 4.19.10, 4.19.11, 4.19.12, 4.19.13, 4.19.14, 4.19.15, 4.19.16, 4.19.9
HyperShiftProxyScheme https://issues.redhat.com/browse/CNTRLPLANE-1407 4.19.10, 4.19.11, 4.19.12, 4.19.13, 4.19.4, 4.19.5, 4.19.6, 4.19.7, 4.19.9
MachineConfigNodesV1AlphaControlPlaneLabels https://issues.redhat.com/browse/MCO-1890 4.19.12, 4.19.13
NMStateServiceFailure https://issues.redhat.com/browse/CORENET-6419 4.19.10, 4.19.11, 4.19.12, 4.19.13, 4.19.9
NetworkManagerOVNBridgeMapping https://issues.redhat.com/browse/CORENET-6483 4.19.13, 4.19.14, 4.19.15, 4.19.16, 4.19.17, 4.19.18
OSUpdateFailureDueToImagePullPolicy https://issues.redhat.com/browse/MCO-1896 4.19.12, 4.19.13, 4.19.14, 4.19.15
RuncShareProcessNamespace https://issues.redhat.com/browse/RUN-3748 4.19.19
SCOSBootImage https://issues.redhat.com/browse/COS-3765 4.19.18
...
Lab Scenarios
Scenario 1
The Customer reported issues with the cluster and stated that they have Cluster Operators that are failing and issues with Container Networking. Review the output of the script and look at the Cluster Operators to determine which operators are having issues, then review the restarting pods to see which networking related pod is having issues and last restarted on 2025-11-25, and then check the alerts and look for OVN related alerts.
To dig deeper, the matching Must-Gather is located in ~/Module4/Cluster_1
ocp_insights.py --file ./Cluster_1/insights.tar.gz
ocp_insights.py --file ./Cluster_1/insights.tar.gz --alerts | jq -r . | less
Scenario 2
The customer is having issues logging into the Cluster. They are currently falling back on the kubeadmin account. Review the output of the script to identify potential issues and potential resolutions.
ocp_insights.py --file ./Cluster_2/insights.tar.gz
Scenario 3
The customer reports the cluster is very unstable, multiple cluster operators are down, and they need help immediately. Review the control-plane nodes and take note the Creation Dates, review the numerous errors in the cluster operators and pay specific attention to the reasons, look at the failing pods, and the namespace events and the timestamps of the events, and finally review the etcd_metrics and see if you can spot a major issue.
ocp_insights.py --file ./Cluster_3/insights.tar.gz --node_info
ocp_insights.py --file ./Cluster_3/insights.tar.gz --cluster_operators
ocp_insights.py --file ./Cluster_3/insights.tar.gz --etcd_metrics