Using OpenShift Lightspeed (Insights) to Proactively Analyze a Cluster

OpenShift Lightspeed (Insights) is a free tool that runs on your OpenShift cluster that collects a specific data set every 2 hours, anonymizes it, and then sends it to Red Hat to be reviewed. The archives are stored on SupportShell allowing Support Engineers and Technical Account Managers to review that data.

In this module we will be reviewing how to leverage a tool to parse an OpenShift Lightspeed (Insights) archive and proactively analyze any OpenShift cluster.

You can find more details on OpenShift Lightspeed (Insights) and the entire remote health monitoring package in our documentation: About remote health monitoring

You can find details on what OpenShift Lightspeed (Insights) collects here: Showing data collected by remote health monitoring

You can also review the source code to see additional information on what is collected here: GitHub: Insights Operator - gather_most_recent_metrics.go

The ocp_insights.py script

With this script we will be able to parse the insights archive, review customer namespace memory usage, look for namespaces with overlapping UIDs, review storage classes, and look at etcd metrics.

cd ~/Module3/
ocp_insights.py -h
ocp_insights.py --help
usage: ocp_insights.py [-h] [--id ID] [--file FILE] [--alerts] [--customer_memory] [--etcd_metrics] [--events] [--list] [--extract] [--cluster_info] [--node_info] [--cluster_operators] [--remote] [--server SERVER] [--node_logs NODE_NAME]

OpenShift Insights Cluster Report.

options:
  -h, --help            show this help message and exit
  --id ID               ClusterID of a connected cluster used to find all connected Clusters
  --file FILE           Use a specific Insights Archive File; must specify full path.
  --alerts              Prints out Alerts in valid JSON
  --customer_memory     Prints Customer Namespace memory usage.
  --etcd_metrics        Prints etcd Slow Apply metrics for all Insights Archives for the cluster.
  --events              Prints namespace events if they exist.
  --list                List available archives for a specific cluster. Must be used with --id option. Can be combined with --extract.
  --extract             Extract archive for a specific cluster to user's home directory. Must be used with --id option. Can be combined with --list to select which archive to extract.
  --cluster_info        Prints only cluster information (ID, name, version, platform, network, encryption, etc.).
  --node_info           Prints only node information (name, status, role, version, OS, CPU, memory).
  --cluster_operators   Prints only cluster operator information (name, version, status).
  --remote              Connect to remote server to perform analysis
  --server SERVER       Remote server to connect to (overrides default). Use with --remote option.
  --node_logs NODE_NAME
                        Print logs for a master node. Provide the full node name (FQDN). Master nodes only.

What is in the output?

  1. This script was written with the intention to parse Insights Archives to display the data the same way it is output by OpenShift’s CLI. All of the following information is output when running the script unless specifically stated below.

    • We include the following information:

      • ClusterVersion

      • Channel

      • Previously Installed Versions

      • Platform

        • VMware, AWS, Nutanix, IBMCloud, etc

      • NetworkType

      • Proxy Configuration

      • Ingress and API IP Addresses

      • etcd Encryption

      • Audit Profile

      • Node Information

        • Name, State, Role, Created Date, Version, OS, CPU, and Memory

      • Cluster Operator status with the same output as oc get clusteroperators

      • Installed Operators

      • Installed OLM Operators

        • Display Name, Version, and Namespace

      • MachineConfigPools

      • MachineSets

      • Failing Pods

      • Alerts

      • PodNetworkConnectivityChecks

      • Namespace Events^

      • Alerts in JSON^

      • Customer Namespace Memory Usage^

      • Control-Plane Node Error Logs^

    • ^ Indicates an additional flag is needed to view the data.

Cluster Information

The --cluster_info option shows a detailed break down of the high level information regarding your cluster including ID, Name, Version, Channel, Cluster Status, Platform, Install Type, and more.

Viewing Cluster Information
ocp_insights.py --cluster_info --file ./Cluster_1/insights.tar.gz
Cluster ID: DB08D743-5559-4259-8ABB-7C5B528439B7
Cluster Name: prod.example.com
Cluster Version: 4.15.57
Channel: stable-4.15
Previous Versions: 4.15.57, 4.15.44, 4.15.43, 4.15.37, 4.15.24, 4.15.20, 4.15.19, 4.15.18, 4.15.17, 4.15.15, 4.15.14, 4.15.13,
4.15.12, 4.15.11, 4.14.23, 4.14.21, 4.13.40, 4.12.55, 4.11.59, 4.11.57, 4.11.56, 4.11.55, 4.11.54, 4.11.53, 4.11.50, 4.11.49,
4.11.48, 4.11.30, 4.10.37
Cluster Status: Failing
Reason: ClusterOperatorDegraded
Message: Cluster operator authentication is degraded
Platform: None
Install Type: UPI
Network Type: OVNKubernetes
IPsec: Enabled
Proxy Settings:
   HTTP:  False
   HTTPS: False

etcd Encryption: AES-CBC
Audit Profile: Default

Node Information

The --node_info option displays all of the node information, including Node Name, Node Status, Role(s), Creation Date, Kubelet Version, CoreOS Version and the CPU and Memory details of each server.

The total memory returned, is the memory available to the Kubelet and not necessarily equal to the physical amount of memory on the node.

Viewing Node Information
ocp_insights.py --node_info --file ./Cluster_2/insights.tar.gz
Node Information:

NAME         READY  ROLE                  CREATED ON           VERSION           OS                                                     CPU  MEMORY
master0-dev  True   control-plane,master  2024-05-11 07:30:21  v1.29.10+67d3387  Red Hat Enterprise Linux CoreOS 416.94.202412170927-0  6    31 GB
master1-dev  True   control-plane,master  2024-05-11 07:30:23  v1.29.10+67d3387  Red Hat Enterprise Linux CoreOS 416.94.202412170927-0  6    31 GB
master2-dev  True   control-plane,master  2024-05-11 07:30:20  v1.29.10+67d3387  Red Hat Enterprise Linux CoreOS 416.94.202412170927-0  6    31 GB
infra0-dev   True   infra                 2024-05-11 07:48:46  v1.29.10+67d3387  Red Hat Enterprise Linux CoreOS 416.94.202412170927-0  4    16 GB
infra1-dev   True   infra                 2024-05-11 07:48:45  v1.29.10+67d3387  Red Hat Enterprise Linux CoreOS 416.94.202412170927-0  4    16 GB
infra2-dev   True   infra                 2024-05-11 07:48:43  v1.29.10+67d3387  Red Hat Enterprise Linux CoreOS 416.94.202412170927-0  4    16 GB
worker0-dev  True   worker                2024-05-11 07:48:40  v1.29.10+67d3387  Red Hat Enterprise Linux CoreOS 416.94.202412170927-0  12   79 GB
worker1-dev  True   worker                2024-05-11 07:48:41  v1.29.10+67d3387  Red Hat Enterprise Linux CoreOS 416.94.202412170927-0  12   79 GB
worker2-dev  True   worker                2024-05-12 11:15:08  v1.29.10+67d3387  Red Hat Enterprise Linux CoreOS 416.94.202412170927-0  12   79 GB
worker3-dev  True   worker                2025-01-22 15:47:13  v1.29.10+67d3387  Red Hat Enterprise Linux CoreOS 416.94.202412170927-0  12   79 GB
worker4-dev  True   worker                2025-01-22 16:19:06  v1.29.10+67d3387  Red Hat Enterprise Linux CoreOS 416.94.202412170927-0  12   79 GB
worker5-dev  True   worker                2025-01-22 16:53:23  v1.29.10+67d3387  Red Hat Enterprise Linux CoreOS 416.94.202412170927-0  12   79 GB

Cluster Operators

The --cluster_operators returns the same output as running oc get clusteroperators or oc get co. All Cluster Operators are returned with the Name, Version, Status, and the Reason if the operator is having issues.

Viewing Cluster Operators
ocp_insights.py --cluster_operators --file ./Cluster_1/insights.tar.gz
Cluster Operators:

NAME                                      VERSION  AVAILABLE  PROGRESSING  DEGRADED  REASON
authentication                            4.15.57  True       False        True      OAuthServerConfigObservationDegraded: error validating configMap openshift-config/ca-config-map: certificate expired:...
baremetal                                 4.15.57  True       False        False
cloud-controller-manager                  4.15.57  True       False        False
cloud-credential                          4.15.57  True       False        False
cluster-autoscaler                        4.15.57  True       False        False
config-operator                           4.15.57  True       False        False
console                                   4.15.57  True       False        False
control-plane-machine-set                 4.15.57  True       False        False
csi-snapshot-controller                   4.15.57  True       False        False
dns                                       4.15.57  True       True         False     DNS "default" reports Progressing=True: "Have 26 available DNS pods, want 27."
etcd                                      4.15.57  True       False        False
image-registry                            4.15.57  True       False        False
ingress                                   4.15.57  True       False        False
insights                                  4.15.57  True       False        False
kube-apiserver                            4.15.57  True       False        False
kube-controller-manager                   4.15.57  True       False        False
kube-scheduler                            4.15.57  True       False        False
kube-storage-version-migrator             4.15.57  True       False        False
machine-api                               4.15.57  True       False        False
machine-approver                          4.15.57  True       False        False
machine-config                            4.15.57  True       False        False
marketplace                               4.15.57  True       False        False
monitoring                                4.15.57  True       False        False
network                                   4.15.57  True       True         False     DaemonSet "/openshift-multus/multus" is not available (awaiting 1 nodes) DaemonSet "/openshift-multus/network-metrics..."
node-tuning                               4.15.57  True       False        False
openshift-apiserver                       4.15.57  True       False        False
openshift-controller-manager              4.15.57  True       False        False
openshift-samples                         4.15.57  True       False        False
operator-lifecycle-manager                4.15.57  True       False        False
operator-lifecycle-manager-catalog        4.15.57  True       False        False
operator-lifecycle-manager-packageserver  4.15.57  True       False        False
service-ca                                4.15.57  True       False        False
storage                                   4.15.57  True       False        False

Namespace Events

The --events option returns any events collected by the Insights Operator when the archive was recreated.

Viewing Namespace Events
ocp_insights.py --events --file ./Cluster_2/insights.tar.gz
Namespace Errors:

NAMESPACE                          TYPE     REASON                        TIME
openshift-authentication-operator  Warning  OpenShiftAPICheckFailed       2025-11-17 22:23:11
openshift-authentication-operator  Warning  OpenShiftAPICheckFailed       2025-11-17 22:23:11
openshift-oauth-apiserver          Warning  Unhealthy                     2025-11-17 21:09:54
openshift-oauth-apiserver          Warning  Unhealthy                     2025-11-17 21:39:58
openshift-oauth-apiserver          Warning  ProbeError                    2025-11-17 21:39:58
openshift-oauth-apiserver          Warning  Unhealthy                     2025-11-17 22:49:50
openshift-oauth-apiserver          Warning  FailedToUpdateEndpointSlices  2025-11-17 22:49:58
openshift-oauth-apiserver          Warning  Unhealthy                     2025-11-17 22:50:00
openshift-oauth-apiserver          Warning  ProbeError                    2025-11-17 22:50:01
openshift-oauth-apiserver          Warning  Unhealthy                     2025-11-17 22:50:17
openshift-oauth-apiserver          Warning  ProbeError                    2025-11-17 22:50:22
openshift-oauth-apiserver          Warning  Unhealthy                     2025-11-17 22:50:22
openshift-oauth-apiserver          Warning  Unhealthy                     2025-11-17 22:59:51
openshift-oauth-apiserver          Warning  ProbeError                    2025-11-17 22:59:52
openshift-oauth-apiserver          Warning  ProbeError                    2025-11-17 23:00:24
openshift-oauth-apiserver          Warning  ProbeError                    2025-11-17 23:00:35
openshift-oauth-apiserver          Warning  ProbeError                    2025-11-17 23:00:35

Customer Namespace Memory Usage

The --customer_memory option returns the memory usage of each Customer Namespace based on the metric container_memory_usage_bytes which is collected by the Insights Operator.

Viewing Cluster Memory Usage
ocp_insights.py --customer_memory --file ./Cluster_2/insights.tar.gz | grep -m1 'Customer Namespace Memory Usage' -A73
Customer Namespace Memory Usage:

NAMESPACE                   MEMORY
app-dev                     3.82 GB
app-test                    4.55 GB
app-uat                     2.67 GB
apps-dev                    11.40 GB
apps-test                   12.68 GB
apps-uat                    13.92 GB
aseventas-uat               97.91 MB
asociados-test              635.70 MB
assistants-dev              338.67 MB
assistants-test             290.35 MB
assistants-uat              321.08 MB
cicd                        7.17 GB
cloudintegration-test       359.88 MB
cloudintegration-uat        542.36 MB
community-dev               3.74 GB
community-test              3.61 GB
community-uat               2.40 GB
concepts-dev                514.27 MB
concepts-test               503.07 MB
dev-uat                     0.00 MB
empresas-test               611.77 MB
groups-dev                  601.91 MB
groups-test                 591.89 MB
kafka-dev                   2.63 GB
kafka-test                  7.49 GB
kafka-uat                   14.17 GB
liquid-test                 691.91 MB
liquid-uat                  604.99 MB
mediflex-test               2.34 GB
mediflex-uat                315.70 MB
minio-operator              903.01 MB
nexus                       2.28 GB
nfs-provisioner             38.49 MB
notification-observability  292.96 MB
poc                         3.08 GB
pos-system-dev              17.33 MB
pos-system-test             248.90 MB
prefab-dev                  626.19 MB
prefab-test                 647.61 MB
presto-test                 672.07 MB
presto-uat                  703.39 MB
providers-test              2.95 GB
providers-uat               3.57 GB
revproxy-test               578.97 MB
revproxy-uat                248.02 MB
rhdh-operator               1.00 GB
rhpam-73                    2.85 GB
sandbox                     569.27 MB
sigo-dev                    7.29 GB
sigo-test                   8.38 GB
sigo-uat                    7.86 GB
sso                         3.42 GB
sso-lab                     1.82 GB
testafip-test               660.54 MB
testafip-uat                882.96 MB
testcomisiones-test         355.80 MB
testmobile-test             1.49 GB
testmobile-uat              1.30 GB
testproviders-test          4.80 GB
testweb-test                1.55 GB
testweb-uat                 1.18 GB
threescale                  7.36 GB
threescale-apicast-test     75.05 MB
userdevops-test             6.01 GB
users-test                  264.38 MB
users-uat                   308.11 MB
valor-dev                   315.38 MB
valor-test                  335.54 MB
zabbix                      224.45 MB

Total Customer Namespace Memory Usage: 176.41 GB

etcd Metrics

Along with the customer namespace metrics, we also collect several etcd metrics including etcd_server_slow_apply_total and etcd_server_slow_read_indexes_total.

These two metrics are a great indicator of performance issues with the underlying disk that supports etcd. Tracking these over multiple Insights Archives is a good way to determine if the cluster is suffering from etcd performance problems.

The ocp_insight.py script currently returns the etcd_server_slow_apply_total which indicates how many Took Too Long messages have occurred since the pod’s last restart.

Viewing etcd metrics
ocp_insights.py --etcd_metrics --file ./Cluster_4/insights.tar.gz
etcd-ocpmstr1.openshift.example.com,Mon Dec 22 12:59:37 PM UTC 2025,7
etcd-ocpmstr2.openshift.example.com,Mon Dec 22 12:59:37 PM UTC 2025,1568
etcd-ocpmstr3.openshift.example.com,Mon Dec 22 12:59:37 PM UTC 2025,181506

Control-Plane Node Logs

The Insights Operator collects Control Plane node logs that follow the following substrings: Control Plane Node Logs

Reviewing these logs can be helpful to understand any potential failure happening on the Control Plane nodes allowing you to quickly identify and narrow down any issues.

Viewing Control Plane Logs
ocp_insights.py --file Cluster_4/insights.tar.gz --node_logs ocpmstr1.openshift.example.com

Node Logs for: ocpmstr1.openshift.example.com
================================================================================
Dec 09 05:45:42.865541 ocpmstr1.openshift.example.com kubenswrapper[2743]: E1209 05:45:42.865519    2743 log.go:32] "ContainerStatus from runtime service failed" err="rpc error: code = NotFound desc = could not find container \"e4ddc0cccb0810f9151de5a685b32203aa4d2da65a293312637b4319b2d5bf6d\": container with ID starting with e4ddc0cccb0810f9151de5a685b32203aa4d2da65a293312637b4319b2d5bf6d not found: ID does not exist" containerID="e4ddc0cccb0810f9151de5a685b32203aa4d2da65a293312637b4319b2d5bf6d"
Dec 09 05:45:42.865865 ocpmstr1.openshift.example.com kubenswrapper[2743]: E1209 05:45:42.865847    2743 log.go:32] "ContainerStatus from runtime service failed" err="rpc error: code = NotFound desc = could not find container \"3880f1b1f78d7dc0e92d05003f4c8b89306882b48435812afa595d390ecad271\": container with ID starting with 3880f1b1f78d7dc0e92d05003f4c8b89306882b48435812afa595d390ecad271 not found: ID does not exist" containerID="3880f1b1f78d7dc0e92d05003f4c8b89306882b48435812afa595d390ecad271"
Dec 09 05:46:35.557704 ocpmstr1.openshift.example.com kubenswrapper[2743]: E1209 05:46:35.557659    2743 prober.go:240] "Unable to write all bytes from execInContainer" err="short write" expectedBytes=39168 actualBytes=10240
Dec 09 05:47:35.583757 ocpmstr1.openshift.example.com kubenswrapper[2743]: E1209 05:47:35.583731    2743 prober.go:240] "Unable to write all bytes from execInContainer" err="short write" expectedBytes=39168 actualBytes=10240
Dec 09 05:47:36.196500 ocpmstr1.openshift.example.com kubenswrapper[2743]: E1209 05:47:36.196468    2743 log.go:32] "ContainerStatus from runtime service failed" err="rpc error: code = NotFound desc = could not find container \"b2e93c0ddaf39f460c1ec556dd9cd08a6f860a69c0dc8c8dc03d914661a9880a\": container with ID starting with b2e93c0ddaf39f460c1ec556dd9cd08a6f860a69c0dc8c8dc03d914661a9880a not found: ID does not exist" containerID="b2e93c0ddaf39f460c1ec556dd9cd08a6f860a69c0dc8c8dc03d914661a9880a"
Dec 09 05:47:36.196881 ocpmstr1.openshift.example.com kubenswrapper[2743]: E1209 05:47:36.196830    2743 log.go:32] "ContainerStatus from runtime service failed" err="rpc error: code = NotFound desc = could not find container \"6e3fc4a4935538bf5761bfdcc4e19f12d276381ca5a766617c67e828d1290cac\": container with ID starting with 6e3fc4a4935538bf5761bfdcc4e19f12d276381ca5a766617c67e828d1290cac not found: ID does not exist" containerID="6e3fc4a4935538bf5761bfdcc4e19f12d276381ca5a766617c67e828d1290cac"
Dec 09 05:47:36.197244 ocpmstr1.openshift.example.com kubenswrapper[2743]: E1209 05:47:36.197218    2743 log.go:32] "ContainerStatus from runtime service failed" err="rpc error: code = NotFound desc = could not find container \"b03c2767e27a93a9f1f7c037fb5d667b20e12b457aebc7756ff9e10d4653d22a\": container with ID starting with b03c2767e27a93a9f1f7c037fb5d667b20e12b457aebc7756ff9e10d4653d22a not found: ID does not exist" containerID="b03c2767e27a93a9f1f7c037fb5d667b20e12b457aebc7756ff9e10d4653d22a"
Dec 09 05:48:35.538461 ocpmstr1.openshift.example.com kubenswrapper[2743]: E1209 05:48:35.538417    2743 prober.go:240] "Unable to write all bytes from execInContainer" err="short write" expectedBytes=39168 actualBytes=10240
Dec 09 05:49:35.544008 ocpmstr1.openshift.example.com kubenswrapper[2743]: E1209 05:49:35.543968    2743 prober.go:240] "Unable to write all bytes from execInContainer" err="short write" expectedBytes=39168 actualBytes=10240
Dec 09 05:50:35.544143 ocpmstr1.openshift.example.com kubenswrapper[2743]: E1209 05:50:35.544108    2743 prober.go:240] "Unable to write all bytes from execInContainer" err="short write" expectedBytes=39168 actualBytes=10240
...

Additional Output

Looking at the full --file output, we can review additional information such as:

ocp_insights.py --file ./Cluster_4/insights.tar.gz

Install Plans

The Insights Operator captures all Install Plans on the cluster which is extremely helpful to see current and previous installed operators. We return CSV and Namespace.

Install Plans
...
Install Plans:

CSV                                              NAMESPACE
cephcsi-operator.v4.19.5-rhodf                   openshift-storage
cephcsi-operator.v4.19.7-rhodf                   openshift-storage
kubernetes-nmstate-operator.4.19.0-202510081435  openshift-nmstate
kubernetes-nmstate-operator.4.19.0-202510142112  openshift-nmstate
kubernetes-nmstate-operator.4.19.0-202510291015  openshift-nmstate
kubernetes-nmstate-operator.4.19.0-202511111644  openshift-nmstate
kubernetes-nmstate-operator.4.19.0-202511260712  openshift-nmstate
kubevirt-hyperconverged-operator.v4.19.1         openshift-cnv
kubevirt-hyperconverged-operator.v4.19.12        openshift-cnv
kubevirt-hyperconverged-operator.v4.19.15        openshift-cnv
kubevirt-hyperconverged-operator.v4.19.6         openshift-cnv
local-storage-operator.v4.19.0-202510071855      openshift-local-storage
local-storage-operator.v4.19.0-202510142112      openshift-local-storage
local-storage-operator.v4.19.0-202510291015      openshift-local-storage
local-storage-operator.v4.19.0-202511102034      openshift-local-storage
local-storage-operator.v4.19.0-202511260712      openshift-local-storage
mcg-operator.v4.19.6-rhodf                       openshift-storage
recipe.v4.19.8-rhodf                             openshift-storage
rook-ceph-operator.v4.19.4-rhodf                 openshift-storage
sriov-network-operator.v4.19.0-202510060211      openshift-sriov-network-operator
sriov-network-operator.v4.19.0-202510142112      openshift-sriov-network-operator
sriov-network-operator.v4.19.0-202510211212      openshift-sriov-network-operator
sriov-network-operator.v4.19.0-202511102034      openshift-sriov-network-operator
sriov-network-operator.v4.19.0-202511260712      openshift-sriov-network-operator
...

OLM Operators

The Insights Operator captures all OLM Operators on the cluster which is extremely helpful to see current and previous installed operators. We return Name, Display Name, Version, and Namespace when applicable.

Installed OLM Operators
...
Installed OLM Operators:

NAME                         DISPLAY NAME                      VERSION               NAMESPACE
axyom                                                                                5gc
axyom                                                                                openshift-operators
casa-redis-operator          Casa Redis Operator               v0.2.33               5gc
casa-redis-operator                                                                  openshift-operators
cephcsi-operator             CephCSI operator                  v4.19.8-rhodf         openshift-storage
kubernetes-nmstate-operator  Kubernetes NMState Operator       4.19.0-202511260712   openshift-nmstate
kubevirt-hyperconverged      OpenShift Virtualization          v4.19.15              openshift-cnv
local-storage-operator       Local Storage                     v4.19.0-202511260712  openshift-local-storage
mcg-operator                 NooBaa Operator                   v4.19.8-rhodf         openshift-storage
ocs-client-operator          OpenShift Data Foundation Client  v4.19.8-rhodf         openshift-storage
ocs-operator                 OpenShift Container Storage       v4.19.8-rhodf         openshift-storage
odf-csi-addons-operator      CSI Addons                        v4.19.8-rhodf         openshift-storage
odf-dependencies             Data Foundation Dependencies      v4.19.8-rhodf         openshift-storage
odf-operator                 OpenShift Data Foundation         v4.19.8-rhodf         openshift-storage
odf-prometheus-operator      Prometheus Operator               v4.19.8-rhodf         openshift-storage
recipe                       Recipe                            v4.19.8-rhodf         openshift-storage
rook-ceph-operator           Rook-Ceph                         v4.19.8-rhodf         openshift-storage
sgwc-operator                SGWC Operator                     v0.6.5                5gc
sgwc-operator                                                                        openshift-operators
sriov-network-operator       SR-IOV Network Operator           v4.19.0-202511260712  openshift-sriov-network-operator
upf-operator                 UPF Operator                      v1.9.42               5gc
upf-operator                                                                         openshift-operators
...

MachineConfigPools and MachineSets

The Insights Operator also collects MachineConfigPool and MachineConfigSet information and returns them when applicable.

MachineConfigPool and MachineSets
...
MachineConfigPools:

NAME    CONFIG                                            PAUSED  UPDATED  UPDATING  DEGRADED  MACHINECOUNT  READYMACHINECOUNT  UPDATEDMACHINECOUNT  DEGRADEDMACHINECOUNT
master  rendered-master-8831ba6d556d1c6a582116beaa537dbb  False   True     False     False     3             3                  3                    0
worker  rendered-worker-b33efe42325e084f9dcef59f47b93fc9  False   True     False     False     5             5                  5                    0

MachineSets:

NAME                    DESIRED  CURRENT  READY  AVAILABLE
prodshift-2nvq7-dmz     2        2        2      2
prodshift-2nvq7-worker  3        3        3      3
...

Storage Classes

For customers using persistent storage via OpenShift Data Foundations or through a 3rd party like Portworx, Infinidat or VMware, we collect storage class information which is helpful to determine what storage is being used by the cluster.

Storage Classes
...
StorageClasses:

NAME             PROVISIONER                                                    RECLAIM POLICY  BINDING MODE  VOLUME EXPANSION
5gc-nfs-storage  cluster.local/nfs-provisioner-nfs-subdir-external-provisioner  Delete          Immediate     True
...

Cluster Memory Usage

Just like with the Customer Memory Usage option, the Insights Operator also collects the memory usage of the cluster namespaces. This is a great way to see if there has been a large growth in memory usage when comparing multiple must-gathers and to get an idea of what the memory usage is compared to the size of the cluster.

Cluster Memory Usage
...
Cluster Namespace Memory Usage:

NAMESPACE                                         MEMORY
openshift-apiserver                               1.21 GB
openshift-apiserver-operator                      237.52 MB
openshift-authentication                          204.72 MB
openshift-authentication-operator                 192.42 MB
openshift-catalogd                                400.57 MB
openshift-cloud-controller-manager-operator       186.32 MB
openshift-cloud-credential-operator               125.80 MB
openshift-cluster-machine-approver                114.22 MB
openshift-cluster-node-tuning-operator            1.10 GB
openshift-cluster-olm-operator                    92.02 MB
openshift-cluster-samples-operator                75.02 MB
openshift-cluster-storage-operator                232.67 MB
openshift-cluster-version                         116.95 MB
openshift-cnv                                     2.99 GB
openshift-config-operator                         76.99 MB
...
openshift-route-controller-manager                236.91 MB
openshift-service-ca                              185.80 MB
openshift-service-ca-operator                     80.82 MB
openshift-sriov-network-operator                  660.84 MB
openshift-storage                                 2.09 GB

Total Cluster Namespace Memory Usage: 72.78 GB
...

Restarting Containers

The Insights Operator collects information on container restarts which is extremely useful when trying to understand problems in the cluster. We return Namespace, Pod Name, Container Name, Restart Count, and the last Restart Time.

Restarting Containers
...
Containers with more than 3 restarts:

NAMESPACE                                    POD NAME                                                         CONTAINER NAME                               RESTARTS  RESTART TIME
openshift-apiserver                          apiserver-75d755b989-qplnj                                       openshift-apiserver                          35        2025-12-08 18:32:39
openshift-apiserver-operator                 openshift-apiserver-operator-c6fcd76c-w9dhr                      openshift-apiserver-operator                 23        2025-12-08 17:08:43
openshift-authentication                     oauth-openshift-5f85d8547d-j5pgl                                 oauth-openshift                              14        2025-12-08 16:46:27
openshift-authentication                     oauth-openshift-5f85d8547d-j9vhg                                 oauth-openshift                              8         2025-12-08 18:33:11
openshift-authentication-operator            authentication-operator-55c75748d8-hrmjj                         authentication-operator                      28        2025-12-08 17:08:57
openshift-catalogd                           catalogd-controller-manager-644c7b6647-jwbvm                     manager                                      37        2025-12-08 18:35:47
openshift-cloud-controller-manager-operator  cluster-cloud-controller-manager-operator-7594f66d79-kp5wk       cluster-cloud-controller-manager             12        2025-12-08 16:39:36
openshift-cloud-controller-manager-operator  cluster-cloud-controller-manager-operator-7594f66d79-kp5wk       config-sync-controllers                      12        2025-12-08 16:39:08
openshift-cluster-machine-approver           machine-approver-66dd4cccf8-6h2cq                                machine-approver-controller                  28        2025-12-08 18:35:36
openshift-cluster-version                    cluster-version-operator-76d4d9c9cb-2jjg9                        cluster-version-operator                     8         2025-12-08 18:32:45
openshift-cnv                                aaq-operator-69dbd4dbcc-f8kht                                    aaq-operator                                 113       2025-12-09 02:16:23
openshift-cnv                                cdi-deployment-8d44554f7-dzhmt                                   cdi-deployment                               120       2025-12-09 04:21:18
openshift-cnv                                cdi-operator-655b595b9d-gm89z                                    cdi-operator                                 120       2025-12-09 04:16:16
openshift-cnv                                hco-operator-67d948fc78-l7ff6                                    hyperconverged-cluster-operator              126       2025-12-09 04:21:30
openshift-cnv                                hco-webhook-8546b78db4-nh89x                                     hyperconverged-cluster-webhook               140       2025-12-09 04:21:19
openshift-cnv                                hostpath-provisioner-operator-98d45dd45-v9xzw                    hostpath-provisioner-operator                36        2025-12-09 02:16:22
openshift-cnv                                kubevirt-ipam-controller-manager-6998cd6677-zhmrd                manager                                      116       2025-12-09 04:16:24
...

Alerts

The Insights Operator collects all alerts that are firing on the cluster. We return Alert Name, State, and Start Time.

Alerts
...
ALERT NAME                          STATE   START TIME
InsightsRecommendationActive        ACTIVE  2025-12-08 20:31:14.889
CDINoDefaultStorageClass            ACTIVE  2025-12-08 20:22:13.091
KubeDaemonSetRolloutStuck           ACTIVE  2025-12-08 12:37:32.089
KubeJobFailed                       ACTIVE  2025-12-08 12:37:32.089
AlertmanagerReceiversNotConfigured  ACTIVE  2025-12-08 12:37:25.872
UpdateAvailable                     ACTIVE  2025-12-08 18:35:39.959
KubeJobFailed                       ACTIVE  2025-12-08 12:37:32.089
ClusterOperatorDegraded             ACTIVE  2025-12-08 19:05:31.735
CDIStorageProfilesIncomplete        ACTIVE  2025-12-09 02:22:13.091
KubeJobFailed                       ACTIVE  2025-12-08 12:37:32.089
ClusterOperatorDegraded             ACTIVE  2025-12-08 20:55:01.735
KubeJobFailed                       ACTIVE  2025-12-08 12:37:32.089
...

PodNetworkConnectivityChecks

The Insights Operator also collects PodNetworkConnectivityChecks which is a service that runs on a Compute node in the cluster and then connects to listener pods on each node of the cluster. It checks for various services and it is helpful to determine if nodes or services become unreachable.

PodNetworkConnectivityChecks
...
PodNetworkConnectivityChecks:

ERROR                                   TIMESTAMP
kubernetes-apiserver-endpoint-ocpmstr3  2025-12-08 18:27:40Z
kubernetes-apiserver-service-cluster    2025-12-08 16:36:38Z
kubernetes-apiserver-service-cluster    2025-12-08 18:30:38Z
kubernetes-default-service-cluster-0    2025-12-08 17:10:38Z
kubernetes-default-service-cluster-0    2025-12-08 18:30:38Z
openshift-apiserver-endpoint-ocpmstr3   2025-12-08 18:32:29Z
openshift-apiserver-service-cluster     2025-12-08 16:40:38Z
...

Conditional Update Risks

Red Hat provides Conditional Update Risks to customers so that they are aware of any potential risk of a failed upgrade or an issue post upgrade.

Conditional Update Risks
...
Conditional Update Risks:

RISK                                         REFERENCE                                         AFFECTED_VERSIONS
ConsoleCrashOnMissingPlugin                  https://issues.redhat.com/browse/CONSOLE-4762     4.19.10, 4.19.11, 4.19.12, 4.19.13, 4.19.7, 4.19.9
HyperShiftClusterVersionOperatorMetrics      https://issues.redhat.com/browse/OTA-1705         4.19.10, 4.19.11, 4.19.12, 4.19.13, 4.19.14, 4.19.15, 4.19.16, 4.19.9
HyperShiftProxyScheme                        https://issues.redhat.com/browse/CNTRLPLANE-1407  4.19.10, 4.19.11, 4.19.12, 4.19.13, 4.19.4, 4.19.5, 4.19.6, 4.19.7, 4.19.9
MachineConfigNodesV1AlphaControlPlaneLabels  https://issues.redhat.com/browse/MCO-1890         4.19.12, 4.19.13
NMStateServiceFailure                        https://issues.redhat.com/browse/CORENET-6419     4.19.10, 4.19.11, 4.19.12, 4.19.13, 4.19.9
NetworkManagerOVNBridgeMapping               https://issues.redhat.com/browse/CORENET-6483     4.19.13, 4.19.14, 4.19.15, 4.19.16, 4.19.17, 4.19.18
OSUpdateFailureDueToImagePullPolicy          https://issues.redhat.com/browse/MCO-1896         4.19.12, 4.19.13, 4.19.14, 4.19.15
RuncShareProcessNamespace                    https://issues.redhat.com/browse/RUN-3748         4.19.19
SCOSBootImage                                https://issues.redhat.com/browse/COS-3765         4.19.18
...

Lab Scenarios

Scenario 1

The Customer reported issues with the cluster and stated that they have Cluster Operators that are failing and issues with Container Networking. Review the output of the script and look at the Cluster Operators to determine which operators are having issues, then review the restarting pods to see which networking related pod is having issues and last restarted on 2025-11-25, and then check the alerts and look for OVN related alerts.

To dig deeper, the matching Must-Gather is located in ~/Module4/Cluster_1

Scenario 1
ocp_insights.py --file ./Cluster_1/insights.tar.gz
ocp_insights.py --file ./Cluster_1/insights.tar.gz --alerts | jq -r . | less

Scenario 2

The customer is having issues logging into the Cluster. They are currently falling back on the kubeadmin account. Review the output of the script to identify potential issues and potential resolutions.

Scenario 2
ocp_insights.py --file ./Cluster_2/insights.tar.gz

Scenario 3

The customer reports the cluster is very unstable, multiple cluster operators are down, and they need help immediately. Review the control-plane nodes and take note the Creation Dates, review the numerous errors in the cluster operators and pay specific attention to the reasons, look at the failing pods, and the namespace events and the timestamps of the events, and finally review the etcd_metrics and see if you can spot a major issue.

Scenario 3
ocp_insights.py --file ./Cluster_3/insights.tar.gz --node_info
ocp_insights.py --file ./Cluster_3/insights.tar.gz --cluster_operators
ocp_insights.py --file ./Cluster_3/insights.tar.gz --etcd_metrics