Installation & Verification
Duration: 25 minutes
Format: Hands-on verification
What You’re Checking
These are the same checks you’d run after an installation, after an upgrade, or when something feels off - ClusterOperators, node config, networking, storage, and etcd health.
Your cluster was provisioned using IPI (Installer-Provisioned Infrastructure), which automates compute, network, storage, and DNS. OpenShift also supports UPI, Agent-based/Assisted Installer for bare metal, and managed offerings (ROSA, ARO). See the Installation Documentation for details.
| Two things that cannot be changed after installation: the pod network CIDR and the service network CIDR. If they overlap with your corporate network, you’re rebuilding the cluster. |
Cluster Health
Step 1: Verify Cluster Operators (Primary Health Check)
ClusterOperators are the single source of truth for cluster health. Each operator manages a specific cluster component and reports its status.
View all cluster operators:
oc get clusteroperators
You’ll see ~34 operators. Example output:
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.20.x True False False 3h cloud-credential 4.20.x True False False 4h console 4.20.x True False False 3h dns 4.20.x True False False 4h etcd 4.20.x True False False 4h ingress 4.20.x True False False 3h kube-apiserver 4.20.x True False False 4h kube-controller-manager 4.20.x True False False 4h kube-scheduler 4.20.x True False False 4h machine-api 4.20.x True False False 4h machine-config 4.20.x True False False 4h monitoring 4.20.x True False False 3h network 4.20.x True False False 4h openshift-apiserver 4.20.x True False False 3h storage 4.20.x True False False 4h
What to check:
-
AVAILABLE = True - Component is functional
-
PROGRESSING = False - No upgrade/rollout in progress
-
DEGRADED = False - Component is healthy
Healthy cluster: All operators show True False False
If an operator is degraded, investigate further:
oc describe clusteroperator <operator-name>
Check the Conditions section for details about why it’s degraded.
Common issues:
-
machine-configprogressing - Nodes are being updated (normal during updates) -
monitoringdegraded - Storage or resource issues -
authenticationdegraded - OAuth or LDAP configuration issues
Console view: Switch to the OCP Console tab and navigate to Administration → Cluster Settings → ClusterOperators. The console shows each operator with green/red status icons - degraded operators are immediately visible without scanning a 34-row table. This is faster than CLI for a quick health check.
Step 2: Check Cluster Version and Update Status
View the current cluster version:
oc get clusterversion
Your output will look similar to:
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.20.x True False 3h Cluster version is 4.20.x
Check which update channel you’re subscribed to:
oc get clusterversion version -o jsonpath='{.spec.channel}'
Common channels:
-
stable-4.20 - Production-ready releases (recommended)
-
fast-4.20 - Early access to stable releases
-
eus-4.20 - Extended Update Support (14-month lifecycle)
-
candidate-4.20 - Pre-release testing
View available updates:
oc adm upgrade
If updates are available, you’ll see a list of recommended versions. If your cluster is already up to date, you’ll see:
No updates available. You may still upgrade to a specific release image with --to-image or wait for new updates to be available.
In the OCP Console, navigate to Administration → Cluster Settings. The Details tab shows the current version, update channel, and - when updates are available - a visual upgrade path graph showing the route from your current version to available targets. This is far more informative than the CLI text output for planning upgrades.
Key takeaway: Operators manage cluster updates. When updates are available, you can upgrade with a single command: oc adm upgrade --to=<version>
Step 3: Verify Node Configuration
Beyond the node health you explored in the Platform Overview, there’s an important verification step specific to post-installation: MachineConfigPools.
Check node configuration status:
oc get machineconfigpool
Your output will look similar to:
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT master rendered-master-… True False False 3 3 worker rendered-worker-… True False False 0 0
| On this compact cluster the worker pool shows 0 machines because all nodes serve both control-plane and worker roles under the master pool. |
What this shows:
-
UPDATED = True - All nodes have latest configuration
-
UPDATING = True - Nodes are being rolled out with new config (normal during changes)
-
DEGRADED = True - Configuration failed on some nodes (investigate)
MachineConfigPools manage node-level configuration (kernel arguments, systemd units, etc.) and coordinate rolling updates.
You can also verify this in the console. Navigate to Compute → MachineConfigPools:
The green Up to date indicators and Degraded: False confirm that all node configurations have been applied successfully. During an upgrade or configuration change, you would see the Update status change to Updating - this view is a quick way to monitor node rollout progress without polling the CLI.
Step 4: Test Networking (Hands-on)
Verify networking works by creating a test application and route:
oc new-project install-test 2>/dev/null || oc project install-test
oc new-app --name=test-app --image=registry.access.redhat.com/ubi9/httpd-24 -n install-test
Wait for the pod to be running, then create a route:
oc rollout status deployment/test-app -n install-test --timeout=60s
oc create route edge test-app --service=test-app -n install-test
| This may take a minute to complete. Only press Ctrl+C if it has been running for more than 5 minutes. |
Test the route:
curl -sk https://$(oc get route test-app -n install-test -o jsonpath='{.spec.host}') | head -3
What this proves: DNS works, router is functional, TLS certificates auto-generated, and the application is reachable from outside the cluster.
You can also verify the route in the console at Networking > Routes (project: install-test) - click the route URL to test it directly from the browser.
|
Clean up:
oc delete project install-test &>/dev/null &
Storage & etcd
Step 5: Test Storage Provisioning
Verify dynamic storage works by creating a PVC and a pod that mounts it:
oc new-project storage-test 2>/dev/null || oc project storage-test
cat <<EOF | oc apply -n storage-test -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: Pod
metadata:
name: test-storage
spec:
containers:
- name: test
image: registry.access.redhat.com/ubi9/ubi-minimal
command: ["/bin/sh", "-c", "echo 'Storage works!' > /data/test.txt && cat /data/test.txt && sleep 30"]
volumeMounts:
- name: data
mountPath: /data
securityContext:
allowPrivilegeEscalation: false
runAsNonRoot: true
capabilities:
drop: ["ALL"]
seccompProfile:
type: RuntimeDefault
volumes:
- name: data
persistentVolumeClaim:
claimName: test-pvc
restartPolicy: Never
EOF
Wait for the PVC to bind and the pod to start:
echo "Waiting for storage test to complete..."
ELAPSED=0
until oc get pod test-storage -n storage-test --no-headers 2>/dev/null | grep -qE 'Running|Completed'; do
sleep 5; ELAPSED=$((ELAPSED+5))
[ $ELAPSED -ge 120 ] && echo "ERROR: Timed out - check 'oc get pods -n storage-test'" && break
done
oc get pvc test-pvc -n storage-test && oc get pod test-storage -n storage-test
The PVC should show Bound and the pod should be Running or Completed:
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE test-pvc Bound pvc-xxxxxxx 1Gi RWO ocs-external-storagecluster-ceph-rbd 15s NAME READY STATUS RESTARTS AGE test-storage 1/1 Running 0 15s
In the console, navigate to Storage > PersistentVolumeClaims (project: storage-test) to see the PVC status and its bound PersistentVolume.
|
Check the pod wrote to the volume successfully:
oc logs test-storage -n storage-test
You should see Storage works! - confirming that dynamic provisioning, volume attachment, and read/write all function correctly.
Clean up:
oc delete project storage-test &>/dev/null &
Step 6: Verify etcd Health
etcd is the distributed key-value store that holds all cluster state. If etcd is unhealthy, everything is unhealthy. Two key metrics to check after installation:
-
Leader elections - should be 0 or very low. Frequent leader elections indicate disk I/O problems or network latency between control plane nodes.
-
Disk sync duration - if fsync times are consistently above 10ms, etcd performance will suffer and
occommands will start timing out.
Check both via the OCP Console at Observe → Metrics. Enter this query and click Run Queries:
increase(etcd_server_leader_changes_seen_total[24h])
You should see 0 elections across all etcd members - that’s healthy.
Now check disk sync times:
histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m]))
This shows the 99th percentile WAL fsync latency per etcd member. Values under 10ms are ideal. Workshop clusters may show higher values due to shared storage - that’s expected.
Also verify the monitoring operator is healthy:
oc get clusteroperator monitoring
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE monitoring 4.20.x True False False 3h
Summary
Installation: You learned how OpenShift can be installed using full stack automation (IPI, Agent-based, Assisted Installer) or pre-existing infrastructure (UPI) across public cloud, private cloud, bare metal, and specialized platforms.
Verification: You verified this cluster is production-ready by checking ClusterOperators, update channels, MachineConfigPools, networking, storage provisioning, and etcd health.
Key takeaway: oc get clusteroperators is your primary health check. All operators showing True False False means a healthy cluster ready for workloads.
Additional Resources
-
Installing OpenShift: Installation Documentation
-
Verifying Cluster Status: Troubleshooting Guide
-
ClusterOperators Reference: Operator Reference