💡 Solution: Scenario 3 — Node selector & resource issues

This page provides the detailed solutions for the issues presented in Scenario 3. Review these steps after you have attempted the diagnosis and resolution yourself.

🛑 Problem 1: Database pod scheduling failure

Diagnosis

The PostgreSQL Database pods were unable to be scheduled because they had an invalid nodeSelector assigned in the CR. The scheduler could not find any node matching the criteria, so the pods remained in a Pending state indefinitely.

🛠️ Resolution: Correcting node scheduling

The fix involves modifying the database configuration within the main Ansible Automation Platform Custom Resource (CR).

1. Modify the database configuration in the CR

In the OpenShift Console:

Navigate to Ecosystem → Installed Operators and select your namespace.
Click on Ansible Automation Platform in the operator list.
Click the AnsibleAutomationPlatform tab and click on your AAP instance.
Switch to the YAML tab.

You have two valid options:

Option A: Specify a valid selector: replace the incorrect nodeSelector with a valid label:
```
spec:
  database:
    nodeSelector:
      kubernetes.io/os: linux
```
Option B: Remove the selector: delete the nodeSelector field entirely from the database section of the CR to allow scheduling on any available node.

🛑 Problem 2: API pod resource exhaustion

Diagnosis

The API pods were failing due to oversized resource requests and limits defined in the CR. The requests exceeded the available node or cluster resources, preventing pod startup.

🛠️ Resolution: Reducing resource requirements

The fix is to reduce the resource requirements defined in the API section of the CR to values that are manageable and realistic.

1. Modify the API resource requirements in the CR

Still in the AAP CR YAML view, reduce the values in the Controller resource requirements:

# Example snippet of the modified AAP CR
spec:
  controller:
    resource_requirements:
      requests:
        cpu: 250m
        memory: 1Gi
      limits:
        cpu: 1000m
        memory: 2Gi

♻️ Final reconciliation

After both CR modifications are applied, the Operator must reconcile the environment.

In the OpenShift Console, navigate to Workloads → Pods and select your namespace.

Find the database pod ({username}-aap-postgres-15-0), click the three-dot menu on the right, and select Delete Pod.

Then find the Gateway Operator manager pod (aap-gateway-operator-controller-manager-…), click the three-dot menu, and select Delete Pod.

Verification

Monitor Workloads → Pods until the database and API pods are successfully redeployed and running. Then log into AAP to confirm full functionality.