Migration Guide → Layers & Role Conversions

5. What We Put in Each Layer — MCP Lab

Each row links to the actual role or GitOps path:

Layer What Role / Path Repo
Infra
Install Operators
Runs once
cluster-provision.yml
RHBK (Keycloak) operator ocp4_workload_authentication agnosticd/core_workloads
Gitea operator ocp4_workload_gitea_operator agnosticd/core_workloads
OpenShift GitOps (ArgoCD) ocp4_workload_openshift_gitops agnosticd/core_workloads
Tekton (Pipelines) ocp4_workload_pipelines agnosticd/core_workloads
ToolHive (MCP proxy runner) ocp4_workload_toolhive agnosticd/ai_workloads
CloudNativePG operator Kubernetes Subscription CR (direct) cluster-provision.yml
AppProjects (infra, platform, tenants) + bootstrap-infra & bootstrap-platform Applications ocp4_workload_gitops_bootstrap (applications list) agnosticd/core_workloads — GitOps: infra/bootstrap/
Platform
Create Instances
& Configure
Runs once
(same provisioning)
Gitea server instance (created from Gitea operator CR) GitOps: platform/bootstrap/ ocpsandbox-mcp-with-openshift-gitops
User workload monitoring (enables Prometheus/Grafana for tenant namespaces) GitOps: platform/monitoring.yaml ocpsandbox-mcp-with-openshift-gitops
Tenant
The User
Per order
AgV workloads:
Create RHBK user (mcpuser-<guid>) ocp4_workload_tenant_keycloak_user agnosticd/namespaced_workloads
Create 6 OCP namespaces (agent, librechat, mcp-gitea, mcp-openshift, gitea, showroom) ocp4_workload_tenant_namespace agnosticd/namespaced_workloads
Deploy per-tenant Gitea instance + user + mirror GitOps repo ocp4_workload_tenant_gitea agnosticd/namespaced_workloads
Create LiteMaaS virtual API key with rate limit ocp4_workload_litellm_virtual_keys rhpds/rhpds.litellm_virtual_keys
Create bootstrap-tenant ArgoCD Application (app-of-apps) ocp4_workload_gitops_bootstrap agnosticd/core_workloads — GitOps: tenant/bootstrap/
OCP console iframe support for Showroom ocp4_workload_ocp_console_embed agnosticd/showroom
Showroom tab UI (lab instructions) ocp4_workload_showroom agnosticd/showroom
LibreChat + MongoDB + Meilisearch (AI chat UI) GitOps: tenant/librechat/ ocpsandbox-mcp-with-openshift-gitops
MCP servers (Gitea MCP + OpenShift MCP via ToolHive) GitOps: tenant/mcp-gitea/ & tenant/mcp-openshift/ ocpsandbox-mcp-with-openshift-gitops
AI agent (Python, built by Tekton pipeline) GitOps: tenant/agent/ — Source: tenant/agent-src/ ocpsandbox-mcp-with-openshift-gitops
Infra and Platform layers: manual for now Currently triggered manually by running cluster-provision.yml. Once GC's cluster attachment integration lands, these layers will move into AgV as a separate cluster provisioner catalog item with no manual steps.

6. Role-by-Role Conversion

components: openshift-base Sandbox API scheduler-only

Before: Every order triggered a full CNV OCP cluster build via components: openshift-base. AgnosticD provisioned the cluster from scratch — typically 45–60 minutes — before any workload roles could run. The cluster was bespoke to that order and destroyed at the end.

After: There is no components: block. Instead, AgV declares one OcpSandbox entry under __meta__.sandboxes. The Sandbox API scheduler picks a pre-provisioned shared cluster from the pool in seconds, injects the cluster credentials (sandbox_openshift_api_url, cluster_admin_agnosticd_sa_token) into the AgnosticD run, and provisioning begins immediately.

The cluster itself is provisioned separately, once per cluster, by cluster-provision.yml. That playbook installs RHBK, the Gitea operator, Tekton, OpenShift GitOps, ToolHive, and CloudNativePG — everything that is cluster-wide infrastructure, not per-order resources. Cluster provisioning is a one-time cost, not a per-order cost.

Impact Order provisioning drops from 45–60 minutes to 2–3 minutes. The cluster is already running; we only provision per-tenant resources.
ocp4_workload_authentication_htpasswd ocp4_workload_tenant_keycloak_user

Before: ocp4_workload_authentication_htpasswd created num_users HTPasswd users (user1, user2, ...) directly on the cluster's OAuth configuration on every order. HTPasswd is cluster-local — there is no SSO, no federation, and the sequential userN naming gives no way to trace which user belongs to which order on a shared cluster.

After: ocp4_workload_tenant_keycloak_user creates exactly one RHBK user per order in the existing realm. RHBK is already installed on the cluster (by the cluster provisioner). The role only creates the user — it does not install or configure RHBK itself.

  • Username changed from user1 (shared, sequential) to mcpuser-<guid> (unique per order, traceable)
  • Password derivation changed from md5(guid[:5]) to sha256(guid[:8]) — same deterministic approach, stronger hash, satisfies modern complexity requirements with the Mcp prefix and ! suffix
  • remove_workload.yml deletes the RHBK user on destroy — previously the cluster deletion handled this automatically
# BEFORE — HTPasswd, N users per order
ocp4_workload_authentication_htpasswd_user_base: user
ocp4_workload_authentication_htpasswd_user_count: "{{ num_users }}"
ocp4_workload_authentication_htpasswd_user_password: "{{ common_user_password }}"

# AFTER — RHBK, one user per order
ocp4_workload_tenant_keycloak_username: "mcpuser-{{ guid }}"
common_password: "Mcp{{ (guid | hash('sha256'))[:8] }}!"
ocp4_workload_tenant_keycloak_user_password: "{{ common_password }}"
ocp4_workload_gitea_operator ocp4_workload_tenant_gitea

Before: ocp4_workload_gitea_operator installed a full Gitea instance per order — operator, deployment, all of it. It then created all num_users Gitea accounts and migrated the GitOps repository for each user. One Gitea instance per cluster, shared by all users on that cluster order.

After: Gitea is already on the cluster — the cluster provisioner installs the Gitea operator once. ocp4_workload_tenant_gitea only creates one Gitea organisation for the tenant and mirrors the GitOps source repository into it.

  • Gitea installation moved to the cluster provisioner — one install per cluster, not one per order
  • Old role: created N users and migrated N repos. New role: creates 1 org + mirrors 1 repo
  • The mirrored repo is the GitOps source that ArgoCD reads — each tenant has their own isolated copy of the repo, preventing cross-tenant interference
  • remove_workload.yml deletes the Gitea organisation (and all repos within it) on destroy
# BEFORE — install Gitea, create all users, migrate all repos
ocp4_workload_gitea_operator_create_users: true
ocp4_workload_gitea_operator_user_number: "{{ num_users }}"
ocp4_workload_gitea_operator_migrate_repositories: true

# AFTER — Gitea already exists. Create one org + mirror one repo.
ocp4_workload_tenant_gitea_username: "{{ ocp4_workload_tenant_keycloak_username }}"
ocp4_workload_tenant_gitea_repositories:
  - name: mcp
    repo: https://github.com/rhpds/ocpsandbox-mcp-with-openshift-gitops
    private: false
ocp4_workload_pipelines / ocp4_workload_openshift_gitops / ocp4_workload_toolhive cluster-provision.yml (one-time)

Before: ocp4_workload_pipelines, ocp4_workload_openshift_gitops, and ocp4_workload_toolhive all appeared in the workloads: list and ran on every order. This meant every order waited for three operator installations to complete, even though they always produced identical cluster-wide results.

After: All three have been removed from the per-order workloads list entirely. They are installed once by cluster-provision.yml when a cluster is added to the pool. Every subsequent order on that cluster finds them already present.

There are no AgV variables for these in the new common.yaml — they are gone from the order configuration completely. Per-order provisioning is faster because these operators are never re-installed.

Rule of thumb If an operator or service is cluster-wide and would produce the same result on every order, it belongs in the cluster provisioner — not in the per-order workloads list.
ocp4_workload_mcp_user (monolithic) ocp4_workload_tenant_namespace + ocp4_workload_gitops_bootstrap

Before: ocp4_workload_mcp_user was a single role that did everything for all users in one pass:

  • Created namespaces for every user
  • Deployed LibreChat per user
  • Deployed MCP servers per user (mcp-gitea, mcp-openshift)
  • Created ArgoCD AppProjects per user
  • Set up per-user ArgoCD Applications
  • Configured LibreChat with MCP endpoints and credentials

If any part of this failed for any user, debugging required reading through hundreds of tasks. If LibreChat changed its configuration schema, you had to update and retest the entire role for all users. There was no way to update just one component.

After: This is split into two distinct layers:

Layer 1 — Ansible (per order):

  • ocp4_workload_tenant_namespace — creates the namespaces. That is its entire job. One responsibility.

Layer 2 — ArgoCD / GitOps (per order):

  • ocp4_workload_gitops_bootstrap — creates one ArgoCD Application called bootstrap-tenant that acts as an app-of-apps. ArgoCD then syncs all child apps: agent, librechat, librechat-config, mcp-gitea, mcp-openshift
# BEFORE — one monolithic role, all users, all resources
ocp4_workload_mcp_user_num_users: "{{ num_users }}"
ocp4_workload_mcp_user_librechat_password: "{{ common_user_password }}"
ocp4_workload_mcp_user_litemaas_url: "{{ lookup('agnosticd_user_data', 'litellm_api_base_url') }}"

# AFTER — namespace role creates namespaces; ArgoCD handles the rest
ocp4_workload_tenant_namespace_suffixes:
  - agent
  - librechat
  - mcp-gitea
  - mcp-openshift
  - gitea
  - showroom

ocp4_workload_gitops_bootstrap_application_name: "bootstrap-tenant"
ocp4_workload_gitops_bootstrap_repo_path: "tenant/bootstrap"
ocp4_workload_gitops_bootstrap_helm_values:
  tenant:
    username: "{{ ocp4_workload_tenant_keycloak_username }}"
    password: "{{ common_password }}"
  litemaas:
    url: "{{ litellm_api_endpoint | default('') }}/v1"
    key: "{{ litellm_virtual_key | default('') }}"
    models: "{{ litellm_available_models | default([]) | join(',') }}"

The practical benefit: if LibreChat needs a configuration change, you push a commit to the GitOps repo and ArgoCD reconciles only the librechat-config Application. You do not re-run Ansible. You do not touch namespaces or the agent or Gitea. The monolithic role made every change an all-or-nothing operation; ArgoCD makes changes surgical.

ocp4_workload_litellm_virtual_keys (one key per cluster/order) ocp4_workload_litellm_virtual_keys (one key per tenant)

Before: rhpds.litellm_virtual_keys.ocp4_workload_litellm_virtual_keys created one LiteMaaS virtual key for the entire order — shared by all num_users users on that cluster. All users' AI calls went through the same key, making per-user usage tracking impossible.

After: The same role from the same collection — nothing changed in the role itself. The scope changed: one key per tenant order (one user). Each order gets its own isolated LiteMaaS key with its own budget and rate limits. The key is injected into the gitops bootstrap Helm values so both LibreChat and the AI agent pick it up automatically.

The role is also still in remove_workloads: — it was the only role that had proper destroy logic in the original lab, and it retains that position in the new lab's remove sequence.

ocp4_workload_showroom_ocp_integration + ocp4_workload_showroom ocp4_workload_ocp_console_embed + ocp4_workload_showroom

Before: ocp4_workload_showroom_ocp_integration was required because the cluster was freshly provisioned — Showroom needed to query and discover the OCP API URL, console URL, and other cluster facts before it could embed them into lab content. A freshly provisioned cluster has no pre-known URLs.

After: ocp4_workload_ocp_console_embed is run once by the cluster provisioner — it patches the IngressController CSP headers to allow Showroom iframe embedding. It does NOT run per order. Running it per order triggers a router rollout every time which blocks provisioning. Sandbox API provides sandbox_openshift_api_url, sandbox_openshift_console_url, and cluster_admin_agnosticd_sa_token directly — no discovery needed.

ocp4_workload_showroom itself is unchanged — it is the same role in both patterns. Only the role that feeds it cluster metadata changed.

No remove_workload.yml (cluster deletion handled everything) Every role has remove_workload.yml — explicit ordered list

Before: Cleanup was "delete the cluster." The only entry in remove_workloads: was the LiteMaaS virtual key, because that was the only resource that lived outside the cluster. Everything else — user namespaces, LibreChat, Gitea, all ArgoCD applications — disappeared when the cluster was deleted. No per-role cleanup logic was ever written.

After: The cluster persists. Every resource that provisioning creates must be explicitly removed by its own remove_workload.yml. The remove_workloads: list is ordered — resources are torn down in the reverse order of creation, with dependencies respected.

# BEFORE — only the LiteMaaS key needed explicit cleanup
remove_workloads:
  - rhpds.litellm_virtual_keys.ocp4_workload_litellm_virtual_keys

# AFTER — every resource must be explicitly cleaned up, in dependency order
workloads:
- agnosticd.namespaced_workloads.ocp4_workload_tenant_keycloak_user
- agnosticd.namespaced_workloads.ocp4_workload_tenant_namespace
- agnosticd.namespaced_workloads.ocp4_workload_tenant_gitea
- rhpds.litellm_virtual_keys.ocp4_workload_litellm_virtual_keys
- agnosticd.core_workloads.ocp4_workload_gitops_bootstrap
- agnosticd.showroom.ocp4_workload_showroom

remove_workloads:
- agnosticd.showroom.ocp4_workload_showroom
- rhpds.litellm_virtual_keys.ocp4_workload_litellm_virtual_keys
- agnosticd.core_workloads.ocp4_workload_gitops_bootstrap    # cascade-deletes all ArgoCD apps
- agnosticd.namespaced_workloads.ocp4_workload_tenant_gitea
- agnosticd.namespaced_workloads.ocp4_workload_tenant_namespace
- agnosticd.namespaced_workloads.ocp4_workload_tenant_keycloak_user

The ocp4_workload_gitops_bootstrap remove step deletes the bootstrap-tenant Application with the ArgoCD cascade finalizer, which ensures all child Applications and their synced Kubernetes resources are deleted before the namespace is removed. Without the finalizer, deleting the bootstrap Application would leave orphaned child Applications and all their resources behind.


← Previous: Why + What Changed Next: Challenges + Lessons →