Migration Guide — Layers + Role Conversions

Migration Guide → Layers & Role Conversions

5. What We Put in Each Layer — MCP Lab

Each row links to the actual role or GitOps path:

Layer	What	Role / Path	Repo
Infra Install Operators Runs once cluster-provision.yml	RHBK (Keycloak) operator	`ocp4_workload_authentication`	agnosticd/core_workloads
	Gitea operator	`ocp4_workload_gitea_operator`	agnosticd/core_workloads
	OpenShift GitOps (ArgoCD)	`ocp4_workload_openshift_gitops`	agnosticd/core_workloads
	Tekton (Pipelines)	`ocp4_workload_pipelines`	agnosticd/core_workloads
	ToolHive (MCP proxy runner)	`ocp4_workload_toolhive`	agnosticd/ai_workloads
	CloudNativePG operator	Kubernetes Subscription CR (direct)	cluster-provision.yml
	AppProjects (infra, platform, tenants) + bootstrap-infra & bootstrap-platform Applications	`ocp4_workload_gitops_bootstrap` (applications list)	agnosticd/core_workloads — GitOps: infra/bootstrap/
Platform Create Instances & Configure Runs once (same provisioning)	Gitea server instance (created from Gitea operator CR)	GitOps: platform/bootstrap/	ocpsandbox-mcp-with-openshift-gitops
	User workload monitoring (enables Prometheus/Grafana for tenant namespaces)	GitOps: platform/monitoring.yaml	ocpsandbox-mcp-with-openshift-gitops
Tenant The User Per order AgV workloads:	Create RHBK user (`mcpuser-<guid>`)	`ocp4_workload_tenant_keycloak_user`	agnosticd/namespaced_workloads
	Create 6 OCP namespaces (agent, librechat, mcp-gitea, mcp-openshift, gitea, showroom)	`ocp4_workload_tenant_namespace`	agnosticd/namespaced_workloads
	Deploy per-tenant Gitea instance + user + mirror GitOps repo	`ocp4_workload_tenant_gitea`	agnosticd/namespaced_workloads
	Create LiteMaaS virtual API key with rate limit	`ocp4_workload_litellm_virtual_keys`	rhpds/rhpds.litellm_virtual_keys
	Create `bootstrap-tenant` ArgoCD Application (app-of-apps)	`ocp4_workload_gitops_bootstrap`	agnosticd/core_workloads — GitOps: tenant/bootstrap/
	OCP console iframe support for Showroom	`ocp4_workload_ocp_console_embed`	agnosticd/showroom
	Showroom tab UI (lab instructions)	`ocp4_workload_showroom`	agnosticd/showroom
	LibreChat + MongoDB + Meilisearch (AI chat UI)	GitOps: tenant/librechat/	ocpsandbox-mcp-with-openshift-gitops
	MCP servers (Gitea MCP + OpenShift MCP via ToolHive)	GitOps: tenant/mcp-gitea/ & tenant/mcp-openshift/	ocpsandbox-mcp-with-openshift-gitops
	AI agent (Python, built by Tekton pipeline)	GitOps: tenant/agent/ — Source: tenant/agent-src/	ocpsandbox-mcp-with-openshift-gitops

Infra and Platform layers: manual for now Currently triggered manually by running cluster-provision.yml. Once GC's cluster attachment integration lands, these layers will move into AgV as a separate cluster provisioner catalog item with no manual steps.

6. Role-by-Role Conversion

Cluster provisioning components: openshift-base → Sandbox API scheduler-only

Before: Every order triggered a full CNV OCP cluster build via components: openshift-base. AgnosticD provisioned the cluster from scratch — typically 45–60 minutes — before any workload roles could run. The cluster was bespoke to that order and destroyed at the end.

After: There is no components: block. Instead, AgV declares one OcpSandbox entry under __meta__.sandboxes. The Sandbox API scheduler picks a pre-provisioned shared cluster from the pool in seconds, injects the cluster credentials (sandbox_openshift_api_url, cluster_admin_agnosticd_sa_token) into the AgnosticD run, and provisioning begins immediately.

The cluster itself is provisioned separately, once per cluster, by cluster-provision.yml. That playbook installs RHBK, the Gitea operator, Tekton, OpenShift GitOps, ToolHive, and CloudNativePG — everything that is cluster-wide infrastructure, not per-order resources. Cluster provisioning is a one-time cost, not a per-order cost.

Impact Order provisioning drops from 45–60 minutes to 2–3 minutes. The cluster is already running; we only provision per-tenant resources.

Authentication ocp4_workload_authentication_htpasswd → ocp4_workload_tenant_keycloak_user

Before: ocp4_workload_authentication_htpasswd created num_users HTPasswd users (user1, user2, ...) directly on the cluster's OAuth configuration on every order. HTPasswd is cluster-local — there is no SSO, no federation, and the sequential userN naming gives no way to trace which user belongs to which order on a shared cluster.

After: ocp4_workload_tenant_keycloak_user creates exactly one RHBK user per order in the existing realm. RHBK is already installed on the cluster (by the cluster provisioner). The role only creates the user — it does not install or configure RHBK itself.

Username changed from user1 (shared, sequential) to mcpuser-<guid> (unique per order, traceable)
Password derivation changed from md5(guid[:5]) to sha256(guid[:8]) — same deterministic approach, stronger hash, satisfies modern complexity requirements with the Mcp prefix and ! suffix
remove_workload.yml deletes the RHBK user on destroy — previously the cluster deletion handled this automatically

# BEFORE — HTPasswd, N users per order
ocp4_workload_authentication_htpasswd_user_base: user
ocp4_workload_authentication_htpasswd_user_count: "{{ num_users }}"
ocp4_workload_authentication_htpasswd_user_password: "{{ common_user_password }}"

# AFTER — RHBK, one user per order
ocp4_workload_tenant_keycloak_username: "mcpuser-{{ guid }}"
common_password: "Mcp{{ (guid | hash('sha256'))[:8] }}!"
ocp4_workload_tenant_keycloak_user_password: "{{ common_password }}"

Gitea ocp4_workload_gitea_operator → ocp4_workload_tenant_gitea

Before: ocp4_workload_gitea_operator installed a full Gitea instance per order — operator, deployment, all of it. It then created all num_users Gitea accounts and migrated the GitOps repository for each user. One Gitea instance per cluster, shared by all users on that cluster order.

After: Gitea is already on the cluster — the cluster provisioner installs the Gitea operator once. ocp4_workload_tenant_gitea only creates one Gitea organisation for the tenant and mirrors the GitOps source repository into it.

Gitea installation moved to the cluster provisioner — one install per cluster, not one per order
Old role: created N users and migrated N repos. New role: creates 1 org + mirrors 1 repo
The mirrored repo is the GitOps source that ArgoCD reads — each tenant has their own isolated copy of the repo, preventing cross-tenant interference
remove_workload.yml deletes the Gitea organisation (and all repos within it) on destroy

# BEFORE — install Gitea, create all users, migrate all repos
ocp4_workload_gitea_operator_create_users: true
ocp4_workload_gitea_operator_user_number: "{{ num_users }}"
ocp4_workload_gitea_operator_migrate_repositories: true

# AFTER — Gitea already exists. Create one org + mirror one repo.
ocp4_workload_tenant_gitea_username: "{{ ocp4_workload_tenant_keycloak_username }}"
ocp4_workload_tenant_gitea_repositories:
  - name: mcp
    repo: https://github.com/rhpds/ocpsandbox-mcp-with-openshift-gitops
    private: false

Cluster-wide operators ocp4_workload_pipelines / ocp4_workload_openshift_gitops / ocp4_workload_toolhive → cluster-provision.yml (one-time)

Before: ocp4_workload_pipelines, ocp4_workload_openshift_gitops, and ocp4_workload_toolhive all appeared in the workloads: list and ran on every order. This meant every order waited for three operator installations to complete, even though they always produced identical cluster-wide results.

After: All three have been removed from the per-order workloads list entirely. They are installed once by cluster-provision.yml when a cluster is added to the pool. Every subsequent order on that cluster finds them already present.

There are no AgV variables for these in the new common.yaml — they are gone from the order configuration completely. Per-order provisioning is faster because these operators are never re-installed.

Rule of thumb If an operator or service is cluster-wide and would produce the same result on every order, it belongs in the cluster provisioner — not in the per-order workloads list.

Main workload (biggest change) ocp4_workload_mcp_user (monolithic) → ocp4_workload_tenant_namespace + ocp4_workload_gitops_bootstrap

Before: ocp4_workload_mcp_user was a single role that did everything for all users in one pass:

Created namespaces for every user
Deployed LibreChat per user
Deployed MCP servers per user (mcp-gitea, mcp-openshift)
Created ArgoCD AppProjects per user
Set up per-user ArgoCD Applications
Configured LibreChat with MCP endpoints and credentials

If any part of this failed for any user, debugging required reading through hundreds of tasks. If LibreChat changed its configuration schema, you had to update and retest the entire role for all users. There was no way to update just one component.

After: This is split into two distinct layers:

Layer 1 — Ansible (per order):

ocp4_workload_tenant_namespace — creates the namespaces. That is its entire job. One responsibility.

Layer 2 — ArgoCD / GitOps (per order):

ocp4_workload_gitops_bootstrap — creates one ArgoCD Application called bootstrap-tenant that acts as an app-of-apps. ArgoCD then syncs all child apps: agent, librechat, librechat-config, mcp-gitea, mcp-openshift

# BEFORE — one monolithic role, all users, all resources
ocp4_workload_mcp_user_num_users: "{{ num_users }}"
ocp4_workload_mcp_user_librechat_password: "{{ common_user_password }}"
ocp4_workload_mcp_user_litemaas_url: "{{ lookup('agnosticd_user_data', 'litellm_api_base_url') }}"

# AFTER — namespace role creates namespaces; ArgoCD handles the rest
ocp4_workload_tenant_namespace_suffixes:
  - agent
  - librechat
  - mcp-gitea
  - mcp-openshift
  - gitea
  - showroom

ocp4_workload_gitops_bootstrap_application_name: "bootstrap-tenant"
ocp4_workload_gitops_bootstrap_repo_path: "tenant/bootstrap"
ocp4_workload_gitops_bootstrap_helm_values:
  tenant:
    username: "{{ ocp4_workload_tenant_keycloak_username }}"
    password: "{{ common_password }}"
  litemaas:
    url: "{{ litellm_api_endpoint | default('') }}/v1"
    key: "{{ litellm_virtual_key | default('') }}"
    models: "{{ litellm_available_models | default([]) | join(',') }}"

The practical benefit: if LibreChat needs a configuration change, you push a commit to the GitOps repo and ArgoCD reconciles only the librechat-config Application. You do not re-run Ansible. You do not touch namespaces or the agent or Gitea. The monolithic role made every change an all-or-nothing operation; ArgoCD makes changes surgical.

LiteMaaS virtual keys ocp4_workload_litellm_virtual_keys (one key per cluster/order) → ocp4_workload_litellm_virtual_keys (one key per tenant)

Before: rhpds.litellm_virtual_keys.ocp4_workload_litellm_virtual_keys created one LiteMaaS virtual key for the entire order — shared by all num_users users on that cluster. All users' AI calls went through the same key, making per-user usage tracking impossible.

After: The same role from the same collection — nothing changed in the role itself. The scope changed: one key per tenant order (one user). Each order gets its own isolated LiteMaaS key with its own budget and rate limits. The key is injected into the gitops bootstrap Helm values so both LibreChat and the AI agent pick it up automatically.

The role is also still in remove_workloads: — it was the only role that had proper destroy logic in the original lab, and it retains that position in the new lab's remove sequence.

Showroom ocp4_workload_showroom_ocp_integration + ocp4_workload_showroom → ocp4_workload_ocp_console_embed + ocp4_workload_showroom

Before: ocp4_workload_showroom_ocp_integration was required because the cluster was freshly provisioned — Showroom needed to query and discover the OCP API URL, console URL, and other cluster facts before it could embed them into lab content. A freshly provisioned cluster has no pre-known URLs.

After: ocp4_workload_ocp_console_embed is run once by the cluster provisioner — it patches the IngressController CSP headers to allow Showroom iframe embedding. It does NOT run per order. Running it per order triggers a router rollout every time which blocks provisioning. Sandbox API provides sandbox_openshift_api_url, sandbox_openshift_console_url, and cluster_admin_agnosticd_sa_token directly — no discovery needed.

ocp4_workload_showroom itself is unchanged — it is the same role in both patterns. Only the role that feeds it cluster metadata changed.

Destroy / cleanup No remove_workload.yml (cluster deletion handled everything) → Every role has remove_workload.yml — explicit ordered list

Before: Cleanup was "delete the cluster." The only entry in remove_workloads: was the LiteMaaS virtual key, because that was the only resource that lived outside the cluster. Everything else — user namespaces, LibreChat, Gitea, all ArgoCD applications — disappeared when the cluster was deleted. No per-role cleanup logic was ever written.

After: The cluster persists. Every resource that provisioning creates must be explicitly removed by its own remove_workload.yml. The remove_workloads: list is ordered — resources are torn down in the reverse order of creation, with dependencies respected.

# BEFORE — only the LiteMaaS key needed explicit cleanup
remove_workloads:
  - rhpds.litellm_virtual_keys.ocp4_workload_litellm_virtual_keys

# AFTER — every resource must be explicitly cleaned up, in dependency order
workloads:
- agnosticd.namespaced_workloads.ocp4_workload_tenant_keycloak_user
- agnosticd.namespaced_workloads.ocp4_workload_tenant_namespace
- agnosticd.namespaced_workloads.ocp4_workload_tenant_gitea
- rhpds.litellm_virtual_keys.ocp4_workload_litellm_virtual_keys
- agnosticd.core_workloads.ocp4_workload_gitops_bootstrap
- agnosticd.showroom.ocp4_workload_showroom

remove_workloads:
- agnosticd.showroom.ocp4_workload_showroom
- rhpds.litellm_virtual_keys.ocp4_workload_litellm_virtual_keys
- agnosticd.core_workloads.ocp4_workload_gitops_bootstrap    # cascade-deletes all ArgoCD apps
- agnosticd.namespaced_workloads.ocp4_workload_tenant_gitea
- agnosticd.namespaced_workloads.ocp4_workload_tenant_namespace
- agnosticd.namespaced_workloads.ocp4_workload_tenant_keycloak_user

The ocp4_workload_gitops_bootstrap remove step deletes the bootstrap-tenant Application with the ArgoCD cascade finalizer, which ensures all child Applications and their synced Kubernetes resources are deleted before the namespace is removed. Without the finalizer, deleting the bootstrap Application would leave orphaned child Applications and all their resources behind.

← Previous: Why + What Changed Next: Challenges + Lessons →