Converting an Existing Lab to the Sandbox API Shared Cluster Pattern

A real account of how the MCP with OpenShift lab moved from dedicated CNV clusters to shared clusters — what changed, role by role, and every problem we hit

About this guide This is a personal account of how I did this migration — not an official step-by-step playbook. I used Claude Code extensively throughout to help build roles, templates, and debug issues. Every lab is different and you will hit edge cases that are not covered here. The challenges listed are the ones I ran into with this specific lab. Your lab will have its own. Use this as a reference for the approach and the thinking, not as a guaranteed recipe.

Three Approaches, One Direction

The shared cluster + Sandbox API pattern is being developed across the team in parallel, with different labs taking different approaches. This guide covers one of them. The others are coming and each will have its own example you can learn from.

Approach	Who	Status	What to learn from it
MCP with OpenShift — this guide	Prakhar Srivastava	Done	Scheduler-only pattern end-to-end: tenant roles, GitOps bootstrap, per-tenant Gitea and LibreChat instances, explicit destroy
Complex GitOps example	Judd Maltin	Coming	A more advanced GitOps-driven pattern showing how to go further with ApplicationSets, multi-app bootstrapping, and Keycloak integration from ArgoCD
OCP Sandbox API	Multiple	Reference	Simpler pattern where Sandbox API creates namespaces and RHSSO users — fewer roles, less setup, good starting point for basic labs

People and feedback that shaped this

Every iteration of this migration involved feedback from the team. Some of it confirmed the approach, some of it changed it:

Judd Maltin — proposed the three-layer architecture (infra / platform / tenant) and the GitOps bootstrap app-of-apps pattern that this lab is built on. The ocp4_workload_gitops_bootstrap role improvements (multi-app list, cascade finalizer, userinfo ConfigMap) came out of working through this together.
Judd Maltin (open concerns — not yet resolved) — raised two important issues with the current destroy approach that need to be addressed before this pattern is used at scale. See the open questions section below for details.
Wolfgang Kulhanek and Nate Stephany — both suggested that namespaces and RBAC should be pre-created by Ansible before any ArgoCD sync happens. Nate's position was clear: foundational resources (namespaces, RBAC, service accounts) should not be managed by GitOps — they should be applied deterministically by Ansible first, and ArgoCD should only manage workload resources on top of that foundation. That feedback directly shaped the operational rule we follow: Ansible owns the foundation, ArgoCD owns the workloads.
GC — working on the Sandbox API integration that will allow clusters ordered from demo.redhat.com to be automatically attached to the pool. Until that lands, the cluster provisioner playbook is run manually. This is the last manual step remaining before the pattern is fully automated.

1. The Original Workshop

The original mcp-with-openshift lab ran on a dedicated CNV OCP cluster per order. Every time someone ordered the lab, AgnosticD provisioned a full OpenShift cluster from scratch — which took 45–60 minutes — then installed all lab dependencies on top of it.

Key characteristics of the original lab

config: openshift-workloads — the old config type used before Sandbox API
components: openshift-base — each order provisioned a full dedicated CNV OCP cluster. Expensive, slow (45–60 min), and one cluster per order
Multi-user: a num_users parameter allowed 2–40 users per order. All users shared the same cluster
Authentication: HTPasswd (ocp4_workload_authentication_htpasswd) — users were created as user1, user2, ..., userN directly on the cluster
Gitea: a full ocp4_workload_gitea_operator was installed per order — one Gitea instance per cluster, shared by all lab users
Pipelines, ArgoCD, ToolHive: installed fresh per order as part of the workload list
Monolithic user role: ocp4_workload_mcp_user did everything for all users in one pass — namespaces, LibreChat, MCP servers, ArgoCD AppProjects, per-user ArgoCD Applications

Original workloads list

# original mcp-with-openshift common.yaml (config: openshift-workloads)
workloads:
- rhpds.litellm_virtual_keys.ocp4_workload_litellm_virtual_keys
- agnosticd.core_workloads.ocp4_workload_authentication_htpasswd
- agnosticd.core_workloads.ocp4_workload_gitea_operator
- agnosticd.core_workloads.ocp4_workload_pipelines
- agnosticd.core_workloads.ocp4_workload_openshift_gitops
- agnosticd.ai_workloads.ocp4_workload_toolhive
- rhpds.mcp_workloads.ocp4_workload_mcp_user          # monolithic — did everything for all users
- agnosticd.showroom.ocp4_workload_showroom_ocp_integration
- agnosticd.showroom.ocp4_workload_showroom
- rhpds.mcp_workloads.ocp4_workload_mcp_with_openshift_validation

remove_workloads:
- rhpds.litellm_virtual_keys.ocp4_workload_litellm_virtual_keys

Original password derivation formula

# Original — MD5 of the first 5 characters of the GUID, base64-encoded
common_user_password: "{{ (guid[:5] | hash('md5') | int(base=16) | b64encode)[:10] }}"
common_admin_password: "{{ (guid[:5] | hash('md5') | int(base=16) | b64encode)[:16] }}"

Notice: no destroy logic for most resources Only litellm_virtual_keys appeared in remove_workloads. Everything else — HTPasswd users, Gitea, all user namespaces, LibreChat deployments — was cleaned up by simply deleting the entire OCP cluster at the end of the order. There was no per-role cleanup needed because the cluster itself was destroyed. This is a critical difference from the shared cluster pattern, where the cluster persists and every role must be able to clean up after itself.

2. Why We Changed It

Zero Touch OpenShift The goal is for a customer or workshop attendee to click "Order" and have a fully working OpenShift environment in minutes — not an hour. This is only possible with pre-provisioned shared clusters. The Sandbox API scheduler-only pattern is the foundation that makes this possible.

Speed. A dedicated CNV cluster takes 45–60 minutes. A shared cluster order takes 2–3 minutes — the cluster is already running, only per-tenant resources are provisioned. For booth demos where attendees walk up and want to start immediately, waiting an hour is not an option.
Scale across use cases — and Lightning Labs at Summit 2026. This pattern supports full workshop sessions, booth demos, and on-demand ordering from a single catalog item.

At Summit 2026, the event team is introducing self-service labs called Lightning Labs — and this is expected to be one of the biggest attractions of the event. Lightning Labs need 2–3 minute provisioning times. A dedicated CNV cluster per order (45–60 min) makes them impossible. The shared cluster + Sandbox API pattern makes them viable at scale, concurrently, across the event floor.
Establishing a reusable pattern. The first RHDP lab to use the shared cluster + Sandbox API scheduler-only pattern end-to-end. The tenant roles (ocp4_workload_tenant_keycloak_user, ocp4_workload_tenant_namespace, ocp4_workload_tenant_gitea) are generic — any lab on a shared cluster can reuse them.
Authentication. HTPasswd with sequential usernames (user1, user2) breaks down on a shared cluster with concurrent orders. RHBK is already on the cluster — each order adds one user to the existing realm and removes it on destroy.

3. Design Decisions — Why We Built It This Way

These decisions are the recommended approach for shared cluster labs — not arbitrary choices.

What is shared vs what is per-tenant On the cluster: RHBK, ArgoCD, Tekton, ToolHive, and the Gitea operator are all shared — installed once by the cluster provisioner. Per tenant (per order): a full Gitea instance (via operator CR), a full LibreChat instance, dedicated namespaces per service, one RHBK user, their own MCP servers, and their own LiteMaaS key.

Our recommendation: namespace-scoped instances per tenant We recommend deploying a full instance of each service per tenant, inside that tenant's own namespace. The operator is shared and installed once — but every order creates its own CR (instance) within an isolated namespace. This gives you clean destroy (delete the namespace and everything goes), complete isolation between tenants, and simple debugging. This is the pattern we used for Gitea and LibreChat in this lab.

Why does each tenant get their own Gitea instance?

The Gitea operator is installed once; each order creates a new Gitea CR inside the tenant's namespace — a fully isolated server with its own PostgreSQL, route, admin user, and repos.

Complete isolation. Each tenant's Gitea is a separate server. One tenant cannot see another tenant's repositories at all — they are not just in different orgs on the same server, they are on entirely different Gitea instances with different URLs.
Lab exercises require it. Attendees push code and configuration changes as part of the exercises. With a per-tenant instance, there is no risk of one attendee's push affecting another's server.
Clean destroy. Deleting the tenant's namespace deletes the entire Gitea instance — the CR, the pods, the PostgreSQL database, the persistent volumes, the route. No cleanup role needed beyond namespace deletion.
Operator does the heavy lifting. The Gitea operator (already on the cluster) handles the full lifecycle of each instance — deploy, configure, expose via route. The tenant_gitea role just creates the CR and waits for it to be ready.

The pattern: shared operator, per-tenant instance Install the operator once at cluster provisioning time. Create a CR per order inside the tenant's namespace. This gives you:

Full isolation — each tenant's instance is a completely separate process with its own storage, network endpoint, and credentials. One tenant cannot reach or affect another's instance.
Clean destroy — delete the namespace and the entire instance is gone. No orphaned data, no cleanup scripts needed.
Simple debugging — every resource for one tenant is in one namespace. oc get all -n gitea-mcpuser-guid shows you exactly what belongs to that tenant and nothing else.

This is the recommended pattern for any stateful service (Gitea, LibreChat, databases) in a shared cluster lab.

Why does each tenant get their own LibreChat instance?

Conversation data. Each attendee's conversations, uploaded files, and API history live in their LibreChat instance. A shared LibreChat means one attendee can see another's data. Not acceptable for a workshop.
MCP endpoints are per-tenant. Each LibreChat is configured to point to that specific tenant's MCP servers — their mcp-gitea-mcpuser-guid and mcp-openshift-mcpuser-guid. A shared instance cannot route MCP calls per user.
LiteMaaS key. Each tenant has their own virtual key with their own budget. A shared LibreChat would pool all API usage under one key with no per-user attribution or rate limiting.

Why separate namespaces per service?

Each service (LibreChat, MCP-Gitea, MCP-OpenShift, agent) gets its own namespace rather than sharing one namespace per tenant.

Quota per service. Each namespace has its own ResourceQuota and LimitRange. LibreChat can have a higher memory limit without affecting the MCP server quota. A quota violation in one namespace does not block another.
RBAC is cleaner. Service accounts are scoped to the namespace. A service account in mcp-openshift-mcpuser-guid cannot touch resources in librechat-mcpuser-guid.
Easier to debug. The namespace name tells you immediately which service you are looking at. kubectl get pods -n librechat-mcpuser-drw4x shows only LibreChat — nothing else mixed in.

Why one RHBK user per order?

Clean lifecycle. The user is created when the order is placed and deleted when it ends. No leftover identity from previous runs.
Traceability. The username (mcpuser-<guid>) ties directly to the order GUID. You always know which order a user belongs to.
RHBK is shared, users are not. The RHBK server is one per cluster. Users inside it are one per order. This is the right separation: shared infrastructure, isolated tenants.

Why one tenant per order instead of multi-user?

The original lab had a num_users parameter (2–40 users per order, all on the same cluster). The new pattern is one tenant per order.

True isolation. One attendee's broken deployment cannot affect any other. With multi-user per order, resource exhaustion by one user hits everyone on that cluster.
Destroy is simple. Deleting one tenant's resources is straightforward. Deleting one of N users from a shared cluster while leaving the others intact is significantly harder.
Scales horizontally. More attendees = more clusters in the pool. Multi-user per order creates uneven resource usage and coordination problems across the pool.

4. What Changed in AgV — Before and After

The new catalog item is mcp-with-openshift-sandbox, replacing mcp-with-openshift for shared cluster deployments.

	Original (`mcp-with-openshift`)	New (`mcp-with-openshift-sandbox`)
`config:`	`openshift-workloads`	`namespace`
`components:`	Full CNV OCP cluster per order (45–60 min)	None — uses Sandbox API pool
`__meta__.sandboxes`	None (cluster was provisioned by AgnosticD)	Single `OcpSandbox` entry, scheduler-only
Users per order	2–40 (`num_users` parameter)	1 — single tenant per order
Authentication	HTPasswd — `user1`, `user2`, ..., `userN`	RHBK — one user `mcpuser-<guid>`
Password formula	`md5(guid[:5])` base64-truncated	`sha256(guid)` — stronger, deterministic
Gitea	Full install per order — one instance per cluster	Shared on cluster; per-user org created per order
Pipelines / GitOps / ToolHive	Installed per order (cluster-wide operators)	Installed once by cluster provisioner
Main workload role	`ocp4_workload_mcp_user` — one monolithic role for all users	Split: individual tenant roles + ArgoCD app-of-apps
Destroy logic	Delete entire cluster — no per-role cleanup needed	Each role has `remove_workload.yml`
`remove_workloads:`	Only LiteMaaS virtual key	All 5 tenant roles listed in explicit order

Next: Layers + Role Conversions →