Converting an Existing Lab to the Sandbox API Shared Cluster Pattern
A real account of how the MCP with OpenShift lab moved from dedicated CNV clusters to shared clusters — what changed, role by role, and every problem we hit
Three Approaches, One Direction
The shared cluster + Sandbox API pattern is being developed across the team in parallel, with different labs taking different approaches. This guide covers one of them. The others are coming and each will have its own example you can learn from.
| Approach | Who | Status | What to learn from it |
|---|---|---|---|
| MCP with OpenShift — this guide | Prakhar Srivastava | Done | Scheduler-only pattern end-to-end: tenant roles, GitOps bootstrap, per-tenant Gitea and LibreChat instances, explicit destroy |
| Complex GitOps example | Judd Maltin | Coming | A more advanced GitOps-driven pattern showing how to go further with ApplicationSets, multi-app bootstrapping, and Keycloak integration from ArgoCD |
| OCP Sandbox API | Multiple | Reference | Simpler pattern where Sandbox API creates namespaces and RHSSO users — fewer roles, less setup, good starting point for basic labs |
People and feedback that shaped this
Every iteration of this migration involved feedback from the team. Some of it confirmed the approach, some of it changed it:
- Judd Maltin — proposed the three-layer architecture (infra / platform / tenant) and the GitOps bootstrap app-of-apps pattern that this lab is built on. The
ocp4_workload_gitops_bootstraprole improvements (multi-app list, cascade finalizer, userinfo ConfigMap) came out of working through this together. - Judd Maltin (open concerns — not yet resolved) — raised two important issues with the current destroy approach that need to be addressed before this pattern is used at scale. See the open questions section below for details.
- Wolfgang Kulhanek and Nate Stephany — both suggested that namespaces and RBAC should be pre-created by Ansible before any ArgoCD sync happens. Nate's position was clear: foundational resources (namespaces, RBAC, service accounts) should not be managed by GitOps — they should be applied deterministically by Ansible first, and ArgoCD should only manage workload resources on top of that foundation. That feedback directly shaped the operational rule we follow: Ansible owns the foundation, ArgoCD owns the workloads.
- GC — working on the Sandbox API integration that will allow clusters ordered from demo.redhat.com to be automatically attached to the pool. Until that lands, the cluster provisioner playbook is run manually. This is the last manual step remaining before the pattern is fully automated.
1. The Original Workshop
The original mcp-with-openshift lab ran on a dedicated CNV OCP cluster per order. Every time someone ordered the lab, AgnosticD provisioned a full OpenShift cluster from scratch — which took 45–60 minutes — then installed all lab dependencies on top of it.
Key characteristics of the original lab
config: openshift-workloads— the old config type used before Sandbox APIcomponents: openshift-base— each order provisioned a full dedicated CNV OCP cluster. Expensive, slow (45–60 min), and one cluster per order- Multi-user: a
num_usersparameter allowed 2–40 users per order. All users shared the same cluster - Authentication: HTPasswd (
ocp4_workload_authentication_htpasswd) — users were created asuser1,user2, ...,userNdirectly on the cluster - Gitea: a full
ocp4_workload_gitea_operatorwas installed per order — one Gitea instance per cluster, shared by all lab users - Pipelines, ArgoCD, ToolHive: installed fresh per order as part of the workload list
- Monolithic user role:
ocp4_workload_mcp_userdid everything for all users in one pass — namespaces, LibreChat, MCP servers, ArgoCD AppProjects, per-user ArgoCD Applications
Original workloads list
# original mcp-with-openshift common.yaml (config: openshift-workloads)
workloads:
- rhpds.litellm_virtual_keys.ocp4_workload_litellm_virtual_keys
- agnosticd.core_workloads.ocp4_workload_authentication_htpasswd
- agnosticd.core_workloads.ocp4_workload_gitea_operator
- agnosticd.core_workloads.ocp4_workload_pipelines
- agnosticd.core_workloads.ocp4_workload_openshift_gitops
- agnosticd.ai_workloads.ocp4_workload_toolhive
- rhpds.mcp_workloads.ocp4_workload_mcp_user # monolithic — did everything for all users
- agnosticd.showroom.ocp4_workload_showroom_ocp_integration
- agnosticd.showroom.ocp4_workload_showroom
- rhpds.mcp_workloads.ocp4_workload_mcp_with_openshift_validation
remove_workloads:
- rhpds.litellm_virtual_keys.ocp4_workload_litellm_virtual_keys
Original password derivation formula
# Original — MD5 of the first 5 characters of the GUID, base64-encoded
common_user_password: "{{ (guid[:5] | hash('md5') | int(base=16) | b64encode)[:10] }}"
common_admin_password: "{{ (guid[:5] | hash('md5') | int(base=16) | b64encode)[:16] }}"
litellm_virtual_keys appeared in remove_workloads. Everything else — HTPasswd users, Gitea, all user namespaces, LibreChat deployments — was cleaned up by simply deleting the entire OCP cluster at the end of the order. There was no per-role cleanup needed because the cluster itself was destroyed. This is a critical difference from the shared cluster pattern, where the cluster persists and every role must be able to clean up after itself.
2. Why We Changed It
- Speed. A dedicated CNV cluster takes 45–60 minutes. A shared cluster order takes 2–3 minutes — the cluster is already running, only per-tenant resources are provisioned. For booth demos where attendees walk up and want to start immediately, waiting an hour is not an option.
-
Scale across use cases — and Lightning Labs at Summit 2026. This pattern supports full workshop sessions, booth demos, and on-demand ordering from a single catalog item.
At Summit 2026, the event team is introducing self-service labs called Lightning Labs — and this is expected to be one of the biggest attractions of the event. Lightning Labs need 2–3 minute provisioning times. A dedicated CNV cluster per order (45–60 min) makes them impossible. The shared cluster + Sandbox API pattern makes them viable at scale, concurrently, across the event floor. -
Establishing a reusable pattern. The first RHDP lab to use the shared cluster + Sandbox API scheduler-only pattern end-to-end. The tenant roles (
ocp4_workload_tenant_keycloak_user,ocp4_workload_tenant_namespace,ocp4_workload_tenant_gitea) are generic — any lab on a shared cluster can reuse them. -
Authentication. HTPasswd with sequential usernames (
user1,user2) breaks down on a shared cluster with concurrent orders. RHBK is already on the cluster — each order adds one user to the existing realm and removes it on destroy.
3. Design Decisions — Why We Built It This Way
These decisions are the recommended approach for shared cluster labs — not arbitrary choices.
Why does each tenant get their own Gitea instance?
The Gitea operator is installed once; each order creates a new Gitea CR inside the tenant's namespace — a fully isolated server with its own PostgreSQL, route, admin user, and repos.
- Complete isolation. Each tenant's Gitea is a separate server. One tenant cannot see another tenant's repositories at all — they are not just in different orgs on the same server, they are on entirely different Gitea instances with different URLs.
- Lab exercises require it. Attendees push code and configuration changes as part of the exercises. With a per-tenant instance, there is no risk of one attendee's push affecting another's server.
- Clean destroy. Deleting the tenant's namespace deletes the entire Gitea instance — the CR, the pods, the PostgreSQL database, the persistent volumes, the route. No cleanup role needed beyond namespace deletion.
- Operator does the heavy lifting. The Gitea operator (already on the cluster) handles the full lifecycle of each instance — deploy, configure, expose via route. The
tenant_gitearole just creates the CR and waits for it to be ready.
- Full isolation — each tenant's instance is a completely separate process with its own storage, network endpoint, and credentials. One tenant cannot reach or affect another's instance.
- Clean destroy — delete the namespace and the entire instance is gone. No orphaned data, no cleanup scripts needed.
- Simple debugging — every resource for one tenant is in one namespace.
oc get all -n gitea-mcpuser-guidshows you exactly what belongs to that tenant and nothing else.
Why does each tenant get their own LibreChat instance?
- Conversation data. Each attendee's conversations, uploaded files, and API history live in their LibreChat instance. A shared LibreChat means one attendee can see another's data. Not acceptable for a workshop.
- MCP endpoints are per-tenant. Each LibreChat is configured to point to that specific tenant's MCP servers — their
mcp-gitea-mcpuser-guidandmcp-openshift-mcpuser-guid. A shared instance cannot route MCP calls per user. - LiteMaaS key. Each tenant has their own virtual key with their own budget. A shared LibreChat would pool all API usage under one key with no per-user attribution or rate limiting.
Why separate namespaces per service?
Each service (LibreChat, MCP-Gitea, MCP-OpenShift, agent) gets its own namespace rather than sharing one namespace per tenant.
- Quota per service. Each namespace has its own ResourceQuota and LimitRange. LibreChat can have a higher memory limit without affecting the MCP server quota. A quota violation in one namespace does not block another.
- RBAC is cleaner. Service accounts are scoped to the namespace. A service account in
mcp-openshift-mcpuser-guidcannot touch resources inlibrechat-mcpuser-guid. - Easier to debug. The namespace name tells you immediately which service you are looking at.
kubectl get pods -n librechat-mcpuser-drw4xshows only LibreChat — nothing else mixed in.
Why one RHBK user per order?
- Clean lifecycle. The user is created when the order is placed and deleted when it ends. No leftover identity from previous runs.
- Traceability. The username (
mcpuser-<guid>) ties directly to the order GUID. You always know which order a user belongs to. - RHBK is shared, users are not. The RHBK server is one per cluster. Users inside it are one per order. This is the right separation: shared infrastructure, isolated tenants.
Why one tenant per order instead of multi-user?
The original lab had a num_users parameter (2–40 users per order, all on the same cluster). The new pattern is one tenant per order.
- True isolation. One attendee's broken deployment cannot affect any other. With multi-user per order, resource exhaustion by one user hits everyone on that cluster.
- Destroy is simple. Deleting one tenant's resources is straightforward. Deleting one of N users from a shared cluster while leaving the others intact is significantly harder.
- Scales horizontally. More attendees = more clusters in the pool. Multi-user per order creates uneven resource usage and coordination problems across the pool.
4. What Changed in AgV — Before and After
The new catalog item is mcp-with-openshift-sandbox, replacing mcp-with-openshift for shared cluster deployments.
Original (mcp-with-openshift) |
New (mcp-with-openshift-sandbox) |
|
|---|---|---|
config: |
openshift-workloads |
namespace |
components: |
Full CNV OCP cluster per order (45–60 min) | None — uses Sandbox API pool |
__meta__.sandboxes |
None (cluster was provisioned by AgnosticD) | Single OcpSandbox entry, scheduler-only |
| Users per order | 2–40 (num_users parameter) |
1 — single tenant per order |
| Authentication | HTPasswd — user1, user2, ..., userN |
RHBK — one user mcpuser-<guid> |
| Password formula | md5(guid[:5]) base64-truncated |
sha256(guid) — stronger, deterministic |
| Gitea | Full install per order — one instance per cluster | Shared on cluster; per-user org created per order |
| Pipelines / GitOps / ToolHive | Installed per order (cluster-wide operators) | Installed once by cluster provisioner |
| Main workload role | ocp4_workload_mcp_user — one monolithic role for all users |
Split: individual tenant roles + ArgoCD app-of-apps |
| Destroy logic | Delete entire cluster — no per-role cleanup needed | Each role has remove_workload.yml |
remove_workloads: |
Only LiteMaaS virtual key | All 5 tenant roles listed in explicit order |