Converting an Existing Lab to the Sandbox API Shared Cluster Pattern

A real account of how the MCP with OpenShift lab moved from dedicated CNV clusters to shared clusters — what changed, role by role, and every problem we hit

About this guide This is a personal account of how I did this migration — not an official step-by-step playbook. I used Claude Code extensively throughout to help build roles, templates, and debug issues. Every lab is different and you will hit edge cases that are not covered here. The challenges listed are the ones I ran into with this specific lab. Your lab will have its own. Use this as a reference for the approach and the thinking, not as a guaranteed recipe.

Three Approaches, One Direction

The shared cluster + Sandbox API pattern is being developed across the team in parallel, with different labs taking different approaches. This guide covers one of them. The others are coming and each will have its own example you can learn from.

Approach Who Status What to learn from it
MCP with OpenShift — this guide Prakhar Srivastava Done Scheduler-only pattern end-to-end: tenant roles, GitOps bootstrap, per-tenant Gitea and LibreChat instances, explicit destroy
Complex GitOps example Judd Maltin Coming A more advanced GitOps-driven pattern showing how to go further with ApplicationSets, multi-app bootstrapping, and Keycloak integration from ArgoCD
OCP Sandbox API Multiple Reference Simpler pattern where Sandbox API creates namespaces and RHSSO users — fewer roles, less setup, good starting point for basic labs

People and feedback that shaped this

Every iteration of this migration involved feedback from the team. Some of it confirmed the approach, some of it changed it:


1. The Original Workshop

The original mcp-with-openshift lab ran on a dedicated CNV OCP cluster per order. Every time someone ordered the lab, AgnosticD provisioned a full OpenShift cluster from scratch — which took 45–60 minutes — then installed all lab dependencies on top of it.

Key characteristics of the original lab

Original workloads list

# original mcp-with-openshift common.yaml (config: openshift-workloads)
workloads:
- rhpds.litellm_virtual_keys.ocp4_workload_litellm_virtual_keys
- agnosticd.core_workloads.ocp4_workload_authentication_htpasswd
- agnosticd.core_workloads.ocp4_workload_gitea_operator
- agnosticd.core_workloads.ocp4_workload_pipelines
- agnosticd.core_workloads.ocp4_workload_openshift_gitops
- agnosticd.ai_workloads.ocp4_workload_toolhive
- rhpds.mcp_workloads.ocp4_workload_mcp_user          # monolithic — did everything for all users
- agnosticd.showroom.ocp4_workload_showroom_ocp_integration
- agnosticd.showroom.ocp4_workload_showroom
- rhpds.mcp_workloads.ocp4_workload_mcp_with_openshift_validation

remove_workloads:
- rhpds.litellm_virtual_keys.ocp4_workload_litellm_virtual_keys

Original password derivation formula

# Original — MD5 of the first 5 characters of the GUID, base64-encoded
common_user_password: "{{ (guid[:5] | hash('md5') | int(base=16) | b64encode)[:10] }}"
common_admin_password: "{{ (guid[:5] | hash('md5') | int(base=16) | b64encode)[:16] }}"
Notice: no destroy logic for most resources Only litellm_virtual_keys appeared in remove_workloads. Everything else — HTPasswd users, Gitea, all user namespaces, LibreChat deployments — was cleaned up by simply deleting the entire OCP cluster at the end of the order. There was no per-role cleanup needed because the cluster itself was destroyed. This is a critical difference from the shared cluster pattern, where the cluster persists and every role must be able to clean up after itself.

2. Why We Changed It

Zero Touch OpenShift The goal is for a customer or workshop attendee to click "Order" and have a fully working OpenShift environment in minutes — not an hour. This is only possible with pre-provisioned shared clusters. The Sandbox API scheduler-only pattern is the foundation that makes this possible.

3. Design Decisions — Why We Built It This Way

These decisions are the recommended approach for shared cluster labs — not arbitrary choices.

What is shared vs what is per-tenant On the cluster: RHBK, ArgoCD, Tekton, ToolHive, and the Gitea operator are all shared — installed once by the cluster provisioner. Per tenant (per order): a full Gitea instance (via operator CR), a full LibreChat instance, dedicated namespaces per service, one RHBK user, their own MCP servers, and their own LiteMaaS key.
Our recommendation: namespace-scoped instances per tenant We recommend deploying a full instance of each service per tenant, inside that tenant's own namespace. The operator is shared and installed once — but every order creates its own CR (instance) within an isolated namespace. This gives you clean destroy (delete the namespace and everything goes), complete isolation between tenants, and simple debugging. This is the pattern we used for Gitea and LibreChat in this lab.

Why does each tenant get their own Gitea instance?

The Gitea operator is installed once; each order creates a new Gitea CR inside the tenant's namespace — a fully isolated server with its own PostgreSQL, route, admin user, and repos.

The pattern: shared operator, per-tenant instance Install the operator once at cluster provisioning time. Create a CR per order inside the tenant's namespace. This gives you: This is the recommended pattern for any stateful service (Gitea, LibreChat, databases) in a shared cluster lab.

Why does each tenant get their own LibreChat instance?

Why separate namespaces per service?

Each service (LibreChat, MCP-Gitea, MCP-OpenShift, agent) gets its own namespace rather than sharing one namespace per tenant.

Why one RHBK user per order?

Why one tenant per order instead of multi-user?

The original lab had a num_users parameter (2–40 users per order, all on the same cluster). The new pattern is one tenant per order.


4. What Changed in AgV — Before and After

The new catalog item is mcp-with-openshift-sandbox, replacing mcp-with-openshift for shared cluster deployments.

Original (mcp-with-openshift) New (mcp-with-openshift-sandbox)
config: openshift-workloads namespace
components: Full CNV OCP cluster per order (45–60 min) None — uses Sandbox API pool
__meta__.sandboxes None (cluster was provisioned by AgnosticD) Single OcpSandbox entry, scheduler-only
Users per order 2–40 (num_users parameter) 1 — single tenant per order
Authentication HTPasswd — user1, user2, ..., userN RHBK — one user mcpuser-<guid>
Password formula md5(guid[:5]) base64-truncated sha256(guid) — stronger, deterministic
Gitea Full install per order — one instance per cluster Shared on cluster; per-user org created per order
Pipelines / GitOps / ToolHive Installed per order (cluster-wide operators) Installed once by cluster provisioner
Main workload role ocp4_workload_mcp_user — one monolithic role for all users Split: individual tenant roles + ArgoCD app-of-apps
Destroy logic Delete entire cluster — no per-role cleanup needed Each role has remove_workload.yml
remove_workloads: Only LiteMaaS virtual key All 5 tenant roles listed in explicit order

Next: Layers + Role Conversions →