Three-Layer GitOps Architecture for Shared Cluster Labs

Infra / Platform / Tenant — a general pattern for any lab running on a shared OpenShift cluster

Three-Layer Architecture: Infra / Platform / Tenant

Each layer has a clear owner, different run frequency, and different lifecycle:

Layer 1 — Infra (Generic, reusable) Runs ONCE per cluster. Installs all cluster-wide operators via cluster-provision.yml. Never runs per order. RHBK Keycloak / SSO Gitea Operator shared instance ArgoCD OpenShift GitOps Tekton Pipelines ToolHive MCP proxy gitops_bootstrap infra AppProjects gitops_bootstrap platform bootstrap cluster ready Layer 2 — Platform (Lab Owner, cluster-wide) Runs ONCE per cluster. Creates operator instances, configures shared cluster services via ArgoCD (bootstrap-platform). platform-monitoring future shared services (user workload monitoring enabled) per order Layer 3 — Tenant (Lab Owner, per-user) Runs PER ORDER. One isolated environment per user. Created by AgnosticD roles + gitops_bootstrap. Destroyed when order ends. Sandbox API schedules cluster Ansible Roles (AgnosticD) tenant_keycloak_user tenant_namespace tenant_gitea | litellm_virtual_keys gitops_bootstrap bootstrap-tenant ArgoCD ArgoCD Apps (GitOps) agent | librechat | librechat-config mcp-gitea | mcp-openshift showroom Destroy (per order): litellm key → gitops_bootstrap (ArgoCD cascade) → tenant_gitea → tenant_namespace → tenant_keycloak_user
Layer Responsibility Runs How triggered MCP Lab example
Infra
Reusable across labs
Base cluster capabilities — foundational layer that everything else depends on. Installed once when the cluster is onboarded. Once per cluster cluster-provision.yml playbook — run by the developer when cluster is onboarded. See: cluster-provision.yml RHBK, Gitea operator, OpenShift GitOps (ArgoCD), Tekton, ToolHive, CloudNativePG, ArgoCD AppProjects. These are generic — any lab on this cluster can rely on them.
Platform
Lab-specific
Cluster-wide resources specific to what this lab needs on top of infra. Not per-user — shared across all tenants on the cluster. The lab developer decides what goes here. Once per cluster — same provisioning run as infra ArgoCD syncs bootstrap-platform Application MCP lab: user workload monitoring ConfigMap. Service mesh lab example: a shared service mesh gateway. AI lab example: a shared LiteLLM instance. Anything cluster-wide but specific to this lab's needs.
Tenant
Lab-specific
Per-user isolated environment — created per order, destroyed when done. Same owner as platform, just a different scope (per-user vs cluster-wide). Every order AgnosticD runs AgV workloads: list via Sandbox API scheduler RHBK user, 6 namespaces, per-tenant Gitea instance, LiteMaaS virtual key. ArgoCD deploys: LibreChat, MCP servers, AI agent, Showroom.
All three layers are developer-owned — the distinction is scope and timing In the MCP lab, the cluster is dedicated to this lab. All three layers are set up by developers. The distinction is: Infra = the base cluster capabilities (installed once, cluster-wide, foundational). Platform = cluster-wide resources specific to what this lab needs on top of infra (MCP: user workload monitoring; a service mesh lab would add a service mesh gateway here). Tenant = per-user, per-order.

How OCP Sandbox API and ArgoCD Pick Up Namespaces

One of the most common questions: the GitOps repo has no namespace creation in it, yet everything deploys into the right namespaces. Here is exactly how it works — no magic.

Two ways namespaces are created

Option A — OCP Sandbox API creates them

Each entry in __meta__.sandboxes causes OCP Sandbox API to create one namespace before AgnosticD runs. Each entry gets its own quota: and limit_range:.

Use this when you want OCP Sandbox API to own the namespace lifecycle and you need per-namespace quota control.

Example: mcp-with-openshift-sandbox/common.yaml

Option B — Ansible role creates them

The ocp4_workload_tenant_namespace role creates all namespaces listed in ocp4_workload_tenant_namespace_suffixes. All get the same quota and limit_range. For different limits per namespace, call the role multiple times.

Use this when you need full control over naming and per-namespace RBAC before ArgoCD runs.

Role: ocp4_workload_tenant_namespace

How ArgoCD finds the right namespace — the naming contract

Both options follow the same convention: {suffix}-{username}. The bootstrap Helm chart constructs namespace names using this formula. As long as Ansible creates namespaces with the same formula, ArgoCD will find them.

LayerCodeResult (username = mcpuser-drw4x)
Ansible creates namespace suffix: librechat + prefix: mcpuser-drw4x librechat-mcpuser-drw4x
ArgoCD Helm targets namespace printf "librechat-%s" $username in applications.yaml librechat-mcpuser-drw4x

CreateNamespace=false in every ArgoCD Application (see applications.yaml line 160) tells ArgoCD to deploy into the existing namespace. If the namespace does not exist, ArgoCD fails — enforcing that Ansible runs first.

Real example — Summit 2026 / Scheduler-Only: Ansible role creates namespaces Primary pattern

Ansible creates all namespaces before ArgoCD runs. Role source →

# Role: agnosticd.namespaced_workloads.ocp4_workload_tenant_namespace
# Runs BEFORE gitops_bootstrap — namespaces must exist before ArgoCD syncs.
ocp4_workload_tenant_namespace_prefix: "{{ ocp4_workload_tenant_keycloak_username }}"
ocp4_workload_tenant_namespace_namespaces:
- suffix: agent
  quota: { limits.cpu: "2", limits.memory: 4Gi }
  limit_range:
    default: { cpu: 500m, memory: 512Mi }
    defaultRequest: { cpu: 50m, memory: 128Mi }
- suffix: librechat
  quota: { limits.cpu: "4", limits.memory: 6Gi }   # LibreChat + MongoDB + Meilisearch
  limit_range:
    default: { cpu: 500m, memory: 512Mi }
    defaultRequest: { cpu: 50m, memory: 128Mi }
- suffix: mcp-gitea
  quota: { limits.cpu: "1", limits.memory: 2Gi }   # lightweight MCP proxy
  limit_range:
    default: { cpu: 500m, memory: 512Mi }
    defaultRequest: { cpu: 50m, memory: 128Mi }

Real example — Post-Summit / OCP Sandbox API: platform creates namespaces with per-entry quota

Each sandbox entry creates one namespace with its own quota — no Ansible role needed for namespace creation. Full file →

# Each sandboxes: entry creates one namespace with its own quota.
# OCP Sandbox API creates these BEFORE AgnosticD runs.
__meta__:
  sandboxes:
  # Primary — also schedules the cluster
  - kind: OcpSandbox
    namespace_suffix: user
    cloud_selector: { cloud: cnv-dedicated-shared, demo: mcp-with-openshift, purpose: prod }
    quota:
      limits.cpu: "2"
      limits.memory: 4Gi
  # LibreChat needs more memory
  - kind: OcpSandbox
    namespace_suffix: librechat
    cluster_condition: same('primary')
    quota:
      limits.cpu: "4"
      limits.memory: 8Gi
  # MCP servers need less
  - kind: OcpSandbox
    namespace_suffix: mcp-gitea
    cluster_condition: same('primary')
    quota:
      limits.cpu: "1"
      limits.memory: 2Gi

Different limits per namespace — Scheduler-Only pattern

The ocp4_workload_tenant_namespace role applies one quota/limit to all namespaces in the list. To give different namespaces different limits, split them into separate calls using ocp4_workload_tenant_namespace_suffixes with different quota values. Each role call creates a group of namespaces with shared limits.

# First call — heavy namespaces (LibreChat needs more)
workloads:
- agnosticd.namespaced_workloads.ocp4_workload_tenant_namespace

# In AgV, set vars for the first group:
ocp4_workload_tenant_namespace_suffixes:
- librechat
ocp4_workload_tenant_namespace_quota:
  limits.cpu: "4"
  limits.memory: 8Gi

# NOTE: The role must support a loop or be called via include_role with vars
# for multiple groups. Alternatively, use the OCP Sandbox API pattern (Option A)
# which natively supports per-namespace quota via separate sandboxes: entries.
Recommended: use OCP Sandbox API entries for per-namespace quota control If you need different resource limits in different namespaces, the cleanest approach is to use separate sandboxes: entries — each entry gets its own quota: block. This is what the MCP lab does and it is shown above.
This is the reference implementation The MCP with OpenShift lab is the reference for this pattern. All files referenced above are live code — not examples. Start from the GitOps repo and the AgV common.yaml.

How This Connects to the Sandbox API

The three-layer pattern maps directly onto the two main Sandbox API integration patterns:

Real example: what went in each layer for the MCP lab See the Migration Guide → What we put in each layer (MCP lab) for a concrete breakdown of what roles and ArgoCD apps landed in each layer during the MCP lab migration.

Red Hat Demo Platform (RHDP) — Internal developer reference — GitHub

← Previous: Cluster Provisioning Next: Why + What Changed →