Three-Layer GitOps Architecture for Shared Cluster Labs

Infra / Platform / Tenant — a general pattern for any lab running on a shared OpenShift cluster

Three-Layer Architecture: Infra / Platform / Tenant

Each layer has a clear owner, different run frequency, and different lifecycle:

Layer	Responsibility	Runs	How triggered	MCP Lab example
Infra Reusable across labs	Base cluster capabilities — foundational layer that everything else depends on. Installed once when the cluster is onboarded.	Once per cluster	`cluster-provision.yml` playbook — run by the developer when cluster is onboarded. See: cluster-provision.yml	RHBK, Gitea operator, OpenShift GitOps (ArgoCD), Tekton, ToolHive, CloudNativePG, ArgoCD AppProjects. These are generic — any lab on this cluster can rely on them.
Platform Lab-specific	Cluster-wide resources specific to what this lab needs on top of infra. Not per-user — shared across all tenants on the cluster. The lab developer decides what goes here.	Once per cluster — same provisioning run as infra	ArgoCD syncs `bootstrap-platform` Application	MCP lab: user workload monitoring ConfigMap. Service mesh lab example: a shared service mesh gateway. AI lab example: a shared LiteLLM instance. Anything cluster-wide but specific to this lab's needs.
Tenant Lab-specific	Per-user isolated environment — created per order, destroyed when done. Same owner as platform, just a different scope (per-user vs cluster-wide).	Every order	AgnosticD runs AgV `workloads:` list via Sandbox API scheduler	RHBK user, 6 namespaces, per-tenant Gitea instance, LiteMaaS virtual key. ArgoCD deploys: LibreChat, MCP servers, AI agent, Showroom.

All three layers are developer-owned — the distinction is scope and timing In the MCP lab, the cluster is dedicated to this lab. All three layers are set up by developers. The distinction is: Infra = the base cluster capabilities (installed once, cluster-wide, foundational). Platform = cluster-wide resources specific to what this lab needs on top of infra (MCP: user workload monitoring; a service mesh lab would add a service mesh gateway here). Tenant = per-user, per-order.

How OCP Sandbox API and ArgoCD Pick Up Namespaces

One of the most common questions: the GitOps repo has no namespace creation in it, yet everything deploys into the right namespaces. Here is exactly how it works — no magic.

Two ways namespaces are created

Option A — OCP Sandbox API creates them

Each entry in __meta__.sandboxes causes OCP Sandbox API to create one namespace before AgnosticD runs. Each entry gets its own quota: and limit_range:.

Use this when you want OCP Sandbox API to own the namespace lifecycle and you need per-namespace quota control.

Example: mcp-with-openshift-sandbox/common.yaml

Option B — Ansible role creates them

The ocp4_workload_tenant_namespace role creates all namespaces listed in ocp4_workload_tenant_namespace_suffixes. All get the same quota and limit_range. For different limits per namespace, call the role multiple times.

Use this when you need full control over naming and per-namespace RBAC before ArgoCD runs.

Role: ocp4_workload_tenant_namespace

How ArgoCD finds the right namespace — the naming contract

Both options follow the same convention: {suffix}-{username}. The bootstrap Helm chart constructs namespace names using this formula. As long as Ansible creates namespaces with the same formula, ArgoCD will find them.

Layer	Code	Result (username = mcpuser-drw4x)
Ansible creates namespace	`suffix: librechat` + `prefix: mcpuser-drw4x`	`librechat-mcpuser-drw4x`
ArgoCD Helm targets namespace	`printf "librechat-%s" $username` in applications.yaml	`librechat-mcpuser-drw4x`

CreateNamespace=false in every ArgoCD Application (see applications.yaml line 160) tells ArgoCD to deploy into the existing namespace. If the namespace does not exist, ArgoCD fails — enforcing that Ansible runs first.

Real example — Summit 2026 / Scheduler-Only: Ansible role creates namespaces Primary pattern

Ansible creates all namespaces before ArgoCD runs. Role source →

# Role: agnosticd.namespaced_workloads.ocp4_workload_tenant_namespace
# Runs BEFORE gitops_bootstrap — namespaces must exist before ArgoCD syncs.
ocp4_workload_tenant_namespace_prefix: "{{ ocp4_workload_tenant_keycloak_username }}"
ocp4_workload_tenant_namespace_namespaces:
- suffix: agent
  quota: { limits.cpu: "2", limits.memory: 4Gi }
  limit_range:
    default: { cpu: 500m, memory: 512Mi }
    defaultRequest: { cpu: 50m, memory: 128Mi }
- suffix: librechat
  quota: { limits.cpu: "4", limits.memory: 6Gi }   # LibreChat + MongoDB + Meilisearch
  limit_range:
    default: { cpu: 500m, memory: 512Mi }
    defaultRequest: { cpu: 50m, memory: 128Mi }
- suffix: mcp-gitea
  quota: { limits.cpu: "1", limits.memory: 2Gi }   # lightweight MCP proxy
  limit_range:
    default: { cpu: 500m, memory: 512Mi }
    defaultRequest: { cpu: 50m, memory: 128Mi }

Real example — Post-Summit / OCP Sandbox API: platform creates namespaces with per-entry quota

Each sandbox entry creates one namespace with its own quota — no Ansible role needed for namespace creation. Full file →

# Each sandboxes: entry creates one namespace with its own quota.
# OCP Sandbox API creates these BEFORE AgnosticD runs.
__meta__:
  sandboxes:
  # Primary — also schedules the cluster
  - kind: OcpSandbox
    namespace_suffix: user
    cloud_selector: { cloud: cnv-dedicated-shared, demo: mcp-with-openshift, purpose: prod }
    quota:
      limits.cpu: "2"
      limits.memory: 4Gi
  # LibreChat needs more memory
  - kind: OcpSandbox
    namespace_suffix: librechat
    cluster_condition: same('primary')
    quota:
      limits.cpu: "4"
      limits.memory: 8Gi
  # MCP servers need less
  - kind: OcpSandbox
    namespace_suffix: mcp-gitea
    cluster_condition: same('primary')
    quota:
      limits.cpu: "1"
      limits.memory: 2Gi

Different limits per namespace — Scheduler-Only pattern

The ocp4_workload_tenant_namespace role applies one quota/limit to all namespaces in the list. To give different namespaces different limits, split them into separate calls using ocp4_workload_tenant_namespace_suffixes with different quota values. Each role call creates a group of namespaces with shared limits.

# First call — heavy namespaces (LibreChat needs more)
workloads:
- agnosticd.namespaced_workloads.ocp4_workload_tenant_namespace

# In AgV, set vars for the first group:
ocp4_workload_tenant_namespace_suffixes:
- librechat
ocp4_workload_tenant_namespace_quota:
  limits.cpu: "4"
  limits.memory: 8Gi

# NOTE: The role must support a loop or be called via include_role with vars
# for multiple groups. Alternatively, use the OCP Sandbox API pattern (Option A)
# which natively supports per-namespace quota via separate sandboxes: entries.

Recommended: use OCP Sandbox API entries for per-namespace quota control If you need different resource limits in different namespaces, the cleanest approach is to use separate sandboxes: entries — each entry gets its own quota: block. This is what the MCP lab does and it is shown above.

This is the reference implementation The MCP with OpenShift lab is the reference for this pattern. All files referenced above are live code — not examples. Start from the GitOps repo and the AgV common.yaml.

How This Connects to the Sandbox API

The three-layer pattern maps directly onto the two main Sandbox API integration patterns:

The Sandbox API scheduler-only pattern maps directly to the Tenant layer. The Sandbox API does one thing: schedule a cluster and return a token and domain. Everything in the Tenant layer is done by your AgnosticD roles. See the Scheduler-Only guide.
Cluster Provisioner sets up the Infra and Platform layers once. A cluster-provision.yml playbook (separate from AgV) runs once per cluster when it joins the Sandbox API pool, installing all shared services (RHBK, Gitea, ArgoCD). After that, every order only touches the Tenant layer.

Real example: what went in each layer for the MCP lab See the Migration Guide → What we put in each layer (MCP lab) for a concrete breakdown of what roles and ArgoCD apps landed in each layer during the MCP lab migration.

Red Hat Demo Platform (RHDP) — Internal developer reference — GitHub

← Previous: Cluster Provisioning Next: Why + What Changed →