Architecture — RHDP LiteMaaS

1 — What LiteMaaS Does #

LiteMaaS gives every user a single API endpoint to access AI models — no infrastructure to manage, no credentials to share. Behind the scenes it handles access control, rate limiting, and spend tracking across all models.

LiteMaaS vs LiteLLM

LiteLLM is the open-source AI proxy engine that sits between users and model servers. It handles routing, quota enforcement, and spend logging. LiteMaaS is RHDP's platform built on top of LiteLLM — it adds a web portal, subscription management, per-user API key provisioning, branding, and OIDC authentication. Think of LiteLLM as the engine and LiteMaaS as the car.

Two Clusters — Same Access Pattern

RHDP MaaS now spans two clusters, but users access all models the same way — through the single LiteMaaS portal and LiteLLM proxy. The Gaudi cluster models are registered in LiteLLM exactly like AWS models. No different endpoint, no different key.

	AWS Cluster (Primary)	Intel Gaudi Cluster (Rackspace DFW3)
Cluster	`maas.redhatworkshops.io`	`maas00.rs-dfw3.infra.demo.redhat.com`
Access	LiteMaaS portal + LiteLLM proxy	Same LiteMaaS portal + LiteLLM proxy
Auth	`sk-...` virtual key	Same `sk-...` virtual key
Hardware	NVIDIA L40S / L4 / T4	8× Intel Gaudi 3 (Dell XE9680)

graph LR U["👤 Users"] subgraph LM["LiteMaaS Platform"] UI["🖥 Web UI\nmodel catalog · API keys\nusage · settings"] API["⚙️ Backend API\nsubscriptions · auth\nanalytics · admin"] PROXY["🔀 LiteLLM Proxy\nroutes requests\nenforces quotas\nlogs spend"] end subgraph OAI["OpenShift AI — KServe (AWS)"] M1["💬 Chat models"] M2["🔢 Embedding model"] M3["📄 Document Conversion"] end subgraph GAUDI["OpenShift AI — KServe (Intel Gaudi · Rackspace)"] G1["💬 deepseek-r1-distill-qwen-14b"] G2["💬 qwen3-14b"] end U -->|browse & manage| UI U -->|"inference calls\nsk-... key"| PROXY UI <-->|API| API API <--> PROXY PROXY --> M1 & M2 & M3 PROXY --> G1 & G2 style LM fill:#fff8f8,stroke:#cc0000,stroke-width:2px style OAI fill:#e3f2fd,stroke:#0066cc,stroke-width:2px style GAUDI fill:#e6f2ff,stroke:#0071c5,stroke-width:2px style U fill:#fce4ec,stroke:#cc0000,stroke-width:1px

2 — Production Deployment #

All three LiteMaaS components run at 3 replicas. Both databases (LiteMaaS and LiteLLM) share one PostgreSQL server. Redis is used by the proxy for caching and rate limiting.

graph TB subgraph ROUTES["OpenShift Routes — maas.redhatworkshops.io"] direction LR R1["litellm-prod-frontend · Portal"] R2["litellm-prod-admin · API"] R3["litellm-prod · AI Proxy"] end subgraph APP["litellm-rhpds namespace — ×3 replicas each"] direction LR FE["🖥 Frontend\nReact + nginx · v0.4.0"] BE["⚙️ Backend\nFastify · v0.4.0"] LLM["🔀 LiteLLM Proxy\ncustom fork · v1.81.0"] end subgraph DATA["Data"] direction LR PG[("PostgreSQL 16\n10 Gi · litemaas_db + litellm_db\nbranding stored here")] RD[("Redis\ncache + rate limits")] end R1 --> FE R2 --> BE R3 --> LLM FE <--> BE BE <--> LLM BE & LLM --> PG LLM --> RD style ROUTES fill:#e8eaf6,stroke:#3949ab,stroke-width:1.5px style APP fill:#fff8f8,stroke:#cc0000,stroke-width:2px style DATA fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px

3 — Model Servers — GitOps #

Model servers on OpenShift AI are managed entirely via GitOps. Committing a new InferenceService manifest to the repo automatically deploys it to the cluster. This is separate from the LiteMaaS deployment. Models can then be registered into LiteMaaS via the UI or using the ocp4_workload_litemaas_models role.

graph LR GIT["📁 Git\nInferenceService manifests"] GO["🔄 OpenShift GitOps\nArgoCD"] subgraph OAI["OpenShift AI — llm-hosting namespace"] direction LR IS1["💬 Chat model\nKServe · vLLM"] IS2["🔢 Embedding model\nKServe · OpenVino"] IS3["📄 Document Conversion\nKServe · Docling Serve"] end LM["🔀 LiteMaaS\nregister via UI\nor Ansible role"] GIT -->|commit| GO GO --> IS1 & IS2 & IS3 IS1 & IS2 & IS3 -.->|"endpoint registered"| LM style OAI fill:#e3f2fd,stroke:#0066cc,stroke-width:2px style GIT fill:#f5f5f5,stroke:#555,stroke-width:1px style GO fill:#e8f5e9,stroke:#2e7d32,stroke-width:1px

4 — Platform Installation #

Done once by the infra team. The rhpds.litemaas collection (v0.3.2) deploys LiteMaaS in HA mode, registers model endpoints, and sets up Grafana dashboards.

graph LR INFRA["👷 Infra Team\none-time setup"] subgraph C1["rhpds.litemaas v0.3.2"] R1["ocp4_workload_litemaas\n— deploys Frontend · Backend\n— LiteLLM Proxy · PostgreSQL · Redis\n— 3 replicas HA"] R2["ocp4_workload_litemaas_models\n— registers model endpoints\n— syncs to LiteMaaS DB"] R3["ocp4_workload_rhoai_metrics\n— Grafana Operator v5\n— ServiceMonitors per model\n— GPU + vLLM dashboards"] end OCP["☁️ OpenShift\nnamespace: litellm-rhpds"] INFRA --> R1 & R2 & R3 R1 & R2 & R3 --> OCP style C1 fill:#fff9c4,stroke:#f9a825,stroke-width:2px style INFRA fill:#e8eaf6,stroke:#3949ab,stroke-width:1px

5 — Per-Workshop Key Creation #

Runs automatically for every workshop participant. The rhpds.litellm_virtual_keys collection (v1.3.1) creates a scoped API key with models, budget, and rate limits defined in the AgnosticV catalog.

graph LR USER["👤 Workshop Participant\norders lab from catalog"] AV["📋 AgnosticV\ndefines: models · budget\nduration · rate limits"] BAB["⚙️ Babylon"] subgraph C2["rhpds.litellm_virtual_keys v1.3.1"] K1["ocp4_workload_litellm_bastion_profile\n— configures bastion access"] K2["ocp4_workload_litellm_virtual_keys\n— validates models exist\n— creates sk-... key per user\n— sets budget + TPM/RPM limits\n— delivers key in lab credentials"] end LM["🔀 LiteMaaS\nPOST /key/generate"] KEY["🔑 sk-... virtual key"] USER --> BAB AV -->|defines what to run| BAB BAB --> K1 --> K2 K2 --> LM --> KEY style C2 fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px style BAB fill:#e8eaf6,stroke:#3949ab,stroke-width:1px style USER fill:#fce4ec,stroke:#cc0000,stroke-width:1px style AV fill:#fff3e0,stroke:#e65100,stroke-width:1px style KEY fill:#fff9c4,stroke:#f9a825,stroke-width:2px