RHDP LiteMaaS โ€” Architecture

Production deployment on maas.redhatworkshops.io ยท Click any diagram to expand

1 โ€” What LiteMaaS Does #

LiteMaaS gives every user a single API endpoint to access AI models โ€” no infrastructure to manage, no credentials to share. Behind the scenes it handles access control, rate limiting, and spend tracking across all models.

LiteMaaS vs LiteLLM

LiteLLM is the open-source AI proxy engine that sits between users and model servers. It handles routing, quota enforcement, and spend logging. LiteMaaS is RHDP's platform built on top of LiteLLM โ€” it adds a web portal, subscription management, per-user API key provisioning, branding, and OIDC authentication. Think of LiteLLM as the engine and LiteMaaS as the car.

Two Clusters โ€” Same Access Pattern

RHDP MaaS now spans two clusters, but users access all models the same way โ€” through the single LiteMaaS portal and LiteLLM proxy. The Gaudi cluster models are registered in LiteLLM exactly like AWS models. No different endpoint, no different key.

AWS Cluster (Primary)Intel Gaudi Cluster (Rackspace DFW3)
Clustermaas.redhatworkshops.iomaas00.rs-dfw3.infra.demo.redhat.com
AccessLiteMaaS portal + LiteLLM proxySame LiteMaaS portal + LiteLLM proxy
Authsk-... virtual keySame sk-... virtual key
HardwareNVIDIA L40S / L4 / T48ร— Intel Gaudi 3 (Dell XE9680)
graph LR U["๐Ÿ‘ค Users"] subgraph LM["LiteMaaS Platform"] UI["๐Ÿ–ฅ Web UI\nmodel catalog ยท API keys\nusage ยท settings"] API["โš™๏ธ Backend API\nsubscriptions ยท auth\nanalytics ยท admin"] PROXY["๐Ÿ”€ LiteLLM Proxy\nroutes requests\nenforces quotas\nlogs spend"] end subgraph OAI["OpenShift AI โ€” KServe (AWS)"] M1["๐Ÿ’ฌ Chat models"] M2["๐Ÿ”ข Embedding model"] M3["๐Ÿ“„ Document Conversion"] end subgraph GAUDI["OpenShift AI โ€” KServe (Intel Gaudi ยท Rackspace)"] G1["๐Ÿ’ฌ deepseek-r1-distill-qwen-14b"] G2["๐Ÿ’ฌ qwen3-14b"] end U -->|browse & manage| UI U -->|"inference calls\nsk-... key"| PROXY UI <-->|API| API API <--> PROXY PROXY --> M1 & M2 & M3 PROXY --> G1 & G2 style LM fill:#fff8f8,stroke:#cc0000,stroke-width:2px style OAI fill:#e3f2fd,stroke:#0066cc,stroke-width:2px style GAUDI fill:#e6f2ff,stroke:#0071c5,stroke-width:2px style U fill:#fce4ec,stroke:#cc0000,stroke-width:1px

2 โ€” Production Deployment #

All three LiteMaaS components run at 3 replicas. Both databases (LiteMaaS and LiteLLM) share one PostgreSQL server. Redis is used by the proxy for caching and rate limiting.

graph TB subgraph ROUTES["OpenShift Routes โ€” maas.redhatworkshops.io"] direction LR R1["litellm-prod-frontend ยท Portal"] R2["litellm-prod-admin ยท API"] R3["litellm-prod ยท AI Proxy"] end subgraph APP["litellm-rhpds namespace โ€” ร—3 replicas each"] direction LR FE["๐Ÿ–ฅ Frontend\nReact + nginx ยท v0.4.0"] BE["โš™๏ธ Backend\nFastify ยท v0.4.0"] LLM["๐Ÿ”€ LiteLLM Proxy\ncustom fork ยท v1.81.0"] end subgraph DATA["Data"] direction LR PG[("PostgreSQL 16\n10 Gi ยท litemaas_db + litellm_db\nbranding stored here")] RD[("Redis\ncache + rate limits")] end R1 --> FE R2 --> BE R3 --> LLM FE <--> BE BE <--> LLM BE & LLM --> PG LLM --> RD style ROUTES fill:#e8eaf6,stroke:#3949ab,stroke-width:1.5px style APP fill:#fff8f8,stroke:#cc0000,stroke-width:2px style DATA fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px

3 โ€” Model Servers โ€” GitOps #

Model servers on OpenShift AI are managed entirely via GitOps. Committing a new InferenceService manifest to the repo automatically deploys it to the cluster. This is separate from the LiteMaaS deployment. Models can then be registered into LiteMaaS via the UI or using the ocp4_workload_litemaas_models role.

graph LR GIT["๐Ÿ“ Git\nInferenceService manifests"] GO["๐Ÿ”„ OpenShift GitOps\nArgoCD"] subgraph OAI["OpenShift AI โ€” llm-hosting namespace"] direction LR IS1["๐Ÿ’ฌ Chat model\nKServe ยท vLLM"] IS2["๐Ÿ”ข Embedding model\nKServe ยท OpenVino"] IS3["๐Ÿ“„ Document Conversion\nKServe ยท Docling Serve"] end LM["๐Ÿ”€ LiteMaaS\nregister via UI\nor Ansible role"] GIT -->|commit| GO GO --> IS1 & IS2 & IS3 IS1 & IS2 & IS3 -.->|"endpoint registered"| LM style OAI fill:#e3f2fd,stroke:#0066cc,stroke-width:2px style GIT fill:#f5f5f5,stroke:#555,stroke-width:1px style GO fill:#e8f5e9,stroke:#2e7d32,stroke-width:1px

4 โ€” Platform Installation #

Done once by the infra team. The rhpds.litemaas collection (v0.3.2) deploys LiteMaaS in HA mode, registers model endpoints, and sets up Grafana dashboards.

graph LR INFRA["๐Ÿ‘ท Infra Team\none-time setup"] subgraph C1["rhpds.litemaas v0.3.2"] R1["ocp4_workload_litemaas\nโ€” deploys Frontend ยท Backend\nโ€” LiteLLM Proxy ยท PostgreSQL ยท Redis\nโ€” 3 replicas HA"] R2["ocp4_workload_litemaas_models\nโ€” registers model endpoints\nโ€” syncs to LiteMaaS DB"] R3["ocp4_workload_rhoai_metrics\nโ€” Grafana Operator v5\nโ€” ServiceMonitors per model\nโ€” GPU + vLLM dashboards"] end OCP["โ˜๏ธ OpenShift\nnamespace: litellm-rhpds"] INFRA --> R1 & R2 & R3 R1 & R2 & R3 --> OCP style C1 fill:#fff9c4,stroke:#f9a825,stroke-width:2px style INFRA fill:#e8eaf6,stroke:#3949ab,stroke-width:1px

5 โ€” Per-Workshop Key Creation #

Runs automatically for every workshop participant. The rhpds.litellm_virtual_keys collection (v1.3.1) creates a scoped API key with models, budget, and rate limits defined in the AgnosticV catalog.

graph LR USER["๐Ÿ‘ค Workshop Participant\norders lab from catalog"] AV["๐Ÿ“‹ AgnosticV\ndefines: models ยท budget\nduration ยท rate limits"] BAB["โš™๏ธ Babylon"] subgraph C2["rhpds.litellm_virtual_keys v1.3.1"] K1["ocp4_workload_litellm_bastion_profile\nโ€” configures bastion access"] K2["ocp4_workload_litellm_virtual_keys\nโ€” validates models exist\nโ€” creates sk-... key per user\nโ€” sets budget + TPM/RPM limits\nโ€” delivers key in lab credentials"] end LM["๐Ÿ”€ LiteMaaS\nPOST /key/generate"] KEY["๐Ÿ”‘ sk-... virtual key"] USER --> BAB AV -->|defines what to run| BAB BAB --> K1 --> K2 K2 --> LM --> KEY style C2 fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px style BAB fill:#e8eaf6,stroke:#3949ab,stroke-width:1px style USER fill:#fce4ec,stroke:#cc0000,stroke-width:1px style AV fill:#fff3e0,stroke:#e65100,stroke-width:1px style KEY fill:#fff9c4,stroke:#f9a825,stroke-width:2px