RHDP LiteMaaS โ€” Architecture

Production deployment on maas.redhatworkshops.io ยท Click any diagram to expand

1 โ€” What LiteMaaS Does #

LiteMaaS gives every user a single API endpoint to access AI models โ€” no infrastructure to manage, no credentials to share. Behind the scenes it handles access control, rate limiting, and spend tracking across all models.

graph LR U["๐Ÿ‘ค Users"] subgraph LM["LiteMaaS Platform"] UI["๐Ÿ–ฅ Web UI\nmodel catalog ยท API keys\nusage ยท settings"] API["โš™๏ธ Backend API\nsubscriptions ยท auth\nanalytics ยท admin"] PROXY["๐Ÿ”€ LiteLLM Proxy\nroutes requests\nenforces quotas\nlogs spend"] end subgraph OAI["OpenShift AI โ€” KServe"] M1["๐Ÿ’ฌ Chat models"] M2["๐Ÿ”ข Embedding model"] M3["๐Ÿ“„ Document Conversion"] end U -->|browse & manage| UI U -->|"inference calls\nsk-... key"| PROXY UI <-->|API| API API <--> PROXY PROXY --> M1 & M2 & M3 style LM fill:#fff8f8,stroke:#cc0000,stroke-width:2px style OAI fill:#e3f2fd,stroke:#0066cc,stroke-width:2px style U fill:#fce4ec,stroke:#cc0000,stroke-width:1px

2 โ€” Production Deployment #

All three LiteMaaS components run at 3 replicas. Both databases (LiteMaaS and LiteLLM) share one PostgreSQL server. Redis is used by the proxy for caching and rate limiting.

graph TB subgraph ROUTES["OpenShift Routes โ€” maas.redhatworkshops.io"] direction LR R1["litellm-prod-frontend ยท Portal"] R2["litellm-prod-admin ยท API"] R3["litellm-prod ยท AI Proxy"] end subgraph APP["litellm-rhpds namespace โ€” ร—3 replicas each"] direction LR FE["๐Ÿ–ฅ Frontend\nReact + nginx ยท v0.4.0"] BE["โš™๏ธ Backend\nFastify ยท v0.4.0"] LLM["๐Ÿ”€ LiteLLM Proxy\ncustom fork ยท v1.81.0"] end subgraph DATA["Data"] direction LR PG[("PostgreSQL 16\n10 Gi ยท litemaas_db + litellm_db\nbranding stored here")] RD[("Redis\ncache + rate limits")] end R1 --> FE R2 --> BE R3 --> LLM FE <--> BE BE <--> LLM BE & LLM --> PG LLM --> RD style ROUTES fill:#e8eaf6,stroke:#3949ab,stroke-width:1.5px style APP fill:#fff8f8,stroke:#cc0000,stroke-width:2px style DATA fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px

3 โ€” Model Servers โ€” GitOps #

Model servers on OpenShift AI are managed entirely via GitOps. Committing a new InferenceService manifest to the repo automatically deploys it to the cluster. This is separate from the LiteMaaS deployment. Models can then be registered into LiteMaaS via the UI or using the ocp4_workload_litemaas_models role.

graph LR GIT["๐Ÿ“ Git\nInferenceService manifests"] GO["๐Ÿ”„ OpenShift GitOps\nArgoCD"] subgraph OAI["OpenShift AI โ€” llm-hosting namespace"] direction LR IS1["๐Ÿ’ฌ Chat model\nKServe ยท vLLM"] IS2["๐Ÿ”ข Embedding model\nKServe ยท OpenVino"] IS3["๐Ÿ“„ Document Conversion\nKServe ยท Docling Serve"] end LM["๐Ÿ”€ LiteMaaS\nregister via UI\nor Ansible role"] GIT -->|commit| GO GO --> IS1 & IS2 & IS3 IS1 & IS2 & IS3 -.->|"endpoint registered"| LM style OAI fill:#e3f2fd,stroke:#0066cc,stroke-width:2px style GIT fill:#f5f5f5,stroke:#555,stroke-width:1px style GO fill:#e8f5e9,stroke:#2e7d32,stroke-width:1px

4 โ€” Platform Installation #

Done once by the infra team. The rhpds.litemaas collection (v0.3.2) deploys LiteMaaS in HA mode, registers model endpoints, and sets up Grafana dashboards.

graph LR INFRA["๐Ÿ‘ท Infra Team\none-time setup"] subgraph C1["rhpds.litemaas v0.3.2"] R1["ocp4_workload_litemaas\nโ€” deploys Frontend ยท Backend\nโ€” LiteLLM Proxy ยท PostgreSQL ยท Redis\nโ€” 3 replicas HA"] R2["ocp4_workload_litemaas_models\nโ€” registers model endpoints\nโ€” syncs to LiteMaaS DB"] R3["ocp4_workload_rhoai_metrics\nโ€” Grafana Operator v5\nโ€” ServiceMonitors per model\nโ€” GPU + vLLM dashboards"] end OCP["โ˜๏ธ OpenShift\nnamespace: litellm-rhpds"] INFRA --> R1 & R2 & R3 R1 & R2 & R3 --> OCP style C1 fill:#fff9c4,stroke:#f9a825,stroke-width:2px style INFRA fill:#e8eaf6,stroke:#3949ab,stroke-width:1px

5 โ€” Per-Workshop Key Creation #

Runs automatically for every workshop participant. The rhpds.litellm_virtual_keys collection (v0.2.0) creates a scoped API key with models, budget, and rate limits defined in the AgnosticV catalog.

graph LR USER["๐Ÿ‘ค Workshop Participant\norders lab from catalog"] AV["๐Ÿ“‹ AgnosticV\ndefines: models ยท budget\nduration ยท rate limits"] BAB["โš™๏ธ Babylon"] subgraph C2["rhpds.litellm_virtual_keys v0.2.0"] K1["ocp4_workload_litellm_bastion_profile\nโ€” configures bastion access"] K2["ocp4_workload_litellm_virtual_keys\nโ€” validates models exist\nโ€” creates sk-... key per user\nโ€” sets budget + TPM/RPM limits\nโ€” delivers key in lab credentials"] end LM["๐Ÿ”€ LiteMaaS\nPOST /key/generate"] KEY["๐Ÿ”‘ sk-... virtual key"] USER --> BAB AV -->|defines what to run| BAB BAB --> K1 --> K2 K2 --> LM --> KEY style C2 fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px style BAB fill:#e8eaf6,stroke:#3949ab,stroke-width:1px style USER fill:#fce4ec,stroke:#cc0000,stroke-width:1px style AV fill:#fff3e0,stroke:#e65100,stroke-width:1px style KEY fill:#fff9c4,stroke:#f9a825,stroke-width:2px