1 โ What LiteMaaS Does #
LiteMaaS gives every user a single API endpoint to access AI models โ no infrastructure to manage, no credentials to share. Behind the scenes it handles access control, rate limiting, and spend tracking across all models.
graph LR
U["๐ค Users"]
subgraph LM["LiteMaaS Platform"]
UI["๐ฅ Web UI\nmodel catalog ยท API keys\nusage ยท settings"]
API["โ๏ธ Backend API\nsubscriptions ยท auth\nanalytics ยท admin"]
PROXY["๐ LiteLLM Proxy\nroutes requests\nenforces quotas\nlogs spend"]
end
subgraph OAI["OpenShift AI โ KServe"]
M1["๐ฌ Chat models"]
M2["๐ข Embedding model"]
M3["๐ Document Conversion"]
end
U -->|browse & manage| UI
U -->|"inference calls\nsk-... key"| PROXY
UI <-->|API| API
API <--> PROXY
PROXY --> M1 & M2 & M3
style LM fill:#fff8f8,stroke:#cc0000,stroke-width:2px
style OAI fill:#e3f2fd,stroke:#0066cc,stroke-width:2px
style U fill:#fce4ec,stroke:#cc0000,stroke-width:1px
2 โ Production Deployment #
All three LiteMaaS components run at 3 replicas. Both databases (LiteMaaS and LiteLLM) share one PostgreSQL server. Redis is used by the proxy for caching and rate limiting.
graph TB
subgraph ROUTES["OpenShift Routes โ maas.redhatworkshops.io"]
direction LR
R1["litellm-prod-frontend ยท Portal"]
R2["litellm-prod-admin ยท API"]
R3["litellm-prod ยท AI Proxy"]
end
subgraph APP["litellm-rhpds namespace โ ร3 replicas each"]
direction LR
FE["๐ฅ Frontend\nReact + nginx ยท v0.4.0"]
BE["โ๏ธ Backend\nFastify ยท v0.4.0"]
LLM["๐ LiteLLM Proxy\ncustom fork ยท v1.81.0"]
end
subgraph DATA["Data"]
direction LR
PG[("PostgreSQL 16\n10 Gi ยท litemaas_db + litellm_db\nbranding stored here")]
RD[("Redis\ncache + rate limits")]
end
R1 --> FE
R2 --> BE
R3 --> LLM
FE <--> BE
BE <--> LLM
BE & LLM --> PG
LLM --> RD
style ROUTES fill:#e8eaf6,stroke:#3949ab,stroke-width:1.5px
style APP fill:#fff8f8,stroke:#cc0000,stroke-width:2px
style DATA fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
3 โ Model Servers โ GitOps #
Model servers on OpenShift AI are managed entirely via GitOps. Committing a new InferenceService manifest to the repo automatically deploys it to the cluster. This is separate from the LiteMaaS deployment. Models can then be registered into LiteMaaS via the UI or using the ocp4_workload_litemaas_models role.
graph LR
GIT["๐ Git\nInferenceService manifests"]
GO["๐ OpenShift GitOps\nArgoCD"]
subgraph OAI["OpenShift AI โ llm-hosting namespace"]
direction LR
IS1["๐ฌ Chat model\nKServe ยท vLLM"]
IS2["๐ข Embedding model\nKServe ยท OpenVino"]
IS3["๐ Document Conversion\nKServe ยท Docling Serve"]
end
LM["๐ LiteMaaS\nregister via UI\nor Ansible role"]
GIT -->|commit| GO
GO --> IS1 & IS2 & IS3
IS1 & IS2 & IS3 -.->|"endpoint registered"| LM
style OAI fill:#e3f2fd,stroke:#0066cc,stroke-width:2px
style GIT fill:#f5f5f5,stroke:#555,stroke-width:1px
style GO fill:#e8f5e9,stroke:#2e7d32,stroke-width:1px
4 โ Platform Installation #
Done once by the infra team. The rhpds.litemaas collection (v0.3.2) deploys LiteMaaS in HA mode, registers model endpoints, and sets up Grafana dashboards.
graph LR
INFRA["๐ท Infra Team\none-time setup"]
subgraph C1["rhpds.litemaas v0.3.2"]
R1["ocp4_workload_litemaas\nโ deploys Frontend ยท Backend\nโ LiteLLM Proxy ยท PostgreSQL ยท Redis\nโ 3 replicas HA"]
R2["ocp4_workload_litemaas_models\nโ registers model endpoints\nโ syncs to LiteMaaS DB"]
R3["ocp4_workload_rhoai_metrics\nโ Grafana Operator v5\nโ ServiceMonitors per model\nโ GPU + vLLM dashboards"]
end
OCP["โ๏ธ OpenShift\nnamespace: litellm-rhpds"]
INFRA --> R1 & R2 & R3
R1 & R2 & R3 --> OCP
style C1 fill:#fff9c4,stroke:#f9a825,stroke-width:2px
style INFRA fill:#e8eaf6,stroke:#3949ab,stroke-width:1px
5 โ Per-Workshop Key Creation #
Runs automatically for every workshop participant. The rhpds.litellm_virtual_keys collection (v0.2.0) creates a scoped API key with models, budget, and rate limits defined in the AgnosticV catalog.
graph LR
USER["๐ค Workshop Participant\norders lab from catalog"]
AV["๐ AgnosticV\ndefines: models ยท budget\nduration ยท rate limits"]
BAB["โ๏ธ Babylon"]
subgraph C2["rhpds.litellm_virtual_keys v0.2.0"]
K1["ocp4_workload_litellm_bastion_profile\nโ configures bastion access"]
K2["ocp4_workload_litellm_virtual_keys\nโ validates models exist\nโ creates sk-... key per user\nโ sets budget + TPM/RPM limits\nโ delivers key in lab credentials"]
end
LM["๐ LiteMaaS\nPOST /key/generate"]
KEY["๐ sk-... virtual key"]
USER --> BAB
AV -->|defines what to run| BAB
BAB --> K1 --> K2
K2 --> LM --> KEY
style C2 fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
style BAB fill:#e8eaf6,stroke:#3949ab,stroke-width:1px
style USER fill:#fce4ec,stroke:#cc0000,stroke-width:1px
style AV fill:#fff3e0,stroke:#e65100,stroke-width:1px
style KEY fill:#fff9c4,stroke:#f9a825,stroke-width:2px