LiteMaaS — Product Overview

What is RHDP MaaS?

RHDP MaaS (Model as a Service) is the Red Hat Demo Platform's AI model access platform. It provides users with a unified, governed interface to production-grade AI models hosted on OpenShift AI — without requiring each user to manage credentials, endpoints, or rate limits themselves.

RHDP MaaS is built on top of LiteMaaS, an open-source platform developed by the Red Hat AI Services BU team. LiteMaaS wraps the open-source LiteLLM proxy with a subscription and key management layer — adding per-user API key lifecycle management, subscription approval workflows, usage analytics, and branding. RHDP uses and extends LiteMaaS to run its own MaaS infrastructure.

Production deployment: The RHDP production instance runs at litellm-prod.apps.maas.redhatworkshops.io (API), litellm-prod-frontend.apps.maas.redhatworkshops.io (user portal), and litellm-prod-admin.apps.maas.redhatworkshops.io/api (admin API). It currently serves 8 running model predictors across chat, embedding, safety, and document conversion workloads.

The Problem It Solves

RHDP delivers hundreds of workshops and demos per year, many of which require participants to call AI model APIs. Without a managed service, every workshop had to solve these problems independently:

Credential sprawl: Sharing a single master API key across all participants is a security anti-pattern. Revoking access for one person requires rotating the key for everyone.
No rate limiting: A single participant could saturate a model endpoint and degrade the experience for everyone else in the workshop.
No budget controls: Without per-user spend limits, a misconfigured script could exhaust the budget for the entire event.
No audit trail: It was impossible to know which participant called which model, when, and how many tokens they used.
Model endpoint churn: As models are updated or replaced, every workshop's curl examples and SDK configurations had to be updated manually.
Restricted model access: Some models (e.g., large frontier models with high inference cost) should require admin approval before a user can subscribe.

LiteMaaS addresses all of these at the platform level, so individual workshops don't have to.

Key Features

Virtual API Key Management

Each user receives a personal virtual API key scoped to their subscription. Keys can have per-key TPM, RPM, and budget ceilings independently of other users.

Model Subscriptions

Users subscribe to models they need. Admins can mark models as "restricted" requiring approval before access is granted. Full audit trail on all state changes.

Usage Tracking & Analytics

Day-by-day incremental usage caching with multi-dimensional filtering (user, model, provider, API key). Export to CSV/JSON. Admin-only system-wide view.

Per-Key Rate & Budget Limits

Per-user configurable TPM (tokens per minute), RPM (requests per minute), max budget, budget duration, and soft budget thresholds enforced at the LiteLLM proxy layer.

Chat Playground

Built-in browser-based chat UI in the frontend. Users can interactively test models they are subscribed to without writing code. Only chat-capable models are shown.

Model Capability Labels

Color-coded labels on model cards: Chat (blue), Embeddings (green), Tokenize (red-orange), Document Conversion (orange). Curl examples adapt to capability type.

OpenShift OAuth Login

Three-tier RBAC (admin / adminReadonly / user) backed by OpenShift OAuth. Users log in with their OpenShift credentials — no separate account needed.

HA Architecture

All components run with multiple replicas. LiteLLM runs 3 replicas behind Redis for session and key caching. PostgreSQL 16 on a dedicated PVC. Stateless frontend and backend scale horizontally.

Branding Customization

Admin-controlled login page branding: custom logo, title, subtitle, header logos (light/dark), and footer text. Stored in database, served via public API endpoint.

Automated Key Cleanup

Daily cron job on the bastion host purges expired or stale keys (older than 30 days) from LiteLLM and syncs revocation status back to the LiteMaaS database.

Backup & Restore

Monthly automated pg_dump to S3 with VolumeSnapshot fallback. Admin-initiated backup and test restore from the Settings UI. 12-month retention for SQL dumps.

Docling Integration

Granite Docling 258M model exposed via a dedicated document conversion endpoint. Frontend hides irrelevant fields (TPM costs, max tokens) for this model type.

LiteLLM, LiteLLM Proxy, and LiteMaaS — what's what

These three terms are often confused. Here is the precise distinction:

LiteLLM (the library)

An open-source Python library that provides a single interface to 100+ LLM providers. It knows how to translate requests to the format each provider expects — OpenAI, Anthropic, Vertex AI, Bedrock, etc. — and converts responses back to a standard shape. This is the translator/engine.

LiteLLM Proxy (the gateway)

A FastAPI HTTP server built on top of the LiteLLM library. It exposes an OpenAI-compatible endpoint so any OpenAI client can connect to it without modification. It adds virtual keys, rate limiting (TPM/RPM), spend tracking, load balancing, and model routing. This is the gateway — raw infrastructure, no user-facing UI.

LiteMaaS (the platform)

Built by Red Hat AI Services BU on top of the LiteLLM Proxy. It adds the self-service portal layer: users log in with Red Hat SSO, browse a model catalog, subscribe to models, create and manage API keys, and view their usage — without talking to an admin. LiteMaaS is what turns the raw proxy into a product.

Simple analogy: LiteLLM library = engine | LiteLLM Proxy = car | LiteMaaS = dealership with showroom, customer accounts, and self-service.

When something breaks at the routing or key level → LiteLLM Proxy. When something breaks at the user, subscription, or catalog level → LiteMaaS. When a new AI provider needs support → LiteLLM library.

Three-Layer Architecture

LiteMaaS is composed of three distinct layers. Each layer has a clear responsibility and communicates with the others over well-defined internal interfaces.

Layer 1 — User Portal (LiteMaaS Frontend + Backend)

The LiteMaaS custom application: React + PatternFly 6 frontend, Fastify backend API. Handles user authentication (OpenShift OAuth), subscription management, API key lifecycle, usage analytics, admin workflows, and branding. Runs as litellm-frontend and litellm-backend deployments. Does not handle LLM inference directly — all model calls go through Layer 2.

Layer 2 — LiteLLM Proxy

The open-source LiteLLM proxy (quay.io/rh-aiservices-bu/litellm-non-root:main-v1.81.0-stable-custom) running as 3 HA replicas. Handles all actual model routing, virtual key enforcement (TPM/RPM/budget), request rate limiting, caching via Redis, and spend tracking. Stores its state in PostgreSQL. Exposes the OpenAI-compatible API on the litellm-prod route.

Layer 3 — Model Serving (OpenShift AI / KServe)

The actual model inference layer — Red Hat OpenShift AI (RHOAI) running KServe predictors in the llm-hosting namespace. Models are deployed as InferenceServices on GPU or CPU nodes. LiteLLM calls these internal ClusterIP services directly over the cluster network (no external routing required for model traffic). Currently hosting Granite, Llama Scout, CodeLlama, Nomic Embed, Llama Guard, and Granite Docling models.

User Role Hierarchy

LiteMaaS implements a three-tier RBAC system enforced at both the backend API level and the frontend UI level.

Role	How Assigned	Capabilities
admin	Manually via `promote-admin.sh` or psql UPDATE	Full platform control: manage users, models, budgets, subscriptions, approve restricted access, view audit logs, configure branding, backup/restore
adminReadonly	Manually assigned	Read-only admin views: view all users, analytics, audit logs, and system state — cannot modify
user	Automatically on first OAuth login	Browse and subscribe to models, manage own API keys (create/revoke), view own usage, use Chat Playground

Important: Users must log in via OAuth at least once before being promoted to admin. Direct database inserts will break OAuth login because the OpenShift OAuth ID (a UUID) will not be set. See Admin Setup for the correct procedure.

Model Capability System

LiteMaaS supports more than just chat models. Each model is tagged with one or more capability types that control how it appears in the UI, which endpoints are available, and what curl examples are shown to users.

Capability	Badge	API Endpoint	Example Models
Chat	Chat	`/v1/chat/completions`	granite-3-2-8b-instruct, llama-scout-17b, granite-4-0-h-tiny, codellama-7b-instruct
Embeddings	Embeddings	`/v1/embeddings`	nomic-embed-text-v1-5
Tokenize	Tokenize	`/v1/tokenize`	Additive flag on chat models
Document Conversion	Docling	`/docling`	granite-docling-258m
Safety / Guardrails	Safety	`/v1/chat/completions`	llama-guard-3-1b

End-to-End User Flow

Click the diagram to expand.

sequenceDiagram participant U as User participant FE as LiteMaaS Frontend participant BE as LiteMaaS Backend participant LL as LiteLLM Proxy participant M as Model (KServe) U->>FE: Visit portal, click Login FE->>BE: GET /api/auth/login BE->>BE: Redirect to OpenShift OAuth U->>BE: Complete OAuth, callback BE->>BE: Create/update user in DB, issue JWT BE->>FE: Set session cookie U->>FE: Browse model catalog FE->>BE: GET /api/models BE->>FE: Return model list with capability badges U->>FE: Click "Subscribe" on a model FE->>BE: POST /api/subscriptions BE->>LL: POST /key/generate (scoped to model) LL->>BE: Return virtual key (sk-...) BE->>BE: Store key in api_keys table BE->>FE: Return key to user U->>LL: POST /v1/chat/completions with sk-... LL->>LL: Validate key, check TPM/RPM/budget LL->>M: Forward request to KServe predictor M->>LL: Return inference response LL->>U: Stream response back

Production Endpoints

Route Name	URL	Purpose	Audience
`litellm-prod`	`https://litellm-prod.apps.maas.redhatworkshops.io`	OpenAI-compatible API endpoint (LiteLLM proxy)	All API users, SDK clients
`litellm-prod-frontend`	`https://litellm-prod-frontend.apps.maas.redhatworkshops.io`	LiteMaaS user portal (React UI)	Workshop participants, end users
`litellm-prod-admin`	`https://litellm-prod-admin.apps.maas.redhatworkshops.io/api`	LiteMaaS admin API (backend)	Admin scripts, automation

All routes use edge TLS termination with automatic redirect from HTTP. HAProxy timeout is set to 600 seconds to support long-running inference with large context windows.