RHDP LiteMaaS

Model as a Service for Red Hat Demo Platform

Model Management

Managing the model registry — adding, syncing, and keeping LiteLLM and LiteMaaS databases in sync.

On This Page

Overview

Models must be registered in two places: the LiteLLM proxy (routing and inference) and the LiteMaaS backend database (user-facing catalog and subscriptions). This page covers how to register new models after deploying an InferenceService, how to enable automatic DB synchronization, and how to fix divergence between the two databases when it occurs.

Click the diagram to expand.

graph TD A[LiteLLM Proxy
LiteLLM_ProxyModelTable] -->|LITELLM_AUTO_SYNC| B[LiteMaaS Backend
models table] B --> C[User Portal - Model Catalog] D[Add via UI or API] --> A subgraph sync [When out of sync] E[Model in LiteLLM only] -->|subscription foreign key error| F[User cannot subscribe] end style A fill:#e8f4fd,stroke:#0066cc style B fill:#d4edda,stroke:#28a745 style C fill:#fff3cd,stroke:#f0a500 style E fill:#f8d7da,stroke:#dc3545 style F fill:#f8d7da,stroke:#dc3545

When to Use This

Register a new model after deploying a new InferenceService — add it to LiteLLM first, then sync to LiteMaaS so users can see and subscribe to it.

Sync models after a DB mismatch when you see subscription foreign key errors — a model exists in LiteLLM but is missing from the LiteMaaS models table. Use the manual sync endpoint or run the sync playbook.

Enable AUTO_SYNC to avoid manual sync steps when models are frequently added or removed via the LiteLLM Admin UI — the backend will pull updates automatically on startup and at regular intervals.

Adding Models

Option 1: Admin UI (recommended)

  1. Go to LiteMaaS FrontendAdmin → Models → Add Model
  2. Fill in:
    • Provider: OpenAI-Compatible
    • Model Name: e.g. my-new-model
    • API Base: KServe internal URL — e.g. http://my-model-predictor.llm-hosting.svc.cluster.local/v1
    • API Key: Service account token, or leave blank if no auth required
  3. Click Test Connect, then Add Model

Option 2: API

curl -sk -X POST "$LITELLM_URL/model/new" \
  -H "Authorization: Bearer $ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model_name":"my-new-model","litellm_params":{"model":"openai/my-new-model","api_base":"http://my-model-predictor.llm-hosting.svc.cluster.local/v1","custom_llm_provider":"openai"}}'

Verify the model is visible to users

curl -sk "$LITELLM_URL/v1/models" -H "Authorization: Bearer $ADMIN_KEY" | python3 -c "import sys,json; [print(m['id']) for m in json.load(sys.stdin)['data']]"

LITELLM_AUTO_SYNC

The LITELLM_AUTO_SYNC environment variable controls whether the LiteMaaS backend automatically synchronizes its model database from LiteLLM on startup and at periodic intervals. When enabled, the backend pulls the current model list from LiteLLM and updates its own models table accordingly.

Click the diagram to expand.

sequenceDiagram participant L as LiteLLM Proxy participant B as LiteMaaS Backend participant D as PostgreSQL Note over L,D: LITELLM_AUTO_SYNC=true L->>D: Query LiteLLM_ProxyModelTable L->>B: POST /api/v1/models (sync) B->>D: Upsert models table Note over B,D: Models now visible in user portal

Check Current Setting

oc exec -n litellm-rhpds deployment/litellm-backend -- \
  sh -c 'echo LITELLM_AUTO_SYNC=$LITELLM_AUTO_SYNC'

Enable Auto-Sync

# Set env var on the backend deployment
oc set env deployment/litellm-backend \
  LITELLM_AUTO_SYNC=true \
  -n litellm-rhpds

# Restart to apply
oc rollout restart deployment/litellm-backend -n litellm-rhpds

When to Use Auto-Sync

ScenarioRecommendation
Models added only via Ansible playbook Auto-sync unnecessary — playbook handles both LiteLLM and backend DB
Models frequently added/removed via LiteLLM admin UI Enable auto-sync to avoid running the sync playbook after each change
Production with tight change control Disable auto-sync, use explicit sync playbook so changes are deliberate

Auto-sync and model deletion: If a model is deleted from LiteLLM and auto-sync is enabled, the backend will remove it from its database too — which will cascade-delete user subscriptions to that model. Be cautious about deleting models from LiteLLM when users have active subscriptions.

DB Sync Between LiteMaaS and LiteLLM

LiteMaaS maintains its own database tables (users, subscriptions, api_keys, models) that must stay in sync with LiteLLM's LiteLLM_VerificationToken and LiteLLM_ModelTable tables. Divergence between the two databases is the root cause of most operational issues.

Understanding the Two Databases

Both LiteMaaS backend and LiteLLM proxy connect to the same PostgreSQL instance (litellm-postgres-0), but use different schemas/tables:

TableOwnerPurpose
usersLiteMaaS backendUser accounts, roles, OAuth IDs
modelsLiteMaaS backendModel catalog with capability metadata
subscriptionsLiteMaaS backendUser-to-model subscription records
api_keysLiteMaaS backendVirtual key tracking with LiteLLM alias references
LiteLLM_VerificationTokenLiteLLM proxyThe actual virtual key records with spend/limits
LiteLLM_ModelTableLiteLLM proxyLiteLLM's internal model registry
LiteLLM_SpendLogsLiteLLM proxyPer-request spend audit trail

Check for Orphaned Keys (LiteMaaS keys with no LiteLLM record)

oc exec -n litellm-rhpds litellm-postgres-0 -- \
  psql -U litellm -d litellm -c "
SELECT ak.id, ak.litellm_key_alias, ak.is_active, ak.created_at
FROM api_keys ak
WHERE ak.is_active = true
  AND ak.litellm_key_alias IS NOT NULL
  AND NOT EXISTS (
    SELECT 1 FROM \"LiteLLM_VerificationToken\" lv
    WHERE lv.key_alias = ak.litellm_key_alias
  )
LIMIT 20;"

Fix Orphaned Keys

# Mark orphaned LiteMaaS keys as inactive
oc exec -n litellm-rhpds litellm-postgres-0 -- \
  psql -U litellm -d litellm -c "
UPDATE api_keys
SET is_active = false,
    revoked_at = NOW(),
    sync_status = 'error',
    sync_error = 'Key not found in LiteLLM - manual cleanup',
    updated_at = NOW()
WHERE is_active = true
  AND litellm_key_alias IS NOT NULL
  AND NOT EXISTS (
    SELECT 1 FROM \"LiteLLM_VerificationToken\" lv
    WHERE lv.key_alias = api_keys.litellm_key_alias
  );"

Check for Models in LiteLLM but Not in LiteMaaS

# List LiteLLM models missing from LiteMaaS models table
curl -X GET "${LITELLM_URL}/model/info" \
  -H "Authorization: Bearer ${LITELLM_KEY}" | \
  jq '.data[].model_name' | sort > /tmp/litellm-models.txt

oc exec -n litellm-rhpds litellm-postgres-0 -- \
  psql -U litellm -d litellm -t -c "SELECT id FROM models;" | \
  sort > /tmp/litemaas-models.txt

# Show models in LiteLLM but not in LiteMaaS
diff /tmp/litellm-models.txt /tmp/litemaas-models.txt

Trigger Manual Sync

# Backend exposes a sync endpoint (admin auth required)
ADMIN_KEY=$(oc get secret backend-secret -n litellm-rhpds \
  -o jsonpath='{.data.ADMIN_API_KEY}' | base64 -d)
BACKEND_URL=$(oc get route litellm-prod-admin -n litellm-rhpds \
  -o jsonpath='https://{.spec.host}')

curl -X POST "${BACKEND_URL}/api/admin/sync-models" \
  -H "Authorization: Bearer ${ADMIN_KEY}"

External Provider Models (Vertex AI)

In addition to on-cluster KServe models, RHDP MaaS supports models hosted on external providers such as Google Vertex AI. These are accessed through the same OpenAI-compatible API endpoint and virtual key — users see no difference in how they call them.

External models are pay-per-token (no GPU to manage), and are suited for use cases that require larger models or specific capabilities not available on-cluster.

How Vertex AI Models Were Added

The following steps were taken to onboard Google Vertex AI models into RHDP MaaS. All models use the global region and the RHDP GCP project.

  1. Identified models in Google Vertex AI Model Garden — browsed the RHDP org GCP project for models available as fully managed MaaS APIs (pay-per-token, no GPU allocation required). Selected models that complement the on-cluster fleet: MiniMax M2, Qwen3 235B, GPT OSS 120B, and GPT OSS 20B.
  2. Created a Service Account (litemaas-vertex-sa) in the RHDP GCP project and granted it the roles/aiplatform.user role — the minimum permission needed to call Vertex AI prediction endpoints.
  3. Stored the SA key as an OCP secret — the JSON key was stored as a Kubernetes secret (vertex-ai-sa-key) in the litellm-rhpds namespace. No credentials are stored in code or config files.
  4. Mounted the secret in the LiteLLM deployment — the secret is volume-mounted at /etc/vertex-credentials/ inside the LiteLLM proxy pod so the proxy can authenticate to Vertex AI at request time.
  5. Registered each model via the LiteLLM API — each model was added using the POST /model/new endpoint with provider: vertex_ai, vertex_credentials pointing to the mounted key path, plus pricing (input/output cost per token) and per-key rate limits (TPM/RPM).

Adding a Service Account key to LiteLLM

Vertex AI models authenticate via a GCP Service Account JSON key. The key is stored as an OCP secret and volume-mounted into the LiteLLM pod — LiteMaaS backend has no knowledge of it.

Step 1 — Store the key as an OCP secret

oc create secret generic vertex-ai-sa-key \
  --from-file=key.json=/path/to/service-account.json \
  -n litellm-rhpds

Step 2 — Mount the secret into the LiteLLM pod

oc set volume deployment/litellm -n litellm-rhpds \
  --add --name=vertex-sa-key \
  --type=secret \
  --secret-name=vertex-ai-sa-key \
  --mount-path=/etc/vertex-credentials \
  --read-only

Step 3 — Reference the path when registering the model

Pass vertex_credentials pointing to the mounted file when calling /model/new:

curl -sk -X POST "$LITELLM_URL/model/new" \
  -H "Authorization: Bearer $ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model_name":"my-vertex-model",
       "litellm_params":{
         "model":"vertex_ai/publisher/model-maas",
         "vertex_project":"my-gcp-project",
         "vertex_location":"global",
         "vertex_credentials":"/etc/vertex-credentials/key.json"
       }}'

LiteLLM reads the key file at request time, generates a short-lived OAuth token, and calls Vertex AI. The credentials path is encrypted in LiteLLM_ProxyModelTable.

Note: If you have multiple SA keys (e.g. one for Model Garden and one for Claude/Gemini), mount each as a separate secret under a different path and reference accordingly.

Currently Available — Google Vertex AI

The following models are registered and available in RHDP MaaS via Google Vertex AI Model Garden:

ModelTypeBest forInput /1MOutput /1M
minimax-m2 Chat, Agentic Multi-step tool use, coding, office workflows $0.30$1.20
qwen3-235b Chat, Reasoning Multilingual, complex reasoning, large context tasks $0.22$0.88
gpt-oss-120b Chat, Reasoning Large-scale generation, function calling, agentic workflows $0.09$0.36
gpt-oss-20b Chat Cost-effective general tasks, lightweight agentic use $0.07$0.25
claude-sonnet-4-6 Chat General-purpose reasoning, coding, analysis $3.00$15.00
claude-opus-4-6 Chat Complex multi-step reasoning, long-document analysis $5.00$25.00
claude-sonnet-4-5 Chat Coding and structured output, previous-generation Sonnet $3.00$15.00
claude-3-5-haiku Chat High-throughput, classification, light Q&A — lowest Anthropic price $1.00$5.00
gemini-2.5-pro Chat Long-document analysis, 1M context, native Google model $1.25$10.00

All models support standard chat completions. The MiniMax, Qwen3, and GPT OSS models also support function calling and are suitable for Agentic and MCP workflows. Rate limits (TPM/RPM) and budget caps are enforced per key to keep usage sustainable.

Calling an external model

No change in how you call the API — use the same endpoint and your virtual key:

# Works exactly the same as on-cluster models
curl https://litellm-prod.apps.maas.redhatworkshops.io/v1/chat/completions \
  -H "Authorization: Bearer $YOUR_VIRTUAL_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "minimax-m2", "messages": [{"role": "user", "content": "your prompt"}]}'

Choose the right model for your use case — each has a different cost. Use on-cluster models for casual testing; reserve external models for specific use cases that require them.