Model as a Service for Red Hat Demo Platform
Managing the model registry — adding, syncing, and keeping LiteLLM and LiteMaaS databases in sync.
Models must be registered in two places: the LiteLLM proxy (routing and inference) and the LiteMaaS backend database (user-facing catalog and subscriptions). This page covers how to register new models after deploying an InferenceService, how to enable automatic DB synchronization, and how to fix divergence between the two databases when it occurs.
Click the diagram to expand.
Register a new model after deploying a new InferenceService — add it to LiteLLM first, then sync to LiteMaaS so users can see and subscribe to it.
Sync models after a DB mismatch when you see subscription foreign key errors — a model exists in LiteLLM but is missing from the LiteMaaS models table. Use the manual sync endpoint or run the sync playbook.
Enable AUTO_SYNC to avoid manual sync steps when models are frequently added or removed via the LiteLLM Admin UI — the backend will pull updates automatically on startup and at regular intervals.
my-new-modelhttp://my-model-predictor.llm-hosting.svc.cluster.local/v1curl -sk -X POST "$LITELLM_URL/model/new" \
-H "Authorization: Bearer $ADMIN_KEY" \
-H "Content-Type: application/json" \
-d '{"model_name":"my-new-model","litellm_params":{"model":"openai/my-new-model","api_base":"http://my-model-predictor.llm-hosting.svc.cluster.local/v1","custom_llm_provider":"openai"}}'
curl -sk "$LITELLM_URL/v1/models" -H "Authorization: Bearer $ADMIN_KEY" | python3 -c "import sys,json; [print(m['id']) for m in json.load(sys.stdin)['data']]"
The LITELLM_AUTO_SYNC environment variable controls whether the LiteMaaS backend automatically
synchronizes its model database from LiteLLM on startup and at periodic intervals. When enabled, the backend
pulls the current model list from LiteLLM and updates its own models table accordingly.
Click the diagram to expand.
oc exec -n litellm-rhpds deployment/litellm-backend -- \ sh -c 'echo LITELLM_AUTO_SYNC=$LITELLM_AUTO_SYNC'
# Set env var on the backend deployment oc set env deployment/litellm-backend \ LITELLM_AUTO_SYNC=true \ -n litellm-rhpds # Restart to apply oc rollout restart deployment/litellm-backend -n litellm-rhpds
| Scenario | Recommendation |
|---|---|
| Models added only via Ansible playbook | Auto-sync unnecessary — playbook handles both LiteLLM and backend DB |
| Models frequently added/removed via LiteLLM admin UI | Enable auto-sync to avoid running the sync playbook after each change |
| Production with tight change control | Disable auto-sync, use explicit sync playbook so changes are deliberate |
Auto-sync and model deletion: If a model is deleted from LiteLLM and auto-sync is enabled, the backend will remove it from its database too — which will cascade-delete user subscriptions to that model. Be cautious about deleting models from LiteLLM when users have active subscriptions.
LiteMaaS maintains its own database tables (users, subscriptions, api_keys, models) that must stay in sync
with LiteLLM's LiteLLM_VerificationToken and LiteLLM_ModelTable tables.
Divergence between the two databases is the root cause of most operational issues.
Both LiteMaaS backend and LiteLLM proxy connect to the same PostgreSQL instance (litellm-postgres-0), but use different schemas/tables:
| Table | Owner | Purpose |
|---|---|---|
users | LiteMaaS backend | User accounts, roles, OAuth IDs |
models | LiteMaaS backend | Model catalog with capability metadata |
subscriptions | LiteMaaS backend | User-to-model subscription records |
api_keys | LiteMaaS backend | Virtual key tracking with LiteLLM alias references |
LiteLLM_VerificationToken | LiteLLM proxy | The actual virtual key records with spend/limits |
LiteLLM_ModelTable | LiteLLM proxy | LiteLLM's internal model registry |
LiteLLM_SpendLogs | LiteLLM proxy | Per-request spend audit trail |
oc exec -n litellm-rhpds litellm-postgres-0 -- \
psql -U litellm -d litellm -c "
SELECT ak.id, ak.litellm_key_alias, ak.is_active, ak.created_at
FROM api_keys ak
WHERE ak.is_active = true
AND ak.litellm_key_alias IS NOT NULL
AND NOT EXISTS (
SELECT 1 FROM \"LiteLLM_VerificationToken\" lv
WHERE lv.key_alias = ak.litellm_key_alias
)
LIMIT 20;"
# Mark orphaned LiteMaaS keys as inactive
oc exec -n litellm-rhpds litellm-postgres-0 -- \
psql -U litellm -d litellm -c "
UPDATE api_keys
SET is_active = false,
revoked_at = NOW(),
sync_status = 'error',
sync_error = 'Key not found in LiteLLM - manual cleanup',
updated_at = NOW()
WHERE is_active = true
AND litellm_key_alias IS NOT NULL
AND NOT EXISTS (
SELECT 1 FROM \"LiteLLM_VerificationToken\" lv
WHERE lv.key_alias = api_keys.litellm_key_alias
);"
# List LiteLLM models missing from LiteMaaS models table curl -X GET "${LITELLM_URL}/model/info" \ -H "Authorization: Bearer ${LITELLM_KEY}" | \ jq '.data[].model_name' | sort > /tmp/litellm-models.txt oc exec -n litellm-rhpds litellm-postgres-0 -- \ psql -U litellm -d litellm -t -c "SELECT id FROM models;" | \ sort > /tmp/litemaas-models.txt # Show models in LiteLLM but not in LiteMaaS diff /tmp/litellm-models.txt /tmp/litemaas-models.txt
# Backend exposes a sync endpoint (admin auth required)
ADMIN_KEY=$(oc get secret backend-secret -n litellm-rhpds \
-o jsonpath='{.data.ADMIN_API_KEY}' | base64 -d)
BACKEND_URL=$(oc get route litellm-prod-admin -n litellm-rhpds \
-o jsonpath='https://{.spec.host}')
curl -X POST "${BACKEND_URL}/api/admin/sync-models" \
-H "Authorization: Bearer ${ADMIN_KEY}"
In addition to on-cluster KServe models, RHDP MaaS supports models hosted on external providers such as Google Vertex AI. These are accessed through the same OpenAI-compatible API endpoint and virtual key — users see no difference in how they call them.
External models are pay-per-token (no GPU to manage), and are suited for use cases that require larger models or specific capabilities not available on-cluster.
The following steps were taken to onboard Google Vertex AI models into RHDP MaaS.
All models use the global region and the RHDP GCP project.
litemaas-vertex-sa) in the RHDP GCP project
and granted it the roles/aiplatform.user role — the minimum permission needed
to call Vertex AI prediction endpoints.
vertex-ai-sa-key) in the litellm-rhpds namespace.
No credentials are stored in code or config files.
/etc/vertex-credentials/ inside the LiteLLM proxy pod so the proxy can
authenticate to Vertex AI at request time.
POST /model/new endpoint with provider: vertex_ai,
vertex_credentials pointing to the mounted key path, plus pricing
(input/output cost per token) and per-key rate limits (TPM/RPM).
Vertex AI models authenticate via a GCP Service Account JSON key. The key is stored as an OCP secret and volume-mounted into the LiteLLM pod — LiteMaaS backend has no knowledge of it.
oc create secret generic vertex-ai-sa-key \ --from-file=key.json=/path/to/service-account.json \ -n litellm-rhpds
oc set volume deployment/litellm -n litellm-rhpds \ --add --name=vertex-sa-key \ --type=secret \ --secret-name=vertex-ai-sa-key \ --mount-path=/etc/vertex-credentials \ --read-only
Pass vertex_credentials pointing to the mounted file when calling /model/new:
curl -sk -X POST "$LITELLM_URL/model/new" \
-H "Authorization: Bearer $ADMIN_KEY" \
-H "Content-Type: application/json" \
-d '{"model_name":"my-vertex-model",
"litellm_params":{
"model":"vertex_ai/publisher/model-maas",
"vertex_project":"my-gcp-project",
"vertex_location":"global",
"vertex_credentials":"/etc/vertex-credentials/key.json"
}}'
LiteLLM reads the key file at request time, generates a short-lived OAuth token, and calls Vertex AI. The credentials path is encrypted in LiteLLM_ProxyModelTable.
Note: If you have multiple SA keys (e.g. one for Model Garden and one for Claude/Gemini), mount each as a separate secret under a different path and reference accordingly.
The following models are registered and available in RHDP MaaS via Google Vertex AI Model Garden:
| Model | Type | Best for | Input /1M | Output /1M |
|---|---|---|---|---|
minimax-m2 |
Chat, Agentic | Multi-step tool use, coding, office workflows | $0.30 | $1.20 |
qwen3-235b |
Chat, Reasoning | Multilingual, complex reasoning, large context tasks | $0.22 | $0.88 |
gpt-oss-120b |
Chat, Reasoning | Large-scale generation, function calling, agentic workflows | $0.09 | $0.36 |
gpt-oss-20b |
Chat | Cost-effective general tasks, lightweight agentic use | $0.07 | $0.25 |
claude-sonnet-4-6 |
Chat | General-purpose reasoning, coding, analysis | $3.00 | $15.00 |
claude-opus-4-6 |
Chat | Complex multi-step reasoning, long-document analysis | $5.00 | $25.00 |
claude-sonnet-4-5 |
Chat | Coding and structured output, previous-generation Sonnet | $3.00 | $15.00 |
claude-3-5-haiku |
Chat | High-throughput, classification, light Q&A — lowest Anthropic price | $1.00 | $5.00 |
gemini-2.5-pro |
Chat | Long-document analysis, 1M context, native Google model | $1.25 | $10.00 |
All models support standard chat completions. The MiniMax, Qwen3, and GPT OSS models also support function calling and are suitable for Agentic and MCP workflows. Rate limits (TPM/RPM) and budget caps are enforced per key to keep usage sustainable.
No change in how you call the API — use the same endpoint and your virtual key:
# Works exactly the same as on-cluster models
curl https://litellm-prod.apps.maas.redhatworkshops.io/v1/chat/completions \
-H "Authorization: Bearer $YOUR_VIRTUAL_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "minimax-m2", "messages": [{"role": "user", "content": "your prompt"}]}'
Choose the right model for your use case — each has a different cost. Use on-cluster models for casual testing; reserve external models for specific use cases that require them.