Model as a Service for Red Hat Demo Platform
Upgrading, scaling, model management, key cleanup, and database synchronization for a running LiteMaaS deployment.
In production, upgrades are handled by updating the deployment image tags. LiteMaaS uses rolling deployments — pods are replaced one at a time, so the service remains available during upgrades.
| Deployment | Current Image | Version |
|---|---|---|
litellm | quay.io/rh-aiservices-bu/litellm-non-root | main-v1.81.0-stable-custom |
litellm-backend | quay.io/rh-aiservices-bu/litemaas-backend | 0.4.0 |
litellm-frontend | quay.io/rh-aiservices-bu/litemaas-frontend | 0.4.0 |
litellm-redis | registry.redhat.io/rhel9/redis-7 | latest |
# Set namespace variable NS=litellm-rhpds NEW_VERSION=0.5.0 # Update backend image oc set image deployment/litellm-backend \ backend=quay.io/rh-aiservices-bu/litemaas-backend:${NEW_VERSION} \ -n ${NS} # Update frontend image oc set image deployment/litellm-frontend \ frontend=quay.io/rh-aiservices-bu/litemaas-frontend:${NEW_VERSION} \ -n ${NS} # Monitor rollout oc rollout status deployment/litellm-backend -n ${NS} oc rollout status deployment/litellm-frontend -n ${NS}
Frontend version mismatch: If the frontend shows a stale version after upgrade, check that the init container image (if used for static asset injection) matches the new frontend tag. A mismatch between the main container and init container is a common cause of version confusion. See Troubleshooting for the fix.
# Before upgrading LiteLLM, take a database backup oc exec -n ${NS} litellm-postgres-0 -- \ pg_dump -U litellm litellm | gzip > /tmp/litellm-pre-upgrade.sql.gz # Update LiteLLM image NEW_LITELLM_TAG=main-v1.85.0-stable-custom oc set image deployment/litellm \ litellm=quay.io/rh-aiservices-bu/litellm-non-root:${NEW_LITELLM_TAG} \ -n ${NS} # Watch the rollout -- LiteLLM runs DB migrations on startup oc rollout status deployment/litellm -n ${NS} # Check logs for migration output oc logs -n ${NS} deployment/litellm --tail=50 | grep -i migrat
# Override image tags in your vars file, then re-run the playbook
ansible-playbook playbooks/deploy_litemaas_ha.yml \
-e ocp4_workload_litemaas_namespace=litellm-rhpds \
-e ocp4_workload_litemaas_litellm_tag=main-v1.85.0-stable-custom \
-e ocp4_workload_litemaas_version=0.5.0
All three main deployments (LiteLLM, Backend, Frontend) support horizontal scaling. Scaling is stateless for Backend and Frontend; LiteLLM uses Redis for shared session state across replicas.
# Scale LiteLLM to 5 replicas oc scale deployment/litellm --replicas=5 -n litellm-rhpds # Or via patch oc patch deployment/litellm -n litellm-rhpds \ --type=json -p='[{"op":"replace","path":"/spec/replicas","value":5}]' # Verify oc get deployment/litellm -n litellm-rhpds
oc scale deployment/litellm-backend --replicas=3 -n litellm-rhpds oc scale deployment/litellm-frontend --replicas=3 -n litellm-rhpds
oc get deployments -n litellm-rhpds -o wide
Redis is required for multi-replica LiteLLM. Without Redis, each LiteLLM pod has its own
in-memory session cache. Key validation will be inconsistent across replicas if Redis is not running.
Verify Redis is healthy before scaling LiteLLM beyond 1 replica:
oc get pods -n litellm-rhpds -l app=litellm-redis
In a production deployment, virtual API keys accumulate over time. Workshop participants receive 30-day keys, and after the workshop ends, those keys remain in LiteLLM's database consuming storage and complicating admin views. The key cleanup cronjob handles this automatically.
expires timestamp is in the past (expired keys)expires - duration metadata)POST /key/deleteapi_keys table in LiteMaaS PostgreSQL to mark it inactive with revoked_at timestamp and sync_status='error'api_keys records inactive if their corresponding LiteLLM_VerificationToken no longer exists/var/log/litemaas-key-cleanup.log with logrotate (30-day retention)# Run from the rhpds.litemaas repository root # Detects whether running on bastion (direct write) or workstation (SSH) ./setup-key-cleanup-cronjob.sh litellm-rhpds
The script creates:
/usr/local/bin/cleanup-litemaas-keys-litellm-rhpds.sh — the cleanup script0 2 * * * /usr/local/bin/cleanup-litemaas-keys-litellm-rhpds.sh/etc/logrotate.d/litemaas-key-cleanup-litellm-rhpds — logrotate config# Run manually (dry-run not supported — this will actually delete keys) sudo /usr/local/bin/cleanup-litemaas-keys-litellm-rhpds.sh # Tail the log sudo tail -f /var/log/litemaas-key-cleanup.log # Verify cronjob is installed sudo crontab -l | grep cleanup-litemaas
sudo crontab -l | grep -v cleanup-litemaas-keys-litellm-rhpds.sh | sudo crontab - sudo rm /usr/local/bin/cleanup-litemaas-keys-litellm-rhpds.sh
Models must be registered in two places: the LiteLLM proxy (which handles actual inference routing) and the LiteMaaS backend database (which powers the user-facing model catalog). Failing to sync both causes the "subscription foreign key violation" error.
https://litellm-prod.apps.maas.redhatworkshops.io)llama-3-1-70b-instructhttp://llama-3-1-70b-predictor.llm-hosting.svc.cluster.local/v1sk-placeholder if no auth required# After adding via UI, sync to LiteMaaS backend database
LITELLM_URL=$(oc get route litellm-prod -n litellm-rhpds \
-o jsonpath='https://{.spec.host}')
LITELLM_KEY=$(oc get secret litellm-secret -n litellm-rhpds \
-o jsonpath='{.data.LITELLM_MASTER_KEY}' | base64 -d)
ansible-playbook playbooks/manage_models.yml \
-e litellm_url="${LITELLM_URL}" \
-e litellm_master_key="${LITELLM_KEY}" \
-e ocp4_workload_litemaas_models_namespace=litellm-rhpds \
-e ocp4_workload_litemaas_models_sync_from_litellm=true \
-e '{"ocp4_workload_litemaas_models_list": []}'
# Create a model config file
cat > new-model.yml <<'EOF'
litellm_url: "https://litellm-prod.apps.maas.redhatworkshops.io"
litellm_master_key: "sk-xxxxx"
ocp4_workload_litemaas_models_namespace: "litellm-rhpds"
ocp4_workload_litemaas_models_backend_enabled: true
ocp4_workload_litemaas_models_list:
- model_name: "llama-3-1-70b-instruct"
litellm_model: "openai/llama-3-1-70b-instruct"
api_base: "http://llama-3-1-70b-predictor.llm-hosting.svc.cluster.local/v1"
api_key: "sk-placeholder"
display_name: "Llama 3.1 70B Instruct"
description: "Meta Llama 3.1 70B instruction-tuned model"
provider: "openshift-ai"
category: "chat"
context_length: 131072
rpm: 30
tpm: 500000
EOF
ansible-playbook playbooks/manage_models.yml -e @new-model.yml
# Check LiteLLM has the model curl -X GET "${LITELLM_URL}/model/info" \ -H "Authorization: Bearer ${LITELLM_KEY}" | \ jq '.data[] | select(.model_name == "llama-3-1-70b-instruct") | .model_name' # Check LiteMaaS backend DB has the model oc exec -n litellm-rhpds litellm-postgres-0 -- \ psql -U litellm -d litellm -c \ "SELECT id, name, provider, availability FROM models WHERE id = 'llama-3-1-70b-instruct';"
| Parameter | Required | Default | Description |
|---|---|---|---|
model_name | Yes | — | Unique identifier passed to LiteLLM and used as model ID |
litellm_model | Yes | — | LiteLLM model format: openai/model-id for OpenAI-compatible endpoints |
api_base | Yes | — | Inference endpoint URL. Use internal ClusterIP for KServe models. |
api_key | Yes | — | Auth token. Use sk-placeholder if the endpoint requires no auth. |
display_name | No | model_name | Human-readable name shown in LiteMaaS UI |
provider | No | openshift-ai | Provider label (informational) |
category | No | general | Type: chat, code, general, embeddings |
context_length | No | null | Context window in tokens (display only) |
rpm | No | null | Requests per minute limit enforced by LiteLLM |
tpm | No | null | Tokens per minute limit enforced by LiteLLM |
supports_streaming | No | true | Enable streaming responses |
The LITELLM_AUTO_SYNC environment variable controls whether the LiteMaaS backend automatically
synchronizes its model database from LiteLLM on startup and at periodic intervals. When enabled, the backend
pulls the current model list from LiteLLM and updates its own models table accordingly.
oc exec -n litellm-rhpds deployment/litellm-backend -- \ sh -c 'echo LITELLM_AUTO_SYNC=$LITELLM_AUTO_SYNC'
# Set env var on the backend deployment oc set env deployment/litellm-backend \ LITELLM_AUTO_SYNC=true \ -n litellm-rhpds # Restart to apply oc rollout restart deployment/litellm-backend -n litellm-rhpds
| Scenario | Recommendation |
|---|---|
| Models added only via Ansible playbook | Auto-sync unnecessary — playbook handles both LiteLLM and backend DB |
| Models frequently added/removed via LiteLLM admin UI | Enable auto-sync to avoid running the sync playbook after each change |
| Production with tight change control | Disable auto-sync, use explicit sync playbook so changes are deliberate |
Auto-sync and model deletion: If a model is deleted from LiteLLM and auto-sync is enabled, the backend will remove it from its database too — which will cascade-delete user subscriptions to that model. Be cautious about deleting models from LiteLLM when users have active subscriptions.
LiteMaaS maintains its own database tables (users, subscriptions, api_keys, models) that must stay in sync
with LiteLLM's LiteLLM_VerificationToken and LiteLLM_ModelTable tables.
Divergence between the two databases is the root cause of most operational issues.
Both LiteMaaS backend and LiteLLM proxy connect to the same PostgreSQL instance (litellm-postgres-0), but use different schemas/tables:
| Table | Owner | Purpose |
|---|---|---|
users | LiteMaaS backend | User accounts, roles, OAuth IDs |
models | LiteMaaS backend | Model catalog with capability metadata |
subscriptions | LiteMaaS backend | User-to-model subscription records |
api_keys | LiteMaaS backend | Virtual key tracking with LiteLLM alias references |
LiteLLM_VerificationToken | LiteLLM proxy | The actual virtual key records with spend/limits |
LiteLLM_ModelTable | LiteLLM proxy | LiteLLM's internal model registry |
LiteLLM_SpendLogs | LiteLLM proxy | Per-request spend audit trail |
oc exec -n litellm-rhpds litellm-postgres-0 -- \
psql -U litellm -d litellm -c "
SELECT ak.id, ak.litellm_key_alias, ak.is_active, ak.created_at
FROM api_keys ak
WHERE ak.is_active = true
AND ak.litellm_key_alias IS NOT NULL
AND NOT EXISTS (
SELECT 1 FROM \"LiteLLM_VerificationToken\" lv
WHERE lv.key_alias = ak.litellm_key_alias
)
LIMIT 20;"
# Mark orphaned LiteMaaS keys as inactive
oc exec -n litellm-rhpds litellm-postgres-0 -- \
psql -U litellm -d litellm -c "
UPDATE api_keys
SET is_active = false,
revoked_at = NOW(),
sync_status = 'error',
sync_error = 'Key not found in LiteLLM - manual cleanup',
updated_at = NOW()
WHERE is_active = true
AND litellm_key_alias IS NOT NULL
AND NOT EXISTS (
SELECT 1 FROM \"LiteLLM_VerificationToken\" lv
WHERE lv.key_alias = api_keys.litellm_key_alias
);"
# List LiteLLM models missing from LiteMaaS models table curl -X GET "${LITELLM_URL}/model/info" \ -H "Authorization: Bearer ${LITELLM_KEY}" | \ jq '.data[].model_name' | sort > /tmp/litellm-models.txt oc exec -n litellm-rhpds litellm-postgres-0 -- \ psql -U litellm -d litellm -t -c "SELECT id FROM models;" | \ sort > /tmp/litemaas-models.txt # Show models in LiteLLM but not in LiteMaaS diff /tmp/litellm-models.txt /tmp/litemaas-models.txt
# Backend exposes a sync endpoint (admin auth required)
ADMIN_KEY=$(oc get secret backend-secret -n litellm-rhpds \
-o jsonpath='{.data.ADMIN_API_KEY}' | base64 -d)
BACKEND_URL=$(oc get route litellm-prod-admin -n litellm-rhpds \
-o jsonpath='https://{.spec.host}')
curl -X POST "${BACKEND_URL}/api/admin/sync-models" \
-H "Authorization: Bearer ${ADMIN_KEY}"
The user must have logged in via OAuth at least once before being promoted.
# Using the helper script (recommended) ./promote-admin.sh litellm-rhpds user@redhat.com # Or directly via psql oc exec -n litellm-rhpds \ $(oc get pods -n litellm-rhpds -l app=litellm-postgres -o name | head -1) -- \ psql -U litellm -d litellm -c \ "UPDATE users SET roles = ARRAY['admin', 'user'] WHERE email = 'user@redhat.com';" # Verify oc exec -n litellm-rhpds litellm-postgres-0 -- \ psql -U litellm -d litellm -c \ "SELECT username, email, roles FROM users WHERE 'admin' = ANY(roles);"
# LiteMaaS admin API key oc get secret backend-secret -n litellm-rhpds \ -o jsonpath='{.data.ADMIN_API_KEY}' | base64 -d # LiteLLM master key oc get secret litellm-secret -n litellm-rhpds \ -o jsonpath='{.data.LITELLM_MASTER_KEY}' | base64 -d # JWT secret (for session debugging) oc get secret backend-secret -n litellm-rhpds \ -o jsonpath='{.data.JWT_SECRET}' | base64 -d
# Requires S3 bucket with IAM role attached to bastion EC2 instance
./setup-litemaas-backup-cronjob.sh litellm-rhpds maas-db-backup
# Backup LiteMaaS + LiteLLM shared database oc exec -n litellm-rhpds litellm-postgres-0 -- \ pg_dump -U litellm litellm | gzip > litellm-backup-$(date +%Y%m%d).sql.gz # Upload to S3 aws s3 cp litellm-backup-$(date +%Y%m%d).sql.gz \ s3://maas-db-backup/litemaas-backups/
aws s3 ls s3://maas-db-backup/litemaas-backups/ --human-readable | sort -r
# Namespace shortcut NS=litellm-rhpds # Get all pod status oc get pods -n $NS # Get all routes oc get routes -n $NS # Tail LiteLLM logs (model routing) oc logs -n $NS deployment/litellm -f --tail=100 # Tail LiteMaaS backend logs (subscription/key ops) oc logs -n $NS deployment/litellm-backend -f --tail=100 # Tail frontend logs oc logs -n $NS deployment/litellm-frontend -f --tail=100 # Health check ROUTE=$(oc get route litellm-prod -n $NS -o jsonpath='{.spec.host}') curl -sk https://$ROUTE/health/livenessz # List all virtual keys (paginated) LITELLM_KEY=$(oc get secret litellm-secret -n $NS \ -o jsonpath='{.data.LITELLM_MASTER_KEY}' | base64 -d) curl "https://$ROUTE/key/list?return_full_object=true&size=100&page=1" \ -H "Authorization: Bearer $LITELLM_KEY" | jq '.keys | length' # Count active subscriptions in LiteMaaS DB oc exec -n $NS litellm-postgres-0 -- \ psql -U litellm -d litellm -c \ "SELECT COUNT(*) FROM subscriptions WHERE status = 'active';" # Force Redis cache flush (after model changes) REDIS_POD=$(oc get pods -n $NS -l app=litellm-redis -o name | head -1) oc exec -n $NS $REDIS_POD -- redis-cli FLUSHALL # Restart all LiteMaaS components (in order) oc rollout restart deployment/litellm-backend -n $NS oc rollout restart deployment/litellm -n $NS oc rollout restart deployment/litellm-frontend -n $NS