Model as a Service for Red Hat Demo Platform
Full installation of LiteMaaS on OpenShift using the rhpds.litemaas Ansible collection.
Before running the deployment playbook, verify these requirements are met.
quay.io and registry.redhat.iokubernetes.core collectionoc CLI authenticated to the target clusterpython3 with kubernetes and openshift pip packages# Install Python dependencies pip install kubernetes openshift # Install required Ansible collections ansible-galaxy collection install kubernetes.core community.general
The deployment pulls images from two registries. Verify cluster access before starting:
| Registry | Images | Auth Required |
|---|---|---|
quay.io/rh-aiservices-bu | litellm-non-root, litemaas-backend, litemaas-frontend | Public pull |
registry.redhat.io/rhel9 | redis-7 (primary), falls back to quay.io/sclorg | Red Hat credentials |
The HA deployment task automatically tests image accessibility and falls back to an alternative image if the primary cannot be pulled. The fallback for Redis is quay.io/sclorg/redis-7-c9s:latest.
LiteMaaS is distributed as an Ansible collection. Install it from the repository or via Ansible Galaxy.
# Clone the collection repository git clone https://github.com/rhpds/rhpds.litemaas.git ~/work/code/rhpds.litemaas cd ~/work/code/rhpds.litemaas # Install the collection locally ansible-galaxy collection install . --force
For catalog-deployed workloads, add the collection to your AgnosticV common.yaml:
# In agnosticv/config/catalog-item/common.yaml requirements_collections: - name: rhpds.litemaas source: https://github.com/rhpds/rhpds.litemaas.git type: git version: main workloads: - rhpds.litemaas.ocp4_workload_litemaas
# Add the chart repository (if published) or install from local path
helm install litemaas ~/work/code/rh-litemaas/deployment/helm/litemaas/ \
--namespace litemaas \
--create-namespace \
-f my-values.yaml
Create an extra vars file. At minimum you need to set the namespace and decide whether to enable OAuth.
# litemaas-vars.yml --- ocp4_workload_litemaas_namespace: "litemaas" # OAuth login (recommended for production) ocp4_workload_litemaas_oauth_enabled: true # Scale LiteLLM replicas (default 3) ocp4_workload_litemaas_ha_litellm_replicas: 3 # Enable RHDP branding ocp4_workload_litemaas_branding_enabled: true # Route prefix customization (optional) ocp4_workload_litemaas_api_route_prefix: "litellm-prod" ocp4_workload_litemaas_frontend_route_prefix: "litellm-prod-frontend" ocp4_workload_litemaas_admin_route_prefix: "litellm-prod-admin" # Pre-configure models (optional — can also be done via UI after deploy) ocp4_workload_litemaas_litellm_models: - model_name: "granite-3-2-8b-instruct" litellm_model: "openai/granite-3-2-8b-instruct" api_base: "http://granite-3-2-8b-instruct-predictor.llm-hosting.svc.cluster.local/v1" api_key: "sk-placeholder" rpm: 120 tpm: 200000
# Deploy LiteMaaS in HA mode ansible-playbook playbooks/deploy_litemaas_ha.yml \ -e @litemaas-vars.yml # Or pass variables inline ansible-playbook playbooks/deploy_litemaas_ha.yml \ -e ocp4_workload_litemaas_namespace=litemaas \ -e ocp4_workload_litemaas_oauth_enabled=true \ -e ocp4_workload_litemaas_branding_enabled=true
The HA task deploys components in this sequence. Each component waits for its predecessor to be ready before continuing:
After successful deployment, the playbook prints connection details via agnosticd_user_info:
# Example output
LiteLLM Admin Portal: https://litellm-prod.apps.cluster.example.com
LiteLLM Admin Login: admin / <auto-generated-password>
LiteLLM Master API Key: sk-<auto-generated-key>
LiteMaaS User Interface: https://litellm-prod-frontend.apps.cluster.example.com
All variables with their defaults from roles/ocp4_workload_litemaas/defaults/main.yml.
| Variable | Default | Required | Description |
|---|---|---|---|
ocp4_workload_litemaas_namespace | litemaas | optional | OpenShift namespace to deploy into |
ocp4_workload_litemaas_version | 0.2.2 | optional | Collection version tag (metadata only) |
ocp4_workload_litemaas_remove | false | optional | Set to true to uninstall (deletes namespace) |
ocp4_workload_litemaas_cluster_domain | "" | optional | Cluster apps domain (auto-detected from cluster if empty) |
| Variable | Default | Description |
|---|---|---|
ocp4_workload_litemaas_litellm_image | quay.io/rh-aiservices-bu/litellm-non-root | LiteLLM proxy image repository |
ocp4_workload_litemaas_litellm_tag | main-v1.81.0-stable | LiteLLM image tag |
ocp4_workload_litemaas_backend_image | quay.io/rhpds/litemaas | LiteMaaS backend image repository |
ocp4_workload_litemaas_backend_tag | backend-0.2.2 | Backend image tag |
ocp4_workload_litemaas_frontend_image | quay.io/rhpds/litemaas | LiteMaaS frontend image repository |
ocp4_workload_litemaas_frontend_tag | frontend-0.2.2 | Frontend image tag |
Production images: The running production deployment uses
quay.io/rh-aiservices-bu/litellm-non-root:main-v1.81.0-stable-custom for LiteLLM and
quay.io/rh-aiservices-bu/litemaas-backend:0.4.0 /
quay.io/rh-aiservices-bu/litemaas-frontend:0.4.0 for LiteMaaS.
The production image registry is quay.io/rh-aiservices-bu, not the default quay.io/rhpds.
| Variable | Default | Description |
|---|---|---|
ocp4_workload_litemaas_api_route_prefix | litellm | Subdomain prefix for the LiteLLM API route |
ocp4_workload_litemaas_admin_route_prefix | litellm-admin | Subdomain prefix for the admin/backend route |
ocp4_workload_litemaas_frontend_route_prefix | litellm-frontend | Subdomain prefix for the frontend UI route |
| Variable | Default | Description |
|---|---|---|
ocp4_workload_litemaas_postgres_password | random 32-char | PostgreSQL database password |
ocp4_workload_litemaas_jwt_secret | random 32-char | JWT signing secret for LiteMaaS backend sessions |
ocp4_workload_litemaas_admin_api_key | random 32-char | LiteMaaS admin API key (stored in backend-secret) |
ocp4_workload_litemaas_litellm_api_key | sk- + random 32-char | LiteLLM master key (stored in litellm-secret) |
ocp4_workload_litemaas_litellm_ui_username | admin | LiteLLM admin UI username |
ocp4_workload_litemaas_litellm_ui_password | random 16-char | LiteLLM admin UI password |
Security: Never commit actual credentials to version control. The Ansible role generates secure random values at deploy time. Retrieve them after deployment from the OpenShift secrets, not from code.
| Variable | Default | Description |
|---|---|---|
ocp4_workload_litemaas_postgres_version | 16 | PostgreSQL major version |
ocp4_workload_litemaas_postgres_storage_size | 10Gi | PVC size for PostgreSQL data |
ocp4_workload_litemaas_postgres_storage_class | "" | StorageClass (empty = auto-detect per cloud) |
ocp4_workload_litemaas_postgres_storage_access_mode | ReadWriteOnce | PVC access mode |
| Variable | Default | Description |
|---|---|---|
ocp4_workload_litemaas_ha_litellm_replicas | 3 | Number of LiteLLM proxy replicas |
ocp4_workload_litemaas_ha_enable_redis | true | Deploy Redis for session/key caching |
ocp4_workload_litemaas_ha_enable_postgres | true | Deploy PostgreSQL (set false to use external DB) |
ocp4_workload_litemaas_ha_redis_image | registry.redhat.io/rhel9/redis-7:latest | Primary Redis image |
ocp4_workload_litemaas_ha_redis_image_fallback | quay.io/sclorg/redis-7-c9s:latest | Fallback Redis image |
ocp4_workload_litemaas_ha_postgres_pvc_size | 10Gi | PVC size for HA PostgreSQL |
ocp4_workload_litemaas_ha_litellm_memory_limit | 2Gi | Per-pod memory limit for LiteLLM |
ocp4_workload_litemaas_ha_litellm_cpu_limit | 2000m | Per-pod CPU limit for LiteLLM |
| Variable | Default | Description |
|---|---|---|
ocp4_workload_litemaas_oauth_enabled | false | Enable OpenShift OAuth login |
ocp4_workload_litemaas_oauth_provider | openshift | OAuth provider type (openshift or oidc) |
ocp4_workload_litemaas_oauth_client_id | namespace name | OAuthClient resource name |
ocp4_workload_litemaas_oauth_client_secret | random 32-char | OAuthClient secret |
ocp4_workload_litemaas_admin_emails | [admin] | Initial admin user list |
| Variable | Default | Description |
|---|---|---|
ocp4_workload_litemaas_litellm_memory_limit | 1Gi | LiteLLM memory limit (single-replica mode) |
ocp4_workload_litemaas_litellm_cpu_limit | 1000m | LiteLLM CPU limit |
ocp4_workload_litemaas_backend_memory_limit | 512Mi | Backend memory limit |
ocp4_workload_litemaas_backend_cpu_limit | 500m | Backend CPU limit |
ocp4_workload_litemaas_frontend_memory_limit | 256Mi | Frontend memory limit |
ocp4_workload_litemaas_frontend_cpu_limit | 250m | Frontend CPU limit |
When ocp4_workload_litemaas_postgres_storage_class is empty, the role auto-detects the cloud provider and selects the appropriate StorageClass:
| Cloud Provider | Storage Class |
|---|---|
| AWS | gp3-csi |
| Azure | managed-premium |
| GCP | standard-rwo |
| vSphere | thin-csi |
| OpenStack | standard |
| Bare Metal | ocs-external-storagecluster-ceph-rbd |
LiteLLM uses Prisma ORM and runs database migrations automatically on startup. However, when upgrading between LiteLLM versions, you may need to trigger a migration manually if automatic migration fails or is skipped.
# Check LiteLLM logs for migration output oc logs -n litellm-rhpds deployment/litellm --tail=100 | grep -i migrat # Check if database tables exist oc exec -n litellm-rhpds litellm-postgres-0 -- \ psql -U litellm -d litellm -c "\dt"
# Get the DATABASE_URL from the running LiteLLM deployment DB_URL=$(oc exec -n litellm-rhpds deployment/litellm -- \ sh -c 'echo $DATABASE_URL') # Exec into a LiteLLM pod and run migration LITELLM_POD=$(oc get pods -n litellm-rhpds -l app=litellm \ -o jsonpath='{.items[0].metadata.name}') oc exec -n litellm-rhpds $LITELLM_POD -- \ python -c "from litellm.proxy.proxy_server import prisma_client; prisma_client.db.migrate.deploy()" # Alternative: restart the deployment (migration runs on startup) oc rollout restart deployment/litellm -n litellm-rhpds oc rollout status deployment/litellm -n litellm-rhpds
Caution: Always take a database backup before running migrations between major LiteLLM versions. See the Operations page for upgrade procedures.
After deployment completes, verify the following before announcing the service to users.
oc get pods -n <namespace> — expect litellm (3), litellm-backend (3), litellm-frontend (3), litellm-postgres-0, litellm-redis-*curl https://<api-route>/health/livenesszoc get secret litellm-secret -n <ns> -o yaml./setup-key-cleanup-cronjob.sh <namespace>./setup-litemaas-backup-cronjob.sh <namespace> <s3-bucket>| Component | Replicas | CPU Request | Memory Request | Storage |
|---|---|---|---|---|
| LiteLLM proxy | 3 | 3 × 500m = 1500m | 3 × 512Mi = 1.5Gi | — |
| LiteMaaS backend | 3 | 3 × 100m = 300m | 3 × 256Mi = 768Mi | — |
| LiteMaaS frontend | 3 | 3 × 50m = 150m | 3 × 128Mi = 384Mi | — |
| PostgreSQL 16 | 1 (StatefulSet) | 500m | 512Mi | 10Gi RWO |
| Redis 7 | 1 | 200m | 256Mi | — |
| Total | ~2650m | ~3.4Gi | 10Gi |
# Remove via playbook (cleanly removes OAuthClient too) ansible-playbook playbooks/deploy_litemaas_ha.yml \ -e ocp4_workload_litemaas_namespace=litemaas \ -e ocp4_workload_litemaas_remove=true # Or delete the namespace directly (OAuthClient persists — clean it up separately) oc delete namespace litemaas oc delete oauthclient litemaas
Deleting the namespace also deletes the PostgreSQL PVC and all data. Take a backup first if you need to preserve data.