Model as a Service for Red Hat Demo Platform
How LiteMaaS connects to Google Cloud Vertex AI — service accounts, model registration, credential mounting, and operational configuration.
Google Cloud Vertex AI is the primary provider for the majority of models in LiteMaaS. This includes all Claude models (via Anthropic on Vertex), Gemini 2.5 Pro, and all Model Garden OSS models such as GPT-OSS, Qwen3, and MiniMax.
LiteMaaS routes requests to Vertex AI through the LiteLLM proxy, using two distinct GCP service accounts — one per GCP billing project. All credentials are stored as OpenShift secrets and mounted as read-only volumes inside the litellm pod.
Traffic stays on Google's network. All Vertex AI calls use Private Service Connect endpoints, keeping inference traffic off the public internet.
Vertex AI is preferred over direct API access (e.g., api.anthropic.com) for several operational reasons:
LiteMaaS uses two separate GCP service accounts, each associated with a different GCP project. The SA selection determines both the billing project and which models are accessible.
| Service Account | GCP Project | Models Served | OCP Secret | Mount Path |
|---|---|---|---|---|
litemaas-vertex-sa@rhdp-infra.iam.gserviceaccount.com |
rhdp-infra |
OSS models: GPT-OSS 120B, GPT-OSS 20B, Qwen3-235B, MiniMax M2 | vertex-ai-sa-key |
/etc/vertex-credentials/key.json |
rhdp-vertex-app-sa@itpc-gcp-product-all-claude.iam.gserviceaccount.com |
itpc-gcp-product-all-claude |
Claude (all versions), Gemini 2.5 Pro | vertex-claude-sa-key |
/etc/vertex-claude-credentials/key.json |
Required IAM role: Both service accounts must have roles/aiplatform.user granted on their respective GCP projects. Without this role, all inference calls return HTTP 403.
Do not mix SA paths. Registering a Claude model with /etc/vertex-credentials/key.json (the OSS SA) will fail — that SA is not authorized on the itpc-gcp-product-all-claude project. Always match the vertex_credentials path to the correct GCP project.
The following models are currently registered in LiteMaaS via Vertex AI backends. Costs are listed per 1M tokens (input / output).
| LiteMaaS Name | Vertex Backend | GCP Project | Cost (in / out per 1M) |
|---|---|---|---|
claude-sonnet-4-6 |
vertex_ai/claude-sonnet-4-6 |
itpc-gcp-product-all-claude |
$3.00 / $15.00 |
claude-opus-4-6 |
vertex_ai/claude-opus-4-6-v1 |
itpc-gcp-product-all-claude |
$5.00 / $25.00 |
claude-sonnet-4-5 |
vertex_ai/claude-sonnet-4-5 |
itpc-gcp-product-all-claude |
$3.00 / $15.00 |
claude-3-5-haiku |
vertex_ai/claude-3-5-haiku |
itpc-gcp-product-all-claude |
$1.00 / $5.00 |
gemini-2.5-pro |
vertex_ai/gemini-2.5-pro |
itpc-gcp-product-all-claude (Gemini AI Studio key) |
$1.25 / $10.00 |
gpt-oss-120b |
vertex_ai/openai/gpt-oss-120b-maas |
rhdp-infra |
$0.09 / $0.36 |
gpt-oss-20b |
vertex_ai/openai/gpt-oss-20b-maas |
rhdp-infra |
$0.07 / $0.25 |
minimax-m2 |
vertex_ai/minimaxai/minimax-m2-maas |
rhdp-infra |
$0.30 / $1.20 |
qwen3-235b |
vertex_ai/qwen/qwen3-235b-a22b-instruct-2507-maas |
rhdp-infra |
$0.22 / $0.88 |
Model Garden path format: OSS models served through Google's Model Garden use an extended path format — vertex_ai/<provider>/<model-id>-maas. The -maas suffix is required by the Model Garden endpoint and is not a LiteMaaS convention.
All Vertex AI-backed models share the following rate limits, which are set on the admin key rhdp-automation-admin in LiteLLM:
| Limit Type | Value | Scope |
|---|---|---|
| Tokens per minute (TPM) | 400,000 | Per key |
| Requests per minute (RPM) | 500 | Per key |
These limits are enforced by LiteLLM at the virtual key level, not at the Vertex API level. Vertex-side quotas are provisioned separately per GCP project and are generally higher than the LiteMaaS-enforced limits.
If users report 429 errors, check whether the issue is a LiteLLM-enforced per-key limit or an actual Vertex quota. Run oc logs -n litellm-rhpds -l app=litellm and look for RateLimitError vs QuotaExceeded to distinguish the source.
GCP service account JSON key files are stored as OpenShift secrets and mounted as read-only volumes into the litellm deployment. LiteLLM reads the file path at inference time when the model's vertex_credentials parameter is set.
Create (or update) the secrets in the litellm-rhpds namespace. Each secret contains a single key named key.json holding the full GCP SA JSON key file content.
# Create the OSS model SA secret (rhdp-infra project) oc create secret generic vertex-ai-sa-key \ --from-file=key.json=/path/to/litemaas-vertex-sa-key.json \ -n litellm-rhpds # Create the Claude / Gemini SA secret (itpc-gcp-product-all-claude project) oc create secret generic vertex-claude-sa-key \ --from-file=key.json=/path/to/rhdp-vertex-app-sa-key.json \ -n litellm-rhpds
Never commit SA key files to git. Treat them as passwords. Use oc create secret generic directly from the downloaded JSON file, then delete the local copy.
The secrets are projected into the pod via the following volume and volumeMount configuration in the litellm Deployment spec:
# Volumes (add to .spec.template.spec.volumes) volumes: - name: vertex-sa-key secret: secretName: vertex-ai-sa-key - name: vertex-claude-sa-key secret: secretName: vertex-claude-sa-key # Volume mounts (add to .spec.template.spec.containers[0].volumeMounts) volumeMounts: - name: vertex-sa-key mountPath: /etc/vertex-credentials readOnly: true - name: vertex-claude-sa-key mountPath: /etc/vertex-claude-credentials readOnly: true
After patching the deployment, verify that both files are visible inside the running pod:
# Verify credential files are present in the pod oc exec -n litellm-rhpds deploy/litellm -- ls -la /etc/vertex-credentials/ oc exec -n litellm-rhpds deploy/litellm -- ls -la /etc/vertex-claude-credentials/ # Expected output for each directory total 8 drwxr-xr-x 2 root root 60 Apr 21 00:00 . drwxr-xr-x 1 root root 60 Apr 21 00:00 .. -rw-r--r-- 1 root root 2400 Apr 21 00:00 key.json
When registering a model in LiteLLM, the vertex_credentials field must point to the correct mounted key file path. LiteLLM reads this path at request time — it does not cache the credentials in memory on startup.
Model params are encrypted at rest. All fields in litellm_params — including vertex_credentials paths and vertex_project — are encrypted in PostgreSQL using the LiteLLM master key. Rotating the master key requires re-registering all models.
Claude models use the itpc-gcp-product-all-claude GCP project and the SA key mounted at /etc/vertex-claude-credentials/key.json.
# Register claude-sonnet-4-6 via Vertex AI curl -X POST "$LITELLM_URL/model/new" \ -H "Authorization: Bearer $MASTER_KEY" \ -H "Content-Type: application/json" \ -d '{ "model_name": "claude-sonnet-4-6", "litellm_params": { "model": "vertex_ai/claude-sonnet-4-6", "vertex_project": "itpc-gcp-product-all-claude", "vertex_location": "us-east5", "vertex_credentials": "/etc/vertex-claude-credentials/key.json" }, "model_info": { "input_cost_per_token": 0.000003, "output_cost_per_token": 0.000015 } }' # Register claude-opus-4-6 curl -X POST "$LITELLM_URL/model/new" \ -H "Authorization: Bearer $MASTER_KEY" \ -H "Content-Type: application/json" \ -d '{ "model_name": "claude-opus-4-6", "litellm_params": { "model": "vertex_ai/claude-opus-4-6-v1", "vertex_project": "itpc-gcp-product-all-claude", "vertex_location": "us-east5", "vertex_credentials": "/etc/vertex-claude-credentials/key.json" }, "model_info": { "input_cost_per_token": 0.000005, "output_cost_per_token": 0.000025 } }' # Register gemini-2.5-pro (also uses the claude SA / GCP project) curl -X POST "$LITELLM_URL/model/new" \ -H "Authorization: Bearer $MASTER_KEY" \ -H "Content-Type: application/json" \ -d '{ "model_name": "gemini-2.5-pro", "litellm_params": { "model": "vertex_ai/gemini-2.5-pro", "vertex_project": "itpc-gcp-product-all-claude", "vertex_location": "us-central1", "vertex_credentials": "/etc/vertex-claude-credentials/key.json" }, "model_info": { "input_cost_per_token": 0.00000125, "output_cost_per_token": 0.00001 } }'
Model Garden OSS models use the rhdp-infra project and the SA key mounted at /etc/vertex-credentials/key.json. The backend model path uses the extended vertex_ai/<provider>/<model-id>-maas format.
# Register gpt-oss-120b (Model Garden) curl -X POST "$LITELLM_URL/model/new" \ -H "Authorization: Bearer $MASTER_KEY" \ -H "Content-Type: application/json" \ -d '{ "model_name": "gpt-oss-120b", "litellm_params": { "model": "vertex_ai/openai/gpt-oss-120b-maas", "vertex_project": "rhdp-infra", "vertex_location": "us-central1", "vertex_credentials": "/etc/vertex-credentials/key.json" }, "model_info": { "input_cost_per_token": 0.00000009, "output_cost_per_token": 0.00000036 } }' # Register qwen3-235b (Model Garden) curl -X POST "$LITELLM_URL/model/new" \ -H "Authorization: Bearer $MASTER_KEY" \ -H "Content-Type: application/json" \ -d '{ "model_name": "qwen3-235b", "litellm_params": { "model": "vertex_ai/qwen/qwen3-235b-a22b-instruct-2507-maas", "vertex_project": "rhdp-infra", "vertex_location": "us-central1", "vertex_credentials": "/etc/vertex-credentials/key.json" }, "model_info": { "input_cost_per_token": 0.00000022, "output_cost_per_token": 0.00000088 } }' # Register minimax-m2 (Model Garden) curl -X POST "$LITELLM_URL/model/new" \ -H "Authorization: Bearer $MASTER_KEY" \ -H "Content-Type: application/json" \ -d '{ "model_name": "minimax-m2", "litellm_params": { "model": "vertex_ai/minimaxai/minimax-m2-maas", "vertex_project": "rhdp-infra", "vertex_location": "us-central1", "vertex_credentials": "/etc/vertex-credentials/key.json" }, "model_info": { "input_cost_per_token": 0.0000003, "output_cost_per_token": 0.0000012 } }'
The diagram below shows how the two service accounts map to GCP projects and which models each SA serves. OCP secrets are projected into the pod as read-only volume mounts.
When a user sends a request to the LiteMaaS API, the following sequence occurs:
litellm_params from PostgreSQL (decrypting with the master key).vertex_credentials.Several environment variables on the litellm deployment govern how LiteMaaS interacts with the database and with the LiteMaaS backend. Understanding these is important for safe day-2 operations.
| Environment Variable | Value | Effect |
|---|---|---|
DISABLE_SCHEMA_UPDATE |
true |
Prevents Prisma from running schema migrations on pod restart. Without this, a LiteLLM version bump could drop LiteMaaS-specific tables. |
STORE_MODEL_IN_DB |
true |
Model configs (including vertex_credentials paths and vertex_project) are persisted in PostgreSQL, not in a config file. Models survive pod restarts. |
LITELLM_AUTO_SYNC |
false |
Models registered in LiteLLM are not automatically synced to the LiteMaaS backend model list. Models must be registered through the LiteMaaS API or UI to appear to users. |
Master key rotation requires model re-registration. All litellm_params (including SA key file paths and GCP project IDs) are encrypted with the LiteLLM master key. If the master key changes, every model must be deleted and re-registered so that params are re-encrypted under the new key.
Because LITELLM_AUTO_SYNC=false, a model can be registered in LiteLLM but not yet visible to users in the LiteMaaS UI or /v1/models endpoint. After registering a model in LiteLLM, explicitly add it to the LiteMaaS backend via the LiteMaaS admin API or UI to make it available.
# Check which models are registered in LiteLLM curl -s "$LITELLM_URL/model/info" \ -H "Authorization: Bearer $MASTER_KEY" | jq '.[].model_name' # Check which models are visible in LiteMaaS (user-facing endpoint) curl -s "$LITEMAAS_URL/v1/models" \ -H "Authorization: Bearer $USER_KEY" | jq '.data[].id'
This IBM-origin error code surfaces when per-model SA credentials fail during multi-turn conversations (i.e., the second or later message in a thread). The root cause is that LiteLLM re-reads the credential file on each request, and a timing or file-permission issue causes a transient failure.
Fix: Set the Vertex credentials as deployment-level environment variables instead of relying solely on the per-model vertex_credentials path. Add GOOGLE_APPLICATION_CREDENTIALS as an env var pointing to the primary SA key path, and LiteLLM will use it as a fallback.
# Add as env var on the litellm deployment
oc set env deploy/litellm \
GOOGLE_APPLICATION_CREDENTIALS=/etc/vertex-claude-credentials/key.json \
-n litellm-rhpds
This is expected when LITELLM_AUTO_SYNC=false. Register the model through the LiteMaaS admin interface or API to make it visible to users. See Model Management for the full workflow.
A 403 from Vertex means the SA does not have the required IAM role on the GCP project. Verify that the SA email has roles/aiplatform.user by checking the GCP IAM console or running:
# Verify IAM binding for the OSS SA gcloud projects get-iam-policy rhdp-infra \ --flatten="bindings[].members" \ --filter="bindings.members:litemaas-vertex-sa@rhdp-infra.iam.gserviceaccount.com" \ --format="table(bindings.role)" # Verify IAM binding for the Claude SA gcloud projects get-iam-policy itpc-gcp-product-all-claude \ --flatten="bindings[].members" \ --filter="bindings.members:rhdp-vertex-app-sa@itpc-gcp-product-all-claude.iam.gserviceaccount.com" \ --format="table(bindings.role)"
A model with LEGACY status in the LiteMaaS UI is still fully functional — it continues to serve inference requests. LEGACY status indicates that a newer version of the same model is available. Consider upgrading the registered backend model string to the latest version at your next maintenance window.
# Test that the OSS SA can reach Vertex oc exec -n litellm-rhpds deploy/litellm -- \ curl -s -o /dev/null -w "%{http_code}" \ -H "Authorization: Bearer $(cat /etc/vertex-credentials/key.json | python3 -c 'import sys,json; print(json.load(sys.stdin)[\"private_key\"][:20])')" \ "https://us-central1-aiplatform.googleapis.com/v1/projects/rhdp-infra/locations/us-central1/publishers/openai/models" # View recent Vertex-related errors in the LiteLLM log oc logs -n litellm-rhpds -l app=litellm --tail=100 | grep -i "vertex\|403\|quota\|rate"
When in doubt, check the pod logs first. LiteLLM logs the full upstream error message from Vertex including status code, error type, and GCP project ID. This is the fastest way to distinguish between an IAM issue, a quota issue, and a model path typo.