Model as a Service for Red Hat Demo Platform
Add AWS Bedrock as a redundant provider alongside Google Vertex AI — same model names, transparent cost-based failover, zero user-facing change.
AWS Bedrock was added as a second provider for four existing LiteMaaS models, all of which already ran on Google Vertex AI. The integration is completely transparent to users — they continue to call the same model names (gpt-oss-120b, gpt-oss-20b, minimax-m2, qwen3-235b) and LiteLLM routes between Vertex and Bedrock automatically using cost-based routing.
| Property | Value |
|---|---|
| Cluster | maas.redhatworkshops.io |
| Namespace | litellm-rhpds |
| AWS Region | us-west-2 |
| IAM User | litemaas-bedrock-user |
| IAM Policy | litemaas-bedrock-policy |
| OCP Secret | litellm-aws-bedrock |
| Routing strategy | Cost-based routing (LiteLLM built-in) |
User-facing impact: none. Because Bedrock models are registered under the same model_name as their Vertex counterparts, they do not appear as separate entries in the LiteMaaS UI or API. LiteLLM groups them automatically and routes between them.
Four models are now backed by both Vertex AI and AWS Bedrock. The Bedrock model IDs are the native identifiers used in the litellm_params.model field (prefixed with bedrock/).
| LiteMaaS model_name | Bedrock model ID | Vertex pricing (in / out per 1M) | Bedrock pricing (in / out per 1M) | Router preference |
|---|---|---|---|---|
gpt-oss-120b |
openai.gpt-oss-120b-1:0 |
$0.09 / $0.36 | $0.15 / $0.60 | Vertex (cheaper) |
gpt-oss-20b |
openai.gpt-oss-20b-1:0 |
$0.07 / $0.25 | $0.07 / $0.30 | Vertex (cheaper) |
minimax-m2 |
minimax.minimax-m2 |
$0.30 / $1.20 | $0.30 / $1.20 | Either (same price) |
qwen3-235b |
qwen.qwen3-235b-a22b-2507-v1:0 |
$0.22 / $0.88 | $0.22 / $0.88 | Either (same price) |
For gpt-oss-120b and gpt-oss-20b, Vertex AI is consistently cheaper so the cost-based router will always prefer it and use Bedrock only as a fallback. For minimax-m2 and qwen3-235b the prices are identical and the router may use either provider freely.
The claude-sonnet-4-6 and claude-opus-4-6 models are awaiting AWS Bedrock model access approval. Once granted, they will be added using the following ARN patterns:
| ARN type | Pattern | Notes |
|---|---|---|
| System inference profile | arn:aws:bedrock:*:809721187735:inference-profile/us.anthropic.claude-* |
Account ID required; * for region enables cross-region routing |
| Foundation model (cross-region) | arn:aws:bedrock:*::foundation-model/anthropic.claude-* |
Double colon, no account ID; required alongside the inference profile ARN |
Both ARN types must be in the IAM policy for Claude cross-region inference to work. Adding only the inference profile ARN will cause AccessDeniedException when Bedrock attempts cross-region routing to foundation model endpoints. See the ARN lesson section below.
The diagram below shows how a request for gpt-oss-120b is handled end-to-end. The user sees a single model; LiteLLM transparently selects the cheaper backend for each request.
Failover is automatic. When a backend accumulates allowed_fails: 2 failures, it enters a cooldown period of 60 seconds and all traffic shifts to the other provider. No manual intervention is required.
A dedicated IAM user (litemaas-bedrock-user) was created with a minimal policy (litemaas-bedrock-policy) scoped to exactly the actions and resources LiteLLM requires.
The policy grants InvokeModel and InvokeModelWithResponseStream only on the specific Bedrock model ARNs in use, plus list permissions on * (required by LiteLLM's health-check path).
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": [
// Claude inference profiles (account-scoped, cross-region) — pending access
"arn:aws:bedrock:*:ACCOUNT_ID:inference-profile/us.anthropic.claude-*",
"arn:aws:bedrock:*:ACCOUNT_ID:inference-profile/global.anthropic.claude-*",
// Claude foundation models (no account ID) — required for cross-region routing
"arn:aws:bedrock:*::foundation-model/anthropic.claude-*",
// Currently active non-Claude models
"arn:aws:bedrock:us-west-2::foundation-model/openai.gpt-oss-120b-1:0",
"arn:aws:bedrock:us-west-2::foundation-model/openai.gpt-oss-20b-1:0",
"arn:aws:bedrock:us-west-2::foundation-model/minimax.minimax-m2",
"arn:aws:bedrock:us-west-2::foundation-model/qwen.qwen3-235b-a22b-2507-v1:0"
]
},
{
"Effect": "Allow",
"Action": [
"bedrock:ListFoundationModels",
"bedrock:ListInferenceProfiles"
],
"Resource": "*"
}
]
}
Replace ACCOUNT_ID with the actual AWS account number (809721187735 for the RHDP prod account). The non-Claude models use foundation model ARNs with a double colon and no account ID — this is expected for public Bedrock foundation models.
Bedrock ARN formats differ significantly depending on whether you are using a system inference profile or a foundation model directly. Getting this wrong produces silent AccessDeniedException errors that are hard to trace.
| ARN type | Has account ID? | Region field | Example |
|---|---|---|---|
System inference profile (Claude us.* / global.*) |
Yes | Use * for cross-region |
arn:aws:bedrock:*:809721187735:inference-profile/us.anthropic.claude-* |
| Foundation model (public, e.g. Claude base, OpenAI-compat, Qwen) | No (double colon) | Specific region or * |
arn:aws:bedrock:us-west-2::foundation-model/openai.gpt-oss-120b-1:0 |
For Claude cross-region inference you need both ARN types in the policy. The inference profile ARN authorises the initial call. Bedrock then internally routes to a foundation model endpoint in another region — and that second hop requires the foundation model ARN (arn:aws:bedrock:*::foundation-model/anthropic.claude-*) to also be allowed. Omitting either one causes access errors.
AWS credentials are stored as an OpenShift Secret in the litellm-rhpds namespace and mounted as environment variables on the litellm Deployment. This approach keeps credentials out of the ConfigMap and out of the LiteLLM database entirely.
# Create the secret from literals (never commit actual values) oc create secret generic litellm-aws-bedrock \ --from-literal=AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE \ --from-literal=AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY \ --from-literal=AWS_REGION=us-west-2 \ -n litellm-rhpds # Mount all keys from the secret as environment variables oc set env deployment/litellm \ --from=secret/litellm-aws-bedrock \ -n litellm-rhpds # Verify the env vars are present on the new pod oc exec -n litellm-rhpds deployment/litellm -- env | grep AWS
LiteLLM's Bedrock provider uses the standard boto3 credential chain. When AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are set as environment variables, boto3 picks them up automatically — no additional LiteLLM configuration is needed for authentication.
Do not store AWS credentials in the LiteLLM ConfigMap or in model litellm_params. Anything in the ConfigMap is readable by anyone with oc get configmap access. The OCP Secret is the correct place.
LiteLLM groups all deployments that share the same model_name (e.g., the Vertex gpt-oss-120b and the Bedrock gpt-oss-120b) and routes between them based on the registered cost per token. The router configuration lives in the litellm-router-config ConfigMap.
# litellm-router-config ConfigMap — relevant router_settings block router_settings: routing_strategy: cost-based-routing num_retries: 3 retry_after: 5 allowed_fails: 2 cooldown_time: 60
cost = input_tokens × input_cost_per_token + output_tokens × output_cost_per_token.gpt-oss-120b, Vertex AI wins because $0.09/1M < $0.15/1M (input).allowed_fails: 2 failures, the backend enters a cooldown for cooldown_time: 60 seconds. During cooldown, all traffic for that model automatically shifts to the other provider.| Setting | Value | Effect |
|---|---|---|
routing_strategy | cost-based-routing | Picks cheapest backend per request |
num_retries | 3 | Up to 3 retry attempts per request across backends |
retry_after | 5 | Wait 5 seconds between retry attempts |
allowed_fails | 2 | Backend enters cooldown after 2 consecutive failures |
cooldown_time | 60 | Failed backend excluded for 60 seconds |
The routing is entirely automatic. No operator action is needed when a provider has an outage — traffic shifts within seconds of hitting the allowed_fails threshold.
Each Bedrock backend is registered via the LiteLLM admin API using the same model_name as the existing Vertex AI deployment. This is the key to transparent dual-provider routing — LiteLLM uses model_name as the grouping key for the router.
Why the same model_name matters. Users and workshops send requests to gpt-oss-120b. LiteLLM internally resolves this to all deployments registered under that name — both Vertex and Bedrock. Registering under a different name (e.g., gpt-oss-120b-bedrock) would expose it as a separate, user-visible model, which is not the goal.
# Register the Bedrock backend for gpt-oss-120b # Note: model_name matches the existing Vertex AI deployment exactly curl -X POST "https://litellm-prod.apps.maas.redhatworkshops.io/model/new" \ -H "Authorization: Bearer $MASTER_KEY" \ -H "Content-Type: application/json" \ -d '{ "model_name": "gpt-oss-120b", "litellm_params": { "model": "bedrock/openai.gpt-oss-120b-1:0", "aws_region_name": "us-west-2" }, "model_info": { "input_cost_per_token": 1.5e-07, "output_cost_per_token": 6e-07 } }'
curl -X POST "https://litellm-prod.apps.maas.redhatworkshops.io/model/new" \ -H "Authorization: Bearer $MASTER_KEY" \ -H "Content-Type: application/json" \ -d '{ "model_name": "gpt-oss-20b", "litellm_params": { "model": "bedrock/openai.gpt-oss-20b-1:0", "aws_region_name": "us-west-2" }, "model_info": { "input_cost_per_token": 7e-08, "output_cost_per_token": 3e-07 } }'
curl -X POST "https://litellm-prod.apps.maas.redhatworkshops.io/model/new" \ -H "Authorization: Bearer $MASTER_KEY" \ -H "Content-Type: application/json" \ -d '{ "model_name": "minimax-m2", "litellm_params": { "model": "bedrock/minimax.minimax-m2", "aws_region_name": "us-west-2" }, "model_info": { "input_cost_per_token": 3e-07, "output_cost_per_token": 1.2e-06 } }'
curl -X POST "https://litellm-prod.apps.maas.redhatworkshops.io/model/new" \ -H "Authorization: Bearer $MASTER_KEY" \ -H "Content-Type: application/json" \ -d '{ "model_name": "qwen3-235b", "litellm_params": { "model": "bedrock/qwen.qwen3-235b-a22b-2507-v1:0", "aws_region_name": "us-west-2" }, "model_info": { "input_cost_per_token": 2.2e-07, "output_cost_per_token": 8.8e-07 } }'
# List all registered model deployments — both Vertex and Bedrock should appear # under the same model_name entries curl -s "https://litellm-prod.apps.maas.redhatworkshops.io/model/info" \ -H "Authorization: Bearer $MASTER_KEY" | jq '.data[] | {model_name, provider: .litellm_params.model}'
Cost values in the model_info block use per-token pricing (not per-1M). To convert: $0.15 per 1M tokens = 1.5e-07 per token. LiteLLM uses these values at runtime to compute the expected cost of each request before dispatching it.
LiteLLM encrypts all sensitive model parameters — api_key, api_base, vertex_credentials, and similar fields — in the database using the master key as the encryption key. This has an important operational implication.
Rotating the master key without re-encrypting DB entries will silently break all models. LiteLLM will fail to decrypt the stored credentials, log no obvious error, and simply not load the model backends. The UI may show models as available but all calls will fail.
# Export all model configs to a local file while old key is active curl -s "https://litellm-prod.apps.maas.redhatworkshops.io/model/info" \ -H "Authorization: Bearer $OLD_MASTER_KEY" \ | jq '.data' > model-backup.json
LITELLM_MASTER_KEY and restart the deployment.# After rotating the key, re-register models from backup # (loop over the exported JSON and POST each entry) jq -c '.[]' model-backup.json | while read model; do curl -X POST "https://litellm-prod.apps.maas.redhatworkshops.io/model/new" \ -H "Authorization: Bearer $NEW_MASTER_KEY" \ -H "Content-Type: application/json" \ -d "$model" done
The AWS credentials used for Bedrock are stored as OCP Secret environment variables and are therefore not encrypted by the LiteLLM master key. They are unaffected by key rotation. Only the model entries registered via the /model/new API (which may include Vertex AI credentials) need to be re-registered.