RHDP LiteMaaS

Model as a Service for Red Hat Demo Platform

AWS Bedrock Integration

Add AWS Bedrock as a redundant provider alongside Google Vertex AI — same model names, transparent cost-based failover, zero user-facing change.

On This Page

Overview & Goals

AWS Bedrock was added as a second provider for four existing LiteMaaS models, all of which already ran on Google Vertex AI. The integration is completely transparent to users — they continue to call the same model names (gpt-oss-120b, gpt-oss-20b, minimax-m2, qwen3-235b) and LiteLLM routes between Vertex and Bedrock automatically using cost-based routing.

PropertyValue
Clustermaas.redhatworkshops.io
Namespacelitellm-rhpds
AWS Regionus-west-2
IAM Userlitemaas-bedrock-user
IAM Policylitemaas-bedrock-policy
OCP Secretlitellm-aws-bedrock
Routing strategyCost-based routing (LiteLLM built-in)

User-facing impact: none. Because Bedrock models are registered under the same model_name as their Vertex counterparts, they do not appear as separate entries in the LiteMaaS UI or API. LiteLLM groups them automatically and routes between them.

Models on Bedrock

Four models are now backed by both Vertex AI and AWS Bedrock. The Bedrock model IDs are the native identifiers used in the litellm_params.model field (prefixed with bedrock/).

Pricing Comparison

LiteMaaS model_name Bedrock model ID Vertex pricing (in / out per 1M) Bedrock pricing (in / out per 1M) Router preference
gpt-oss-120b openai.gpt-oss-120b-1:0 $0.09 / $0.36 $0.15 / $0.60 Vertex (cheaper)
gpt-oss-20b openai.gpt-oss-20b-1:0 $0.07 / $0.25 $0.07 / $0.30 Vertex (cheaper)
minimax-m2 minimax.minimax-m2 $0.30 / $1.20 $0.30 / $1.20 Either (same price)
qwen3-235b qwen.qwen3-235b-a22b-2507-v1:0 $0.22 / $0.88 $0.22 / $0.88 Either (same price)

For gpt-oss-120b and gpt-oss-20b, Vertex AI is consistently cheaper so the cost-based router will always prefer it and use Bedrock only as a fallback. For minimax-m2 and qwen3-235b the prices are identical and the router may use either provider freely.

Claude Models (Pending Access Approval)

The claude-sonnet-4-6 and claude-opus-4-6 models are awaiting AWS Bedrock model access approval. Once granted, they will be added using the following ARN patterns:

ARN typePatternNotes
System inference profile arn:aws:bedrock:*:809721187735:inference-profile/us.anthropic.claude-* Account ID required; * for region enables cross-region routing
Foundation model (cross-region) arn:aws:bedrock:*::foundation-model/anthropic.claude-* Double colon, no account ID; required alongside the inference profile ARN

Both ARN types must be in the IAM policy for Claude cross-region inference to work. Adding only the inference profile ARN will cause AccessDeniedException when Bedrock attempts cross-region routing to foundation model endpoints. See the ARN lesson section below.

Request Flow

The diagram below shows how a request for gpt-oss-120b is handled end-to-end. The user sees a single model; LiteLLM transparently selects the cheaper backend for each request.

flowchart LR User([User / Workshop]) -->|model: gpt-oss-120b| LiteMaaS[LiteMaaS Frontend] LiteMaaS --> LiteLLM[LiteLLM Proxy] LiteLLM --> Router{Cost-Based Router} Router -->|"preferred: $0.09/1M in"| VertexAI["Google Vertex AI\ngpt-oss-120b-maas"] Router -->|"fallback: $0.15/1M in"| Bedrock["AWS Bedrock\nopenai.gpt-oss-120b-1:0"] VertexAI -->|response| Router Bedrock -->|response| Router Router --> LiteLLM

Failover is automatic. When a backend accumulates allowed_fails: 2 failures, it enters a cooldown period of 60 seconds and all traffic shifts to the other provider. No manual intervention is required.

IAM Setup

A dedicated IAM user (litemaas-bedrock-user) was created with a minimal policy (litemaas-bedrock-policy) scoped to exactly the actions and resources LiteLLM requires.

IAM Policy (Final Correct Version)

The policy grants InvokeModel and InvokeModelWithResponseStream only on the specific Bedrock model ARNs in use, plus list permissions on * (required by LiteLLM's health-check path).

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": [
        // Claude inference profiles (account-scoped, cross-region) — pending access
        "arn:aws:bedrock:*:ACCOUNT_ID:inference-profile/us.anthropic.claude-*",
        "arn:aws:bedrock:*:ACCOUNT_ID:inference-profile/global.anthropic.claude-*",
        // Claude foundation models (no account ID) — required for cross-region routing
        "arn:aws:bedrock:*::foundation-model/anthropic.claude-*",
        // Currently active non-Claude models
        "arn:aws:bedrock:us-west-2::foundation-model/openai.gpt-oss-120b-1:0",
        "arn:aws:bedrock:us-west-2::foundation-model/openai.gpt-oss-20b-1:0",
        "arn:aws:bedrock:us-west-2::foundation-model/minimax.minimax-m2",
        "arn:aws:bedrock:us-west-2::foundation-model/qwen.qwen3-235b-a22b-2507-v1:0"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:ListFoundationModels",
        "bedrock:ListInferenceProfiles"
      ],
      "Resource": "*"
    }
  ]
}

Replace ACCOUNT_ID with the actual AWS account number (809721187735 for the RHDP prod account). The non-Claude models use foundation model ARNs with a double colon and no account ID — this is expected for public Bedrock foundation models.

ARN Lesson: Inference Profiles vs Foundation Models

Bedrock ARN formats differ significantly depending on whether you are using a system inference profile or a foundation model directly. Getting this wrong produces silent AccessDeniedException errors that are hard to trace.

ARN typeHas account ID?Region fieldExample
System inference profile (Claude us.* / global.*) Yes Use * for cross-region arn:aws:bedrock:*:809721187735:inference-profile/us.anthropic.claude-*
Foundation model (public, e.g. Claude base, OpenAI-compat, Qwen) No (double colon) Specific region or * arn:aws:bedrock:us-west-2::foundation-model/openai.gpt-oss-120b-1:0

For Claude cross-region inference you need both ARN types in the policy. The inference profile ARN authorises the initial call. Bedrock then internally routes to a foundation model endpoint in another region — and that second hop requires the foundation model ARN (arn:aws:bedrock:*::foundation-model/anthropic.claude-*) to also be allowed. Omitting either one causes access errors.

OCP Credentials

AWS credentials are stored as an OpenShift Secret in the litellm-rhpds namespace and mounted as environment variables on the litellm Deployment. This approach keeps credentials out of the ConfigMap and out of the LiteLLM database entirely.

flowchart TD IAM["AWS IAM\nlitemaas-bedrock-user\nAccess Key + Secret"] -->|stored in| Secret["OCP Secret\nlitellm-aws-bedrock\n(litellm-rhpds namespace)"] Secret -->|oc set env --from| Deployment["litellm Deployment\nenv: AWS_ACCESS_KEY_ID\nenv: AWS_SECRET_ACCESS_KEY\nenv: AWS_REGION"] Deployment -->|boto3 picks up env vars| LiteLLM["LiteLLM Proxy\nbedrock/openai.gpt-oss-120b-1:0"] LiteLLM -->|SigV4 signed request| Bedrock["AWS Bedrock\nus-west-2"]

Create and Mount the Secret

# Create the secret from literals (never commit actual values)
oc create secret generic litellm-aws-bedrock \
  --from-literal=AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE \
  --from-literal=AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY \
  --from-literal=AWS_REGION=us-west-2 \
  -n litellm-rhpds

# Mount all keys from the secret as environment variables
oc set env deployment/litellm \
  --from=secret/litellm-aws-bedrock \
  -n litellm-rhpds

# Verify the env vars are present on the new pod
oc exec -n litellm-rhpds deployment/litellm -- env | grep AWS

LiteLLM's Bedrock provider uses the standard boto3 credential chain. When AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are set as environment variables, boto3 picks them up automatically — no additional LiteLLM configuration is needed for authentication.

Do not store AWS credentials in the LiteLLM ConfigMap or in model litellm_params. Anything in the ConfigMap is readable by anyone with oc get configmap access. The OCP Secret is the correct place.

Cost-Based Router Configuration

LiteLLM groups all deployments that share the same model_name (e.g., the Vertex gpt-oss-120b and the Bedrock gpt-oss-120b) and routes between them based on the registered cost per token. The router configuration lives in the litellm-router-config ConfigMap.

# litellm-router-config ConfigMap — relevant router_settings block
router_settings:
  routing_strategy: cost-based-routing
  num_retries: 3
  retry_after: 5
  allowed_fails: 2
  cooldown_time: 60

How Cost-Based Routing Works

  1. For each incoming request, LiteLLM estimates the cost of sending that request to every available backend: cost = input_tokens × input_cost_per_token + output_tokens × output_cost_per_token.
  2. It selects the backend with the lowest estimated cost. For gpt-oss-120b, Vertex AI wins because $0.09/1M < $0.15/1M (input).
  3. If the chosen backend returns an error or times out, LiteLLM retries on the next cheapest backend (Bedrock in this case).
  4. After allowed_fails: 2 failures, the backend enters a cooldown for cooldown_time: 60 seconds. During cooldown, all traffic for that model automatically shifts to the other provider.
  5. After the cooldown expires, the backend is re-evaluated on the next request.
SettingValueEffect
routing_strategycost-based-routingPicks cheapest backend per request
num_retries3Up to 3 retry attempts per request across backends
retry_after5Wait 5 seconds between retry attempts
allowed_fails2Backend enters cooldown after 2 consecutive failures
cooldown_time60Failed backend excluded for 60 seconds

The routing is entirely automatic. No operator action is needed when a provider has an outage — traffic shifts within seconds of hitting the allowed_fails threshold.

Registering Bedrock Models

Each Bedrock backend is registered via the LiteLLM admin API using the same model_name as the existing Vertex AI deployment. This is the key to transparent dual-provider routing — LiteLLM uses model_name as the grouping key for the router.

Why the same model_name matters. Users and workshops send requests to gpt-oss-120b. LiteLLM internally resolves this to all deployments registered under that name — both Vertex and Bedrock. Registering under a different name (e.g., gpt-oss-120b-bedrock) would expose it as a separate, user-visible model, which is not the goal.

Register gpt-oss-120b on Bedrock

# Register the Bedrock backend for gpt-oss-120b
# Note: model_name matches the existing Vertex AI deployment exactly
curl -X POST "https://litellm-prod.apps.maas.redhatworkshops.io/model/new" \
  -H "Authorization: Bearer $MASTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model_name": "gpt-oss-120b",
    "litellm_params": {
      "model": "bedrock/openai.gpt-oss-120b-1:0",
      "aws_region_name": "us-west-2"
    },
    "model_info": {
      "input_cost_per_token": 1.5e-07,
      "output_cost_per_token": 6e-07
    }
  }'

Register gpt-oss-20b on Bedrock

curl -X POST "https://litellm-prod.apps.maas.redhatworkshops.io/model/new" \
  -H "Authorization: Bearer $MASTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model_name": "gpt-oss-20b",
    "litellm_params": {
      "model": "bedrock/openai.gpt-oss-20b-1:0",
      "aws_region_name": "us-west-2"
    },
    "model_info": {
      "input_cost_per_token": 7e-08,
      "output_cost_per_token": 3e-07
    }
  }'

Register minimax-m2 on Bedrock

curl -X POST "https://litellm-prod.apps.maas.redhatworkshops.io/model/new" \
  -H "Authorization: Bearer $MASTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model_name": "minimax-m2",
    "litellm_params": {
      "model": "bedrock/minimax.minimax-m2",
      "aws_region_name": "us-west-2"
    },
    "model_info": {
      "input_cost_per_token": 3e-07,
      "output_cost_per_token": 1.2e-06
    }
  }'

Register qwen3-235b on Bedrock

curl -X POST "https://litellm-prod.apps.maas.redhatworkshops.io/model/new" \
  -H "Authorization: Bearer $MASTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model_name": "qwen3-235b",
    "litellm_params": {
      "model": "bedrock/qwen.qwen3-235b-a22b-2507-v1:0",
      "aws_region_name": "us-west-2"
    },
    "model_info": {
      "input_cost_per_token": 2.2e-07,
      "output_cost_per_token": 8.8e-07
    }
  }'

Verify Registration

# List all registered model deployments — both Vertex and Bedrock should appear
# under the same model_name entries
curl -s "https://litellm-prod.apps.maas.redhatworkshops.io/model/info" \
  -H "Authorization: Bearer $MASTER_KEY" | jq '.data[] | {model_name, provider: .litellm_params.model}'

Cost values in the model_info block use per-token pricing (not per-1M). To convert: $0.15 per 1M tokens = 1.5e-07 per token. LiteLLM uses these values at runtime to compute the expected cost of each request before dispatching it.

Master Key Rotation Gotcha

LiteLLM encrypts all sensitive model parameters — api_key, api_base, vertex_credentials, and similar fields — in the database using the master key as the encryption key. This has an important operational implication.

Rotating the master key without re-encrypting DB entries will silently break all models. LiteLLM will fail to decrypt the stored credentials, log no obvious error, and simply not load the model backends. The UI may show models as available but all calls will fail.

Safe Master Key Rotation Procedure

  1. Export all model configurations before changing the key while the old key is still active:
# Export all model configs to a local file while old key is active
curl -s "https://litellm-prod.apps.maas.redhatworkshops.io/model/info" \
  -H "Authorization: Bearer $OLD_MASTER_KEY" \
  | jq '.data' > model-backup.json
  1. Delete all model entries from the database (still using the old key).
  2. Rotate the master key — update the OCP Secret or environment variable that holds LITELLM_MASTER_KEY and restart the deployment.
  3. Re-register all models from the exported backup using the new key. This re-encrypts every entry with the new master key.
# After rotating the key, re-register models from backup
# (loop over the exported JSON and POST each entry)
jq -c '.[]' model-backup.json | while read model; do
  curl -X POST "https://litellm-prod.apps.maas.redhatworkshops.io/model/new" \
    -H "Authorization: Bearer $NEW_MASTER_KEY" \
    -H "Content-Type: application/json" \
    -d "$model"
done

The AWS credentials used for Bedrock are stored as OCP Secret environment variables and are therefore not encrypted by the LiteLLM master key. They are unaffected by key rotation. Only the model entries registered via the /model/new API (which may include Vertex AI credentials) need to be re-registered.