Pizza Shop Demo

Putting it all together — deploy the voice agent app, connect the speech models to an LLM agent, and have a spoken conversation about pizza.

The app follows the voice sandwich pattern:

Browser Mic → [Whisper STT] → Text → [LLM Agent Graph] → Text → [Higgs-Audio TTS] → Speaker
Component Description Port

Backend

Python WebSocket server (LangGraph agent)

8765

Frontend

Next.js UI served by Nginx

8080

Run the Notebook

The fastest way to deploy and test the pizza shop app is the notebook. It gathers credentials, runs Helm install, and tests the WebSocket — all from your workbench.

Prerequisites

Open a Terminal in JupyterLab and log in to OpenShift:

git clone https://github.com/rhai-code/voice-agents.git

Open and run

In the File Explorer, navigate to voice-agents/content/notebooks/ and open:

Run all cells (Run > Run All Cells). The notebook will:

  1. Verify speech models are running

  2. Gather MaaS LLM and STT/TTS credentials

  3. Deploy the app via Helm chart

  4. Wait for backend and frontend pods

  5. Test the agent via WebSocket (text input, no mic needed)

  6. Provide a link to the Web UI

If the notebook completes successfully, open the Web UI link and try the voice experience. The rest of this page covers the agent architecture and manual deployment steps.


The Web UI

Pizza Shop UI

The UI connects to the backend WebSocket automatically. Use the TALK button to record with your microphone, or type a message in the text box.

Pizza Shop Conversation

The conversation history shows agent routing — the supervisor decides which specialist agent handles each request.


Agent Architecture

Agent Graph

The pizza agent graph shows how different agents collaborate:

Agent Graph

Creating Agents

Agents are created with specific roles and responsibilities:

Agent Creation

Each agent has:

  • A defined role (supervisor, pizza type selector, order calculator, delivery handler)

  • Specific system prompts

  • Access to tools and functions

  • Connection to the LLM

Supervisor Model Configuration

The supervisor agent coordinates all other agents:

Supervisor Model Configuration

Invoking Agents

Agents are invoked through the LangGraph framework:

Invoking Agents

Conversation State

The system maintains conversation state across interactions:

Conversation State

State includes: current pizza type, order details, delivery information, conversation history, and interrupt flags.

Agent Prompts

Supervisor

The supervisor manages the flow between agents:

Supervisor Agent Prompt

Pizza Type

Handles pizza selection:

Pizza Agent Prompt

Order Total

Calculates order costs:

Order Agent Prompt

Delivery

Manages delivery details:

Delivery Agent Prompt

Agent Interrupts

The system handles interruptions gracefully:

Agent Interrupt

Interrupts allow canceling operations, switching agents, handling user corrections, and managing error states.

LangGraph and Agent Handoffs

Supervisor Routing

The supervisor routes requests to the appropriate specialist agent:

Supervisor Routing

Routing decisions are based on user intent, current conversation state, and agent capabilities.

Wait for User Speech

The system pauses to wait for user input:

Wait for User Speech

Handoff Flow

  1. User speaks → Speech converted to text (STT)

  2. Supervisor receives text and analyzes intent

  3. Supervisor routes to appropriate agent

  4. Agent processes request and generates response

  5. Response converted to speech (TTS)

  6. System waits for next user input


Step-by-step Deployment

Gather Credentials

The backend needs tokens for the LLM (MaaS), STT (Whisper), and TTS (Higgs-Audio).

Get your MaaS LLM token (replace <your-maas-sa> with your SA name):

oc get sa -n maas-default-gateway-tier-enterprise
LLM_TOKEN=$(oc create token <your-maas-sa> -n maas-default-gateway-tier-enterprise \
  --audience=maas-default-gateway-sa --duration=24h)

Get the Whisper STT token:

STT_TOKEN=$(oc get secret whisper-sa-whisper-sa -o jsonpath='{.data.token}' | base64 -d)
WHISPER_URL=$(oc get llmisvc whisper -o jsonpath='{.status.addresses[?(@.type=="gateway-external")].url}' \
  | tr ' ' '\n' | grep https)

Helm Install

Deploy the app with the collected credentials:

helm upgrade --install ai-voice-agent ai-voice-agent/deploy/chart \
  --namespace ai-roadshow \
  --set backend.env.MODEL_NAME=llama-4-scout-17b-16e-w4a16 \
  --set backend.env.BASE_URL=https://maas.apps.ocp.cloud.rhai-tmm.dev/prelude-maas/llama-4-scout-17b-16e-w4a16/v1 \
  --set backend.env.TTS_URL=http://higgs-audio-predictor:8080/v1 \
  --set backend.env.TTS_MODEL=higgs-audio-v2-generation-3B-base \
  --set backend.env.STT_URL="${WHISPER_URL}/v1/audio/transcriptions" \
  --set backend.env.STT_MODEL=whisper \
  --set backend.secret.API_KEY="${LLM_TOKEN}" \
  --set backend.secret.STT_TOKEN="${STT_TOKEN}" \
  --set mlflow.enabled=false \
  --set guardrails.enabled=false \
  --set nemoGuardrails.enabled=false

Wait for Pods

oc rollout status deployment/ai-voice-agent-backend --timeout=120s
oc rollout status deployment/ai-voice-agent-frontend --timeout=120s
oc get pods -l 'app.kubernetes.io/part-of=ai-voice-agent'

Get the App URL

echo "https://$(oc get route ai-voice-agent -o jsonpath='{.spec.host}')"

Open the URL in your browser. The UI auto-connects to the WebSocket backend.

Cleanup

To remove the app (keeps the speech models):

helm uninstall ai-voice-agent -n ai-roadshow