Voice Agents: Build an AI Voice Agent Workshop
Welcome to the Voice Agents workshop!
The live demo is a voice-enabled AI agent that can take pizza orders via spoken conversation.
What you’ll learn
In this workshop, you will:
-
Deploy speech models — set up Whisper (STT) and Higgs-Audio (TTS) on GPU MIG slices using KServe and vLLM
-
Build a voice agent — deploy a multi-agent LangGraph application that takes pizza orders via spoken conversation
-
Measure TTS performance — understand gen-x (generation speed vs real-time) and why GPU selection matters for voice quality
-
Add observability — monitor agent interactions, latency, and model behaviour with MLflow tracing
-
Apply guardrails — configure TrustyAI FMS and NeMo for prompt injection detection and content safety
The app
The pizza shop voice agent follows the voice sandwich pattern — STT and TTS layers wrap an LLM agent graph. You speak into the microphone, a supervisor agent routes your request to specialist agents (pizza, order, delivery), and the response is spoken back to you in real-time.
Who this is for
This workshop is designed for AI/ML engineers, platform engineers, and developers who want hands-on experience with:
-
Voice-enabled AI applications (STT + LLM + TTS)
-
Multi-agent orchestration with LangGraph
-
Model serving on OpenShift AI with vLLM and KServe
-
AI safety guardrails (TrustyAI, NeMo)
Experience level: Beginner to Intermediate
Prerequisites
-
Access to an OpenShift AI cluster with GPU nodes
-
A Hugging Face account and API token
-
A web browser with microphone access (for the voice demo)