Workshop details

Timing and schedule

Full workshop (1 hour 30 minutes)

  • Module 1: The Agentic App and Why Observability Matters (15 minutes)

  • Module 2: Observability Pillars, Concepts, and Personas (10 minutes)

  • Module 3: Metrics and Logs for Agentic Applications (15 minutes)

  • Module 4: Tracing and MLflow (20 minutes)

  • Module 5: LLM Evaluations (15 minutes)

  • Module 6: From Development to Production (15 minutes)

Abbreviated workshop (45 minutes)

  • Module 1: The Agentic App and Why Observability Matters (10 minutes)

  • Module 3: Metrics and Logs for Agentic Applications (15 minutes)

  • Module 4: Tracing and MLflow (20 minutes)

Technical requirements

Software versions

  • Red Hat OpenShift Container Platform 4.20

  • Red Hat OpenShift AI 3.4

  • MLflow 3.10.1

  • LangGraph/LangChain (latest stable)

  • Grafana/Perses for dashboards

  • Web browser (Chrome, Firefox, Safari, Edge)

Environment access

Participants need access to:

  • Red Hat OpenShift AI cluster (provided during workshop)

  • MLflow tracking server

  • Grafana dashboards

  • Terminal/command line environment

  • Code editor (VS Code recommended)

Network requirements

  • Internet connectivity for accessing documentation and external resources

  • Access to the OpenShift cluster and its services

  • Red Hat Demo Platform network access (specific URLs provided)

Environment setup

Pre-workshop checklist

OpenShift access confirmed - Test login credentials at https://console-openshift-console.apps.cluster.example.com (Module 1 includes a guided walkthrough)
CLI tools installed - Verify oc client installation
Python environment ready - Python 3.11+ with pip
Workshop repository cloned - Clone the multi-agent loan app repository
Network connectivity verified - Test access to required URLs

Setup validation

Participants should run these commands to verify setup:

# Verify OpenShift CLI
oc version

# Login to OpenShift cluster
oc login --insecure-skip-tls-verify $(oc whoami --show-server) -u user1 -p openshift

# Verify project access
oc project

# Test connectivity to MLflow
curl -s https://mlflow-redhat-ods-applications.apps.cluster.example.com/health | head -5

Troubleshooting guide

Common setup issues

Problem: "error: You must be logged in to the server (Unauthorized)" → Solution: Re-run the oc login command with correct credentials. Verify username and password.

Problem: "oc: command not found" → Solution: Download and install the OpenShift CLI from https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz. Add to your PATH.

Problem: "Unable to connect to the server: dial tcp: lookup api.cluster…​" → Solution: Verify network connectivity. Check if VPN is required. Contact workshop facilitator.

Problem: "Permission denied when accessing MLflow UI" → Solution: Ensure you’re logged into OpenShift. MLflow uses OpenShift OAuth for authentication.

During workshop support

  • Encourage participants to help each other

  • Use the bastion host for CLI access if local setup fails: ssh lab-user@bastion.example.com

  • Have backup environments ready for technical difficulties

  • Use screen sharing for complex troubleshooting

Follow-up resources

Next steps for participants

Additional learning paths

  • Intermediate: Advanced MLflow features, custom scorers, trace analysis

  • Advanced: Building custom observability pipelines, OpenTelemetry integration

  • Certification: Red Hat Certified Specialist in AI/ML

Glossary

Key terms used in this workshop (click to expand)
Term Definition

AgentOps

Agent Operations, the discipline of monitoring, tracing, evaluating, and maintaining AI agent systems in production

AI Agent

A system that uses an LLM to reason about tasks, decide which tools to call, and take autonomous actions

LLM

Large Language Model, an AI model trained on large text datasets that can generate and understand natural language

MCP

Model Context Protocol, a standard for connecting AI agents to external tools and data sources

LangGraph

A framework for building stateful, multi-agent AI workflows, built on top of LangChain

RAG

Retrieval-Augmented Generation, a pattern that enhances LLM responses by retrieving relevant documents before generating answers

Trace

A complete record of a request’s journey through a distributed system, composed of spans

Span

A single operation within a trace (e.g., 1 LLM call, 1 tool invocation)

Scorer

A function that evaluates the quality of an agent’s response (deterministic or LLM-powered)

Inner Loop

Manual, developer-driven evaluation workflow (e.g., running evaluations from a Jupyter notebook)

Outer Loop

Automated, platform-driven evaluation workflow (e.g., scheduled AI Pipelines)

RBAC

Role-Based Access Control, restricting system access based on user roles

pgvector

A PostgreSQL extension that enables vector similarity search for embeddings

PromQL

Prometheus Query Language, used to query metrics in Grafana dashboards

LogQL

Log Query Language, used to query logs in LokiStack/Grafana Loki

Authors and contributors

Primary Author: Red Hat AI AI Business Unit
Authors: Roberto Carratalá, Taylor Smith
Last Updated: April 2026
Workshop Version: 1.0