Module 4: Tracing and MLflow

Metrics tell you what is happening, but traces tell you why. In this module, you’ll explore MLflow tracing to gain end-to-end visibility into the Fed Aura Capital multi-agent mortgage system. You’ll trace the requests you made in Module 1 across LangGraph agents and tool calls, closing the observability gap you experienced firsthand.

In this module, you’re wearing both hats: SREs use traces to diagnose latency and failures, while AI Developers use them to understand agent reasoning and tool behavior.

Learning objectives

By the end of this module, you’ll be able to:

  • Navigate the MLflow UI to find and analyze request traces

  • Correlate traces back to user interactions from Module 1

  • Analyze multi-agent execution timelines, tool calls, and LLM invocations

  • Understand how MLflow captures user identity, inputs, outputs, and latency

Introduction to MLflow tracing

MLflow Tracing is an OpenTelemetry-compatible LLM observability solution that captures inputs, outputs, and metadata for each step of a request. For Fed Aura Capital, this means visibility into:

  • The complete path of a loan application across all 5 agents

  • Every LLM inference call with prompt and response

  • All Model Context Protocol (MCP) tool invocations and their results

  • Agent decision points and handoffs

Key concepts

  • Trace: A complete record of a request’s journey through the system

  • Span: A single operation within a trace (e.g., one LLM call, one tool invocation)

  • Parent-child relationships: Spans form a tree showing the call hierarchy

  • Attributes: Metadata attached to spans (tokens used, model name, etc.)

Exercise 1: Access the MLflow tracking server

Red Hat OpenShift AI includes MLflow as part of the platform. Let’s access the MLflow UI.

  1. Access the MLflow UI directly from the MLflow Console tab at the top of this page:

    Showroom tabs showing MLflow Console tab

    Authentication is handled via OpenShift OAuth. Log in with your OpenShift credentials: username user1, password openshift.

    Alternatively, you can access MLflow from the OpenShift Console. Click the application launcher (waffle icon) in the top-right corner. Under OpenShift Self Managed Services, click MLflow:

    OpenShift Console application launcher showing MLflow under OpenShift Self Managed Services
  2. Select your workspace from the Select workspace dropdown in the top-left corner. Choose wksp-user1. Your workspace contains the traces and experiments for your mortgage-ai app.

    MLflow UI welcome page showing workspace selection and core features
    Red Hat OpenShift AI (RHOAI) deploys a single shared MLflow instance that provides namespace-based isolation through workspaces. Each data science project (OpenShift namespace) maps to its own MLflow workspace, giving teams logically separated experiments, traces, registered models, and prompts, all while sharing a single tracking server.
  3. Once in your workspace, explore the MLflow interface:

    • Tracing: Capture and debug LLM interactions and agent workflows (our focus)

    • Evaluation: Measure and compare LLM quality with built-in and custom scorers

    • Prompts: Version control and manage prompts with aliases across teams

RHOAI 3.4 ships with MLflow 3.10.1, but may be updated in future releases.

How does tracing work?

When you deployed the mortgage-ai application in Module 1, tracing was already enabled. You never had to configure it manually. Every LLM call, tool invocation, and agent decision was recorded automatically. Here’s how.

MLflow autolog: one-line tracing

MLflow Tracing integrates with 40+ popular LLM and AI agent frameworks, offering a one-line automatic tracing experience. No manual span creation, no decorator boilerplate, no SDK wiring.

The mortgage-ai application uses LangChain autologging. In observability.py, the entire tracing setup is:

import mlflow
import mlflow.langchain

mlflow.set_tracking_uri(settings.MLFLOW_TRACKING_URI)
mlflow.set_experiment(settings.MLFLOW_EXPERIMENT_NAME)
mlflow.langchain.autolog()  (1)
1 This single line enables automatic tracing for all LangChain and LangGraph operations: every LLM call, tool invocation, and agent decision is captured as spans without any code changes to the agents themselves.

MLflow’s automatic tracing provides:

  • Zero code changes for basic observability: autolog hooks into the framework at the library level

  • Unified traces across multi-framework apps: LangChain, LangGraph, and custom tools all appear in a single trace

  • Rich metadata including inputs, outputs, token counts, model names, and latency

  • Production-ready scaling: traces are sent asynchronously to avoid blocking the application

OpenTelemetry compatibility

MLflow Tracing is built on OpenTelemetry, the industry standard for distributed tracing. This means you can also export traces to other backends (Jaeger, Zipkin, Grafana Tempo) alongside MLflow.

Exercise 2: Find your first trace

In Module 1, you clicked Explore Products and chatted with the Prospect Agent. That conversation looked like a simple exchange, but MLflow captured every step. Let’s find that trace.

  1. In MLflow, click Experiments in the left sidebar, then select the mortgage-ai experiment. This is the experiment configured by the mortgage-ai application where all traces are recorded.

  2. Click Traces in the left sidebar. You’ll see a list of all traced requests, sorted by most recent:

    MLflow Traces list showing captured requests including Explore Products and portfolio health queries

    Each row shows the Trace ID, the Request (what the user asked), the Response (what the agent returned), Execution time, and State. Notice the trace at the bottom, "What mortgage products do you offer? I’d like…​", this is the Explore Products conversation from Module 1.

  3. Click on that trace to open its Summary view:

    MLflow trace summary showing inputs with user identity and the span sequence of ChatOpenAI and tool calls

    The Summary tab reveals what was invisible in Module 1:

    • Inputs: The user context passed to the agent

    • Span sequence: The operations that executed (ChatOpenAI was called, tool invocations, and follow-up LLM calls)

    • Outputs: The complete response returned to the user, including metadata

Exercise 3: Close the observability gap

Remember Module 1, Exercise 3? You signed in as CEO David Park, asked "What is my portfolio health?", and got a detailed response. We pointed out the observability gap: you had no way of knowing which agents processed your question, which tools were invoked, or how long each step took.

Now let’s close that gap.

CEO Executive Dashboard showing the portfolio health question and agent response from Module 1

Find the CEO trace

  1. Return to the MLflow Traces list and locate the trace for "what is my portfolio health":

    MLflow Traces list with the portfolio health trace highlighted

    Notice the execution time: 8.07s. That’s the total end-to-end latency for a question that seemed to be answered instantly in the chat UI.

Examine user identity tracking

  1. Click on the trace to open the Summary tab:

    MLflow trace summary showing David Park user identity and the span sequence with ChatOpenAI and ceo_pipeline_summary calls

    MLflow captures the full user context with every trace:

    • user_id: d1a2b3c4-e5f6-7890-abcd-ef1234567804, the authenticated user’s UUID

    • user_email: david.park@example.com, traceable identity

    • user_name: David Park, the CEO persona

    Below the inputs, the span sequence shows the agent’s execution plan: ChatOpenAI was called (the LLM decides what to do), then ceo_pipeline_summary was called (the tool fetches data), then ChatOpenAI was called again (the LLM formulates the response).

    If you don’t see a tool call (e.g., ceo_pipeline_summary) in the span sequence, go back to the Mortgage AI App and ask the same question again. LLMs are non-deterministic: even with an optimized system prompt, the model may occasionally answer from its training data without invoking the tool. Repeat until you see a trace that includes the tool call, then continue with the next steps.

Explore the execution timeline

  1. Click the Details & Timeline tab, then click Show execution timeline to see the full trace breakdown:

    MLflow Details and Timeline view showing the full execution tree from LangGraph through input shield agent ChatOpenAI tool calls and output shield

    The trace tree shows the complete execution path: LangGraphinput_shieldagentChatOpenAI (1.26s, decides to call a tool) → tool_authceo_pipeline_summary (17ms, fetches pipeline data) → agentChatOpenAI (6.44s, composes the response) → output_shield. Now you can answer every question from Module 1’s observability challenge.

    Full span breakdown (click to expand)
    Span Duration What It Does

    LangGraph

    8.08s

    Root orchestrator: manages the full agent workflow

    input_shield

    26.45ms

    Safety filter: checks the input for harmful content

    agent (first)

    1.33s

    LangGraph agent node: routes to the LLM

    ChatOpenAI (first)

    1.26s

    LLM call: decides to call the ceo_pipeline_summary tool

    should_continue

    0.75ms

    Decision point: should the agent keep processing?

    tool_auth

    21.50ms

    Authorization check: does the CEO role have access to this tool?

    toolsceo_pipeline_summary

    17.11ms

    Tool execution: queries the database for pipeline data

    agent (second)

    6.52s

    LangGraph agent node: routes the tool result back to the LLM

    ChatOpenAI (second)

    6.44s

    LLM call: formulates a natural-language response from the raw pipeline data

    output_shield

    Safety filter: checks the output before returning to the user

Inspect the tool call

  1. In the trace tree, click on the ceo_pipeline_summary span to see the tool’s inputs and outputs:

    MLflow span detail for ceo_pipeline_summary showing input days=90 and output with pipeline summary data
    • Inputs: days: 90, the tool queries a 90-day window of pipeline data

    • Outputs: "Pipeline Summary (90-day window): Total active applications: 38…​", the raw data the LLM used to compose its response

    This is where the agent got its numbers. If the CEO’s response ever contains incorrect data, you can trace it directly to this tool’s output and verify whether the issue is in the tool logic or the LLM’s interpretation.

Inspect the LLM call

  1. Click on the final ChatOpenAI span to see the actual LLM invocation:

    MLflow span detail for ChatOpenAI showing model gpt-oss-120b with system prompt and completion
    • Model: gpt-oss-120b, the exact model serving this request

    • Inputs → messages: The system prompt ("You are the Fed Aura Capital executive assistant") and the conversation context including the tool result

    • Outputs → choices: The complete LLM response before any post-processing

    This level of detail is essential for debugging: you can see the exact prompt the model received, which model version handled it, and the raw completion before safety filters and formatting.

Keep exploring

Try signing in as other personas (Borrower, Loan Officer, Underwriter) and chatting with them. Then return to MLflow. Every conversation appears automatically as a new trace. Compare how different personas use different tools and models.

Exercise 4: Explore user sessions

Many real-world AI applications use sessions to maintain multi-turn user interactions. MLflow Tracing provides built-in support for associating traces with users and grouping them into sessions. Tracking users and sessions in your LLM application or AI agent provides essential context for understanding user behavior, analyzing conversation flows, and improving personalization.

  1. In MLflow, click Sessions in the left sidebar under Observability:

    MLflow left sidebar showing Sessions under Observability
  2. In the mortgage-ai application, each chat conversation is tracked as a session. MLflow groups all turns within a session together, letting you see the full conversation flow:

MLflow Chat Sessions view showing a session with 7 turns grouped by user and agent

Clicking into a session reveals each turn with its inputs and outputs. You can click View full trace on any turn to jump directly to the detailed trace view:

MLflow session detail showing Turn 1 with inputs outputs and View full trace link

This is especially valuable for debugging multi-turn conversations where agent behavior degrades over time. You can see the full context the agent had at each step.

Module summary

What you accomplished:

  • Accessed MLflow and found traces from your Module 1 interactions

  • Analyzed the CEO trace end-to-end: input shields, LLM calls, tool authorization, tool execution, and output shields

  • Inspected individual spans to see tool inputs/outputs and LLM prompts/completions

Key takeaways:

  • MLflow autolog provides end-to-end tracing with zero code changes: one line enables it

  • Traces capture user identity, execution timelines, and tool/LLM details, closing the observability gap for multi-agent systems

  • Individual span inspection lets you pinpoint issues to specific components: tool logic, LLM prompts, or safety filters

Next steps:

Module 5 will introduce LLM Evaluations (Evals), combining tracing with quality assessment to ensure your agents maintain high-quality outputs alongside system reliability.