Module 3: Metrics and logs for agentic applications

Metrics tell you what is happening in your system. Logs tell you what exactly happened and when. In this module, you’ll explore a pre-deployed Grafana dashboard that monitors the mortgage-ai multi-agent system, then investigate application logs using LokiStack in the OpenShift Console.

In this module, you’re wearing your SRE / Platform Engineering hat, focusing on system health, request rates, error rates, and operational logs.

Learning objectives

By the end of this module, you’ll be able to:

  • Identify the key metrics that matter for multi-agent AI systems

  • Navigate the pre-deployed Mortgage-AI Grafana dashboard

  • Explore and filter application logs using LokiStack and Log Query Language (LogQL)

  • Explain how metrics and logs work together for agentic observability

What makes agentic metrics different?

Traditional applications need request rate, error rate, and latency. Agentic applications need all of that plus metrics that capture LLM behavior, agent routing, and tool execution. Here are the 6 metrics that matter most:

Metric Prometheus Name What It Tells You

Request Rate & Errors

http_requests_total

How much traffic are we getting? How many requests are failing?

Latency Percentiles

http_request_duration_seconds

How fast are we responding? Are there outliers?

LLM Token Usage

llm_tokens_total

How many tokens are we burning? What’s the cost per conversation?

LLM Inference Latency

llm_inference_duration_seconds

How long does the model take to respond? Is it the bottleneck?

Tool Call Success Rate

tool_calls_total

Are agent tools working? Which ones are failing?

Agent Routing

agent_routing_total

Are queries routing to the fast or capable model? Is the classifier working?

Agentic Metrics - The 6 That Matter showing traditional metrics plus agentic additions with model routing and Prometheus scraping by OpenShift UWM

The mortgage-ai API exposes all these at GET /metrics in Prometheus format. OpenShift User Workload Monitoring scrapes them automatically.

Complete metrics reference (click to expand)

The mortgage-ai backend exposes 10 custom metrics defined in packages/api/src/core/metrics.py:

Prometheus Name Type Labels

llm_tokens_total

Counter

model, direction (input/output), persona

llm_inference_duration_seconds

Histogram

model, persona

agent_routing_total

Counter

persona, complexity (simple/complex)

agent_escalation_total

Counter

persona, reason

tool_calls_total

Counter

tool_name, persona, status (success/error)

tool_call_duration_seconds

Histogram

tool_name, persona

active_chat_sessions

Gauge

persona

chat_messages_total

Counter

persona, direction (inbound/outbound)

loan_applications_total

Counter

status, loan_type (not yet instrumented)

compliance_checks_total

Counter

check_type, result (not yet instrumented)

Additionally, prometheus-fastapi-instrumentator provides automatic HTTP metrics (http_requests_total, http_request_duration_seconds_bucket).

Exercise 1: Explore the Mortgage-AI Grafana dashboard

You can access the Grafana dashboard directly from the Grafana tab at the top of this page, without navigating through the OpenShift Console.

A Grafana instance is pre-deployed in your workspace namespace with a Mortgage-AI Backend Metrics dashboard already configured. It connects to 2 Prometheus datasources:

  • User Workload Monitoring: scrapes the mortgage-ai application metrics (default)

  • Red Hat OpenShift AI (RHOAI) Prometheus: provides model serving metrics from OpenShift AI

Let’s open it and see what’s there.

  1. Open Grafana using the application launcher (waffle icon) in the top-right corner of the OpenShift Console and click Grafana (wksp-user1) under Third Party Services:

    OpenShift Console application launcher showing Grafana under Third Party Services

    You can also find it in Networking > Routes in the wksp-user1 project and click the grafana-route Location link:

    OpenShift Console Routes page showing grafana-route in the wksp-user1 project

    Alternatively, from the terminal:

    oc login --insecure-skip-tls-verify $(oc whoami --show-server) -u user1 -p openshift
    GRAFANA_ROUTE=$(oc get route -n wksp-user1 grafana-route -o jsonpath='{.spec.host}')
    echo "Grafana: https://${GRAFANA_ROUTE}"
  2. Grafana uses OpenShift OAuth for authentication. When prompted, log in with your OpenShift credentials: username user1, password openshift:

    OpenShift SSO login page for Grafana authentication
  3. Navigate to the Mortgage-AI Backend Metrics dashboard.

Overview and HTTP panels

The top of the dashboard gives you an at-a-glance view of system health:

Grafana Overview and HTTP panels showing request rate and latency

What you’re seeing:

  • Total Requests (Last Hour): how many requests hit the API. In the screenshot, ~983 requests.

  • Request Rate: current throughput in requests per second.

  • Error Rate (%): percentage of 5xx responses. Green means healthy.

  • P95 Latency: 95% of requests complete faster than this value. Under 100ms is excellent.

Below the overview, the HTTP panels break down traffic by endpoint and status code. Look for:

  • Which endpoint gets the most traffic (likely /api/prospect/chat)

  • Any yellow (4xx) or red (5xx) bars in the HTTP Status Distribution

  • Which endpoints have the highest P95 latency

Prometheus Query Language (PromQL) details for Overview panels (click to expand)
  • Request Rate: sum(rate(http_requests_total{namespace=~"$namespace"}[5m]))

  • Error Rate: (sum(rate(http_requests_total{namespace=~"$namespace", status=~"5.."}[5m])) / sum(rate(http_requests_total{namespace=~"$namespace"}[5m])) * 100) or vector(0)

  • P95 Latency: histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{namespace=~"$namespace"}[5m])) by (le))

rate() calculates per-second averages over a time window. histogram_quantile() computes percentiles from histogram buckets.

AI agent metrics

Scroll down to the AI Agent Metrics row. This is where it gets interesting. These metrics are unique to agentic applications:

Grafana AI Agent Metrics showing token usage and tool call success rate

The 4 stat panels at the top show:

  • Active Chat Sessions: live WebSocket connections. How many users are chatting right now?

  • Total LLM Tokens: cumulative tokens consumed since the API started. This is your cost proxy, in the screenshot, 18.5K tokens.

  • Tool Call Success Rate: are agent tools working? 100% means everything is healthy.

  • Avg LLM Latency: median time for an LLM inference call. 1.75s in the screenshot.

Below the stats, the detail panels show:

  • LLM Token Usage by Model: which model burns the most tokens? Is input or output dominant?

  • LLM Inference Latency by Persona: which agent persona is slowest? A slow underwriter agent might indicate complex compliance tool calls.

  • Agent Routing Distribution: pie chart of simple vs complex routing. A high ratio of complex queries means the capable model is doing most of the work.

  • Tool Calls by Status: green for success, red for failures. A spike in red means tool integrations are breaking.

PromQL details for AI Agent Metrics (click to expand)
  • LLM Tokens: sum(increase(llm_tokens_total{namespace=~"$namespace"}[1h]))

  • Tool Success Rate: (sum(tool_calls_total{namespace=~"$namespace", status="success"}) / sum(tool_calls_total{namespace=~"$namespace"}) * 100) or vector(100)

  • LLM Latency: histogram_quantile(0.50, sum(rate(llm_inference_duration_seconds_bucket{namespace=~"$namespace"}[5m])) by (le))

  • Routing Distribution: sum(agent_routing_total{namespace=~"$namespace"}) by (complexity)

The dashboard is managed as a GrafanaDashboard custom resource and version-controlled via GitOps. Run oc get grafanadashboard -n wksp-user1 to see it.

The traffic you generated in Module 1 (Explore Products, CEO portfolio health) is already reflected in these panels. Set the time range to Last 1 hour to see it.

Exercise 2: Explore application logs with LokiStack

Metrics show aggregated health, but when something goes wrong you need the actual log events. OpenShift’s LokiStack automatically collects everything your application writes to STDOUT or STDERR, no code changes needed. Let’s explore the mortgage-ai logs.

Navigate to the log viewer

  1. Return to the OpenShift Console (use the OCP Console tab at the top of this page). Navigate to ObserveLogs.

    You’ll see a Forbidden — Missing permissions to get logs error:

    OpenShift Console Observe Logs showing Forbidden error for regular user

    This is expected. Because you’re logged in as a regular user (user1), OpenShift restricts log access to only your namespace for security. Cluster-wide log access is reserved for cluster administrators.

Filter to mortgage-ai

  1. Click Show Query to open the query editor:

    OpenShift Console Logs Show Query button

    Enter this LogQL query to scope the logs to your namespace:

    { log_type="application", kubernetes_namespace_name="wksp-user1" }

    Click Run Query. You should now see only logs from the mortgage-ai pods.

    log_type="application" filters out infrastructure logs (kubelet, CRI-O). The kubernetes_namespace_name label is injected automatically by the log collector.

Explore the UI logs

  1. Narrow the view to the UI container and filter for /api/ calls. This strips out health probes and static assets, showing only real user interactions:

    { log_type="application", kubernetes_namespace_name="wksp-user1", kubernetes_container_name="ui" } |= "/api/"

    You’ll see nginx access logs like:

    10.131.0.19 - - [08/Apr/2026:00:26:48 +0000] "GET /api/chat HTTP/1.1" 101 2342
    10.128.2.19 - - [08/Apr/2026:00:26:48 +0000] "GET /api/applications/2590/conditions HTTP/1.1" 200 1097
    10.131.0.2  - - [08/Apr/2026:00:26:50 +0000] "GET / HTTP/1.1" 200 771 "-" "kube-probe/1.33"

    Notice the mix of real user traffic (/api/chat, /api/applications/…​) and Kubernetes health probes (kube-probe). The UI proxies API calls, so you see the complete request flow from the user’s browser.

Filter for specific traffic

  1. Add a pipeline filter to find only chat-related entries:

    { log_type="application", kubernetes_namespace_name="wksp-user1", kubernetes_container_name="ui" } |= "/api/chat"
    LogQL query filtering mortgage-ai UI logs for chat activity

    The |= operator is a case-sensitive substring filter. Try also |= "404" to find failed requests, or |= "500" to find server errors.

    After running the query, you’ll see the matching log entries with full metadata:

    OpenShift Console Logs showing filtered chat activity results

The traffic you generated in Module 1 should already appear in these logs. WebSocket connections (chat sessions) show as HTTP 101 upgrades.

Compare with API container logs (click to expand)

The API container (api) shows the server-side processing: FastAPI access logs, MLflow warnings, and agent responses:

{ log_type="application", kubernetes_namespace_name="wksp-user1", kubernetes_container_name="api" }

Sample output:

INFO:  10.131.1.212:45042 - "GET /api/applications/2590/risk-assessment HTTP/1.1" 200 OK
WARNING mlflow.utils.autologging_utils: Encountered unexpected error during autologging...

You can see the same /api/ requests in both the UI and API logs. The UI shows the incoming request, and the API shows how it was processed.

In production, containers are ephemeral. When a pod restarts or gets redeployed, its local logs vanish. LokiStack persists logs beyond pod lifetime, making post-incident investigation possible. For multi-agent systems, correlating logs across API, UI, and database containers is essential for understanding end-to-end request flow.

Module summary

What you accomplished:

  • Explored the Grafana dashboard with HTTP and AI-specific agent metrics

  • Filtered application logs with LokiStack and LogQL queries

  • Correlated log entries across API and UI containers

Key takeaways:

  • Agentic apps need metrics beyond HTTP: token usage, inference latency, routing decisions, and tool success rates

  • Logs provide the detail that metrics aggregate away. This detail is essential for debugging ephemeral containers.

  • Metrics answer "how much?" and "how fast?". Logs answer "what exactly happened?"

Next steps:

Module 4 will introduce MLflow tracing, the key to understanding why things happen, not just what is happening.