Module 3: Metrics and logs for agentic applications
Metrics tell you what is happening in your system. Logs tell you what exactly happened and when. In this module, you’ll explore a pre-deployed Grafana dashboard that monitors the mortgage-ai multi-agent system, then investigate application logs using LokiStack in the OpenShift Console.
| In this module, you’re wearing your SRE / Platform Engineering hat, focusing on system health, request rates, error rates, and operational logs. |
Learning objectives
By the end of this module, you’ll be able to:
-
Identify the key metrics that matter for multi-agent AI systems
-
Navigate the pre-deployed Mortgage-AI Grafana dashboard
-
Explore and filter application logs using LokiStack and Log Query Language (LogQL)
-
Explain how metrics and logs work together for agentic observability
What makes agentic metrics different?
Traditional applications need request rate, error rate, and latency. Agentic applications need all of that plus metrics that capture LLM behavior, agent routing, and tool execution. Here are the 6 metrics that matter most:
| Metric | Prometheus Name | What It Tells You |
|---|---|---|
Request Rate & Errors |
|
How much traffic are we getting? How many requests are failing? |
Latency Percentiles |
|
How fast are we responding? Are there outliers? |
LLM Token Usage |
|
How many tokens are we burning? What’s the cost per conversation? |
LLM Inference Latency |
|
How long does the model take to respond? Is it the bottleneck? |
Tool Call Success Rate |
|
Are agent tools working? Which ones are failing? |
Agent Routing |
|
Are queries routing to the fast or capable model? Is the classifier working? |
The mortgage-ai API exposes all these at GET /metrics in Prometheus format. OpenShift User Workload Monitoring scrapes them automatically.
Complete metrics reference (click to expand)
The mortgage-ai backend exposes 10 custom metrics defined in packages/api/src/core/metrics.py:
| Prometheus Name | Type | Labels |
|---|---|---|
|
Counter |
|
|
Histogram |
|
|
Counter |
|
|
Counter |
|
|
Counter |
|
|
Histogram |
|
|
Gauge |
|
|
Counter |
|
|
Counter |
|
|
Counter |
|
Additionally, prometheus-fastapi-instrumentator provides automatic HTTP metrics (http_requests_total, http_request_duration_seconds_bucket).
Exercise 1: Explore the Mortgage-AI Grafana dashboard
| You can access the Grafana dashboard directly from the Grafana tab at the top of this page, without navigating through the OpenShift Console. |
A Grafana instance is pre-deployed in your workspace namespace with a Mortgage-AI Backend Metrics dashboard already configured. It connects to 2 Prometheus datasources:
-
User Workload Monitoring: scrapes the mortgage-ai application metrics (default)
-
Red Hat OpenShift AI (RHOAI) Prometheus: provides model serving metrics from OpenShift AI
Let’s open it and see what’s there.
-
Open Grafana using the application launcher (waffle icon) in the top-right corner of the OpenShift Console and click Grafana (wksp-user1) under Third Party Services:
You can also find it in Networking > Routes in the
wksp-user1project and click thegrafana-routeLocation link:Alternatively, from the terminal:
oc login --insecure-skip-tls-verify $(oc whoami --show-server) -u user1 -p openshift GRAFANA_ROUTE=$(oc get route -n wksp-user1 grafana-route -o jsonpath='{.spec.host}') echo "Grafana: https://${GRAFANA_ROUTE}" -
Grafana uses OpenShift OAuth for authentication. When prompted, log in with your OpenShift credentials: username
user1, passwordopenshift: -
Navigate to the Mortgage-AI Backend Metrics dashboard.
Overview and HTTP panels
The top of the dashboard gives you an at-a-glance view of system health:
What you’re seeing:
-
Total Requests (Last Hour): how many requests hit the API. In the screenshot, ~983 requests.
-
Request Rate: current throughput in requests per second.
-
Error Rate (%): percentage of 5xx responses. Green means healthy.
-
P95 Latency: 95% of requests complete faster than this value. Under 100ms is excellent.
Below the overview, the HTTP panels break down traffic by endpoint and status code. Look for:
-
Which endpoint gets the most traffic (likely
/api/prospect/chat) -
Any yellow (4xx) or red (5xx) bars in the HTTP Status Distribution
-
Which endpoints have the highest P95 latency
Prometheus Query Language (PromQL) details for Overview panels (click to expand)
-
Request Rate:
sum(rate(http_requests_total{namespace=~"$namespace"}[5m])) -
Error Rate:
(sum(rate(http_requests_total{namespace=~"$namespace", status=~"5.."}[5m])) / sum(rate(http_requests_total{namespace=~"$namespace"}[5m])) * 100) or vector(0) -
P95 Latency:
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{namespace=~"$namespace"}[5m])) by (le))
rate() calculates per-second averages over a time window. histogram_quantile() computes percentiles from histogram buckets.
AI agent metrics
Scroll down to the AI Agent Metrics row. This is where it gets interesting. These metrics are unique to agentic applications:
The 4 stat panels at the top show:
-
Active Chat Sessions: live WebSocket connections. How many users are chatting right now?
-
Total LLM Tokens: cumulative tokens consumed since the API started. This is your cost proxy, in the screenshot, 18.5K tokens.
-
Tool Call Success Rate: are agent tools working? 100% means everything is healthy.
-
Avg LLM Latency: median time for an LLM inference call. 1.75s in the screenshot.
Below the stats, the detail panels show:
-
LLM Token Usage by Model: which model burns the most tokens? Is input or output dominant?
-
LLM Inference Latency by Persona: which agent persona is slowest? A slow underwriter agent might indicate complex compliance tool calls.
-
Agent Routing Distribution: pie chart of simple vs complex routing. A high ratio of complex queries means the capable model is doing most of the work.
-
Tool Calls by Status: green for success, red for failures. A spike in red means tool integrations are breaking.
PromQL details for AI Agent Metrics (click to expand)
-
LLM Tokens:
sum(increase(llm_tokens_total{namespace=~"$namespace"}[1h])) -
Tool Success Rate:
(sum(tool_calls_total{namespace=~"$namespace", status="success"}) / sum(tool_calls_total{namespace=~"$namespace"}) * 100) or vector(100) -
LLM Latency:
histogram_quantile(0.50, sum(rate(llm_inference_duration_seconds_bucket{namespace=~"$namespace"}[5m])) by (le)) -
Routing Distribution:
sum(agent_routing_total{namespace=~"$namespace"}) by (complexity)
The dashboard is managed as a GrafanaDashboard custom resource and version-controlled via GitOps. Run oc get grafanadashboard -n wksp-user1 to see it.
|
The traffic you generated in Module 1 (Explore Products, CEO portfolio health) is already reflected in these panels. Set the time range to Last 1 hour to see it.
Exercise 2: Explore application logs with LokiStack
Metrics show aggregated health, but when something goes wrong you need the actual log events. OpenShift’s LokiStack automatically collects everything your application writes to STDOUT or STDERR, no code changes needed. Let’s explore the mortgage-ai logs.
Navigate to the log viewer
-
Return to the OpenShift Console (use the OCP Console tab at the top of this page). Navigate to Observe → Logs.
You’ll see a Forbidden — Missing permissions to get logs error:
This is expected. Because you’re logged in as a regular user (
user1), OpenShift restricts log access to only your namespace for security. Cluster-wide log access is reserved for cluster administrators.
Filter to mortgage-ai
-
Click Show Query to open the query editor:
Enter this LogQL query to scope the logs to your namespace:
{ log_type="application", kubernetes_namespace_name="wksp-user1" }Click Run Query. You should now see only logs from the mortgage-ai pods.
log_type="application"filters out infrastructure logs (kubelet, CRI-O). Thekubernetes_namespace_namelabel is injected automatically by the log collector.
Explore the UI logs
-
Narrow the view to the UI container and filter for
/api/calls. This strips out health probes and static assets, showing only real user interactions:{ log_type="application", kubernetes_namespace_name="wksp-user1", kubernetes_container_name="ui" } |= "/api/"You’ll see nginx access logs like:
10.131.0.19 - - [08/Apr/2026:00:26:48 +0000] "GET /api/chat HTTP/1.1" 101 2342 10.128.2.19 - - [08/Apr/2026:00:26:48 +0000] "GET /api/applications/2590/conditions HTTP/1.1" 200 1097 10.131.0.2 - - [08/Apr/2026:00:26:50 +0000] "GET / HTTP/1.1" 200 771 "-" "kube-probe/1.33"Notice the mix of real user traffic (
/api/chat,/api/applications/…) and Kubernetes health probes (kube-probe). The UI proxies API calls, so you see the complete request flow from the user’s browser.
Filter for specific traffic
-
Add a pipeline filter to find only chat-related entries:
{ log_type="application", kubernetes_namespace_name="wksp-user1", kubernetes_container_name="ui" } |= "/api/chat"The
|=operator is a case-sensitive substring filter. Try also|= "404"to find failed requests, or|= "500"to find server errors.After running the query, you’ll see the matching log entries with full metadata:
The traffic you generated in Module 1 should already appear in these logs. WebSocket connections (chat sessions) show as HTTP 101 upgrades.
Compare with API container logs (click to expand)
The API container (api) shows the server-side processing: FastAPI access logs, MLflow warnings, and agent responses:
{ log_type="application", kubernetes_namespace_name="wksp-user1", kubernetes_container_name="api" }
Sample output:
INFO: 10.131.1.212:45042 - "GET /api/applications/2590/risk-assessment HTTP/1.1" 200 OK
WARNING mlflow.utils.autologging_utils: Encountered unexpected error during autologging...
You can see the same /api/ requests in both the UI and API logs. The UI shows the incoming request, and the API shows how it was processed.
| In production, containers are ephemeral. When a pod restarts or gets redeployed, its local logs vanish. LokiStack persists logs beyond pod lifetime, making post-incident investigation possible. For multi-agent systems, correlating logs across API, UI, and database containers is essential for understanding end-to-end request flow. |
Module summary
What you accomplished:
-
Explored the Grafana dashboard with HTTP and AI-specific agent metrics
-
Filtered application logs with LokiStack and LogQL queries
-
Correlated log entries across API and UI containers
Key takeaways:
-
Agentic apps need metrics beyond HTTP: token usage, inference latency, routing decisions, and tool success rates
-
Logs provide the detail that metrics aggregate away. This detail is essential for debugging ephemeral containers.
-
Metrics answer "how much?" and "how fast?". Logs answer "what exactly happened?"
Next steps:
Module 4 will introduce MLflow tracing, the key to understanding why things happen, not just what is happening.









