Module 1: The Problem Domain

Who you are

You are a senior SRE at Meridian Financial, a financial services firm running hundreds of Ansible Automation Platform jobs per week across hybrid VM and OpenShift infrastructure.

Overnight job failures are piling up, compliance deadlines are looming, and the on-call team is stretched thin. The gap between a failure occurring and a human investigating it is where MTTR accumulates — and where an AI-powered assistant can add immediate value.

Your mission in this workshop: build and evolve an AI-driven incident response assistant that can triage, investigate, and help resolve infrastructure failures — so your team can focus on what matters.

Explore AAP2 — Ansible Automation Platform

In the AAP2 tab (log in with user-12345 / deeper-agents if prompted)

This is where Meridian Financial’s automation runs — and where failures originate. Your AI assistant will receive webhooks from AAP2 whenever a job fails.

Step 1: Open the sidebar

After logging in, you’ll see the AAP2 dashboard. If the left sidebar is collapsed, click the hamburger menu (three horizontal lines) in the top left to expand it. Then click Automation Execution to reveal the navigation options.

AAP2 dashboard with sidebar and Automation Execution highlighted

Step 2: Find Jobs and Templates

With the Automation Execution section expanded, you’ll see two key menu items:

Jobs — the execution history of every job that has run
Templates — the job templates that define what automation can be launched

These are the two screens you’ll use most in this workshop.

Step 3: Explore Job Templates

Click Templates in the sidebar.

You’ll see 10 numbered job templates — 01 Ping RHEL VM through 10 Install Python 3.14. These represent the day-to-day automation that Meridian’s SRE team runs against their infrastructure: health checks, package updates, application deployments, and compliance scans.

Each template is wired to a playbook in Gitea. Some will succeed. Some will fail in specific, diagnosable ways — and those failures are what your AI assistant needs to handle.

Notice the ordering — they progress from simple (a basic ping) to complex (a health check using a custom execution environment). You’ll work through these failures throughout the workshop.

AAP2 showing all 9 numbered job templates

Step 4: Check the Jobs history

Click Jobs in the sidebar.

This is the execution history — every time a job template runs, it appears here. Notice that one job has already failed (shown with a red "Failed" status). This is not an accident — the environment automatically ran a job during provisioning so you’d have something to investigate.

That failed job triggered a chain of events that you’re about to trace through Kira and Rocket.Chat.

AAP2 Jobs list showing the failed seed job

Explore Kira — Trouble Ticketing

In the Kira tab (log in with user-12345 / deeper-agents if prompted)

Kira is the ticketing system where your AI assistant creates structured incident tickets. Think of it as a purpose-built Jira for infrastructure operations.

The AI agent uses a conservative "temperature" setting optimized for the deterministic behavior desired in AIOps. Your ticket content may differ slightly from the screenshots — this is expected.

Step 1: Review the Dashboard

The dashboard gives you the operational overview at a glance:

Open / In Progress / Resolved — ticket lifecycle counts
Avg Confidence — how confident the agent is across all analyses
Tickets by Area — categorized by domain (linux, kubernetes, networking, application)
Risk Distribution — how severe the AI agent assessed each failure to be
Recent Tickets — the latest tickets with title, area, skills, risk, confidence, and status

You should see at least one open ticket from the seed job that ran during provisioning.

Kira dashboard showing open tickets and area distribution

Step 2: Open a ticket

Click into an open ticket in the Recent Tickets table.

This ticket was created automatically by the Athena AI agent running in your namespace. Examine the fields:

Title — a human-readable summary of the failure
Risk / Confidence / Stage — the agent’s assessment of severity, certainty, and environment
Skills — the expertise areas the agent identified as relevant
Affected systems — specific hosts or services involved
Recommended Action — numbered steps to fix the issue
Analysis — the agent’s root cause analysis with evidence from the job output
Issues — sub-issues broken down by severity (high, medium, low)

This is what an AI-generated incident ticket looks like. Your job in this workshop is to understand how it got here — and how to make it better.

Kira ticket detail showing full AI analysis

Step 3: Try the AI Chatbot

Notice the chat icon in the bottom right corner of the ticket screen. Click it to open the AI Chatbot.

This chatbot is connected to Red Hat OpenShift AI MaaS (Model as a Service) and can answer questions about the ticket in context. Try asking it something like "Explain how AAP2 machine credentials work" — it will respond using the LLM with the ticket context available.

Explore Rocket.Chat — ChatOps

In the Rocket.Chat tab (log in with user-12345 / deeper-agents if prompted)

Rocket.Chat is the ChatOps notification sink — where the AI assistant posts summaries of failures it has analyzed. In Meridian’s environment, this is how the on-call team stays informed without watching dashboards.

In your environment, this is Rocket.Chat. In production, this could just as easily be Slack, Microsoft Teams, or any webhook-capable messaging platform. The pattern is the same — the AI agent posts structured notifications to a channel that humans monitor.

Step 1: Find the #support channel

After logging in, you’ll see the Rocket.Chat home screen. In the left sidebar, click the #support channel.

Rocket.Chat showing the #support channel in the sidebar

Step 2: Review the AI notification

In the #support channel, you should see a notification from the aiops bot. This matches the ticket you just saw in Kira — the same failure, summarized for ChatOps:

The job name and failure classification
Risk level and confidence score
A brief recommended action
A direct link to the full ticket in Kira

This is the end of the automated pipeline — the AI agent analyzed the failure, created a ticket, and notified the team, all without human intervention.

Rocket.Chat #support channel with AI-generated notification

How it all connects

You’ve now seen three pieces of the puzzle: a failed job in AAP2, a ticket in Kira, and a notification in Rocket.Chat. But how did that happen? Here’s the flow:

This is the Deep Agents pattern in action:

AAP2 job fails — a playbook encounters an error during execution
Webhook fires — AAP2 sends a failure notification to Athena’s webhook endpoint
Athena catches the webhook — retrieves the full job output and error details from the AAP2 API
Classifies the failure — determines whether this is an Ansible, Linux, OpenShift, or Networking issue
Delegates to a specialist — spins up the appropriate sre_ subagent (e.g., sre_linux for a package issue)
Root cause analysis — the specialist agent analyzes the error, reads the playbook, and identifies the cause
Creates a ticket — structured ticket in Kira with title, area, confidence, risk, and recommended action
Notifies the team — posts a summary to the Rocket.Chat #support channel

Everything you saw in AAP2, Kira, and Rocket.Chat was produced by this automated pipeline — no human intervention required.

The Compliance Angle

At Meridian Financial, every action taken on a production failure must be traceable. Auditors from the Financial Conduct Authority review change management and incident response records regularly. Missing an SLA window doesn’t just mean a late fix — it triggers regulatory penalties and mandatory remediation reports.

Today, those reports are written manually — 2 to 3 hours per major incident, often reconstructed after the fact from memory and chat logs. An AI agent that acts without logging is a compliance liability. An agent that logs everything it does — the classification rationale, the specialist analysis, the confidence score, the reviewer’s verdict — becomes a compliance asset.

This is a theme that runs throughout the workshop:

Module 2: You’ll see the structured TicketPayload that every agent produces — title, evidence, root cause, risk, confidence, recommended action. This is audit-ready output by design.
Module 4: You’ll explore data sovereignty — why sending job output to external LLM providers creates regulatory risk, and how open source models on Red Hat OpenShift AI keep your data within your controlled infrastructure.
Module 5: You’ll configure human-in-the-loop gates and evaluate when the agent can act autonomously vs when it needs human approval — the governance layer that regulators want to see.

You will build an agent that is not just fast, but auditable.

Data Sovereignty

There is another dimension to compliance that goes beyond audit trails: where does your data go when an AI agent processes it?

Every API call to a frontier model sends your job output — error messages, hostnames, IP addresses, credential names, system topology — to an external provider. In a regulated environment like Meridian Financial, this creates real data residency risk under frameworks like GDPR and DORA.

This is not a niche concern. 63% of large enterprises across EMEA cite sovereignty concerns as the greatest barrier to cloud adoption, and over two-thirds have identified it as a top IT priority for the next 18 months.

Red Hat’s position is that open source is the foundation of sovereign AI. Open source models running on your own infrastructure — via Red Hat OpenShift AI — give you the control that regulated industries require: you choose where inference runs, your data never leaves the platform, and you maintain full auditability of the AI supply chain.

In Module 4, you’ll switch the entire agent team from a frontier model to an open source model running on MaaS — same pipeline, same skills, same prompts, but with your data staying entirely within your controlled environment.

Takeaways

You are an SRE at Meridian Financial, responsible for investigating and resolving infrastructure failures
AAP2 job templates represent your automation — Jobs shows the execution history, including failures
Kira tickets are created automatically by the Athena AI agent with structured root cause analysis
Rocket.Chat (or any ChatOps platform) is the notification sink for the on-call team
The Deep Agents pipeline connects all three: failure → classification → specialist analysis → ticket + notification
Everything your agent does must be traceable for compliance — this is a constraint, not a nice-to-have
Data sovereignty matters — open source models on your own infrastructure keep sensitive operational data out of external APIs