Module 5: Can Open Source Models Take Full Control?

In Module 4, you switched the specialist agents to qwen3-235b. But the ops_manager — the agent that orchestrates everything — was still running on claude-sonnet-4-6. In this module, you’ll switch the manager to OSS too, making the entire pipeline run on open source models.

Then we’ll step back and ask a bigger question: how much autonomy should you give an AI operations team?

Part 1: OSS all the way down

Exercise 1: Switch the ops_manager model

The ops_manager model is configured via an environment variable on the Athena deployment. Changing it is a single command.

Click the Terminal tab and log in if needed:

oc login --insecure-skip-tls-verify \
  -u user-12345 \
  -p deeper-agents \
  https://openshift.example.com:6443 \
  --namespace user-12345-agentic

Check the current ops_manager model:

oc get deployment athena -o jsonpath='{.spec.template.spec.containers[0].env}' | python3 -c "
import sys, json
envs = json.load(sys.stdin)
for e in envs:
    if e['name'] == 'OPS_MANAGER_MODEL':
        print(f\"OPS_MANAGER_MODEL = {e['value']}\")
        break
else:
    print('OPS_MANAGER_MODEL not set (defaults to claude-sonnet-4-6)')
"

Expected output:

OPS_MANAGER_MODEL = openai/claude-sonnet-4-6

Switch the ops_manager to qwen3-235b:

oc set env deployment/athena OPS_MANAGER_MODEL=qwen3-235b

Expected output:

deployment.apps/athena updated

This single command updates the environment variable and automatically triggers a rolling restart — the new pod comes up with the OSS model. No Helm, no ConfigMap editing, no image rebuild.

This is another reason why AI agents belong on OpenShift and Kubernetes. The platform gives you operational primitives — env vars, ConfigMaps, PVCs, rolling restarts — that make AI systems configurable, observable, and manageable using the same tools your team already knows. oc set env is the same command you’d use to reconfigure any application.

Confirm the model was set:

oc get deployment athena -o jsonpath='{.spec.template.spec.containers[0].env}' | python3 -c "
import sys, json
envs = json.load(sys.stdin)
for e in envs:
    if e['name'] == 'OPS_MANAGER_MODEL':
        print(f\"OPS_MANAGER_MODEL = {e['value']}\")
        break
"

Expected output:

OPS_MANAGER_MODEL = qwen3-235b

Wait for the rollout:

oc rollout status deployment/athena --timeout=120s

Expected output:

deployment "athena" successfully rolled out

Your entire pipeline — manager, specialists, and reviewer — is now running on open source models.

Exercise 2: Test the fully OSS pipeline

In the AAP2 tab (log in with user-12345 / deeper-agents if prompted), navigate to Automation Execution → Templates
Launch 04 Start Web Server by clicking the rocket icon

This is a systemd/service failure — the sre_linux agent will handle it.
Launch 06 Verify Service DNS by clicking the rocket icon

This is a DNS resolution failure — the sre_networking agent will handle it.
Wait 1-3 minutes for the pipeline to process both failures
In the Kira tab (log in with user-12345 / deeper-agents if prompted), examine the new tickets:
- The entire analysis — from classification by the ops_manager through specialist investigation to reviewer validation — was performed by qwen3-235b
- Compare the quality against tickets from earlier modules. Is the ops_manager routing accurately? Are the specialist analyses still coherent?
Use the AI Chatbot to discuss the results with qwen3-235b
In the Rocket.Chat tab (log in with user-12345 / deeper-agents if prompted), check #support for the notifications

Part 2: Human in the loop — how much autonomy is right?

You’ve now seen the full pipeline handle multiple failure types across multiple domains — with both frontier and open source models. Every ticket was created, analyzed, and posted automatically.

But notice: every ticket has a status of open. No ticket was automatically closed. No remediation was automatically applied. A human — you — has to review and act on each one.

This is Human in the Loop (HITL) — the most conservative autonomy model. Let’s examine the spectrum.

The autonomy spectrum

Pattern	What the AI does	What the human does
Human in the Loop (current)	Triage, analyze, create ticket, notify	Review every ticket, decide on action, close
Human on the Loop	Triage, analyze, act on low-risk issues, notify	Monitor activity, intervene on exceptions
Human out of the Loop	Full autonomous response for known patterns	Review audit logs periodically, set policy

Pattern

What the AI does

What the human does

Human in the Loop (current)

Triage, analyze, create ticket, notify

Review every ticket, decide on action, close

Human on the Loop

Triage, analyze, act on low-risk issues, notify

Monitor activity, intervene on exceptions

Human out of the Loop

Full autonomous response for known patterns

Review audit logs periodically, set policy

Exercise 3: Review your tickets

In the Kira tab, browse through all the tickets created during this workshop
For each ticket, note the risk and confidence scores
Ask yourself these questions for each ticket:
- If this ticket has low risk and high confidence (> 90%), would you be comfortable with the AI closing it automatically?
- What about medium risk with high confidence?
- Where do you draw the line?

The case for stepping back

Consider the tickets you’ve reviewed:

Low risk, high confidence — the agent correctly identified a package naming issue and recommended a one-line fix. Does this really need a human to click "close"? In a team handling 40+ tickets per week, these manual reviews cost hours of SRE time
High risk, any confidence — a credential-related failure affecting production authentication. Here, you absolutely want a human reviewing the recommended action before anything changes

The answer isn’t binary. A well-designed system lets you configure the autonomy level per risk category:

Low risk + high confidence → auto-close with audit trail (Human out of the Loop)
Medium risk + high confidence → auto-triage, human approves action (Human on the Loop)
High risk or low confidence → full human review required (Human in the Loop)

The compliance requirement

At Meridian Financial, the key constraint isn’t whether the AI can act autonomously — it’s whether every action is traceable. The Financial Conduct Authority doesn’t require a human to click every button. They require:

A complete audit trail of what was done and why
Evidence that the decision-making process is sound
Clear escalation paths when confidence is low
Human oversight proportional to risk

An AI agent that logs everything it does — including its reasoning, confidence, and the evidence it used — can satisfy these requirements while operating with more autonomy than a purely HITL model.

Takeaways

The entire pipeline — manager, specialists, reviewer — runs effectively on open source models
Switching the ops_manager model is a single oc set env command — Kubernetes makes AI systems operationally manageable
HITL is a starting point, not the end state — autonomy should be proportional to risk and confidence
The three autonomy patterns (in/on/out of the loop) are not mutually exclusive — you can apply different patterns to different risk levels
Compliance requires traceability, not manual intervention — well-logged AI decisions can satisfy regulators
The combination of OSS models + configurable autonomy + full audit trail is the path to production-ready AIOps