Module 5: Can Open Source Models Take Full Control?

In Module 4, you switched the specialist agents to qwen3-235b. But the ops_manager — the agent that orchestrates everything — was still running on claude-sonnet-4-6. In this module, you’ll switch the manager to OSS too, making the entire pipeline run on open source models.

Then we’ll step back and ask a bigger question: how much autonomy should you give an AI operations team?

Part 1: OSS all the way down

Exercise 1: Switch the ops_manager model

The ops_manager model is configured via an environment variable on the Athena deployment. Changing it is a single command.

  1. Click the Terminal tab and log in if needed:

    oc login --insecure-skip-tls-verify \
      -u user-12345 \
      -p deeper-agents \
      https://openshift.example.com:6443 \
      --namespace user-12345-agentic
  2. Check the current ops_manager model:

    oc get deployment athena -o jsonpath='{.spec.template.spec.containers[0].env}' | python3 -c "
    import sys, json
    envs = json.load(sys.stdin)
    for e in envs:
        if e['name'] == 'OPS_MANAGER_MODEL':
            print(f\"OPS_MANAGER_MODEL = {e['value']}\")
            break
    else:
        print('OPS_MANAGER_MODEL not set (defaults to claude-sonnet-4-6)')
    "

    Expected output:

    OPS_MANAGER_MODEL = openai/claude-sonnet-4-6
  3. Switch the ops_manager to qwen3-235b:

    oc set env deployment/athena OPS_MANAGER_MODEL=qwen3-235b

    Expected output:

    deployment.apps/athena updated

    This single command updates the environment variable and automatically triggers a rolling restart — the new pod comes up with the OSS model. No Helm, no ConfigMap editing, no image rebuild.

    This is another reason why AI agents belong on OpenShift and Kubernetes. The platform gives you operational primitives — env vars, ConfigMaps, PVCs, rolling restarts — that make AI systems configurable, observable, and manageable using the same tools your team already knows. oc set env is the same command you’d use to reconfigure any application.

  4. Confirm the model was set:

    oc get deployment athena -o jsonpath='{.spec.template.spec.containers[0].env}' | python3 -c "
    import sys, json
    envs = json.load(sys.stdin)
    for e in envs:
        if e['name'] == 'OPS_MANAGER_MODEL':
            print(f\"OPS_MANAGER_MODEL = {e['value']}\")
            break
    "

    Expected output:

    OPS_MANAGER_MODEL = qwen3-235b
  5. Wait for the rollout:

    oc rollout status deployment/athena --timeout=120s

    Expected output:

    deployment "athena" successfully rolled out

Your entire pipeline — manager, specialists, and reviewer — is now running on open source models.

Exercise 2: Test the fully OSS pipeline

  1. In the AAP2 tab (log in with user-12345 / deeper-agents if prompted), navigate to Automation Execution → Templates

  2. Launch 04 Start Web Server by clicking the rocket icon rocket icon

    This is a systemd/service failure — the sre_linux agent will handle it.

  3. Launch 06 Verify Service DNS by clicking the rocket icon rocket icon

    This is a DNS resolution failure — the sre_networking agent will handle it.

  4. Wait 1-3 minutes for the pipeline to process both failures

  5. In the Kira tab (log in with user-12345 / deeper-agents if prompted), examine the new tickets:

    • The entire analysis — from classification by the ops_manager through specialist investigation to reviewer validation — was performed by qwen3-235b

    • Compare the quality against tickets from earlier modules. Is the ops_manager routing accurately? Are the specialist analyses still coherent?

    Use the AI Chatbot to discuss the results with qwen3-235b
  6. In the Rocket.Chat tab (log in with user-12345 / deeper-agents if prompted), check #support for the notifications

Part 2: Human in the loop — how much autonomy is right?

You’ve now seen the full pipeline handle multiple failure types across multiple domains — with both frontier and open source models. Every ticket was created, analyzed, and posted automatically.

But notice: every ticket has a status of open. No ticket was automatically closed. No remediation was automatically applied. A human — you — has to review and act on each one.

This is Human in the Loop (HITL) — the most conservative autonomy model. Let’s examine the spectrum.

The autonomy spectrum

Pattern What the AI does What the human does

Human in the Loop (current)

Triage, analyze, create ticket, notify

Review every ticket, decide on action, close

Human on the Loop

Triage, analyze, act on low-risk issues, notify

Monitor activity, intervene on exceptions

Human out of the Loop

Full autonomous response for known patterns

Review audit logs periodically, set policy

Exercise 3: Review your tickets

  1. In the Kira tab, browse through all the tickets created during this workshop

  2. For each ticket, note the risk and confidence scores

  3. Ask yourself these questions for each ticket:

    • If this ticket has low risk and high confidence (> 90%), would you be comfortable with the AI closing it automatically?

    • What about medium risk with high confidence?

    • Where do you draw the line?

The case for stepping back

Consider the tickets you’ve reviewed:

  • Low risk, high confidence — the agent correctly identified a package naming issue and recommended a one-line fix. Does this really need a human to click "close"? In a team handling 40+ tickets per week, these manual reviews cost hours of SRE time

  • High risk, any confidence — a credential-related failure affecting production authentication. Here, you absolutely want a human reviewing the recommended action before anything changes

The answer isn’t binary. A well-designed system lets you configure the autonomy level per risk category:

  • Low risk + high confidence → auto-close with audit trail (Human out of the Loop)

  • Medium risk + high confidence → auto-triage, human approves action (Human on the Loop)

  • High risk or low confidence → full human review required (Human in the Loop)

The compliance requirement

At Meridian Financial, the key constraint isn’t whether the AI can act autonomously — it’s whether every action is traceable. The Financial Conduct Authority doesn’t require a human to click every button. They require:

  • A complete audit trail of what was done and why

  • Evidence that the decision-making process is sound

  • Clear escalation paths when confidence is low

  • Human oversight proportional to risk

An AI agent that logs everything it does — including its reasoning, confidence, and the evidence it used — can satisfy these requirements while operating with more autonomy than a purely HITL model.

Takeaways

  • The entire pipeline — manager, specialists, reviewer — runs effectively on open source models

  • Switching the ops_manager model is a single oc set env command — Kubernetes makes AI systems operationally manageable

  • HITL is a starting point, not the end state — autonomy should be proportional to risk and confidence

  • The three autonomy patterns (in/on/out of the loop) are not mutually exclusive — you can apply different patterns to different risk levels

  • Compliance requires traceability, not manual intervention — well-logged AI decisions can satisfy regulators

  • The combination of OSS models + configurable autonomy + full audit trail is the path to production-ready AIOps