Module 4: Open Source Models — Do You Really Need Frontier?

In the previous modules, your AI agents used claude-sonnet-4-6 — a frontier model from Anthropic. It’s capable, but it’s also expensive, and every API call sends your job output data to an external provider.

In this module, you’ll switch every agent to qwen3-235b — a 235-billion parameter open source model running on Red Hat OpenShift AI via MaaS. Same pipeline, same skills, same prompts — different model. Then you’ll evaluate whether the results are good enough.

Exercise 1: Test an Open Source model in the chatbot

Before changing the pipeline, let’s get a feel for how qwen3-235b handles SRE reasoning.

In the Kira tab (log in with user-12345 / deeper-agents if prompted), open any existing ticket
Click the AI Chatbot icon in the bottom right corner
In the model selector dropdown at the top of the chat, select qwen3-235b

Ask the chatbot a question about the ticket — for example:

What is the root cause of this failure and what would you recommend?

Evaluate the response:
- Is the reasoning coherent?
- Does it reference the actual error from the ticket?
- Are the recommendations specific and actionable?
You’re comparing this against the frontier model responses you’ve seen in previous modules. Keep this mental benchmark as we switch the pipeline.

Exercise 2: Switch all agents to Open Source

Open source models have improved dramatically — today’s 235B-parameter models come very close to matching frontier quality on structured tasks, and even smaller models deliver strong results when paired with a well-designed harness and focused skills. That means you can get near-frontier analysis at a fraction of the cost, running entirely on your own infrastructure.

Now let’s swap the entire agent team to qwen3-235b. The process is the same as Module 3 — download a pre-built configuration and patch the ConfigMap.

Click the Terminal tab and log in if needed:

oc login --insecure-skip-tls-verify \
  -u user-12345 \
  -p deeper-agents \
  https://openshift.example.com:6443 \
  --namespace user-12345-agentic

Sample output:

Login successful.
Using project "user-XXXXX-agentic".

Download the open source model configuration:

curl -sL https://gitea.apps.cluster-GUID.opentlc.com/user-12345/agentic-devops-plays/raw/branch/main/configs/subagents-oss-models.yaml \
  -o /tmp/subagents-oss.yaml

Verify the model change — every agent should now use qwen3-235b:

grep "model:" /tmp/subagents-oss.yaml

Expected output:

  model: qwen3-235b
  model: qwen3-235b
  model: qwen3-235b
  model: qwen3-235b
  model: qwen3-235b
  model: qwen3-235b

All six agents (including the reviewer) are now using the open source model.

Patch the ConfigMap:

oc create configmap athena-agent-config \
  --from-file=subagents.yaml=/tmp/subagents-oss.yaml \
  --from-file=AGENTS.md=<(oc get configmap athena-agent-config -o jsonpath='{.data.AGENTS\.md}') \
  --dry-run=client -o yaml | oc apply -f -

Expected output:

configmap/athena-agent-config configured

Roll out the change:

oc rollout restart deployment/athena && \
  oc rollout status deployment/athena --timeout=120s

Expected output:

deployment.apps/athena restarted
...
deployment "athena" successfully rolled out

Exercise 3: Launch two failures

Now let’s give the open source-powered pipeline two different failures to analyze.

In the AAP2 tab (log in with user-12345 / deeper-agents if prompted), navigate to Automation Execution → Templates
Launch 05 Update System Packages by clicking the rocket icon

This is a Linux domain failure — a package repository or Satellite content issue.
Launch 07 Deploy Payment Service by clicking the rocket icon

This is an OpenShift domain failure — a namespace or manifest issue.

Both jobs will fail and the pipeline will process them using qwen3-235b instead of claude-sonnet-4-6. While we wait, let’s talk about why this matters.

Why Open Source models are more than good enough

What is an agentic "harness"?

A harness is the framework that wraps and drives an LLM — managing context, tools, subagent delegation, retries, and output parsing. Deep Agents is the harness in this workshop. The harness is responsible for giving the model exactly the right context at the right time. A well-designed harness makes smaller models perform like larger ones.

Know your models

MaaS gives you a portfolio of Open Source models. Choosing the right one for each agent is a cost/quality/speed tradeoff:

Model Parameters Best for Trade-off

Model	Parameters	Best for	Trade-off
`qwen3-235b`	235B	Complex reasoning, orchestration, root cause analysis	Highest quality, slower, most compute
`gpt-oss-120b`	120B	Strong analysis, good structured output	Good balance of quality and speed
`gpt-oss-20b`	20B	Simple classification, fast triage	Fast and cheap, weaker reasoning
`llama-scout-17b`	17B	Lightweight tool use, structured output	Smallest, fastest, limited reasoning
`minimax-m2`	230B	General purpose, multilingual	Different architecture, worth evaluating

qwen3-235b

235B

Complex reasoning, orchestration, root cause analysis

Highest quality, slower, most compute

gpt-oss-120b

120B

Strong analysis, good structured output

Good balance of quality and speed

gpt-oss-20b

20B

Simple classification, fast triage

Fast and cheap, weaker reasoning

llama-scout-17b

17B

Lightweight tool use, structured output

Smallest, fastest, limited reasoning

minimax-m2

230B

General purpose, multilingual

Different architecture, worth evaluating

In this lab, we use qwen3-235b for all agents — it’s the most capable open source model on MaaS and makes the strongest case that open source can come close to frontier quality. In production, you’d mix models: a larger model for the ops_manager (which needs to classify and delegate accurately) and smaller, cheaper models for routine specialist analysis.

The Agentic Revolution changed the equation

The rise of "vibe coding" and agentic AI development has led to an increasing focus on Open Source models:

Agentic harnesses are better at driving LLMs — frameworks like Deep Agents don’t just send a prompt and hope. They manage context windows, delegate to specialists, provide structured skills, and parse outputs. This structured approach compensates for the capability gap between frontier and open source models
Excellent context helps smaller models — when you give a 235B parameter model a focused system prompt, a specific skill document, and a structured incident context, it performs remarkably well. The model doesn’t need to "know everything" — it just needs to reason about the specific data in front of it
Skills do the heavy lifting — your sre_package_management skill from Module 3 encoded institutional knowledge about Meridian’s Satellite content views and CRB requirements. The model doesn’t need to have learned this from training data — you gave it the answer as context. A 7B model could use that skill effectively

Cost

Frontier model pricing adds up fast in an operations environment:

A single incident analysis involves multiple LLM calls: classification, specialist reasoning, review
At 40+ failures per week, token costs become a significant line item
Open source models running on your own infrastructure have a fixed compute cost — no per-token billing, no usage surprises

Data sovereignty

Data sovereignty — keeping control over where your data is processed and stored — has moved from an abstract compliance concept to a critical strategic mandate. 63% of large enterprises across EMEA cite sovereignty concerns as the greatest barrier to cloud adoption, and over two-thirds have identified it as a top IT priority.

For AI workloads, the stakes are higher. Every API call to a frontier model sends your data to an external provider:

Job output logs may contain hostnames, IP addresses, and system topology
Credential names and configuration details are visible in error messages
In regulated environments like Meridian Financial, this creates data residency risk under frameworks like GDPR and DORA

Red Hat’s position is clear: open source is the foundation of sovereign AI. Open models running on your own infrastructure — via Red Hat OpenShift AI — give you the control that regulated industries require: you choose where inference runs, your data never leaves the platform, and you maintain full auditability of the AI supply chain.

This isn’t theoretical. The agents you’re running right now on MaaS process Meridian’s job output locally — no external API calls, no data leaving your environment.

Security

No API keys to external providers to manage, rotate, or risk leaking
Network traffic stays internal — no egress to external LLM APIs
Audit trail is complete — all inference happens within your controlled environment

Exercise 4: Evaluate the results

The pipeline should have processed both failures by now.

In the Kira tab (log in with user-12345 / deeper-agents if prompted), find the two new tickets
For each ticket, evaluate:
- Root cause accuracy — did the agent correctly identify what failed and why?
- Recommended action quality — are the steps specific and actionable?
- Confidence score — does it seem well-calibrated?
- Overall coherence — does the analysis read like it was written by a competent SRE?

Compare these open source-generated tickets against the frontier-generated tickets from earlier modules

Can you tell which model produced which ticket?
Is the quality difference meaningful enough to justify the cost and data exposure of frontier models?
Where, if anywhere, would you still want a frontier model?

Use the AI Chatbot with qwen3-235b selected to ask follow-up questions about the tickets. Feel free to experiment with the other models too and see their takes on the ticket or more general technical queries.

In the Rocket.Chat tab (log in with user-12345 / deeper-agents if prompted), check #support for the notifications

Takeaways

Switching from frontier to open source models is a configuration change — same pipeline, same skills, same prompts
qwen3-235b running on MaaS provides analysis quality comparable to frontier models for structured SRE tasks
Agentic harnesses like Deep Agents compensate for the capability gap by providing focused context and structured workflows
Skills and system prompts do most of the work — the model provides the reasoning engine
Open source models on MaaS offer three strategic advantages:
- Cost — fixed infrastructure cost vs per-token billing
- Data sovereignty — your data stays within your infrastructure
- Security — no external API keys, no egress traffic, complete audit trail
The choice between frontier and open source isn’t binary — you can mix models per agent based on task complexity