Module 4: Open Source Models — Do You Really Need Frontier?
In the previous modules, your AI agents used claude-sonnet-4-6 — a frontier model from Anthropic. It’s capable, but it’s also expensive, and every API call sends your job output data to an external provider.
In this module, you’ll switch every agent to qwen3-235b — a 235-billion parameter open source model running on Red Hat OpenShift AI via MaaS. Same pipeline, same skills, same prompts — different model. Then you’ll evaluate whether the results are good enough.
Exercise 1: Test an Open Source model in the chatbot
Before changing the pipeline, let’s get a feel for how qwen3-235b handles SRE reasoning.
-
In the Kira tab (log in with
user-12345/deeper-agentsif prompted), open any existing ticket -
Click the AI Chatbot icon in the bottom right corner
-
In the model selector dropdown at the top of the chat, select qwen3-235b
-
Ask the chatbot a question about the ticket — for example:
What is the root cause of this failure and what would you recommend?
-
Evaluate the response:
-
Is the reasoning coherent?
-
Does it reference the actual error from the ticket?
-
Are the recommendations specific and actionable?
You’re comparing this against the frontier model responses you’ve seen in previous modules. Keep this mental benchmark as we switch the pipeline.
-
Exercise 2: Switch all agents to Open Source
Open source models have improved dramatically — today’s 235B-parameter models come very close to matching frontier quality on structured tasks, and even smaller models deliver strong results when paired with a well-designed harness and focused skills. That means you can get near-frontier analysis at a fraction of the cost, running entirely on your own infrastructure.
Now let’s swap the entire agent team to qwen3-235b. The process is the same as Module 3 — download a pre-built configuration and patch the ConfigMap.
-
Click the Terminal tab and log in if needed:
oc login --insecure-skip-tls-verify \ -u user-12345 \ -p deeper-agents \ https://openshift.example.com:6443 \ --namespace user-12345-agenticSample output:
Login successful. Using project "user-XXXXX-agentic".
-
Download the open source model configuration:
curl -sL https://gitea.apps.cluster-GUID.opentlc.com/user-12345/agentic-devops-plays/raw/branch/main/configs/subagents-oss-models.yaml \ -o /tmp/subagents-oss.yaml -
Verify the model change — every agent should now use
qwen3-235b:grep "model:" /tmp/subagents-oss.yamlExpected output:
model: qwen3-235b model: qwen3-235b model: qwen3-235b model: qwen3-235b model: qwen3-235b model: qwen3-235b
All six agents (including the reviewer) are now using the open source model.
-
Patch the ConfigMap:
oc create configmap athena-agent-config \ --from-file=subagents.yaml=/tmp/subagents-oss.yaml \ --from-file=AGENTS.md=<(oc get configmap athena-agent-config -o jsonpath='{.data.AGENTS\.md}') \ --dry-run=client -o yaml | oc apply -f -Expected output:
configmap/athena-agent-config configured
-
Roll out the change:
oc rollout restart deployment/athena && \ oc rollout status deployment/athena --timeout=120sExpected output:
deployment.apps/athena restarted ... deployment "athena" successfully rolled out
Exercise 3: Launch two failures
Now let’s give the open source-powered pipeline two different failures to analyze.
-
In the AAP2 tab (log in with
user-12345/deeper-agentsif prompted), navigate to Automation Execution → Templates -
Launch 05 Update System Packages by clicking the rocket icon

This is a Linux domain failure — a package repository or Satellite content issue.
-
Launch 07 Deploy Payment Service by clicking the rocket icon

This is an OpenShift domain failure — a namespace or manifest issue.
Both jobs will fail and the pipeline will process them using qwen3-235b instead of claude-sonnet-4-6. While we wait, let’s talk about why this matters.
Why Open Source models are more than good enough
|
What is an agentic "harness"?
A harness is the framework that wraps and drives an LLM — managing context, tools, subagent delegation, retries, and output parsing. Deep Agents is the harness in this workshop. The harness is responsible for giving the model exactly the right context at the right time. A well-designed harness makes smaller models perform like larger ones. |
Know your models
MaaS gives you a portfolio of Open Source models. Choosing the right one for each agent is a cost/quality/speed tradeoff:
| Model | Parameters | Best for | Trade-off |
|---|---|---|---|
|
235B |
Complex reasoning, orchestration, root cause analysis |
Highest quality, slower, most compute |
|
120B |
Strong analysis, good structured output |
Good balance of quality and speed |
|
20B |
Simple classification, fast triage |
Fast and cheap, weaker reasoning |
|
17B |
Lightweight tool use, structured output |
Smallest, fastest, limited reasoning |
|
230B |
General purpose, multilingual |
Different architecture, worth evaluating |
In this lab, we use qwen3-235b for all agents — it’s the most capable open source model on MaaS and makes the strongest case that open source can come close to frontier quality. In production, you’d mix models: a larger model for the ops_manager (which needs to classify and delegate accurately) and smaller, cheaper models for routine specialist analysis.
The Agentic Revolution changed the equation
The rise of "vibe coding" and agentic AI development has led to an increasing focus on Open Source models:
-
Agentic harnesses are better at driving LLMs — frameworks like Deep Agents don’t just send a prompt and hope. They manage context windows, delegate to specialists, provide structured skills, and parse outputs. This structured approach compensates for the capability gap between frontier and open source models
-
Excellent context helps smaller models — when you give a 235B parameter model a focused system prompt, a specific skill document, and a structured incident context, it performs remarkably well. The model doesn’t need to "know everything" — it just needs to reason about the specific data in front of it
-
Skills do the heavy lifting — your
sre_package_managementskill from Module 3 encoded institutional knowledge about Meridian’s Satellite content views and CRB requirements. The model doesn’t need to have learned this from training data — you gave it the answer as context. A 7B model could use that skill effectively
Cost
Frontier model pricing adds up fast in an operations environment:
-
A single incident analysis involves multiple LLM calls: classification, specialist reasoning, review
-
At 40+ failures per week, token costs become a significant line item
-
Open source models running on your own infrastructure have a fixed compute cost — no per-token billing, no usage surprises
Data sovereignty
Data sovereignty — keeping control over where your data is processed and stored — has moved from an abstract compliance concept to a critical strategic mandate. 63% of large enterprises across EMEA cite sovereignty concerns as the greatest barrier to cloud adoption, and over two-thirds have identified it as a top IT priority.
For AI workloads, the stakes are higher. Every API call to a frontier model sends your data to an external provider:
-
Job output logs may contain hostnames, IP addresses, and system topology
-
Credential names and configuration details are visible in error messages
-
In regulated environments like Meridian Financial, this creates data residency risk under frameworks like GDPR and DORA
Red Hat’s position is clear: open source is the foundation of sovereign AI. Open models running on your own infrastructure — via Red Hat OpenShift AI — give you the control that regulated industries require: you choose where inference runs, your data never leaves the platform, and you maintain full auditability of the AI supply chain.
This isn’t theoretical. The agents you’re running right now on MaaS process Meridian’s job output locally — no external API calls, no data leaving your environment.
Exercise 4: Evaluate the results
The pipeline should have processed both failures by now.
-
In the Kira tab (log in with
user-12345/deeper-agentsif prompted), find the two new tickets -
For each ticket, evaluate:
-
Root cause accuracy — did the agent correctly identify what failed and why?
-
Recommended action quality — are the steps specific and actionable?
-
Confidence score — does it seem well-calibrated?
-
Overall coherence — does the analysis read like it was written by a competent SRE?
-
-
Compare these open source-generated tickets against the frontier-generated tickets from earlier modules
-
Can you tell which model produced which ticket?
-
Is the quality difference meaningful enough to justify the cost and data exposure of frontier models?
-
Where, if anywhere, would you still want a frontier model?
Use the AI Chatbot with qwen3-235bselected to ask follow-up questions about the tickets. Feel free to experiment with the other models too and see their takes on the ticket or more general technical queries. -
-
In the Rocket.Chat tab (log in with
user-12345/deeper-agentsif prompted), check#supportfor the notifications
Takeaways
-
Switching from frontier to open source models is a configuration change — same pipeline, same skills, same prompts
-
qwen3-235brunning on MaaS provides analysis quality comparable to frontier models for structured SRE tasks -
Agentic harnesses like Deep Agents compensate for the capability gap by providing focused context and structured workflows
-
Skills and system prompts do most of the work — the model provides the reasoning engine
-
Open source models on MaaS offer three strategic advantages:
-
Cost — fixed infrastructure cost vs per-token billing
-
Data sovereignty — your data stays within your infrastructure
-
Security — no external API keys, no egress traffic, complete audit trail
-
-
The choice between frontier and open source isn’t binary — you can mix models per agent based on task complexity