Guide

What You’re Seeing

This demo is a split-screen dashboard where an AI agent plays a honeycomb word game in real time.

Left panel — The Game. A live WordSwarm puzzle. Letters sit on 17 hexagonal cells and words are formed by dragging across adjacent cells.
Right panel — The Agent. Controls, live stats, and a scrolling log of the AI agent’s actions and reasoning as it plays autonomously.

When you press START AGENT, a Python process launches on the server. It observes the board, searches for words, and submits them — all without ever seeing the answer key.

How It Works

Agent observes: reads the board state (letters, hints, visible cells) via the Blind API
Solver enumerates: DFS through the 17-node adjacency graph finds all valid letter paths (3-6 letters)
Dictionary match: each path is checked against a 1,677-word dictionary in ~4ms
Hint matching: found words are matched against the hint list (first letter + length + revealed letters)
LLM resolves ambiguity: when multiple dictionary words match the same hint, the LLM reasons about which one fits
Agent submits: the chosen word is queued as a simulated drag sequence and the game client replays it

Game Mechanics

Objective

Find hidden words on the honeycomb before time runs out. Each correct word raises the honey level. If the honey drops to zero, the round ends.

How Words Work

Words are 3-6 letters long, laid across adjacent hexagonal cells
Each cell can only be used once per word
The word list shows hints: the first letter is always visible; remaining letters are revealed as words are solved
A correct word lights up green; an incorrect guess shows a red X and drains honey

Scoring & Levels

Solve enough words to advance to the next level
Each level generates a new puzzle with a fresh set of words
Your score accumulates across levels
The timer and honey meter add pressure — speed matters

What to Observe

As the agent plays, pay attention to the stats panel and log output. Here is what each metric tells you:

Speed Metrics

Metric	What It Means
TTFT	Time to First Token — how long before the model starts responding. Bounded by network latency and model load.
Latency	Total wall-clock time for each LLM call, including all token generation.
Tokens/sec	Output throughput. Higher means the model generates text faster.

Metric

What It Means

TTFT

Time to First Token — how long before the model starts responding. Bounded by network latency and model load.

Latency

Total wall-clock time for each LLM call, including all token generation.

Tokens/sec

Output throughput. Higher means the model generates text faster.

Cost Metrics

Metric	What It Means
Tokens In	Prompt tokens sent to the model (context, instructions, game state).
Tokens Out	Completion tokens generated by the model (reasoning, tool calls).
Total Tokens	Combined usage. In production, this drives cost.

Metric

What It Means

Tokens In

Prompt tokens sent to the model (context, instructions, game state).

Tokens Out

Completion tokens generated by the model (reasoning, tool calls).

Total Tokens

Combined usage. In production, this drives cost.

Effectiveness Metrics

Metric	What It Means
Words Found	How many words the agent successfully submitted.
Solver Runs	How many times the path-enumeration solver scanned the board.
LLM Calls	Total invocations. More calls = more reasoning steps. Watch the ratio of calls to words found.

Metric

What It Means

Words Found

How many words the agent successfully submitted.

Solver Runs

How many times the path-enumeration solver scanned the board.

LLM Calls

Total invocations. More calls = more reasoning steps. Watch the ratio of calls to words found.

Reasoning Transparency

Every agent action is logged with full reasoning:

[Agent] Calling tool: find_words
[Agent] Found 12 candidates, 8 certain, 4 ambiguous
[Agent] Submitting 8 safe words...
[Agent] Calling tool: submit_word_by_name("HASTE")
  → LLM resolved ambiguity: HASTE vs PASTE — chose HASTE based on
    hint H____ (first letter matches)

This transparency is crucial for understanding and trusting agentic AI systems.

Key Concepts

Agentic AI

Traditional AI answers questions. Agentic AI takes actions in a loop: observe, reason, act, repeat.

Traditional AI	Agentic AI
Responds to individual prompts	Pursues goals autonomously
Generates text or answers	Makes decisions and takes actions
Stateless interactions	Maintains context across steps
Requires human direction each step	Plans and executes multi-step tasks
Static behavior	Adapts to changing circumstances

Traditional AI

Agentic AI

Responds to individual prompts

Pursues goals autonomously

Generates text or answers

Makes decisions and takes actions

Stateless interactions

Maintains context across steps

Requires human direction each step

Plans and executes multi-step tasks

Static behavior

Adapts to changing circumstances

This agent uses the ReAct (Reason + Act) pattern powered by LangGraph. At each step it decides which tool to call — observe the board, run the solver, or submit a word — based on the current game state and its own prior reasoning.

Unlike a script, the agent adapts. If the solver returns ambiguous matches, the LLM reasons about which word fits best. If the board changes mid-turn, it re-observes and re-plans.

Reasoning Models

Some models (like kimi-k2-5) use internal "chain of thought" before producing a visible answer. This shows up as reasoning tokens — tokens the model generates for itself, not shown to the user.

Reasoning models often produce better decisions (fewer wrong guesses) but consume more tokens and take longer per call. Watch the Tokens Out counter — a reasoning model may show high output even when the visible response is short.

The trade-off: accuracy vs. speed. A reasoning model may solve the puzzle in fewer attempts but take longer per move. A faster model may guess more but act quickly.

Use the /think and /no-think prompting patterns to control reasoning behavior at runtime. Module 1 covers this in detail.

Performance & Efficiency

The demo measures what matters in production AI systems:

Latency — Can the agent act fast enough to keep the honey level from dropping? This mirrors real-time requirements in production (chat, automation, monitoring).
Token efficiency — How many tokens does it take to find each word? Fewer tokens per result = lower cost at scale.
Tool use — The agent has specialized tools (solver, observer). Good agents call the right tool at the right time instead of reasoning from scratch every step.
Blind mode — The agent never sees the answer key. It must discover words through graph traversal and dictionary lookup, just like a human player. This demonstrates real-world constraints where AI operates with incomplete information.

Real-World Applications

The concepts demonstrated in this demo apply to production AI systems:

DevOps & SRE — Autonomous incident response, self-healing infrastructure, intelligent resource scaling
Business process automation — Intelligent workflow routing, dynamic resource allocation, adaptive scheduling
Data engineering — Autonomous pipeline management, intelligent quality monitoring, self-tuning operations
Customer service — Multi-turn problem solving, context-aware support agents, proactive issue detection

Powered by Red Hat OpenShift AI

The models are served through Red Hat OpenShift AI Model As A Service (MaaS):

Scalable model serving for multiple concurrent requests via vLLM
Low latency inference enabling real-time agent decision-making
Enterprise security with API authentication and network policies
Optimized serving with quantized models for cost efficiency