Backend and MCP Architecture

In previous modules, you explored Llama Stack’s APIs in isolation. Now you’ll see how Llama Stack agents interact with real business functions through the Model Context Protocol (MCP). This module demonstrates how agents can access enterprise capabilities - customer data, financial transactions, order management - through standardized tool interfaces.

From APIs to agent tools

Traditional enterprise applications expose business functions through REST APIs. While powerful, these APIs require developers to write custom integration code, manage authentication, handle errors, and understand domain-specific schemas. Every new application means more integration work.

Llama Stack agents change this paradigm. Instead of hardcoding API integrations, agents can:

Discover tools dynamically at runtime through MCP servers
Reason about which tools to use based on user intent
Invoke business functions naturally using tool calling
Compose multi-step workflows across different systems

The bridge between agents and existing business systems is the Model Context Protocol (MCP).

What is MCP and why does it matter?

The Model Context Protocol is an open standard that allows AI agents to interact with external tools and data sources through a consistent interface. Think of MCP as "API for AI" - while REST APIs are designed for developers, MCP is designed for agents.

MCP solves the agent integration challenge

Without MCP, integrating agents with business systems requires:

Writing custom tool wrappers for every API endpoint
Hardcoding business logic into agent prompts
Managing credentials and authentication separately
Rebuilding tools when APIs change
No standardization across different systems

With MCP, you get:

Standardized tool discovery: Agents ask MCP servers "what can you do?" and receive structured tool descriptions
Unified invocation protocol: All tools are called the same way, regardless of underlying implementation
Security boundary: MCP servers control access, validate requests, and enforce policies
Backend abstraction: Business systems can evolve without breaking agents
Ecosystem compatibility: Any MCP-compliant agent can use any MCP-compliant server

How Llama Stack uses MCP

Llama Stack provides first-class MCP support through its tool runtime:

MCP server registration: Register MCP endpoints as tool groups with Llama Stack
Tool discovery: Llama Stack queries MCP servers for available tools
Agent reasoning: When an agent needs capabilities, Llama Stack makes tools available for invocation
Tool execution: Llama Stack invokes MCP tools and returns results to the agent
Response synthesis: The agent uses tool results to formulate natural language responses

This module demonstrates this pattern with business functions for customer management and financial operations.

Agent-to-business-function architecture

This lab demonstrates how Llama Stack agents access business capabilities through MCP servers. The example uses Fantaco, a fictional company with customer and finance business functions:

graph TB subgraph Users["User Layer"] SUPPORT[Support Agent
Chat Interface] end subgraph AI["AI Platform - OpenShift"] subgraph LLS["Llama Stack Distribution"] AGENT[AI Agent
Multi-turn reasoning] API[Llama Stack APIs
Responses, Tools, Agents] VLLM[vLLM Inference
Qwen3-14B] end end subgraph MCP["MCP Layer - Tool Abstraction"] MCP_CUST[Customer MCP Server
Port: 8001] MCP_FIN[Finance MCP Server
Port: 8002] end subgraph Backend["Backend Microservices"] subgraph CustSvc["Customer Domain"] CUST_API[Customer REST API
FastAPI] CUST_DB[(PostgreSQL
Customer Data)] end subgraph FinSvc["Finance Domain"] FIN_API[Finance REST API
FastAPI] FIN_DB[(PostgreSQL
Orders, Invoices)] end end SUPPORT --> AGENT AGENT --> API API --> VLLM API --> MCP_CUST API --> MCP_FIN MCP_CUST --> CUST_API MCP_FIN --> FIN_API CUST_API --> CUST_DB FIN_API --> FIN_DB style Users fill:#e1f5ff style AI fill:#fff4e1 style LLS fill:#ffe4e1 style MCP fill:#e8f5e9 style Backend fill:#f3e5f5 style CustSvc fill:#e8eaf6 style FinSvc fill:#e8eaf6

Figure 1. Llama Stack agent with MCP business function access

How agents access business functions

Step 1: User makes a request

A user asks a natural language question: "Find the customer with email thomashardy@example.com and show their order history"

Step 2: The LLM reasons about the request

The agent sends the request to the LLM, which analyzes it and determines that 2 capabilities are needed:

Search for a customer by email
Retrieve order history for that customer

Step 3: Tool discovery

Llama Stack queries the registered MCP servers to find available tools:

Customer MCP: search_customers, get_customer
Finance MCP: fetch_order_history, fetch_invoice_history, etc.

Step 4: Agent invokes tools

The agent makes tool calls through Llama Stack’s tool runtime:

Call search_customers(email="thomashardy@example.com") via Customer MCP
Extract customer ID from result
Call fetch_order_history(customer_id="AROUT") via Finance MCP

Step 5: MCP servers execute business logic

Each MCP server:

Validates the request
Calls the appropriate backend API
Formats the response as MCP tool result
Returns data to Llama Stack

Step 6: Agent synthesizes response

The agent receives tool results and generates a natural language response combining customer information with order history

The agent-MCP-business function flow

MCP creates a clean separation between agent reasoning and business logic execution. Agents don’t need to know about REST APIs, database schemas, or authentication mechanisms - they simply invoke tools and receive structured results.

sequenceDiagram participant Agent as Llama Stack Agent participant LLS as Llama Stack Server participant MCP as MCP Server participant API as Backend API participant DB as Database Agent->>LLS: "Find customer with email
thomashardy@example.com" LLS->>LLS: Analyze request
Determine tool needed LLS->>MCP: GET /v1/tools
Discover available tools MCP-->>LLS: [search_customers,
get_customer] LLS->>LLS: Generate tool call
search_customers() LLS->>MCP: POST /v1/tool-runtime/invoke
search_customers(email=...) MCP->>MCP: Validate request
Check permissions MCP->>API: GET /api/customers?
contactEmail=... API->>DB: SELECT * FROM customers
WHERE contact_email LIKE... DB-->>API: Customer record(s) API-->>MCP: JSON response MCP->>MCP: Format as MCP result MCP-->>LLS: ToolInvocationResult LLS->>Agent: Synthesize natural
language response Agent-->>LLS: "Found customer:
Thomas Hardy at
Around the Horn..." Note over Agent,DB: MCP servers translate between agent tool calls
and backend business functions

Figure 2. MCP request flow sequence

Business functions exposed as agent tools

In this lab, 2 MCP servers expose business capabilities to Llama Stack agents:

Customer tools (via Customer MCP Server)

search_customers: Find customers by various criteria with partial matching
get_customer: Retrieve complete customer details by ID

Finance tools (via Finance MCP Server)

fetch_order_history: Get order history with date filtering and pagination
fetch_invoice_history: Get invoice history with date filtering and pagination

Agents invoke these tools naturally through Llama Stack’s tool runtime without knowing about the underlying APIs, databases, or implementation details. The MCP servers handle all the complexity of translating tool calls into business operations.

FantaCo Backend

Deploy Microservices and Database Backend

The backend system consists of two REST APIs, one for Customer and one for Finance. Each has its own Postgres database and the two services are fully independent of each other.

Make sure you are in the correct directory

cd $HOME/fantaco-redhat-one-2026/
pwd

/home/lab-user/fantaco-redhat-one-2026

Install the enterprise backend databases and REST APIs

What is Helm? Helm is the package manager for Kubernetes. A Helm chart bundles all the YAML manifests (Deployments, Services, ConfigMaps, etc.) needed to deploy an application. Running helm install deploys the entire FantaCo backend — two PostgreSQL databases and two FastAPI services — in a single command.

helm install fantaco-app ./helm/fantaco-app

NAME: fantaco-app
LAST DEPLOYED: Fri Dec 12 23:01:21 2025
NAMESPACE: agentic-user1
STATUS: deployed
REVISION: 1
TEST SUITE: None

oc get pods

NAME                                            READY   STATUS    RESTARTS   AGE
fantaco-customer-main-7fd4ddb666-5cngz          1/1     Running   0          3m14s
fantaco-finance-main-75ffddb44b-knj6x           1/1     Running   0          3m14s
llamastack-distribution-vllm-77897d9f8f-xl6gp   1/1     Running   0          51m
postgresql-customer-ff78dffdf-tpj9c             1/1     Running   0          12m
postgresql-finance-689d97894f-2fq89             1/1     Running   0          12m

And wait for the 1/1 Running before continuing

What’s running now? You have 5 pods: the Llama Stack server from the earlier module, two PostgreSQL databases (one for customer data, one for financial data), and two FastAPI microservices that expose REST APIs over those databases. This mirrors a real enterprise setup where business data lives in separate domain-specific services.

Test backend

Verify that you have connectivity to the backend REST endpoints and their databases with some simple curl commands.

CUST_URL=https://$(oc get routes -l app=fantaco-customer-main -o jsonpath="{range .items[*]}{.status.ingress[0].host}{end}")
echo $CUST_URL

http://fantaco-customer-service-default.apps.cluster-frcqw.dynamic.redhatworkshops.io

curl -sS -L "$CUST_URL/api/customers?contactEmail=thomashardy%40example.com" | jq

[
  {
    "customerId": "AROUT",
    "companyName": "Around the Horn",
    "contactName": "Thomas Hardy",
    "contactTitle": "Sales Representative",
    "address": "120 Hanover Sq.",
    "city": "London",
    "region": null,
    "postalCode": "WA1 1DP",
    "country": "UK",
    "phone": "(171) 555-7788",
    "fax": "(171) 555-6750",
    "contactEmail": "thomashardy@example.com",
    "createdAt": "2025-12-13T22:19:49.433401",
    "updatedAt": "2025-12-13T22:19:49.433401"
  }
]

Why test with curl first? Verifying the REST APIs directly confirms the backend is healthy before adding the MCP layer. If something fails later, you’ll know the issue is in the MCP translation — not the underlying data.

FIN_URL=https://$(oc get routes -l app=fantaco-finance-main -o jsonpath="{range .items[*]}{.status.ingress[0].host}{end}")
echo $FIN_URL

http://fantaco-finance-service-default.apps.cluster-frcqw.dynamic.redhatworkshops.io

curl -sS -X POST $FIN_URL/api/finance/orders/history \
   -H "Content-Type: application/json" \
   -d '{
    "customerId": "AROUT",
    "limit": 10
   }' | jq

{
  "data": [
    {
      "id": 8,
      "orderNumber": "ORD-008",
      "customerId": "AROUT",
      "totalAmount": 59.99,
      "status": "PENDING",
      "orderDate": "2024-01-30T15:20:00",
      "createdAt": "2024-01-30T15:20:00",
      "updatedAt": null
    },
    {
      "id": 3,
      "orderNumber": "ORD-003",
      "customerId": "AROUT",
      "totalAmount": 89.99,
      "status": "PENDING",
      "orderDate": "2024-01-25T09:45:00",
      "createdAt": "2024-01-25T09:45:00",
      "updatedAt": null
    },
    {
      "id": 4,
      "orderNumber": "ORD-004",
      "customerId": "AROUT",
      "totalAmount": 199.99,
      "status": "DELIVERED",
      "orderDate": "2024-01-10T16:20:00",
      "createdAt": "2024-01-10T16:20:00",
      "updatedAt": null
    }
  ],
  "success": true,
  "count": 3,
  "message": "Order history retrieved successfully"
}

You’ve now verified that the Fantaco backend microservices are operational and can respond to direct REST API calls. The Customer API provides anagraphic data, while the Finance API delivers transactional information. Both services are backed by PostgreSQL databases containing realistic sample data.

Next, you’ll deploy the MCP servers that will bridge these backend APIs to Llama Stack agents.

Deploy MCP Servers

MCP servers act as the translation layer between Llama Stack’s tool invocation protocol and the backend REST APIs. Each MCP server:

Implements the MCP specification: Provides standardized endpoints for tool discovery and invocation
Wraps backend APIs: Translates MCP tool calls into appropriate REST API requests
Handles authentication and validation: Ensures only authorized operations are performed
Formats responses: Converts API responses into MCP-compatible tool results

The Fantaco deployment includes 2 MCP servers, one for each domain:

The MCP servers don’t contain business logic. They are thin adapters that translate MCP tool invocations into REST API calls. The actual business rules, data validation, and database queries stay in the backend services. This separation means you can add AI capabilities to existing systems without modifying them.

helm install fantaco-mcp ./helm/fantaco-mcp

NAME: fantaco-mcp
LAST DEPLOYED: Fri Dec 12 23:01:44 2025
NAMESPACE: agentic-user1
STATUS: deployed
REVISION: 1
TEST SUITE: None

oc get pods

NAME                                            READY   STATUS    RESTARTS   AGE
fantaco-customer-main-7fd4ddb666-5cngz          1/1     Running   0          25m
fantaco-finance-main-75ffddb44b-knj6x           1/1     Running   0          25m
llamastack-distribution-vllm-77897d9f8f-xl6gp   1/1     Running   0          73m
mcp-customer-6bd8bcfc7b-f85dl                   1/1     Running   0          118s
mcp-finance-75bd497cfd-wtnpt                    1/1     Running   0          118s
postgresql-customer-ff78dffdf-tpj9c             1/1     Running   0          34m
postgresql-finance-689d97894f-2fq89             1/1     Running   0          34m

Summary

In this module, you deployed the infrastructure that enables agents to access business functions:

Deployed backend services

2 microservices (Customer and Finance) with PostgreSQL databases provide realistic business capabilities

Deployed MCP servers

2 MCP servers act as bridges between Llama Stack and the backend services, translating agent tool calls into API requests

Verified the integration

Using curl commands, you confirmed that:

Backend APIs respond to direct REST calls
MCP servers are deployed

Key concepts demonstrated

Agent-to-business-function flow: How agents access enterprise capabilities without knowing implementation details
MCP as abstraction layer: Standardized protocol for tool discovery and invocation
Dynamic tool discovery: Agents learn available capabilities at runtime, not deployment time
Clean separation: Business logic stays in backend services, MCP handles translation, agents handle reasoning

The infrastructure is now ready for agents to invoke these tools. In the next module, you’ll use Python clients (Llama Stack Client and LangGraph) to interact with these MCP servers that intelligently combine these business functions to answer complex user queries.