Tools and Agents

So far, we’ve seen how we can query a given model within the Llama Stack Playground and receive static information based on the model’s training data. For many use cases, such as troubleshooting a broken CI/CD pipeline, we would like the model to provide real-time information by accessing external systems such as the web or the CI/CD runtime. The technical capability that allows a model to interact with a given system is called a tool, and the ability to use tools in order to achieve a given objective is considered a core feature of an AI agent.

For our use case, we’ll need the following tools:

  • Read container logs: fetch the logs of a given container, which contained the failing pipeline step,

  • Web search: query a web search engine with a given search string and receive the content of matching web sites,

  • Create Github issue: open a new issue within a given Github repository with a given issue description.

Llama Stack simplifies managing and providing tools for agentic use cases. There are three different ways to use tools in conjunction with Llama Stack:

  • Implement the tool on the client side and pass it to Llama Stack and the model for invocation. This approach can be used during early development if tools are not available otherwise, but it’s generally not recommended since it does not allow re-use of tool definitions.

  • Use an existing built-in (inline) tool provider that Llama Stack offers. One example is websearch, which uses the Tavily online search engine. Built-in tools are a good option for quickly getting started with tool usage while minimizing maintenance overhead.

  • Integrate an MCP server as a remote tool provider. We will cover MCP integration for accessing OpenShift and Github at a later step.

To get started, we’ll enable the built-in websearch tool with our own Tavily API keys after reviewing how to configure Llama Stack.

Configuring Llama Stack

As part of Red Hat AI, a Llama Stack instance is represented by the LlamaStackDistribution custom resource (CR) that defines the Llama Stack server configuration including:

  • Server Configuration: Which distribution to use (including built-in tools etc.)

  • Container Specifications: Port, environment variables, resource limits

There is a unique instance of Llama Stack distribution deployed for each lab user.

  1. Locate the Llama Stack distribution.

    Llama Stack distribution
  2. Click on the existing distribution.

    Llama Stack distribution current
  3. Click on the YAML tab

    Llama Stack distribution tab
  4. Review the Llama Stack Distribution we’re using for this lab.

apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: user1-llama-stack
spec:
  replicas: 1
  server:
    containerSpec:
      env:
        - name: TELEMETRY_SINKS
          value: 'console, sqlite, otel_trace, otel_metric'
        - name: OTEL_EXPORTER_OTLP_ENDPOINT
          value: 'http://otel-collector-collector.observability-hub.svc.cluster.local:4318'
        - name: OTEL_SERVICE_NAME
          value: user1-llamastack
        - name: DEFAULT_MODEL_API_TOKEN   # [1]
          valueFrom:
            secretKeyRef:
              key: apiKey
              name: granite-3-2-8b-instruct
        - name: TAVILY_API_KEY            # [2]
          valueFrom:
            secretKeyRef:
              key: tavily-search-api-key
              name: tavily-search-key
      name: llama-stack
      port: 8321
    distribution:
      name: rh-dev
    userConfig:                           # [3]
      configMapName: llama-stack-config

As you can see, we’ve specified the secrets for the services that’ll be interacting with the Llama Stack server:

  • [1] MaaS API

  • [2] Tavily search API. We will update this API key in a moment.

Also listed is [3] the user configuration used with this distribution (ConfigMap: llama-stack-config). Let’s review this configuration before updating the Tavily search key.

Llama Stack User Config

The user config configmap can be used to store the run.yaml configuration for each Llama Stack distribution. Updates to the ConfigMap will restart the Pod to load the new data.

Your current Llama Stack config should like the following. Reference the ConfigMap resource in the UI or terminal, your preference. Refer to the Llama Stack documentation for details and customization options.

kind: ConfigMap
apiVersion: v1
metadata:
  name: llama-stack-config
data:
  run.yaml: |
    # Llama Stack configuration
    version: '2'
    image_name: rh
    apis:
    - inference
    - tool_runtime
    - agents
    - safety
    - vector_io
    models:
      - metadata: {}
        model_id: granite-3-2-8b-instruct
        provider_id: vllm
        provider_model_id: granite-3-2-8b-instruct
        model_type: llm
      - metadata: {}
        model_id: Llama-Guard-3-1B
        provider_id: vllm
        provider_model_id: Llama-Guard-3-1B
        model_type: llm
    providers:
      inference:
      - provider_id: vllm
        provider_type: "remote::vllm"
        config:
          url: https://litellm-prod.apps.maas.redhatworkshops.io/v1
          context_length: 4096
          api_token: ${env.DEFAULT_MODEL_API_TOKEN}
          tls_verify: true
      tool_runtime:
      - provider_id: tavily-search
        provider_type: remote::tavily-search
        config:
          api_key: ${env.TAVILY_API_KEY}
          max_results: 3
      agents:
      - provider_id: meta-reference
        provider_type: inline::meta-reference
        config:
          persistence_store:
            type: sqlite
            db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/rh}/agents_store.db
          responses_store:
            type: sqlite
            db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/rh}/responses_store.db
    server:
      port: 8321
    tools:
      - name: builtin::websearch
        enabled: true
    tool_groups:
    - provider_id: tavily-search
      toolgroup_id: builtin::websearch

Set Up Websearch

Let’s now create our own Tavily key and update the Llama Stack distribution to use this key for our websearch tool calls.

Tavily search token: Gather your Tavily web search API Key.

  1. Set up a Tavily api key for web search. Log in using a Github account of one of your team members.

    Create Tavily API Key
    Figure 1. Tavily API Key
  2. Create a new secret object using this specification while replacing the PLACEHOLDER with your Tavily key. Use the + icon in the UI or use your terminal to apply the new object.

    kind: Secret
    apiVersion: v1
    metadata:
      name: new-tavily-search-key
      namespace: userX-llama-stack
    stringData:
      tavily-search-api-key: {PLACEHOLDER}
    type: Opaque
  3. Now, via CLI update the Llama Stack distribution to reference this new secret instead of the old one. You must do this with oc edit or oc patch and you cannot update in the UI.

    Edit command to open local editor:

    oc edit llamastackdistribution/userX-llama-stack -n userX-llama-stack

    Replace tavily-search-key with:

    new-tavily-search-key

    See below for placement reference:

    Updated Llama Stack config
    Figure 2. Updated Llama Stack Distribution
  4. Restart the playground by deleting its pod (starting with llama-stack-playground) in the userX-llama-stack namespace. Wait until it’s Ready and Running.

Test Websearch integration

  1. Let’s now try out the websearch tool! Navigate back to the Llama Stack Playground UI.

  2. In the left menu, choose Agent-based. Then, select the built-in websearch tool.

    LlamaStack Playground with websearch
  3. Now, let’s ask LLM the same question again:

    What is the weather today in Brisbane?

    The LLM is now able to the answer with realtime data provided by the websearch tool.

    LlamaStack Playground regular websearch

Agent Types

As you noticed, under the Agent-based selection you can see Uses an agent (Regular or ReAct) with tools. What is the difference?

Regular Agents

In the previous example, once we enabled the websearch tool for our Regular agent, it invoked the tool to perform an online search to provide an up-to-date answer to our question.

The Regular agent apply the most basic and straightforward strategy for processing user queries with tools (see diagram):

  1. When a user query is provided, the agent processes the input and determines whether a tool is required to fulfill the request.

  2. Because the query involves retrieving current information, the agent invokes the websearch tool.

  3. The tool interacts with the online web search backend, which returns a list of matching results.

  4. The agent processes the tool response and generates a response to the user prompt.

This workflow ensures that the agent can handle a wide range of queries effectively.

Regular agent workflow

ReAct Agents

  1. A ReAct agent will use Thinking, Action and Observation steps in its processing. These steps are performed in order for each user prompt, and this sequence may be repeated if deemed necessary by the agent. This separation of Reasoning (Thinking) and Acting is the hallmark of the ReAct strategy.

For a given user query, the ReAct workflow includes the following steps (see diagram):

  1. Reason that a particular tool (such as websearch) may be needed to answer the user query.

  2. Act by calling that tool.

  3. Observe the tool response and conclude that it is sufficient to answer the user query. Alternatively, the agent may reason that more information is required, for which it may need additional tool calls.

  4. The agent processes the tool response and generates a response to the user prompt.

Unlike the regular agent approach, ReAct dynamically breaks down tasks and adapts its approach based on the results of each step. This makes it more flexible and capable of handling complex, real-world queries effectively.

ReAct Agent

Now that we explored agent types with a simple built-in tool, let’s dive into MCP integration for enabling more powerful capabilities!