2.1 Lab Overview & Setup

Now it’s your turn to get hands-on. In this lab, you will deploy the mock ServiceNow API and the Kubeflow Pipeline that ingests its data into our Milvus vector database.

To ensure you can get started quickly, we have pre-created a project for you and deployed some of the necessary components. This section will guide you through connecting to the environment, exploring your project, and running a simple test against the Large Language Model (LLM) provided for this workshop.

Getting Connected

For this workshop, we have provisioned a shared OpenShift Container Platform cluster with Red Hat OpenShift AI deployed on it. Each attendee has a unique user account.

Environment Information

If you are viewing these instructions in the deployed lab environment, the values below will be correctly rendered for you. If viewing from a static source like GitHub, placeholder values will appear.

Table 1. Lab Environment Access

Username

userX

Password

openshift

Login Procedure

  1. Click the Login with OpenShift button at the OpenShift AI Dashboard.

    The main login screen for Red Hat OpenShift AI
  2. Enter your user credentials (userX and openshift) provided above.

    Your browser might display a security warning. It is safe to ignore this message for the lab environment.

  3. After you authenticate, you will land on the OpenShift AI dashboard.

    The main dashboard of Red Hat OpenShift AI after logging in

Congratulations, you are now connected!

Reviewing Your Pre-Created Project

A project has been pre-created for you. Let’s take a quick tour.

  1. In the Red Hat OpenShift AI Dashboard, navigate to the Projects area and select the default Workspace that’s been created.

    Opening your dedicated user project
  2. Inside your project, you will find a pre-configured Jupyter Workbench to perform model testing. Later, you’ll setup a Pipeline and store data in the Cluster storage.

    Your dedicated user project home showing available components
    Table 2. Project Components Overview
    Component Purpose

    Workbench

    Jupyter environment for model testing and development

    Pipeline

    Data ingestion and processing workflows (to be configured)

    Cluster Storage

    Persistent storage for data and models

  3. These resources are all cloud-native, and to see the running pods for these services, you need to switch to the main OpenShift Console. On the left-hand side of your page, select the OpenShift Console tab.

    The console may prompt you to sign in again. Use your credentials: userX & openshift.

    Opening the OpenShift Console from the Quick Links menu
  4. First, make sure you have the userX project selected from the project dropdown at the top of the console. Then, using the left-hand menu, navigate to WorkloadsPods. Here you can see all the running components, like the Pipeline Server, database, and other pre-configured resources.

    OpenShift Topology view showing the deployed Milvus instance within your project

Interacting with a Large Language Model

Before we build our RAG pipeline, let’s connect to the local LLM that has been deployed for this workshop and ask it a basic question. This will help you get familiar with interacting with an LLM programmatically from your pre-created Jupyter environment via an OpenAI-compatible API.

The LLM service (granite-4-0-h-tiny) is already running in the another OpenShift AI cluster (MaaS) and accessible via API and API token.

  1. Return to the OpenShift AI Dashboard browser tab and navigate to the Workbenches tab within your project. Launch the workbench (my-workbench) by clicking on its name.

    Launching the Jupyter Workbench from the OpenShift AI dashboard
  2. Once JupyterLab opens, use the file browser on the left to navigate to the lab materials folder:

    hello-chris-rag-pipeline/lab-content/3.1
    Navigating to the lab materials folder in JupyterLab
  3. Open the notebook named 03-01-nb-llm-example.ipynb. This notebook contains pre-written Python code to connect to and query the LLM.

    Opening the LLM example notebook in JupyterLab
  4. Copy your API token into the openai_api_key variable, replacing the PASTE_YOUR_LITELLM_KEY_HERE string:

    {litellm_virtual_key}
  5. Execute the cells in the notebook one by one. Click the play button (▶️) next to each cell or use Shift+Enter to run them sequentially.

    Executing notebook cells showing the play button and cell execution flow

    The key cells perform the following actions:

    1. Connect to the LLM: This cell defines the connection to the internal service for the granite-4.0-h-tiny model that is running in the cluster.

      # LLM Inference Server URL
      inference_server_url = "https://litellm-prod.apps.maas.redhatworkshops.io" (1)
      
      # LLM definition using a client that speaks the OpenAI API format
      llm = VLLMOpenAI(
          openai_api_key="{litellm_virtual_key}", (2)
          openai_api_base=f"{inference_server_url}/v1", (3)
          model_name="granite-4.0-h-tiny", (4)
          # ... other parameters
      )
      1 Internal Kubernetes service URL for the LLM inference server
      2 Your personal API token ({litellm_virtual_key}) injected at runtime
      3 OpenAI-compatible API endpoint format
      4 Specific model identifier for the deployed Granite model
    2. Define the Prompt: This cell creates a prompt template that instructs the LLM how to behave and formats the user’s question.

      template="""<|system|>
      You are a helpful, respectful and honest assistant. (1)
      <|user|>
      ### QUESTION:
      {input} (2)
      ### ANSWER:
      <|assistant|>
      """
      prompt = PromptTemplate(input_variables=["input"], template=template) (3)
      1 System message defining the assistant’s behavior
      2 Placeholder for user input that will be dynamically replaced
      3 LangChain PromptTemplate object for structured prompt formatting
    3. Ask a Question: The final cell defines a question and sends it to the LLM. You should see the answer streamed back directly in the notebook output.

      Watch for the cell execution indicator [*] which shows when a cell is running, and [number] when it completes successfully.

Summary

  • Explored the pre-configured Data Science Project: workbench, pipeline server, and cluster storage provisioned per user

  • Switched between the Red Hat OpenShift AI Dashboard (ML-focused) and the OpenShift Container Platform Console (infrastructure-focused) to view the same resources from different perspectives

  • Queried a Granite LLM via the OpenAI-compatible /v1/chat/completions API — the same interface the RAG chain will use later