Introduction

Azure Red Hat OpenShift (ARO) clusters store log data inside the cluster by default. Understanding metrics and logs is critical in successfully running your cluster. Included with ARO is the OpenShift Cluster Logging Operator, which is intended to simplify log management and analysis within an ARO cluster, offering centralized log collection, powerful search capabilities, visualization tools, and integration with other other Azure systems like Azure Files.

In this section of the workshop, we’ll configure ARO to forward logs and metrics to Azure Files and view them using Grafana.

Configure Autoscaling for All Worker MachineSets

OpenShift Cluster Logging takes place in the cluster itself, so we need to scale up the number of worker nodes. We’ll use the MachineAutoscaler on each of the existing MachineSets to scale up the number of worker nodes.

  1. To deploy the MachineAutoscalers, run the following command:

    MACHINESETS=$(oc -n openshift-machine-api get machinesets -o name | cut -d / -f2 )
    
    for MACHINESET in $(echo ${MACHINESETS}); do
    cat <<EOF | oc apply -f -
    ---
    apiVersion: "autoscaling.openshift.io/v1beta1"
    kind: "MachineAutoscaler"
    metadata:
      name: "${MACHINESET}"
      namespace: "openshift-machine-api"
    spec:
      minReplicas: 1
      maxReplicas: 3
      scaleTargetRef:
        apiVersion: machine.openshift.io/v1beta1
        kind: MachineSet
        name: "${MACHINESET}"
    EOF
    done
Be patient with this lab. As you progress through this lab deploying logging be aware that it will take about five minutes to provision each machine and another one minute for the Nodes to become Ready.
  1. Be prepareed to watch the progress of the MachineAutoscaler as you work through this lab with the following commands.

    1. Get the list of Pods in state "Pending", which triggers the MachineAutoscaler to scale up the number of worker nodes

    2. Get the list of Machines in the cluster, which you will see in various states as they are being provisioned

    3. Get the list of Nodes as they become ready and the Pods in Pending are deployed to them

      oc get pods -A | grep Pending
      oc -n openshift-machine-api get machines -l "machine.openshift.io/cluster-api-machine-role=worker"
      oc get nodes -o wide

Configure Metrics and Log Forwarding to Azure Files

  1. First, let’s create our Azure Files storage account. To do so, run the following command::

    AZR_STORAGE_ACCOUNT_NAME="storage${GUID}"
    
    ARO_LOCATION=$(az aro show --resource-group openenv-${GUID} --name aro-cluster-${GUID} --query location -o tsv)
    
    az storage account create --name "${AZR_STORAGE_ACCOUNT_NAME}" --allow-blob-public-access --resource-group "openenv-${GUID}" --location "${AZ_LOCATION}" --sku Standard_LRS
    Sample Output
    The public access to all blobs or containers in the storage account will be disallowed by default in the future, which means default value for --allow-blob-public-access is still null but will be equivalent to false.
    {
     "accessTier": "Hot",
      "accountMigrationInProgress": null,
      "allowBlobPublicAccess": true,
      "allowCrossTenantReplication": false,
    
    [... Lots of output Omitted ...]

    You may get an error about rate limiting when running the az storage account create command. If so just wait a few seconds and repeat the command.

  2. Next, let’s grab our storage account key. To do so, run the following command:

    AZR_STORAGE_KEY=$(az storage account keys list --resource-group "openenv-${GUID}" -n "${AZR_STORAGE_ACCOUNT_NAME}" --query "[0].value" -o tsv)
    
    echo ${AZR_STORAGE_KEY}
    Sample Output
    gr6ujd144KyO14BVQ5cEupJw/MWQx/XvXxQE/eG62oOvoVfnLVO68EcFGDygSXQD4pUGx+oA+wNJ+AStwccBSw==
  3. Now, let’s create a separate storage bucket for logs and metrics. To do so, run the following command:

    az storage container create --name "aro-logs" \
      --account-name "${AZR_STORAGE_ACCOUNT_NAME}" \
      --account-key "${AZR_STORAGE_KEY}"
    az storage container create --name "aro-metrics" \
      --account-name "${AZR_STORAGE_ACCOUNT_NAME}" \
      --account-key "${AZR_STORAGE_KEY}"
    Sample Output
    {
      "created": true
    }
    {
      "created": true
    }
  4. Deploy ElasticSearch CRDs (not used, but needed for a bug workaround):

    oc create -f https://raw.githubusercontent.com/openshift/elasticsearch-operator/release-5.8/bundle/manifests/logging.openshift.io_elasticsearches.yaml
    Sample Output
    customresourcedefinition.apiextensions.k8s.io/elasticsearches.logging.openshift.io created
  5. Check that helm is installed properly:

    helm version
    Sample Output
    version.BuildInfo{Version:"v3.15.4+60.el9", GitCommit:"fa384522f2878321c8b6b1a06f8ff5f86f47a937", GitTreeState:"clean", GoVersion:"go1.22.7 (Red Hat 1.22.7-2.el9_5)"}
  6. Next, let’s add the MOBB Helm Chart repository. To do so, run the following command:

    helm repo add mobb https://rh-mobb.github.io/helm-charts/
    
    helm repo update
    Sample Output
    "mobb" has been added to your repositories
    Hang tight while we grab the latest from your chart repositories...
    ...Successfully got an update from the "mobb" chart repository
    Update Complete. ⎈Happy Helming!⎈
  7. Now, we need to create a project (namespace) to deploy our logging resources to. To create that, run the following command:

    oc new-project custom-logging
    Sample Output
    Now using project "custom-logging" on server "https://api.rbrlitrg.westeurope.aroapp.io:6443".
    
    You can add applications to this project with the 'new-app' command. For example, try:
    
        oc new-app rails-postgresql-example
    
    to build a new example application in Ruby. Or use kubectl to deploy a simple Kubernetes application:
    
        kubectl create deployment hello-node --image=k8s.gcr.io/e2e-test-images/agnhost:2.33 -- /agnhost serve-hostname
  8. Next, we need to install a few operators to run our logging setup. These operators include the Red Hat Cluster Logging Operator, the Loki operator, the Grafana operator, and more. First, we’ll create a list of all the operators we’ll need to install by running the following command:

    cat <<EOF > clf-operators.yaml
    subscriptions:
    - name: grafana-operator
      channel: v4
      installPlanApproval: Automatic
      source: community-operators
      sourceNamespace: openshift-marketplace
    - name: cluster-logging
      channel: stable
      installPlanApproval: Automatic
      source: redhat-operators
      sourceNamespace: openshift-marketplace
      namespace: openshift-logging
    - name: loki-operator
      channel: stable
      installPlanApproval: Automatic
      source: redhat-operators
      sourceNamespace: openshift-marketplace
      namespace: openshift-operators-redhat
    - name: resource-locker-operator
      channel: alpha
      installPlanApproval: Automatic
      source: community-operators
      sourceNamespace: openshift-marketplace
      namespace: resource-locker-operator
    - name: patch-operator
      channel: alpha
      installPlanApproval: Automatic
      source: community-operators
      sourceNamespace: openshift-marketplace
      namespace: patch-operator
    operatorGroups:
    - name: custom-logging
      targetNamespace: ~
    - name: openshift-logging
      namespace: openshift-logging
      targetNamespace: openshift-logging
    - name: openshift-operators-redhat
      namespace: openshift-operators-redhat
      targetNamespace: all
    - name: resource-locker
      namespace: resource-locker-operator
      targetNamespace: all
    - name: patch-operator
      namespace: patch-operator
      targetNamespace: all
    EOF
  9. Next, let’s deploy the Grafana, Cluster Logging, and Loki operators from the file we just created above. To do so, run the following command:

    oc create ns openshift-logging
    
    oc create ns openshift-operators-redhat
    
    oc create ns resource-locker-operator
    
    oc create ns patch-operator
    
    helm upgrade -n custom-logging clf-operators \
      mobb/operatorhub --install \
      --values ./clf-operators.yaml
    Sample Output
    namespace/openshift-logging created
    namespace/openshift-operators-redhat created
    namespace/resource-locker-operator created
    Release "clf-operators" does not exist. Installing it now.
    NAME: clf-operators
    LAST DEPLOYED: Tue Dec 19 09:40:44 2023
    NAMESPACE: custom-logging
    STATUS: deployed
    REVISION: 1
    TEST SUITE: None
    NOTES:
    .
  10. Now, let’s wait for the operators to be installed.

    These commands will loop through each type of resource until the CRDs for the Operators have been deployed.

    Eventually you’ll see the message No resources found in custom-logging namespace and be returned to a prompt.

    while ! oc get grafana; do sleep 5; echo -n .; done
    while ! oc get clusterlogging; do sleep 5; echo -n .; done
    while ! oc get lokistack; do sleep 5; echo -n .; done
    while ! oc get resourcelocker; do sleep 5; echo -n .; done
    Sample Output
    No resources found in custom-logging namespace.
    No resources found in custom-logging namespace.
    No resources found in custom-logging namespace.
    No resources found in custom-logging namespace.
  11. Now that the operators have been successfully installed, let’s use a helm chart to deploy Grafana and forward metrics to Azure Files. To do so, run the following command:

    helm upgrade -n "custom-logging" aro-thanos-af \
      --install mobb/aro-thanos-af --version 0.6.3 \
      --set "aro.storageAccount=${AZR_STORAGE_ACCOUNT_NAME}" \
      --set "aro.storageAccountKey=${AZR_STORAGE_KEY}" \
      --set "aro.storageContainer=aro-metrics" \
      --set "enableUserWorkloadMetrics=true"
    Sample Output
    Release "aro-thanos-af" does not exist. Installing it now.
    NAME: aro-thanos-af
    LAST DEPLOYED: Tue Dec 19 09:41:57 2023
    NAMESPACE: custom-logging
    STATUS: deployed
    REVISION: 1
    TEST SUITE: None

    If you get an error during that command wait a few seconds and try it again. Sometimes the Patch Operator takes a little longer to fully deploy.

  12. Wait until Grafana has successfully deployed. Run the following command:

    oc -n custom-logging rollout status deploy grafana-deployment
  13. Next, let’s ensure that we can access Grafana. To do so, we should fetch its route and try browsing to it with your web browser. To grab the route, run the following command:

    oc -n custom-logging get route grafana-route \
      -o jsonpath='{"https://"}{.spec.host}{"\n"}'
    Sample Output
    https://grafana-route-custom-logging.apps.nbybk9f3.eastus.aroapp.io
  14. You should already be logged into the cluster because you logged into the web console earlier. Accept all permissions by clicking on Allow selected permissions. You should see the Grafana dashboard.

    If your browser displays an error that says 'Application is not available' wait a minute and try again.

    If it persists you’ve hit a race condition with certificate creation.

    Run the following command to try to resolve it:

    oc patch -n custom-logging service grafana-alert -p '{ "metadata": { "annotations": null }}'
    
    oc -n custom-logging delete secret aro-thanos-af-grafana-cr-tls
    
    oc patch -n custom-logging service grafana-service \
        -p '{"metadata":{"annotations":{"retry": "true" }}}'
    
    sleep 5
    
    oc -n custom-logging rollout restart deployment grafana-deployment

Set up Log Forwarding

  1. Now, set the storage class to use for the persistent volumes to be created - using the storage class that is set as the default storage class:

    STORAGE_CLASS=$(oc get storageclass -o=jsonpath='{.items[?(@.metadata.annotations.storageclass\.kubernetes\.io/is-default-class=="true")].metadata.name}')
    
    echo ${STORAGE_CLASS}
    Sample Output
    managed-csi
  2. Next, let’s use another helm chart to deploy forward logs to Azure Files. To do so, run the following command:

    helm upgrade -n custom-logging aro-clf-blob \
     --install mobb/aro-clf-blob --version 0.1.3 \
     --set "azure.storageAccount=${AZR_STORAGE_ACCOUNT_NAME}"  \
     --set "azure.storageAccountKey=${AZR_STORAGE_KEY}"   \
     --set "azure.storageContainer=aro-logs" \
     --set "lokiStack.storageClassName=${STORAGE_CLASS}"
    Sample Output
    Release "aro-clf-blob" does not exist. Installing it now.
    NAME: aro-clf-blob
    LAST DEPLOYED: Tue Dec 19 09:43:20 2023
    NAMESPACE: custom-logging
    STATUS: deployed
    REVISION: 1
    TEST SUITE: None
  3. Once the Helm Chart deploys its resource, we need to wait for the Log Collector agent to be started. To watch its status, run the following command:

    oc -n openshift-logging rollout status daemonset collector
    Sample Output
    daemon set "collector" successfully rolled out
  4. Occasionally, the log collector agent starts before the operator has finished configuring Loki. To proactively address this, we need to restart the agent. To do so, run the following command:

    oc -n openshift-logging rollout restart daemonset collector
    Sample Output
    daemonset.apps/collector restarted

View the Metrics and Logs

Now that the metrics and log forwarding are forwarding to Azure Files, let’s view them in Grafana.

  1. First, we’ll need to fetch the route for Grafana and visit it in our web browser. To get the route, run the following command

    oc -n custom-logging get route grafana-route \
       -o jsonpath='{"https://"}{.spec.host}{"\n"}'
    Sample Output
    https://grafana-route-custom-logging.apps.nbybk9f3.eastus.aroapp.io
  2. Browse to the provided route address in the same browser window as your OCP console and login using your OpenShift credentials. If you tested this before you are already logged in.

  3. View an existing dashboard such as custom-logging -> Node Exporter -> USE Method -> Cluster (click on the search icon on the left to see the custom-logging dashboard).

    These dashboards are copies of the dashboards that are available directly on the OpenShift web console under Observability".

    grafana metrics

    If you don’t see the graphs as in the screenshot above wait a minute and refresh the browser window - it takes a few minutes for the Grafana dashboard to start showing data.

  4. Click the Explore (compass) Icon in the left hand menu, select “Loki (Application)” in the dropdown and search for {kubernetes_namespace_name="custom-logging"}. Click the blue Run Query button on the top right to execute the search.

    grafana logs

Enabling Custom Metrics

In order to display metrics from your own applications you need to enable custom metrics.

  1. Check the cluster-monitoring-config ConfigMap object:

    oc -n openshift-monitoring get configmap cluster-monitoring-config -o yaml
    Sample Output
    apiVersion: v1
    data: {}
    kind: ConfigMap
    metadata:
      creationTimestamp: "2023-06-06T17:11:22Z"
      name: cluster-monitoring-config
      namespace: openshift-monitoring
      resourceVersion: "391968"
      uid: 5d84fef5-d798-4b11-bb2f-dd93fc6e76d8
  2. Enable User Workload Monitoring:

    oc patch configmap cluster-monitoring-config -n openshift-monitoring \
      --patch='{"data":{"config.yaml": "enableUserWorkload: true\n"}}'
  3. Check that the User workload monitoring is starting up (wait until the output below matches what you see):

    oc -n openshift-user-workload-monitoring get pods
    Sample Output
    NAME                                   READY   STATUS    RESTARTS   AGE
    prometheus-operator-78774d88c8-vq2pz   2/2     Running   0          23m
    prometheus-user-workload-0             6/6     Running   0          23m
    prometheus-user-workload-1             6/6     Running   0          23m
    thanos-ruler-user-workload-0           3/3     Running   0          23m
    thanos-ruler-user-workload-1           3/3     Running   0          23m
  4. Append remoteWrite settings to the user-workload-monitoring config to forward user workload metrics to Thanos.

    Check if the User Workload Config Map exists:

    oc -n openshift-user-workload-monitoring get \
      configmaps user-workload-monitoring-config -o yaml
    Sample Output
    apiVersion: v1
    kind: ConfigMap
    metadata:
      creationTimestamp: "2023-06-07T09:14:09Z"
      name: user-workload-monitoring-config
      namespace: openshift-user-workload-monitoring
      resourceVersion: "392232"
      uid: c1a3c96a-1773-4a56-ba4d-537c7cb9a92a
  5. Update the ConfigMap:

    cat << EOF | kubectl apply -f -
    ---
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: user-workload-monitoring-config
      namespace: openshift-user-workload-monitoring
    data:
      config.yaml: |
        prometheus:
          remoteWrite:
          - url: "http://thanos-receive.custom-logging.svc.cluster.local:9091/api/v1/receive"
    EOF

Congratulations!

Your cluster is now configured to allow custom metrics.

Summary

Here you learned how to:

  • Configure metrics and log forwarding to Azure Files

  • View the metrics and logs in a Grafana dashboard