Cloud Pak for Business Automation

 View Only

How to Monitor Components of the IBM Cloud Pak For Business Automation

By Jorge Rodriguez posted Wed October 06, 2021 08:44 AM

  
Author:
Jorge D. Rodriguez 
STSM | Business Automation Solutions Architect | Automation SWAT Team

Contributors:
Dr. Matthias Jung 
Enterprise Content Services | Accelerated Value Specialist | Digital Business Automation - SWAT

Kevin Trinh 
STSM | Cloud Integration Architect | Digital Business Automation (ECM)

Piotr Godowski 
STSM | IBM Licensing Tools | Cloud Pak Foundational Services Architecture

James Hewitt 
IBM Master Inventor | Cloud Pak for Integration | Patent Incubator Program Lead

Overview

The IBM Cloud Pak for Business Automation provides a fully integrated platform that combines IBM’s best-in-class automation software to digitally transform and automate business operations.  The core automation features of the IBM Cloud Pak for Business Automation includes capture capabilities to extract and classify data, content services to store, manage and share business data, decision engine to create, automate and govern flexible business rules and workflow capabilities for process and case management digitalization.

While the automation platform itself provides a low code simplified experience for business and technical users alike, system administrators and operations teams still need to deploy, manage and monitor the containerized software that makes up the IBM Cloud Pak for Business Automation in order to keep the platform and the solutions built on top running optimally.

In this article I will discuss how to enable monitoring capabilities on the IBM Cloud Pak for Business Automation and how to extend OpenShift’s pre-configured monitoring stack to provide operational visibility across Cloud Pak components.

Pre-Requisites

  • OpenShift Container Platform version 4.6+
  • IBM Cloud Pak for Business Automation version 21.0.x
  • Workstation with oc command installed

Brief Introduction to the OpenShift Monitoring Stack

The OpenShift Container Platform monitoring stack is based on the Prometheus open source project. At a high level the pre-configured stack includes one or more instances of the following components:

Component Description
Prometheus Used to provide a time-series data store for metrics, rule evaluation engine and alert generation.
AlertManager Responsible for alerts handling and notification to external systems.
Thanos Responsible for metric aggregation across Prometheus instances as well as alert generation engine.
Grafana Used to provide dashboard and metric visualization capabilities.  This is a read only instance of Grafana to show platform metrics


By default the OpenShift monitoring stack provides monitoring capabilities for core platform components only.  That is, the out of the box deployment of the monitoring stack found under the openshift-monitoring project is fully dedicated to monitor core OpenShift Container Platform systems and other essential kubernetes services.

Optionally, since OpenShift version 4.6, the default monitoring stack deployment can be extended to monitor user defined projects and custom deployments such as the IBM Cloud Pak for Business Automation installations.  The additional components needed to monitor user defined projects are automatically deployed under the openshift-user-workload-monitoring project once the default stack is configured to support user defined projects.  After the configuration is completed you will be able to collect, query, visualize and create alerts based on custom metrics generated by your own deployments.  The remaining of this article describes how to enable these capabilities, how to expand them using additional monitoring components provided by the IBM Cloud Pak Foundational Services and how to define the necessary kubernetes resources to collect and utilize custom metrics generated by the IBM Cloud Pak for Business Automation platform.  For more information on how to monitor user defined projects see the Enabling Monitoring for User Defined Projects documentation.

Enable User Project Monitoring in OpenShift

The first thing that we need to do in order to monitor IBM Cloud Pak for Business Automation components is to turn on user project monitoring in OpenShift.  To do that we need to create an instance of a ConfigMap named cluster-monitoring-config in the openshift-monitoring project.  The cluster-monitoring-config ConfigMap allows you to specify the configuration details for the OpenShift monitoring stack such as enabling user project monitoring, retention policy for metrics collected, resource limits for monitoring components, node selectors for POD deployments,  etc.  For a complete list of configurable parameters see Configuring the OpenShift Monitoring Stack.

For the purpose of this article we are going to deploy a simple instance of a ConfigMap that sets the enableUserWorkload attribute to true.  Setting the enableUserWorkload to true will enable monitoring for user-defined projects in addition to the default platform monitoring and will automatically trigger the deployment of additional monitoring components under the openshift-user-workload-monitoring project once the ConfigMap is created.  To create the cluster-monitoring-config ConfigMap complete the following steps:

  1. Create a cluster-monitoring-config.yaml file.

  2. Add the following content to the file:
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cluster-monitoring-config
      namespace: openshift-monitoring
    data:
      config.yaml: |
        enableUserWorkload: true
    
  3. Use the oc apply command to create the ConfigMap instance.

    oc apply -f cluster-monitoring-config.yaml -n openshift-monitoring

  4. Now that the cluster-monitoring-config ConfigMap instance has been created we should verify that the additional components required for user project monitoring have been deployed under the openshift-user-workload-monitoring project.  

    oc get pods -n openshift-user-workload-monitoring

    Wait until the PODs are running and ready.  Once the additional monitoring components have been deployed successfully the output of the oc get pods command should look similar to the following listing:
    NAME                                READY   STATUS  RESTARTS  AGE
    prometheus-operator-5d66498b6-tgwhv 2/2     Running 0         2d4h
    prometheus-user-workload-0          4/4     Running 0         2d4h
    prometheus-user-workload-1          4/4     Running 0         2d4h
    thanos-ruler-user-workload-0        3/3     Running 0         2d4h
    thanos-ruler-user-workload-1        3/3     Running 0         2d4h​

If you would like to customize the resources deployed under the openshift-user-workload-monitoring project, you can create a ConfigMap named user-workload-monitoring-config under that project. This ConfigMap is analogous to the cluster-monitoring-config created under the openshift-monitoring project and allows you to further customize the deployment of components specific to user project monitoring.  See Configuring the monitoring stack for additional details.

Configure Monitoring Services in the IBM Cloud Pak Foundational Services

While the OpenShift monitoring stack provides most of the capabilities required to properly address observability and operational visibility requirements for your Cloud Pak, there are some limitations around the ability to visualize metrics.  Specifically, the Grafana instance included in the OpenShift core monitoring stack is read only.  This means that you can only visualize a predefined set of metrics through pre-configured dashboards provided by the platform.  This also means that the ability to create new dashboards and visualize custom metrics has been completely removed from this installation.

Fortunately, the IBM Cloud Pak Foundational Services, installed as part of the IBM Cloud Pak for Business Automation, makes it easy to deploy an additional instance of Grafana that is fully functional and that is connected to the OpenShift’s Prometheus instance.  This configuration gives us the ability to use all metrics available in the platform, including custom metrics, so that we can create additional visualizations and dashboards on top of what is provided by the OpenShift user-defined workload monitoring stack.

Although the IBM Cloud Pak Foundational Services are installed as part of your IBM Cloud Pak For Business Automation installation, the monitoring services that include Grafana might not be installed.  To verify that Grafana was installed as part of your Cloud Pak deployment you can run the following command:

oc get pods -n ibm-common-services -l app.kubernetes.io/managed-by=ibm-monitoring-grafana-operator

If Grafana is installed as part of the IBM Cloud Pak Foundational Services you should see an output similar to the following listing:
NAME                                               READY   STATUS  RESTARTS AGE
ibm-monitoring-grafana-78b9cd8688-xxq5q            4/4     Running 0        3d14h
ibm-monitoring-grafana-operator-54cbd464d-xqq82    1/1     Running 0        3d14

If Grafana is not installed we can configure the monitoring services that are part of the
IBM Cloud Pak Foundational Services to deploy an instance.  Please notice that the steps provided below assume that the IBM Cloud Pak Foundational Services are already installed on your cluster.  To verify if the IBM Cloud Pak Foundational Services are installed you can run the following command:

oc get pods -n ibm-common-services -l app.kubernetes.io/managed-by=ibm-common-service-operator
If the IBM Cloud Pak Foundational Services are installed the output of the command should be similar to the following listing:
NAME                                            READY   STATUS  RESTARTS AGE
ibm-common-service-operator-84c9d8cc69-zsgxw    1/1     Running 0        3d15h

If the IBM Cloud Pak Foundational Services are not installed on your system you cannot complete the remaining of the steps in this section.  As already mentioned, the IBM Cloud Pak Foundational Services will be installed automatically once you install the IBM Cloud Pak for Business Automation hence you can come back to this section once your Cloud Pak is installed.  See the IBM Cloud Pak Foundational Services documentation for additional information on these services.

Having verified that the IBM Cloud Pak Foundational Services are deployed in your cluster, we can customize that deployment to include Grafana based services.  The Grafana based services are deployed and managed by the IBM Monitoring Grafana Operator.  This operator can be installed using the Operand Deployment Lifecycle Manager framework that is part of the foundational services.  To install the IBM Monitoring Grafana Operator:

  1. Create a grafana-operand-request.yaml file.

  2. Add the following content to the file:
    apiVersion: operator.ibm.com/v1alpha1
    kind: OperandRequest
    metadata:
      name: common-service
      namespace: ibm-common-services
    spec:
      requests:
        - operands:
            - name: ibm-monitoring-grafana-operator
          registry: common-service
    Notice: CP4BA 21.0.3 gives you the ability to install the Cloud Pak Foundational Services (also known as IBM Common Services) in a namespace other than ibm-common-services.  If the Cloud Pak Foundational Services are installed on a different namespace, update the namespace attribute accordingly.   To determine the namespace where the common services were installed, look at the ConfigMap/common-service-maps resource in the kube-public namespace.  For more details on the structure of the common-service-maps ConfigMap see the Creating the configmap section of the Installing IBM Cloud Pak foundational services in multiple namespaces

  3. Deploy the OperandRequest instance using the oc apply command.  Notice that running this command might trigger the installation or update of additional components within the IBM Cloud Pak Foundational Services.

    oc apply -f grafana-operand-request.yaml

    The output of the command should be similar to the following listing:

    operandrequest.operator.ibm.com/common-service created

    If the approval strategy for operators has been changed to Manual, the install plan for the operator must be approved for installation to actually happen.

  4. Wait for services to install.  This could take up to 30 minutes depending on how many other components are being updated or installed.  Use the following command to monitor the installation.

    oc get pods -w -n ibm-common-services -l app=grafana

    Once the installation is completed successfully you should see the an output similar to the following listing:
    NAME                                      READY   STATUS   RESTARTS   AGE
    ibm-monitoring-grafana-797c5d4c5f-nxt8c   4/4     Running  0          78s


    Notice that the ibm-monitoring-grafana-xxxxxxxxx-xxxxx POD has multiple containers.  All of these containers should report Ready for the installation to be considered completed.

    Now that Grafana has been installed we need to find the URL that we can use to access the services.  The Grafana instance deployed through the IBM Foundational Monitoring Services is accessed via a common Cloud Pak route called cp-console.  To find the details of the cp-console route you can run the following command:

    oc get route -n ibm-common-services cp-console

    The output of the command should look similar to the following listing:
    NAME       HOST/PORT                      SERVICES               PORT  TERMINATION
    cp-console cp-console.<cluster-subdomain> icp-management-ingress https reencrypt/Redirect
    The full URL to access the cp-console route is as follows:
    https://cp-console.<cluster-subdomain>

    Where<cluster-subdomain> the the ingress subdomain assigned to your cluster.

  5. To access the Grafana user interface enter this URL in your browser.  You will be taken to the Cloud Pak authentication page. 

  6. Select OpenShift as the authentication type to be taken to the Cloud Pak Administration Hub using OpenShift credentials.

  7. Click on the Menu icon (top left)  and select the Monitoring under the menu.  This should take you to the Grafana UI. 
    Note: Alternatively, you can open https://cp-console.<cluster-subdomain>/grafana directly to access the Grafana Dashboard.

Enable Monitoring Capabilities at the Component Level in the IBM Cloud Pak for Business Automation

The IBM Cloud Pak for Business Automation allows you to explicitly enable monitoring capabilities at the component level so that you can decide what components from the platform will generate custom metrics.  By default monitoring is disabled in most components but it can be enabled either during the Cloud Pak installation or as an update to the custom resource instance created when the Cloud Pak has been installed.

Enabling the monitoring capabilities at the component level in the IBM Cloud Pak for Business Automation is a two step process, first we need to specify the global monitoring parameters used by different Cloud Pak components.  Once the global parameters have been specified we need to configure the monitoring parameters specific to the component, typically this entails specifying the monitor_enabled attribute to indicate whether the monitoring capabilities for the component should be enabled or not.

There are two ways that we can make these modifications.  The first option is through the modification of the generated-cr/ibm_cp4a_cr_final.yaml file created during the Cloud Pak installation process.  For details on how this file was generated see the Generating the custom resource documentation.  The second option is by modifying the instance of the Cloud Pak custom resource definition in the cluster.  This article covers how we can do this on either case.  Make sure you read through both sections as the first section have explanations not repeated on the second section to avoid repetition.

Enabling Monitoring Capabilities via Custom Resource File

First we need to specify the global monitoring parameters for the Cloud Pak. We can do this completing the following steps:

  1. Open the custom resource file used to install the IBM Cloud Pak for Business Automation.  This is the generated-cr/ibm_cp4a_cr_final.yaml file created during the installation process.

  2. Add a monitoring_configuration configuration section to the generated-cr/ibm_cp4a_cr_final.yaml file. The following snippet of code shows the proper hierarchy for this section within the yaml file:
    spec:
      ##################################################################
      ########        Shared Monitoring configuration           ########
      ##################################################################
      monitoring_configuration:
        collectd_disable_host_monitoring: false
        collectd_interval: 10
        collectd_plugin_write_prometheus_port: 9103
        mon_enable_plugin_mbean: true
        mon_enable_plugin_pch: true
        mon_metrics_writer_option: 4
    Notice that the monitoring_configuration section does not have to be placed immediately after the spec stanza.  This snippet is meant to show hierarchy.

The configuration provided above is based on the default values assigned to these parameters.  The most important settings are
mon_metrics_writer_option and collectd_plugin_write_prometheus_port.  The first setting, mon_metrics_writer_option with a value of 4, tells the monitoring framework to generate metrics for Prometheus consumption.  The second setting collectd_plugin_write_prometheus_port, tells the monitoring framework the port to use when creating the metrics endpoint needed by Prometheus to scrape the custom metrics.  For a detailed description of each of these settings see the IBM Cloud Pak For Business Automation Monitoring Parameters documentation.

Now that we have specified the global configuration for the Cloud Pak we can focus on enabling monitoring capabilities on individual components.  As stated above, this can be done by setting a flag in the custom resource file for the specific component.

For example, to enable the monitoring capabilities of the IBM Business Automation Workflow Authoring component you would modify the ibm_cp4a_cr_final.yaml custom resource file as follows:

  1. Open the custom resource file generated to install the IBM Cloud Pak for Business Automation.  This is the generated-cr/ibm_cp4a_cr_final.yaml file created during the installation process.

  2. Look for the workflow_authoring_configuration section.

  3. If there is a monitor_enabled parameter under the section, set it to true. If not, add the monitor_enabled parameter and set the value to true.  Make sure that the new parameter is properly indented under the workflow_authoring_configuration stanza.  The following snippet of code shows the hierarchy of the parameter within the ibm_cp4ba_cr_final.yaml file:
    spec:
      ##############################################################################
      ########   IBM Business Automation Workflow Authoring configuration   ########
      ##############################################################################
      workflow_authoring_configuration:
        monitor_enabled: true

Other components in the
IBM Cloud Pak For Business Automation such as the FileNet Content Manager, allows you to configure monitoring capabilities at the sub-component level. This means that we can have more fined control over what sub-components generate custom metrics and specify a monitor_enabled parameter for each of the sub-components configured.

For example, to enable monitoring capabilities in the Content Platform Engine(CPE) and the Content Management Interoperability Services(CMIS) sub-components of the FileNet Content Manager we can do the following:

  1. Open the custom resource file used to install the IBM Cloud Pak for Business Automation.  This is the generated-cr/ibm_cp4a_cr_final.yam file created during the installation process.

  2. Look for the cpe stanza found under the ecm_configuration section where all the FileNet Content Manager configuration resides.

  3. If there is a monitor_enabled parameter under the cpe stanza, change the value to true.  If not, add the monitor_enabled parameter and set the value to true.  Make sure that the new parameter is indented properly under the cpe stanza.  The following snippet of code shows the hierarchy of the parameter within the ibm_cp4ba_cr_final.yaml file:
    ecm_configuration:
      cpe:
        monitor_enabled: true
  4. Like wise for CMIS, look for the cmis stanza found under the ecm_configuration section where all the FileNet Content Manager configuration resides.

  5. If there is a monitor_enabled parameter under the cmis stanza, change the value to true. If not, add the monitor_enabled parameter and set the value to true.  Make sure that the new parameter is indented properly under the cmis stanza.  The following snippet of code shows the hierarchy of the parameter within the ibm_cp4ba_cr_final.yaml file:
    ecm_configuration:
      cpe:
        monitor_enabled: true
      cmis:
        monitor_enabled: true

Once the proper configuration is entered in the custom resource file you can use it to either install the Cloud Pak with monitoring capabilities pre-configured or to update your existing Cloud Pak installation.  Since installing the IBM Cloud Pak for Business Automation is outside of the scope of this article I will only list the command needed to update an existing installation.  To update your existing installation of the IBM Cloud Pak for Business Automation using the custom resource file you can run the following command:

oc apply -f generated-cr/ibm_cp4a_cr_final.yaml --overwrite=true
Notice that updating an existing Cloud Pak installation will force a restart on components where monitoring is enabled.

Enabling Monitoring Capabilities via Custom Resource Definition Instance

You can enable monitoring capabilities in the Cloud Pak by changing an existing instance of the ICP4AClusters custom resource definition directly.  To complete an edit in place of the ICP4AClusters instance, you can use the oc get command to find the name of the instance in your Cloud Pak and use the oc edit command to modify the attributes of the instance.  The following steps shows how to do this:

  1. Get the name of the ICP4AClusters instance that was created during your Cloud Pak installation.

    oc get ICP4ACluster -n <cp4ba_namespace>
    Where <cp4ba_namespace> should be replaced with the name of the project where the IBM Cloud Pak for Business Automation is installed.

    The output of the command should look similar to the following listing:
    NAME            AGE
    icp4adeploy     7d1h
  2. Use the name of the instance as listed in the previous step and run the oc edit command to modify the instance's configuration in place.
    oc edit ICP4ACluster <name-of-cp4ba-instance> -n <cp4ba_namespace>
    Where <cp4ba_namespace> should be replaced with the name of the project where the IBM Cloud Pak for Business Automation is installed and <name-of-cp4ba-instance> should be replaced with the name of the instance as listed in the previous step.
  3. Enter the global monitoring parameters as explained at the beginning of the previous section.

  4. Enter the individual monitor_enabled parameter as discussed in the previous section.

You should not assume that every component in the
IBM Cloud Pak for Business Automation supports the monitor_enabled parameter.  For example, the IBM Cloud Pak for Business Automation Operator component does not have a parameter to enable/disable the generation of custom metrics, in that case custom metrics are always generated by the operator.  The following table provides a list of IBM Cloud Pak for Business Automation components that currently support this flag and the proper yaml hierarchy where the flag should be specified.

Component Parent Hierarchy Parameter
IBM Business Automation Navigator spec.navigator_configuration monitor_enabled: true
IBM FileNet Content Manager - CPE spec.ecm_configuration.cpe monitor_enabled: true
IBM FileNet Content Manager - CSS spec.ecm_configuration.css monitor_enabled: true
IBM FileNet Content Manager - CMIS spec.ecm_configuration.cmis monitor_enabled: true
IBM FileNet Content Manager - ES spec.ecm_configuration.es monitor_enabled: true
IBM FileNet Content Manager - GraphQL spec.ecm_configuration.graphql monitor_enabled: true
IBM Business Automation Workflow - Authoring spec.workflow_authoring_configuration monitor_enabled: true
Business Automation Workflow - Embedded Elastic Search spec.elasticsearch_configuration monitor_enabled: true
IBM Process Federation Server spec.pfs_configuration monitor_enabled: true

Enable Metric Scraping of IBM Cloud Pak For Business Automation Components via Service Monitors

Up to this point we have enabled the default monitoring stack in OpenShift to allow for the monitoring of custom deployments.  We have also enabled monitoring capabilities on specific IBM Cloud Pak for Business Automation components so that they start generating custom metrics that can be consumed by the monitoring stack. The next step before we are able to leverage monitoring data coming from IBM Cloud Pak for Business Automation components is to notify Prometheus of additional targets where it can scrape custom metrics from. This can be done through the use of service monitors. A ServiceMonitor is a custom resource definition provided by the Prometheus operator that allows you to specify additional endpoints for a Prometheus instance to scrape.

Creating an instance of a service monitor can be done through the deployment of a ServiceMonitor resource.  For example, to create a service monitor instance that will allow Prometheus to scrape metrics from the IBM Cloud Pak for Business Automation Operator we can deploy a ServiceMonitor instance in the following way:

  1. Create a file called cp4a-operator-monitor.yaml.

  2. Add the following content to the file:
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      name: cp4a-operator-monitor
      namespace: <cp4ba_namespace>
    spec:
      endpoints:
        - interval: 30s
          port: http-metrics
          scheme: http
      selector:
        matchLabels:
        name: ibm-cp4a-operator
    Where <cp4ba_namespace> should be replaced with the name of the project where you deployed the IBM Cloud Pak for Business Automation.
  3. Deploy the ServiceMonitor instance using the oc apply command.

    oc apply -f cp4a-operator-monitor.yaml -n <cp4ba_namespace>

    Where <cp4ba_namespace> should be replaced with the name of the project where you deployed the IBM Cloud Pak for Business Automation.
  4. Verify that the ServiceMonitor has been deployed.

    oc get ServiceMonitor cp4a-operator-monitor -n <cp4ba_namespace>

    The output should be similar to the following listing:
    NAME                    AGE
    cp4a-operator-monitor   41d
    Where <cp4ba_namespace> should be replaced with the name of the project where you deployed the IBM Cloud Pak for Business Automation.

Once the
cp4a-operator-monitor is deployed, Prometheus will start scraping metrics generated by the IBM Cloud Pak for Business Automation Operator and they will be available for analysis, visualization or generation of alerting rules in the monitoring stack.

The following table describes the main set of attributes used in the previous example to create a ServiceMonitor instance.

Attribute Description
interval Used to indicate how often Prometheus should hit the metrics endpoint for our IBM Cloud Pak for Business Automation component.
port Name of the kubernetes service port that should be used by Prometheus when scraping metrics from the endpoint.  Typically metrics for IBM Cloud Pak for Business Automation components are exposed via port called metrics or http-metrics.
scheme Name of HTTP scheme to use for scraping. Typically the value for this attribute is http in IBM Cloud Pak for Business Automation components.
matchLabels Map of key/value pairs used to select the Kubernetes service where the metric endpoint is defined. This is specific to the component we want to monitor in the IBM Cloud Pak for Business Automation.
To validate if Prometheus is collecting metrics from your ServiceMonitor instance, you can use the OpenShift Console to query for a specific metric and create a visual representations for it.  For example, we can validate if you are getting metrics generated by the IBM Cloud Pak for Business Automation Operator by completing the following procedure:

  1. Open the Administrator perspective of your OpenShit Console.

  2. Go to the Monitoring menu on the left hand side of the console.

  3. Select the Metrics option under the Monitoring menu.

  4. Enter the following query on the text area found under the Insert Metric at Cursor drop down.  Do not worry about the meaning of the metric being used here, we will explain that in the following section.

    workqueue_unfinished_work_seconds{job="ibm-cp4a-operator-metrics"}
    This expressions allows you to query Prometheus for the workqueue_unfinished_work_seconds metric generated by the IBM Cloud Pak for Business Automation Operator.  If you are not familiar with Prometheus querying see Querying Prometheus.

  5. Click the Run Queries button above the text area to trigger the drawing of the workqueue_unfinished_work_seconds metric.
      
    If your ServiceMonitor deployment succeeded you should see a line plotted on the graph area.  The following picture shows how this might look like.  Notice that the values coming from your system will be different hence the chart itself will most likely be different but you should see values being plotted.

Although not necessarily obvious at first glance, a ServiceMonitor targets a specific Kubernetes service and its port using the matchLabels and port attributes respectively.  This association as well as the overall chaining of resources from the Prometheus instance to the actual endpoint can be better appreciated in the following diagram:
This diagram shows how the ServiceMonitor instance created in the previous example targets the ibm-cp4a-operator-metrics service using the name label which in turns exposes the endpoints running on the IBM Cloud Pak for Business Automation Operator POD ibm-cp4a-operator-xxxxxxxxxx-xxxxx.  Prometheus then uses the Kubernetes service targeted by the ServiceMonitor instance to scrape metrics from the endpoint based on the interval that has been specified.

While creating service monitors is a relatively easy task, you need to create an instance of a service monitor for each of the IBM Cloud Pak for Business Automation components that you wish to monitor.  Each of these instances will require the knowledge of internal deployment information specific to the component such as the label that you can use to target the relevant Kubernetes service and the name of the port used to expose the metrics endpoint.  The following table provides a list of labels and port names that can be used to create ServiceMonitor instances for many of the IBM Cloud Pak for Business Automation components.

Component Name of Kubernetes Service Unique Kubernetes Service Label Port Name
IBM Cloud Pak for Business Automation Operator ibm-cp4a-operator-metrics name: ibm-cp4a-operator http-metrics
IBM Business Automation Navigator icp4adeploy-navigator-svc servicename: icp4adeploy-navigator-svc metrics
IBM FileNet Content Manager - CPE icp4adeploy-cpe-svc servicename: icp4adeploy-cpe-svc metrics
IBM FileNet Content Manager - CSS icp4adeploy-css-svc-1 servicename: icp4adeploy-css-svc-1 metrics
IBM FileNet Content Manager - CMIS icp4adeploy-cmis-svc servicename: icp4adeploy-cmis-svc metrics
IBM FileNet Content Manager - ES icp4adeploy-es-svc servicename: icp4adeploy-es-svc metrics
IBM FileNet Content Manager - GraphQL icp4adeploy-graphql-svc servicename: icp4adeploy-graphql-svc metrics
IBM Business Automation Application - Application Engine icp4adeploy-workspace-aae-ae-service app.kubernetes.io/instance: icp4adeploy-workspace-aae metrics
IBM Business Automation Application - Resource Registry icp4adeploy-dba-rr-client app.kubernetes.io/name: resource-registry metrics
IBM Business Automation Studio - Play Back Server icp4adeploy-pbk-ae-service app.kubernetes.io/instance: icp4adeploy-pbk metrics
IBM Business Automation Studio cp4adeploy-bastudio-service app.kubernetes.io/name: bastudio metrics

Create Cloud Pak For Business Automation Sample Dashboard

Now that we have configured the monitoring capabilities for the IBM Cloud Pak for Business Automation, and deployed a fully functional instance of Grafana we can create a sample dashboard to visualize some of the custom metrics generated by IBM Cloud Pak for Business Automation components.  For simplicity this article will show you how to create a simple dashboard to visualize the ansible_operator_reconciles_count counter and the rate of change for the workqueue_unfinished_work_seconds counter.  Both metrics are are being scraped from the IBM Cloud Pak for Business Automation Operator instance based on the ServiceMonitor we created in the previous section.  The ansible_operator_reconciles_count represents the total number reconciliations completed by the operator.  The rate of change for the workqueue_unfinished_work_seconds counter can be used to identify the possibility of deadlocked threads in the operator.  We will deploy our dashboard by creating an instance of a MonitoringDashboard custom resource definition provided by the IBM Monitoring Grafana Operator but you can create or edit a dashboard directly from the Grafana user interface.  To create the Cloud Pak For Business Automation Sample Dashboard complete the following instructions:

  1. Create a file called cp4ba-sample-dashboard.yaml.

  2. Add the following content to the file:
    apiVersion: monitoringcontroller.cloud.ibm.com/v1
    kind: MonitoringDashboard
    metadata:
      name: cp4ba-sample-dashboard
      namespace: <cp4ba_namespace>
      labels:
        component: grafana
    spec:
      data: |
        {
          "annotations": {
            "list": [
              {
                "builtIn": 1,
                "datasource": "-- Grafana --",
                "enable": true,
                "hide": true,
                "iconColor": "rgba(0, 211, 255, 1)",
                "name": "Annotations & Alerts",
                "type": "dashboard"
              }
            ]
          },
          "editable": true,
          "gnetId": null,
          "graphTooltip": 0,
          "links": [],
          "panels": [
            {
              "aliasColors": {},
              "bars": false,
              "dashLength": 10,
              "dashes": false,
              "datasource": null,
              "fieldConfig": {
                "defaults": {
                  "custom": {}
                },
                "overrides": []
              },
              "fill": 1,
              "fillGradient": 0,
              "gridPos": {
                "h": 8,
                "w": 12,
                "x": 0,
                "y": 0
              },
              "hiddenSeries": false,
              "id": 4,
              "legend": {
                "avg": false,
                "current": false,
                "max": false,
                "min": false,
                "show": true,
                "total": false,
                "values": false
              },
              "lines": true,
              "linewidth": 1,
              "nullPointMode": "null",
              "percentage": false,
              "pluginVersion": "7.1.5",
              "pointradius": 2,
              "points": false,
              "renderer": "flot",
              "seriesOverrides": [],
              "spaceLength": 10,
              "stack": false,
              "steppedLine": false,
              "targets": [
                {
                  "expr": "rate(workqueue_unfinished_work_seconds{name=\"icp4acluster-controller\"}[5m])",
                  "interval": "",
                  "legendFormat": "",
                  "refId": "A"
                }
              ],
              "thresholds": [],
              "timeFrom": null,
              "timeRegions": [],
              "timeShift": null,
              "title": "CP4BA Operator Rate of Unfinished Work",
              "tooltip": {
                "shared": true,
                "sort": 0,
                "value_type": "individual"
              },
              "type": "graph",
              "xaxis": {
                "buckets": null,
                "mode": "time",
                "name": null,
                "show": true,
                "values": []
              },
              "yaxes": [
                {
                  "format": "short",
                  "label": null,
                  "logBase": 1,
                  "max": null,
                  "min": null,
                  "show": true
                },
                {
                  "format": "short",
                  "label": null,
                  "logBase": 1,
                  "max": null,
                  "min": null,
                  "show": true
                }
              ],
              "yaxis": {
                "align": false,
                "alignLevel": null
              }
            },
            {
              "datasource": null,
              "fieldConfig": {
                "defaults": {
                  "custom": {},
                  "mappings": [],
                  "thresholds": {
                    "mode": "absolute",
                    "steps": [
                      {
                        "color": "green",
                        "value": null
                      },
                      {
                        "color": "red",
                        "value": 80
                      }
                    ]
                  }
                },
                "overrides": []
              },
              "gridPos": {
                "h": 9,
                "w": 12,
                "x": 0,
                "y": 0
              },
              "id": 2,
              "options": {
                "reduceOptions": {
                  "calcs": [
                    "mean"
                  ],
                  "fields": "",
                  "values": false
                },
                "showThresholdLabels": false,
                "showThresholdMarkers": true
              },
              "pluginVersion": "7.1.5",
              "targets": [
                {
                  "expr": "ansible_operator_reconciles_count{job=\"ibm-cp4a-operator-metrics\"}",
                  "interval": "",
                  "legendFormat": "",
                  "refId": "A"
                }
              ],
              "timeFrom": null,
              "timeShift": null,
              "title": "CP4BA Operator Reconcile Count",
              "type": "gauge"
            }
          ],
          "schemaVersion": 26,
          "style": "dark",
          "tags": [],
          "templating": {
            "list": []
          },
          "time": {
            "from": "now-6h",
            "to": "now"
          },
          "timepicker": {
            "refresh_intervals": [
              "5s",
              "10s",
              "30s",
              "1m",
              "5m",
              "15m",
              "30m",
              "1h",
              "2h",
              "1d"
            ]
          },
          "timezone": "",
          "title": "Cloud Pak for Business Automation Sample Dashboard",
          "version": 1
        }
      enabled: true
    
        

    Notice how the dashboard is defined by the standard JSON representation of a Grafana Dashboard in the data field.  Also notice the fact that the MonitoringDashboard instance is being defined under the namespace where the
    IBM Cloud Pack For Business Automation was installed.

    Where <cp4ba_namespace> should be replaced with the name of the project where you deployed the IBM Cloud Pak for Business Automation.
  3. Deploy the MonitoringDashboard instance using the oc apply command.
    oc apply -f cp4ba-sample-dashboard.yaml -n <cp4ba_namespace>
    The output of the command should be similar to the following listing:
    monitoringdashboard.monitoringcontroller.cloud.ibm.com/cp4ba-sample-dashboard created
  4. Make sure MonitoringDashboard instance has been created successfully.
    oc get MonitoringDashboard cp4ba-sample-dashboard -n <cp4ba_namespace>
    The output of the command should be similar to the following listing:
    NAME                     ENABLED   AGE
    cp4ba-sample-dashboard   true      111s
  5. Now we should be able to see the new Cloud Pak For Business Automation Sample Dashboard in the Grafana interface.  Access your new Grafana instance using the URL off the cp-console route.  See the previous section on how to access the Grafana dashboard.  As stated before the URL should have the following format:
    https://cp-console.<cluster-subdomain>/grafana
  6. Click on the Settings icon, found on the lower left hand side of the screen (above the help "?" icon).  This will take you to the settings panel for the user that is currently authenticated in Grafana.
  7. Go to the Organizations section under the Settings page and look for the namespace where your Grafana dashboard was deployed.  Once you find the namespace, click on the Select button next to it.  This will switch the organization used by Grafana so that it can find the Cloud Pak For Business Automation Sample Dashboard  under the namespace where it was created.
    Note: You will only be able to see Grafana organizations corresponding to the OpenShift projects that your authenticated user has access to.  For more information on role based access controls for the Grafana instance included in the IBM Cloud Pak Foundational Services see Monitoring Service.
  8. Click on the Dashboards menu, depicted by the four squares icon, to the left of the screen and select the Manage option.  You should be able to see the Cloud Pak For Business Automation Sample Dashboard listed.
  9. Click on the Cloud Pak for Business Automation Sample Dashboard entry.  Once in the Cloud Pak for Business Automation Sample Dashboard page, you should should be able to see two charts similar to the ones depicted in the image below.
    While this is a simple implementation of a dashboard you can now explore all custom metrics available via the IBM Cloud Pak for Business Automation components and expand the dashboard capabilities to include visualizations for additional metrics and components.

Configure Alerts Based on IBM Cloud Pak For Business Automation Components

Arguably one of the most important features of a monitoring system is the ability to generate notifications based on conditions or events that need to be investigated in order to prevent or address application issues and potential failures. The OpenShift Container Platform leverages the combined capabilities of Prometheus, Thanos and AlertManager to generate, correlate and route notifications to a variety of notification targets.

In previous sections of this article we were able to configure OpenShift’s monitoring stack to support user defined projects and feed custom metrics generated by IBM Cloud Pak for Business Automation components into it.  In this section we will learn how to leverage this setup in order to create customized alerts to monitor the behavior of the Cloud Pak.  Specifically, we will create an alert to monitor the possibility of thread deadlocks in the IBM Cloud Pak for Business Automation Operator using the workqueue_unfinished_work_seconds metric discussed in the previous section.  While we are using the same metric to keep the context of the discussion, you can create alerts without using them in a dashboard or vice versa.

To define a custom alert in the OpenShift’s monitoring stack we need to create an instance of the PrometheusRule custom resource definition.  This entails creating a resource of type PrometheusRule and specifying the rules that would trigger the firing of the alert using the Prometheus expression language.  To create an alert that will notify the potential of thread deadlocks in the IBM Cloud Pak For Business Automation Operator we can do the following:

  1. Create a file called cp4ba-operator-deadlock-alert.yaml.

  2. Add the following content to the file
    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
      name: cp4ba-operator-deadlock
      namespace: <cp4ba_namespace>
    spec:
      groups:
      - name: cp4basetup
        rules:
          - alert: cp4ba-deadlock-detected
            expr: rate(workqueue_unfinished_work_seconds{name="icp4acluster-controller"}[5m]) > 1
            for: 10m
            labels:
              severity: warning
            annotations:
              message: Potential thread deadlock detected in Cloud Pak For Business Automation Operator
    
    
    Where <cp4ba_namespace> should be replaced with the name of the project where the IBM Cloud Pak For Business Automation has been installed.
  3. Deploy alert using the oc apply command

    oc apply -f cp4ba-operator-deadlock-alert.yaml
    The output of the command should be similar to the following listing:
    prometheusrule.monitoring.coreos.com/cp4ba-operator-deadlock created


  4. Verify the alert has been created

    oc get PrometheusRule cp4ba-operator-deadlock -n <cp4ba_namespace>

    Where <cp4ba_namespace> should be replaced with the name of the project where the Cloud Pak For business automation has been deployed.

    The output of the command should be similar to the following listing:
    NAME                       AGE
    cp4ba-operator-deadlock    18m​


To validate that your alert is working properly
, you can use the OpenShift Console.  From the OpenShift Console you can inspect the instance of the PrometheusRule created and verify if the alert is active.  An active alert is an alert that is actually being fired or that it is waiting for the threshold, as specified by the for parameter,  to be met in order to fire.  We can validate the cp4ba-operator-deadlock alert created above by completing the following procedure:

  1. Open the Administrator perspective of your OpenShit Console.

  2. Go to the Monitoring menu on the left hand side of the console.

  3. Select the Alerting option under the Monitoring menu.

  4. Click on the Clear all filters link to show active alerts not only for OpenShift Container Platform defined alerts but for custom alerts too.

    By default the Alerting panel shows platform based alerts, that is those monitoring OCP components.  You can expand the search to include non platform alerts by clicking on the Clear all filters link.  When all filters are removed, you should be able to see alerts that are part of the platform and custom alerts coming from user defined projects.  Notice that this panel does not show the actual instance of the PrometheusRule created in the previous step.  It shows only alerts that are active.  If you do not see your alert in this panel it means that the conditions that should trigger the alert have not been met yet.

  5. To view the actual instance of the PrometheusRule previously created in this section,  click on the Alerting Rules tab found at the top of the Alerting panel. 
    Once in the
    Alerting Rules panel,  click on the Clear all filters link to include custom alerts.  Once you clear all filters you should be able to see the cp4ba-operator-deadlock alert in the list.

As you can see, creating alerts for components of the IBM Cloud Pak for Business Automation can be done using the standard mechanism provided by Prometheus to create custom alerts.  A few things worth mentioning about this capability are:

  • Once custom metrics have been scrapped by Prometheus you can use them to create alert rules.  This gives you the flexibility to create alerts that are specific to the capabilities and inner workings of the IBM Cloud Pak for Business Automation components and to get real time notifications based on potential threads to the reliability of your solutions.
  • Custom metrics can be used in combination with functions provided by Prometheus to test for a condition.  In this example we used the rate function to calculate the rate of change of the workqueue_unfinished_work_seconds counter but there are many other functions available for use.  See the Prometheus Function documentation for additional details. 
  • The instance of your PrometheusRule should be created under the namespace where the resource being monitored resides.  In this example since were were monitoring the IBM Cloud Pak For Business Automation Operator we created the alert under that namespace.

For a complete explanation on how to create alerts and the Prometheus expression language see the Alerting Rules documentation.

Conclusion

Operational visibility and the ability to understand internals of the IBM Cloud Pak for Business Automation behavior is key to maintain the health of the solutions built on top of your Cloud Pak.  You can monitor components of your Cloud Pak by leveraging the out of the box monitoring stack provided by the OpenShift Container Platform 4.6+ and enhance it’s capabilities by deploying the monitoring services provided with the IBM Cloud Foundational Services.  This configuration provides a powerful open source based solution that allows you to identify and respond to issues in a timely manner in order to guarantee the readiness, level of service and availability of your business applications. 

Want to start your business automation journey?  See how the IBM Cloud Pak for Business Automation can help you.



#CloudPakforBusinessAutomation

0 comments
202 views

Permalink