Monitoring CP4AIOps using Instana
Co-author: Ben Stern, Pratik Patel
Cloud Pak for AIOps is highly relied upon 24x7 by enterprises as their IT/Network operations management solution. So it is important to monitor Cloud Pak for AIOPS and the underlying infrastructure, to prevent outages/performance issues that might impact users. IBM Instana provides best-in-class monitoring for microservices-based applications like CP4AIOps. Instana automatically discovers an application's components, and the underlying OpenShift platform, and provides a graphical representation of the application's topology.
This blog details how you can configure Instana monitoring for CP4AIOps.
Monitoring Setup:

1. Setup Kubernetes Monitoring
Install Instana Kubernetes agent (using operator method) :

Check on the cluster if instana-agent pods on all worker nodes are running.
[root@api.aiops24.cp.fyre.ibm.com ~]# oc get pods -n instana-agent
NAME READY STATUS RESTARTS AGE
instana-agent-2krj4 1/1 Running 0 8d
instana-agent-4sbb7 1/1 Running 0 8d
instana-agent-crrzw 1/1 Running 1 (7d20h ago) 8d
instana-agent-dx89d 1/1 Running 0 8d
instana-agent-h49vz 1/1 Running 0 8d
instana-agent-jz6kb 1/1 Running 0 8d
instana-agent-mxbq9 1/1 Running 0 8d
instana-agent-qms9z 1/1 Running 0 8d
instana-agent-tqnz7 1/1 Running 0 8d
instana-agent-vs5vw 1/1 Running 0 8d
instana-agent-xldtk 1/1 Running 0 8d
k8sensor-8657c6b5b9-7wckj 1/1 Running 0 8d
k8sensor-8657c6b5b9-r94j5 1/1 Running 0 8d
k8sensor-8657c6b5b9-ssq5f 1/1 Running 0 8d
Check discovered Kubernetes cluster in Instana UI -> Platforms -> Kubernetes

Click on Cluster name to launch Kubernetes monitoring dashboard

Instana provides built-in health rules for Kubernetes platform monitoring.
Reference: https://www.ibm.com/docs/en/instana-observability/current?topic=references-built-in-events-reference#kubernetes

2. Instrumenting CP4AIOps
2.1 Update instana-agent configmap for CP4AIOPS for KAFKA and Postgres
[root@api.aiops24.cp.fyre.ibm.com ~]# oc edit configmap instana-agent
apiVersion: v1
data:
cluster_name: aiops24
configuration-disable-kubernetes-sensor.yaml: |
com.instana.plugin.kubernetes:
enabled: false
configuration.yaml: |
com.instana.plugin.kafka:
#jmxUsername: ''
#jmxPassword: ''
#jmxPort: '' # default jmx port is 1099
topicsRegex: '.*'
brokerPropertiesFilePath: '/opt/kafka/config/server.properties'
collectLagData: 'true' # true or false. The default value is true
#sslTrustStore: '/path/to/truststore.jks'
#sslTrustStorePassword: 'kafkaTsPassword'
#sslKeyStore: '/path/to/sslKeyStoreFile.jks'
#sslKeyStorePassword: 'kafkaKsPassword'
com.instana.plugin.postgresql:
user: 'aiops_topology'
password: 'password'
database: 'aiops_topology' # by default PostgreSQL will use 'user' as database to connect to.
# Manual a-priori configuration. Configuration will be only used when the sensor
# is actually installed by the agent.
# The commented out example values represent example configuration and are not
# necessarily defaults. Defaults are usually 'absent' or mentioned separately.
# Changes are hot reloaded unless otherwise mentioned.
# It is possible to create files called 'configuration-abc.yaml' which are
# merged with this file in file system order. So 'configuration-cde.yaml' comes
# after 'configuration-abc.yaml'. Only nested structures are merged, values are
# overwritten by subsequent configurations.
# Secrets
# To filter sensitive data from collection by the agent, all sensors respect
# the following secrets configuration. If a key collected by a sensor matches
# an entry from the list, the value is redacted.
#com.instana.secrets:
# matcher: 'contains-ignore-case' # 'contains-ignore-case', 'contains', 'regex'
# list:
# - 'key'
# - 'password'
# - 'secret'
# Host
#com.instana.plugin.host:
# tags:
# - 'dev'
# - 'app1'
# Hardware & Zone
#com.instana.plugin.generic.hardware:
# enabled: true # disabled by default
# availability-zone: 'zone'
kind: ConfigMap
PostgreSQL in CP4AIOPS Instrumentation :
Reference: https://www.ibm.com/docs/en/instana-observability/current?topic=technologies-monitoring-postgresql
For "com.instana.plugin.postgresql" section in above configmap, you need to create a new user in Postgres database. To login into Postgres you need user and password from the secret <installation-name>-edb-postgres-superuser.
[root@api.aiops24.cp.fyre.ibm.com ~]# oc get secret/aiops-edb-postgres-superuser --template='{{.data.username | base64decode}}'
postgres
[root@api.aiops24.cp.fyre.ibm.com ~]# oc get secret/aiops-edb-postgres-superuser --template='{{.data.password | base64decode}}'
B6AK9Jqp****************************************************************
Now login into any postgres pod (running in cp4aiops namespace) using the above username/password to create new user 'aiops_topology' and give access to this user for metrics collection.
[root@api.aiops24.cp.fyre.ibm.com ~]# oc rsh aiops-edb-postgres-1
bash-4.4$ psql --host localhost --username postgres
Password for user postgres: <password from above secret>
psql (13.14)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.
postgres=# create user aiops_topology with password 'password;
CREATE ROLE
postgres=# grant SELECT ON pg_stat_database to aiops_topology;
postgres=# GRANT ALL PRIVILEGES ON DATABASE aimanager TO aiops_topology;
Enable statistics collection in the postgres configuration by adding track parameters to the existing yaml file.
[root@api.aiops24.cp.fyre.ibm.com ~]# oc edit cluster common-service-db
postgresql:
enableAlterSystem: true
parameters:
track_activities: "on"
track_counts: "on"
track_io_timing:"on"
After all the configuration changes restart instana-agent pods in the OCP cluster.
3. Instana Dashboards
In Instana UI, define an application perspective from the namespace in which CP4AIOPS is deployed

You can observe the CP4AIOps application in Instana UI with out-of-the-box discovered metrics.

Services tab showing latency, call rates, etc for different services in the CP4AIOps application that are available out of the box

3.1 KAFKA Monitoring Dashboards
Kafka Cluster Dashboard

Kafka Cluster Consumer Group Lag

References: https://www.ibm.com/docs/en/instana-observability/current?topic=technologies-monitoring-kafka
KAFKA Built-in Events: https://www.ibm.com/docs/en/instana-observability/current?topic=references-built-in-events-reference#kafka
3.2 Postgres Monitoring Dashboards


PostgreSQL Built-in Events: https://www.ibm.com/docs/en/instana-observability/current?topic=references-built-in-events-reference#postgresql-db
3.3 Cassandra Monitoring Dashboards
Basic Cassandra monitoring is available out of the box (no custom config required).
Launch Infrastructure and search Cassandra -> select the node from the cluster that runs Cassandra and then open Cassandra Cluster Dashboard.



References: https://www.ibm.com/docs/en/instana-observability/current?topic=technologies-monitoring-cassandra
Cassandra Built-in Events: https://www.ibm.com/docs/en/instana-observability/current?topic=references-built-in-events-reference#cassandra-cluster
4. Create Custom alerts for CP4AIOps KPIs
You can create custom events in addition to built-in events provided by Instana. For example below sample alert is created when there is KAFKA lag on any of the monitored consumer groups.

Using the Instana connector in CP4AIOps all these alarms can be ingested into CP4AIOPS for correlation with platform and network alarms.

I hope this article is helpful. For more information about Configuring Instana and its monitoring capabilities: https://www.ibm.com/docs/en/instana-observability/current?topic=configuring-monitoring-supported-technologies