Instana APM to Cloud Pak for Watson AIOps Integration
This blogpost will explain best practices when integrating Instana APM with Cloud Pak for Watson AIOps and has been written in February 2023. It reflects all capabilities of Instana version 239 on OpenShift 4.8.47 and Cloud Pak for Watson AIOPs version 3.6.
IBM’s automation portfolio has got a lot of interesting capabilities with Instana Application Performance Management (APM), SevOne Network Performance Management (NPM) and Turbonomic Application Resource Management (ARM). APM takes an AIOPs solution to the next level and especially APM, NPM and ARM are key elements of the IBM IT Automation architecture. They deliver important and critical data to an AIOps solution (Proactive Incident Resolution) :
Figure 1a: IBM Automation IT Architecture
What are the main differences between NPM, ARM, APM and AIOps?
APM and NPM – as the names says, are “Performance Managers”. ARM is a Resource Manager.
APM (Application Performance Management) tools like instana are designed to monitor / observe and manage the performance of applications and identify any issues that may be affecting the user experience. They provide visibility into the performance of an application, including response times, error rates, and resource utilization.
A NPM (Network Performance Management) tool like SevOne is a highly scalable and application-aware network performance monitoring system. SevOne NPM is designed to monitor / observe and manage the performance of networks and identify any issues that may be affecting the network performance. SevOne NPM helps also to deliver smooth transitions to virtualized networking and cloud services. By transforming raw network performance data from infrastructure across the entire delivery chain into actionable insights, SevOne NPM delivers a comprehensive view of what’s happening in the network and how that performance affects the applications driving modern businesses.
ARM (Application Resource Management) is a top-down, application-driven approach that continuously analyzes applications resource needs and generates automatable actions to ensure applications always get the resources they need to perform. ARM focuses to assure performance while maintaining efficient use of resources. When performance and efficiency are both maintained, the environment is in the “desired state”.
AIOps tools like IBM CloudPak for Watson AIOPS are designed to automate the IT operations process using artificial intelligence and machine learning. They use data from various IT systems to analyze patterns, identify issues, and suggest resolution strategies. AIOps tools aim to improve the speed and efficiency of IT operations, reduce downtime, and minimize human error. In short, APM focuses on monitoring and optimizing application performance, while AIOps aims to automate and optimize IT operations.
Let’s show an ideal use case, where APM and ARM work together in an AIOps solution.
Figure 1b : Sample use case of working APN, ARM, NPM and AIOPs together
The use case scenario includes the following components :
• A Customer deployed Instana, Turbonomic and CP4WAIOPS to manage a sample application (Quote of the Day application, QuoteD)
• Instana is continuously observing the QuoteD application to provide accurate and real time information about QuoteD application performance to the SREs
• Turbonomic collects data about services and infrastructure (supporting QuoteD application) and analyzes the data for continuous optimization of the resources
• CP4WAIOPS receives and correlates all the information from Instana, Turbonomic and other sources in order to reduce noise and provide quick resolution of problems affecting the QuoteD application
Use Case steps :
1. QuoteD experiences performance issue(s)
2. Instana quickly discover and detects performance issue(s) in QuoteD
3. Instana forwards topology, events and metrics to CP4WAIOPS
4. Upon receiving metrics CP4WAIOPS detects metric anomalies related to QuoteD and generates anomaly events
5. In parallel Turbonomic identifies and suggests an action plan to address performance issues in QuoteD services.
6. Turbonomic forwards Action event to CP4WAIOPS
7. CP4WAIOPS automatically creates a single Story that groups all related events from Instana, CP4WAIOPS, Turbonomic and other sources to reduce noise
8. SRE quickly understands the cause effect relationship across these events and decides to execute the action plan via runbook
9. After action plan execution SRE can see that QuoteD returns to normal, events are cleared, and story is closed
It’s important to highlight that each of these solutions (APM, NPM, ARM, AIOps) has many additional strengths and capabilities, which are beyond the scope of this article.
Let’s focus on Instana and APM first.
The goal of an APM solution like Instana is to “observe” your environment. But what is the big difference between “observability” and “monitoring”. There have been written many articles around the differences between “observability” and “monitoring”, e.g https://www.ibm.com/cloud/blog/observability-vs-monitoring.
Monitoring and observability are two ways to identify the underlying cause of problems. Monitoring tells you when something is wrong, while observability can tell you what’s happening, why it’s happening and how to fix it. Another difference is in alerting and event management.
While monitoring alerts the team to a potential issue, observability helps the team detect and solve the root cause of the issue. Quickly solving the root cause of an issue is the main driver of observability. But the best software incident is one that never occurs. Therefore Instana, as an AI infused observability tool, tries to avoid application issues before they occur.
From an event management perspective, observability and monitoring are quite equal. Both are assessing the health of a system by sending events and alerts, before they turn into a problem.
In an incident management toolchain, an observability tool like Instana plays an important role, because it is virtually at the “front end” and collects relevant data from its agents. Instana can be integrated to the Cloud Pak for Watson AIOps with Topology, Tracing, Monitoring and Metric information.
Figure 1c: Instana Observability adds critical data to an Incident Management Toolchain
It is important to highlight, that you do not need Cloud Pak for Watson AIOPs for Instana APM to work and vice versa, as they are independent products. But often it is helpful to send informational messages or even critical alerts from Instana APM to an Event Manager Console for further processing, e.g. root cause analysis, or triggering runbook automation tasks to respond and solve issues detected by Instana, or even creating a resolution “story” with AI Manager.
Modern observability and monitoring systems are able to send their event data to an event manager via webhook, where the event data fields will be parsed and transformed into event records. Therefore, Instana has a customizable webhook based integration to IBM Cloud Pak for Watson AIOPs using the message bus probe, where the SRE is able to finetune the event fields and event record sending activities (on the Instana side) and the event receiving activities on the event management (receiving) side, by setting the right parameters and properties for the transport and parsing the event records with rules. The outcome is very important as event fields like severity, origin, application type, failure groups and many other helpful extended attributes will be stored at the event manager for either quick root cause analysis or for event analytics and event correlation, like noise reduction and supergrouping of events.
In the case of an incident, Instana triggers an alert to the Incident Management toolchain. The Event Manager of the Cloud Pak for Watson AIOPs is collecting and analyzing aggregate data from monitoring and observability systems based on a predefined set of event fields, topology data, tracing data, monitoring and metrics data. Cloud Pak for Watson AIOps can trigger an automated action and If Cloud Pak for Watson AIOps execute these actions via runbooks, the environment will maintain operating conditions that assure performance and SLAs / QoS for your customers.
As Instana also has the capabilities to create events, alerts and incidents, it is important to react on the really critical alerts, before issues will comes up. In Instana, you can define so called “smart alerts”, that will “observe” your application for critical events, e.g. when too many anomalies will be detected within a defined time frame, like too many erreonous calls on an application.
Figure 2a : Application perspectives and smart alerts
The Smart Alert function then automatically triggers an alert to the Instana defined Alert channel, which is integrated with the event manager of IBM Cloud Pak for Watson AIOPS via the webhook URL. You can have multiple Alert Channels in Instana, to the same Event Manager, for availability and scalability reasons, or even just for different virtual applications and tenants. Depending on the customer’s choice, and requirements, a webhook based integration via message bus can be defined on a VM (via the Message Bus Probe) or containerized in OpenShift via the Cloud Pak for Watson AIOPs Event Integration Operator. Both implementations have the same parameters files, rules files, transport files and all common features, just the deployment format is different (VM or pod based).
Figure 2b : Instana Alert Channel definition
After an Alert Channel has been defined with the target webhook URL, you can create a Smart Alert for your defined Instana Application and select the desired Alert Channel, where the Smart Alert for this Application should send the Alerts as event records to the Cloud Pak for Watson AIOPs Event Manager. Now, the Smart Alert has the connection to the message bus probe via the respective Alert Channel and the Instana Application is able to send alerts in form of event records with the appropriate severity to the Event Manager.
Figure 2c : Instana Alert Channel properties to webhook triggered by a Smart Alert
Instana has a nice feature, where you can define additional custom event fields for a custom payload, if you need further information of an Application payload, e.g. the names of the affected pods of the Application. At the Instana Smart Alert, you can specify custom event fields at the Global Custom Payload section and add either static or dynamic event fields. In our example in the next figure, we have defined an additional custom event field “app_pod_name” with dynamic field input, which includes all the names of the failing pods of an Instana Application.
Figure 2d: Instana custom event fields for customPayloads.
When Instana sends the event record to the Event Manager, the customPayload fields are added as nested json value pairs to the end of the respective event record.
Note: This is the predefined Instana Webhook format. It is not compatible with third-party tools which expect an incoming Webhook in their format. The entire Instana event record contains the pre-defined event fields of Instana, which are fixed and will be always sent. These are value pairs with the “issue” identifier. The customPayload fields are added to the event record to the dynamic field “custom:app_pod_name”.
Figure 2e: All “issue” based Instana json value pairs to be sent to Cloud Pak for Watson AIOPs webhook
For more information, how to define alert webhooks in Instana, refer to the latest Instana documentation at
Integration between Instana and Cloud Pak for Watson AIOps
Cloud Pak for Watson AIOps is using two ways to integrate with Instana and receive critical data ingestion from Instana :
- 1) Via a direct defined connection in Cloud Pak for Watson AIOps where you can define Data and Tool Connections via the GUI :
Define >> Data and Tool Connections >> Add Connection :
Figure 3a: Adding an Instana Connection in Cloud Pak for Watson AIOPs
If you use the Cloud Pak for Watson AIOPs AI Manager to connect to Instana, refer to the documentation at
- 2) Via Message Bus probe to receive event data from external systems like legacy monitoring and 3rd party APM systems. The Cloud Pak for Watson AIOps documentation explains what a probe is about:
Probes connect to an event source, detect, and acquire event data, and forward the data to the ObjectServer as alerts. Each probe is uniquely designed to acquire event data from a specific source. However, probes can be categorized based on how they acquire events.
Deploying the message bus probe for Instana and connecting the Message Bus probe to Cloud Pak for Watson AIOPs is not explicitly documented in the IBM documentation, neither in the Netcool Operations Insight documentation, nor in the Netcool Omnibus documentation, but you can follow these instructions for a generic webhook probe here:
Depending on the option you have implemented the connection to Instana, you need to configure the message bus probe parameters accordingly. The location of the Cloud Pak for Watson AIOps object server defines the way how to connect the message bus probe.
- - If the Cloud Pak for Watson AIOps Event Manager is implemented in “on-prem” mode or “hybrid mode”, that means, the object servers are running on VMs, the message bus probe is recommended to be deployed on a VM, too.
- - If the Cloud Pak for Watson AIOps Event Manager is implemented completely on OpenShift in “cloud mode”, that means, the object servers are deployed on OpenShift pods, the message bus probe is recommended to be deployed on OpenShift pods using the Cloud Pak for Watson AIOps Event Integration Operator, or the probe can be deployed on-prem on a VM.
That means, it is important to identify, where the object server(s) are running, and how to connect the message bus probe to the Cloud Pak for Watson AIOps Object Servers, because then the object servers’ nodeports need to be exposed accordingly, when the message bus probe is running on a VM so that the on-prem VM based probe is able to connect to the OpenShift based object servers. Details can be found here:
If the message bus probe is being deployed on OpenShift using the Cloud Pak for Watson AIOps Event Integrations Operator, then you can follow these instructions here at “Deploying the generic message bus probe for webhook integration”:
We recommend to deploying a message bus probe instance for every integration type, e.g. Instana, or Turbonomic, or SevOne, or others. This lets you define the appropriate parameters for each integration individually, e.g. the message bus props, parser, transport and rule files. The next figure shows the Event Integrations Operator with message bus (webhook) probe instances deployed in pods for Instana and Turbonomic :
Figure 3b: Event Integrations for message bus webhook probes
If you would like to run your probe with a custom rules file rather than the default rules file, you have to create a PVC first and copy your custom rules file into a PVC of the configmap.
In the on-prem, VM based message bus probe software package, the according parameter files for Instana integration are not included, but you can create and customize them from the generic message bus probe examples. It is recommended that you create the following message bus probe files for Instana :
These specific parameter files do not come as default in the Message Bus probe product binaries. For your convenience, we have provided you these files for download here :
The according parameters can be passed to the containerized probe by using config maps in OpenShift. You pass the respective probe file to the message bus probe instance in a configmap, which is defined when deploying the probe instance.
In Cloud Pak for Watson AIOps, the http(s) address of the message bus probe is defined in the message bus transport.properties file, e.g. message_bus_instana_transport.properties file, with a following content :
In our example, the containerized message bus probe instance is reachable at the following webhook URI :
The containerized webhook probe does not need a port number, as it is clearly identified with the unique application link (ingress) in OpenShift.
A webhook definition does generally include:
– The URL to the application you send the webhook to
– An HTTP method (http 1.1)
– A template for the webhook payload
– Authentication credentials to access the application
If the event record coming from Instana is being received accordingly from Event Manager, it will be parsed according to the policies in the message bus probe’s rules file and parsing properties file. The next figure shows the entire event record in json syntax before the Event Processor will parse the event fields and map them to the Event Manager’s defined object server fields according to the rules and parsing properties files.
Figure 3c: Event record received by the webhook before event processing
Usually, an event receiver can adapt to these event fields as json value pairs, by transforming the fields into their expected record format. The Cloud Pak for Watson AIOPs Event Manager is able to transform incoming event records, from different senders, by parsing the event fields according to customized rules files. When the Instana event record arrives at the Event Managers message bus probe, the received event record is being parsed. The progress of the parsing activity is reported in the message bus log file with the Event Processor category, if you have enabled the log file information level of “Debug”.
Figure 3d: Event processing based on rules file at the Cloud Pak for Watson AIOPs message bus probe
After having parsed the Instana event record, the event fields are successfully mapped to the designated Event Manager fields and these are saved in the Object Servers for further processing. The filtered event fields are displayed in the customizable Event Viewer GUI to show the important fields first.
Figure 4a: Instana Event record sent to Cloud Pak for Watson AIOPs Event Viewer Console
All mapped event fields are summarized in the “Fields” Tab of the Event Viewer :
Figure 4b : Instana Event Record Fields Overview mapped to the Cloud Pak for Watson AIOps Event Viewer
All received Instana event fields are also reported in the Details section of the Event Viewer. In our case, we have specified in the rules file to do so, by adding a rules file function
This mapping adds all json value pairs to the Details section of the Event Viewer for further information, which can be useful for detailed root cause analysis.
Figure 4c : Instana Event Record Details at Cloud Pak for Watson AIOps Event Viewer Console
According to the message bus probe rules file for Instana, the Extended Attributes Field contains all extended value pair parameters sent from Instana, as an aggregation. The rule parameter is defined as ($*), which is interpreted that all fields of an event record from Instana can be aggregated in the ExtendedAttr field.
@ExtendedAttr = nvp_add($*)
(The probe’s rules file function “nvp_add” enables probes to generate events that contain extended attributes, which are supplied as name-value pairs).
The ExtendedAttr field enables the policy engine of Cloud Pak for Watson AIOps to filter for important value pairs, e.g., for further root cause analysis or event enrichment or to triggering actions like runbooks or other event enrichment tasks. Therefore, the ExtendedAttr Field and its content is sometimes helpful to gain additional information in events and alerts for further processing and event analytics (and AIOps related tasks).
Instana is not only a technically advanced observability solution, but furthermore Instana can be an entry point to a comprehensive AIOPs project. As expected from IBM, Instana APM integrates well, out-of-the-box, with other IBM solutions like Turbonomic ARM, and Cloud Pak for Watson AIOPs. Although Instana can be deployed as a standalone APM solution, and is totally independent from an AIOPs project, it is a crucial deliverer of application information, which can bring an AIOPs implementation to the next level of business service management. The options of integrating Instana to Cloud Pak for Watson AIOPs are technically proven and enrich each proactive Incident Management Resolution. Cloud Pak for Watson AIOps keeps your environment within a stable state – consolidating events and alerts, enrich them for structured IT service management and analyze them for a quick root cause analysis by triggering automated actions (like runbooks) based on defined policies. Instana makes IBM’s AIOPs solution more powerful and valuable to the SRE.