AIOps

AIOps

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only

Cloud Pak for AIOps 4 tips: migrating Network Manager correlation to AIOps

By Zane Bray posted 7 days ago

  

ITNM ROOT CAUSE ANALYSIS DOWNSTREAM CORRELATION

IBM Tivoli Network Manager (ITNM) is in use around the world, providing IP network discovery and polling capabilities. Other features include downstream root-cause analysis (RCA) and event correlation based on its discovered topology. In Netcool/OMNIbus WebGUI, this is visualised in the event list via a Relationship. The Relationship links the root-cause event to the symptomatic events by writing the Serial of the root-cause event to the NmosSerial field of the symptom events. When the ITNM Relationship is selected in the WebGUI view, the parent-child relationships are visible, enabling an Operator to see problems in the context of the root-cause events, effectively suppressing the symptomatic ones. Operators can also opt to filter out Symptom events.

This blog post outlines a way to generically migrate correlation in Netcool to AIOps. The approach essentially identifies a common attribute among the members of the Netcool event grouping, then passes the common attribute to AIOps as a custom attribute, which is then grouped on using AIOps scope-based grouping.

This approach could be extended to ITNM RCA correlation also, by identifying a common attribute among a root-cause event and its children, then passing it to AIOps in a similar manner. This blog post outlines a suggested approach that can co-exist with any other correlations that are in use in the existing AIOps system.

A real advantage in AIOps over traditional Netcool is the ability for alerts to be a member of multiple groups at the same time, and all these correlations to co-exist and be visualised together. This was especially not the case with ITNM root-cause correlation which needed a separate view in WebGUI. With AIOps however, the ITNM correlations can be combined with the other correlation mechanisms and fully contribute to super-grouping.

ITNM PARENT CHILD RELATIONSHIP IN NETCOOL

In Netcool, the ITNM root-cause alert is linked to its corresponding symptomatic alerts. This is done by copying the Serial field value of the root-cause event to the NmosSerial field of each of the child events. WebGUI then offers this as a Relationship option in the Event viewer View configuration:

Applying this View to your Event Viewer reveals ITNM RCA correlation:

MAP THE CUSTOM CORRELATION FIELD
Just like in the other blog post, the first step to bringing a correlation over to AIOps is to identify the common attribute among the group members and set them to be a custom correlation attribute in AIOps. An AIOps scope-based grouping policy can then be created to cause groups to be formed based on this common attribute.
In the case of ITNM RCA correlation events, the common attribute tying the events together is the Serial of the root-cause alert. The root-cause alert (NmosCauseType = 1) holds this value in the Serial field and the symptom events (NmosCauseType = 2) hold the same value in the NmosSerial field.
Hence we can map this value to our custom correlation field in the AIOps Netcool Connector mapping by using the following:
    ...
  "state": alert.@Severity = 0 ? "clear" : "open",
    "acknowledged": alert.@Acknowledged = 1 ? true : false,
    "expirySeconds": alert.@ExpireTime = 0 ? undefined : alert.@ExpireTime,
    "details": {
      "itnmCorrelation": alert.@NmosCauseType = 1 ? $string(alert.@Serial) : \
               alert.@NmosCauseType = 2 ? alert.@NmosSerial : undefined
  },
...
In this example, we are defining a sub-attribute of details called itnmCorrelation based on the value of NmosCauseType:
  • If NmosCauseType is a 1, we set itnmCorrelation to the value contained in the Serial field from OMNIbus;
  • If NmosCauseType is a 2, we set itnmCorrelation to the value contained in the to the value of the NmosSerial field from OMNIbus;
  • Else if neither is the case, we leave this sub-attribute undefined.

After adding the new attribute to the Netcool ObjectServer Connector instance mapping, click Save and allow a couple of minutes for the Connector to reinitialise.

CREATE A NEW SCOPE-BASED GROUPING POLICY

Next, we create an AIOps scope-based grouping policy to do the actual grouping. Select Automations from the main menu in AIOps and select the Policies tab. Click the Create policy button, choose Group alerts based on scope as the type, and give your policy a name: ITNM Correlation.

For the Policy triggers section, we need to enable both:

  • Before an alert is created - if ITNM enriches the alert in OMNIbus before it is initially passed to AIOps
  • Alert an alert has been updated - if ITNM enriches the alert in OMNIbus after it is initially passed to AIOps

For the latter case, we would simply add the condition that the policy should fire only if our ITNM correlation attribute changes:

For the condition sets, our criteria is simply when the contents of our ITNM correlation attribute is not empty, then use its value to group on:
Finally, set a time window for the correlation and choose a type:
Scroll back up to the top of the Policy definition window, and check your settings, then click Save to save your new Policy.
HIGHLIGHT THE ROOT-CAUSE ALERTS
To make the root-cause alerts stand out in the AIOps Alerts viewer, you can tweak the mapping in the Netcool Connector mapping.
The Summary field can be augmented for ITNM root-cause alerts per the following example:
  ...
  { 
    "summary": alert.@NmosCauseType = 1 ? "ITNM Root Cause: " & alert.@Summary : alert.@Summary,
    "deduplicationKey": alert.@Identifier,
   ...
This tells the Connector to prepend the Summary field with the string "ITNM Root Cause: " if the NmosCauseType = 1, indicating it is an ITNM Root Cause event.
As before, if you make any changes to the Netcool ObjectServer Connector instance mapping, click Save and allow a couple of minutes for the Connector to reinitialise.
You should then see the ITNM RCA correlations being replicated in AIOps, with the root-cause alert highlighted:
Note that the root-cause alert is correlated together with the symptoms in AIOps since AIOps does not support real alerts being a parent event in the view. It is however highlighted as an ITNM root-cause alert, via its Summary field.
PRIME THE PROBABLE CAUSE ANALYSIS ENGINE
A final optional step is to prime the AIOps probable-cause analysis engine to increase the probable-cause score for ITNM root-cause alerts.
Care should be taken if customising the probable-cause engine, since ITNM RCA correlation is not the only type of correlation that AIOps leverages to correlate alerts together and hence there may be other alerts with a higher probable-cause score due to the keywords contained in their Summary fields, for example. It would be reasonable however to give ITNM root-cause event probable cause scores a boost, since a correlated network root-cause event is highly likely to be the probable cause of any ongoing incident.
To prime the probable-cause analysis engine for the appearance of ITNM root-cause alerts, do the following:
  1. Log into your OpenShift cluster
  2. Connect to the probable-cause API with curl and download the current word list
  3. Add an entry for the ITNM root-cause alerts and save the file
  4. Upload your modified word list back to the probable-cause API
EXAMPLE:
Log into your cluster:
$ oc login --token=sha256~b5Tb989gl2FJxZfhgjxmjeW401234567890 --server=https://api.aiops-4.cp.fyre.ibm.com:6443
Logged into "https://api.aiops-4.cp.fyre.ibm.com:6443" as "kube:admin" using the token provided.
You have access to 78 projects, the list has been suppressed. You can list all projects with 'oc projects'
Using project "aiops".
$
Run the following to set up your environment parameters:
ROUTE=$(oc get route cpd -n aiops --no-headers | awk '{print $2}')
PASS=$(oc get secret admin-user-details -o jsonpath='{.data.initial_admin_password}' -n aiops | base64 -d)
TOKEN=$(curl -s -k -X POST https://$ROUTE/icp4d-api/v1/authorize -H 'Content-Type: application/json' \
-d '{"username": "admin","password": "'`echo $PASS`'"}' | jq .token | sed 's/"//g')
Run the following curl command to extract the current word list:
curl -k -X GET --header 'Accept: application/json' \
-H "Authorization: Bearer ${TOKEN}" https://$ROUTE/aiops/api/issue-resolution/mime/v1/customisation/words \
-H "accept: application/json" -H "X-TenantID: cfd95b7e-3bc7-4006-a4a8-a73a79c71255" | jq  > out.json
Add an entry to the word list (ie. edit out.json) and save the file:
{
  "words": [
    {
      "word": "ITNM Root Cause",
      "caseSenstive": true,
      "weight": 100.0
    },
    ...
Re-upload the word list to AIOps:
$ curl -k -X POST  -H "Content-Type: application/json" \
-H "Authorization: Bearer ${TOKEN}" https://$ROUTE/aiops/api/issue-resolution/mime/v1/customisation/words \
-H "X-TenantID: cfd95b7e-3bc7-4006-a4a8-a73a79c71255" -d @out.json
{"Result":"Success"}
$
You should now see the ITNM root-cause alert indicating as a more highly ranked probable-cause alert than any of its symptoms in the incident view:
0 comments
28 views

Permalink