AIOps

AIOps

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only

Creating incidents based on multiple alert criteria - new in Cloud Pak for AIOps V4.10.1

By MATTHEW THORNHILL posted 2 days ago

  

Incidents in IBM Cloud Pak for AIOps allow you to cut through the noise, focus on the most important issues and understand all the context needed to resolve problems. It has always been possible to influence when incidents should be raised, but this capability has just got a lot more powerful.

What is an Incident?

In IBM Cloud Pak for AIOps, an Incident represents a holistic issue which requires immediate resolution. These are the most important problems which are directly impacting an organisation's key business.

An incident is created when an alert occurs which indicates such an issue. These alerts are identified by policies, that specify what these alerts look like. These can range from generic, e.g. "create an incident when an alert occurs against a business-critical resource", to specific, e.g. "create an incident when there is a ping failure alert against my-important-e-commerce.example.com".

Incidents are then deduplicated based on alert correlation. Where multiple alerts are detected to share a cause, they are marked as related. If an alert triggers an incident creation policy, and an incident already exists for any related alert, then the alert is added to the existing incident rather than creating a new one. This means that each incident represents the issue as a whole, so the user doesn't have to context switch between many of them to understand a complex problem.

Finally, once an incident is created, additional contextual information is gathered to help a user understand and resolve the issue. This includes any other related alerts, topology information and suggested resolution steps.

How to Choose When Incidents Should Be Created

Incidents are created based on incident creation policies. These can be authored from [Main menu] → Automations → Policies → Create policy → Promote alerts to incident.

Alerts are selected based on conditions; for example, the following will promote any alert which is against a resource with a business criticality of gold and a severity greater than or equal to major:

Screenshot showing incident creation policy with business criticality and severity conditions

Once created, any alert matching that condition will cause an incident to be created, as long as one does not already exist for a related alert.

What's New in v4.10.1?

There are two new capabilities in IBM Cloud Pak for AIOps v4.10.1 which enhance this experience:

1. Conditions Spanning Multiple Alerts

Often there are cases where the information in a single alert is not enough to determine if an incident should be created. For example, for an internet service provider, a single end-user router being offline is unlikely to be incident-worthy; they may have just unplugged it.

However, if a significant number are offline, and the system has detected they share a cause with a fault in a local exchange, then an incident should be raised.

It is now possible to capture these kinds of conditions in an incident creation policy:

Screenshot showing incident creation policy with multiple alert conditions

In the example above, the incident will only be created once 10 "end-user router offline" alerts and 1 "exchange equipment" alert are active and the system detects that they share a cause with each other.

2. Support for Advanced JSONata-based Conditions

While the condition builder is easy to use, it cannot express more advanced elements like deep nesting, arithmetic operations, and regular expressions.

This is now possible for incident creation policies through the use of JSONata. This is an industry-standard language which is already used elsewhere in the product, for example as a language to define field mapping within connectors.

When building a condition, there is a choice between advanced and basic conditions:

Screenshot showing the option to choose between basic and advanced conditions

Choosing advanced will add a JSONata condition block:

Screenshot showing JSONata condition block

Launching the editor then provides an interactive environment for authoring and testing the condition, including syntax highlighting and autocomplete:

Screenshot showing the JSONata condition editor

In the example above, the "end-user-router" alert condition is enhanced to only consider alerts which fall outside of some defined maintenance windows. This condition can be as simple or as complex as required, and can make use of any alert field. It is also possible to mix-and-match advanced conditions with basic ones:

Screenshot showing mixed basic and advanced conditions

Conclusion

With these new capabilities in IBM Cloud Pak for AIOps v4.10.1, you now have much more flexibility in defining when incidents should be created. The ability to create conditions that span multiple alerts allows for more sophisticated incident creation policies that better match real-world scenarios, while the addition of JSONata support enables advanced users to create highly customized conditions.

These enhancements help ensure that incidents are created only when truly necessary, reducing alert fatigue and helping operations teams focus on what matters most.

For more information on these features, please refer to the IBM Cloud Pak for AIOps v4.10.1 documentation.

0 comments
9 views

Permalink