AIOps

 View Only

New Features in IBM Cloud Pak for AIOps v4.3

By Ricardo Olivieri posted Wed January 17, 2024 01:34 PM

  

Authors: Ricardo Olivieri, Pratik Patel

In this blog post, we briefly highlight some of the new officially supported features introduced in the latest major release of IBM Cloud Pak for AIOps (CP4AIOps), v4.3.

Programmatic integration with IT service management tools (ITSM)

As our readers may know, CP4AIOps supports out-of-the-box integration (in the CP4AIOps UI console) with ServiceNow for the implementation of these three use cases:

  • Incident creation
  • Similar tickers
  • Change risk assessment

After many customer engagements, we’ve learned that these features have proven to be very valuable given the added insights for new and/or recurrent problems in their IT environments. With CP4AIOps v4.3, customers that do not use ServiceNow and instead use another ITSM tool (e.g., BMC Remedy, Jira) can now implement the aforementioned use cases. To do so, developers leverage the new set of ticketing APIs in the Connectors SDK for CP4AIOps. This SDK has been available for a while now, though it only supported ingestion of events, metrics, and topology (aka inventory) data. The implementation of this SDK is now expanded to allow for the ingestion of ticketing data (e.g., incident records, change request records, etc.). New API methods in the Connectors SDK facilitate the consumption of ticketing data into CP4AIOps by abstracting developers from understanding the internal Kafka and Elasticsearch schemas that CP4AIOps imposes on ticketing data. The recommended starting point for developing a custom connector for integration with an ITSM system is found in the Ticket Integration Template repository, which is publicly available in GitHub. As a prelude, here’s a snippet of code that shows how to ingest a ticket record into CP4AIOps:

ticket = objectMapper.readValue(json.toString(), Ticket.class);

ticketAction.emitIncident(ticket, ConnectorConstants.SELF_SOURCE.toString());

ticketAction.insertIncidentIntoElastic(ticket);

Custom dashboards

CP4AIOps v4.3 introduces an UI extension toolkit for developing custom dashboards, which are then accessible in the CP4AIOps UI console (through the Cloud Pak’s navigation system). For example, a custom dashboard can be implemented for summarizing and visualizing alerts and incidents data in a different format than the one displayed in the Incident and Alert Viewer pages. To build these custom dashboards, programming and web development skills are required, specially knowledge in TypeScript and React (React.ts). The UI extension toolkit allows developers to access the same data sources available to the CP4AIOps product. Note that you could all bring in and visualize in your custom dashboards third party data (for instance, you could leverage AJAX to communicate asynchronously with an external data source). It is worth mentioning that any custom dashboards that are deployed to the CP4AIOps base product do inherit the built-in security from the base product for authentication and authorization. The UI extension toolkit is open source and available in GitHub. Therefore, we look forward to contributions (such as additional examples, enhancements, fixes, etc. ) from the large community of CP4AIOps users. Finally, we’d like to encourage our readers to check out the storybook that our CP4AIOps engineering team put together for visual examples.

Invoke Impact policies based on incident attributes

As mentioned in a previous blog post, CP4AIOps provides an out-of-the-box integration for Netcool/Impact. Using this integration, you can have CP4AIOps invoke remotely hosted Impact policies to, for example, enrich alerts or notify downstream tools/systems when certain alerts are seen. What is new in CP4AIOps v4.3 is that you can trigger the invocation of an Impact policy based also on the attributes (or properties) of an incident (prior to this release, you could only trigger an Impact policy based on the attributes of alerts). With this new enhancement, you can trigger the invocation of an Impact policy when, say, an incident’s priority escalates from 5 to 1 (1 equates to the highest priority level in CP4AIOps). This new enhancement makes the execution of external, coarse-grained actions possible since such can be tied to the incident lifecycle (i.e., creation and updates of incidents) as opposed to only the lifecycle of alerts.

Infrastructure Management connector

A new out-of-the-box connector is now available for ingesting events that are sourced from the Infrastructure Management component in Infrastructure Automation (note: the Infrastructure Automation instance must be deployed to a Red Hat OpenShift cluster). Infrastructure Management collects events from underlying infrastructure such as hosts and virtual machines. Ingesting such events into CP4AIOps can provide insights about circumstances and situations (e.g., a VM was powered off, a VM migration has begun, a physical host has been disconnected, etc.) in the infrastructure that supports your application and other workloads, which in turn can be very useful for incident and root cause analysis across your IT environments.

Leverage fields in the alert.details element when defining policies

When specifying conditions that define which alerts an automation policy should look for and match, you can now in v4.3 take advantage of the new granular support for the fields contained in the alert.details field. The alert.details element (see Alert schema for further details) is generally formatted as an object of names and values pairs (where both of these are scalar strings), for example:

{"additionalProp1":"prop1Value", "additionalProp2":"prop2Value", "additionalProp3":"prop3Value"}

With this new capability, you can define a condition that looks for a given name and value pair value, such as "datacenter":"New York" . As reference, see image below:

Log anomaly  — golden signals

The log anomaly — golden signals algorithm was introduced in the previous version, v4.2, as a technical preview. In v4.3, log anomaly — golden signals pipeline is now generally available (GA) and production-ready. Though log anomaly detection utilizing natural language processing (NLP) and statistical baseline algorithms continues to remain available in v4.3, we encourage customers to use the new golden signals pipeline.

Taking feedback from our customers into action, this new log anomaly detection pipeline iteratively learns and adjusts the machine learning model for identifying anomalies as incoming logs are ingested. This significantly simplifies the process of keeping the machine learning model up-to-date and improves the signal-to-noise management of log anomaly detection. Also, the classification of log data into one of CP4AIOps’ seven golden signals (latency, error, availability, exception, traffic, saturation, and information) aims to reduce alert noise, minimize false positives, and generate genuine alerts when abnormal behavior is detected. Explainability is also significantly improved by providing access to computed log templates (aka log patterns), alert counts against those templates, and to raw logs on the CP4AIOps UI console.

IT operations teams can then in turn use identified anomalies in system and application logs to facilitate root cause analysis and the resolution of incidents.

Custom Alert View

The Alert Viewer in CP4AIOps brings together alert details, alert severity, and alert filtering on a single, intuitive page. In addition to this, CP4AIOps v4.3 provides the ability to create and manage custom, detailed alert views to suit an IT organization’s business requirements. For example, a Network Operations Center (NOC) team can simplify their tasks by defining custom view(s) in CP4AIOps tailored to support their specific needs. Such view(s) would render incoming network alerts in a pre-decided and agreed upon format, thus, ensuring that all NOC operators share the same tailored custom view(s).

This new capability in v4.3 provides flexibility to rearrange the Alert View in ways that it was not possible in previous releases of CP4AIOps. For example, as part of the definition of a custom view, you can select or specify:

  • The columns (i.e., alert attributes) to render and the order of these columns in the view
  • Sorting prioritization for columns
  • The data alignment in the cells
  • A custom name for column headers
  • The columns that should be clickable, thus, enabling navigation to other pages
  • The width for columns
  • The CP4AIOps users can can edit and use the view

Topology — Geographic information system data

CP4AIOps v4.3 features an exciting new geographic information system (GIS) capability for modeling the geographical location of managed resources, physical (e.g., buildings, data centers, fan, servers) and logical (e.g., virtual machines, processes, containers) along with events (e.g., floods, fires, storms). This new GIS functionality offers effective operations management, allowing for location-aware management of applications, services, and infrastructure.

Along with the attributes associated with topology resources in previous releases (such as name, tags, entityTypes, etc.), v4.3 adds a new attribute, geolocation, which captures the location of the resource. 

Here are some examples on how organizations can leverage this new GIS capability:

  1. Enhance customer service by letting users know of problems in a specific geographical region.
  2. Determine whether bad weather (e.g., floods, storms), fires, or social unrest could impact infrastructure and services.
  3. Optimize truck-rolls by identifying suitable locations for technicians to install or replace equipment.
  4. Leverage location data while assessing an organization’s resilience capabilities for infrastructure and services.

Wrap-up

Our Solution Engineering team at IBM Technology Expert Labs supports customers in their adoption of CP4AIOps (and also of our other products under the IT Automation portfolio). We support customers in defining the architectural deployment for CP4AIOps and identifying the integration points with their existing IT monitoring and observability tools. CP4AIOps enables organizations to minimize disruptions and outages in their IT environments and resolve IT problems quickly when they occur. Site reliability engineers (SREs) and IT operations team members are empowered to save significant time in detecting and remediating incidents with the advanced AI and analytics capabilities in CP4AIOps. For information on additional new capabilities and features (including tech previews) in this latest release of CP4AIOps, please take a look at the release notes. Finally, for specific information about the included vulnerability fixes, see Support.


#CloudPakforAIOps
#AIOps
#automation-featured-area-2
#Featured-area-2-home

0 comments
43 views

Permalink