AIOps

 View Only

Cloud Pak for AIOps 4 tips: migration from Netcool

By Zane Bray posted 10 days ago

  

This blog outlines a deployment architecture, migration strategy, and key considerations needed for deploying and integrating IBM Cloud Pak for AIOps (AIOps) with an existing IBM Netcool (Netcool) environment.

In general, AIOps will typically be deployed alongside an existing Netcool deployment, and the two integrated together. In this context, Netcool will continue to provide the majority of functionality that it does today, and AIOps will provide a wide array of new capabilities. AIOps will replace some of the functions provided by Netcool however, such as correlation, user tooling, and some automation.

During the course of the deployment, certain Netcool components will be able to be decommissioned, thereby freeing up hardware that can be used for something else; perhaps even repurposed into compute resource for AIOps. Figure 1 below depicts a typical AIOps/Netcool deployment architecture:

Figure 1: Deployment architecture of IBM Cloud Pak for AIOps integrating into an IBM Netcool deployment

POINTS OF NOTE:

  • An AIOps Netcool Connector instance connects to each pair of Aggregation Netcool/OMNIbus ObjectServers;
  • An AIOps Netcool/Impact Connector instance connects to each Netcool/Impact cluster;
  • An AIOps ITNM Observer job is created for each ITNM NCIM database present, to pull topology data;
  • Most, if not all, existing Netcool/Impact third-party integrations can remain unchanged;
  • Other data integrations can be implemented directly into AIOps via the various Connectors and integrations;
  • Event archiving can continue to happen from Netcool/OMNIbus, or be moved to AIOps, or be a combination.

The following sections outline a general approach to integration between AIOps and Netcool, and eventual migration of the users from the Netcool UI to AIOps. Note that the activities outlined in these sections are not exhaustive, and there is likely many more tasks and considerations that are needed along the way.

NOTE: The concepts outlined in this blog are discussed in more detail along with many more in the IBM Cloud Pak for AIOps 4 Best Practices guide.

Review Netcool custom fields and map only the essential ones

The primary source of event data in a Netcool to AIOps migration will be Netcool/OMNIbus. This is best done via the Netcool Connector provided by AIOps. As part of this integration, it is likely that a selection of custom Netcool/OMNIbus fields will be needed in the AIOps layer. It is important to do a review of your custom fields in Netcool/OMNIbus and only replicate those up to AIOps that are actually needed. This would include ones that the users would need to be able to see, as well as those needed by automation or tooling. In the interests of efficiency, it is important to bear in-mind that additional fields brought up to AIOps come with additional compute and storage cost, so it is prudent to only replicate the needed ones, and not just bring everything up! It is also a long-standing Netcool best practice to ensure components are configured to be as efficient as possible, to maximise performance.

The following example Connector mapping is provided and shows the correct place to position a listing of any custom data items as sub-attributes of the details attribute. In this example, two fields are mapped: appId and region and take their values from the Netcool event fields @AppID and @Region respectively.

Example AIOps Netcool Connector mapping:

(
  $isIPAddr := function($i){ $contains($i,/^[0-9]+.[0-9]+.[0-9]+.[0-9]+$/)};
  { 
    "summary": alert.@Summary,
    "deduplicationKey": alert.@Identifier,
    "sender": {
        "service": alert.@Agent,
        "name": alert.@Manager
    },
    "resource": {
        "name": alert.@Node = "" ? alert.@NodeAlias = "" ? undefined : alert.@NodeAlias : alert.@Node,
        "location": alert.@Location = "" ? undefined : alert.@Location,
        "ipAddress": $isIPAddr(alert.@NodeAlias) ? alert.@NodeAlias : $isIPAddr(alert.@Node) ? alert.@Node : undefined,
        "hostname": $not($isIPAddr(alert.@Node)) ? alert.@Node : undefined,
        "sourceId": alert.@NodeAlias = "" ? undefined : alert.@NodeAlias,
        "service": alert.@Service = "" ? undefined : alert.@Service,
        "port": alert.@PhysicalPort = 0 ? undefined : alert.@PhysicalPort,
        "physicalslot": alert.@PhysicalSlot = 0 ? undefined : alert.@PhysicalSlot,
        "physicalcard": alert.@PhysicalCard = "" ? undefined : alert.@PhysicalCard,
        "scopeId": alert.@ScopeID = "" ? undefined : alert.@ScopeID,
        "application": alert.@AIOpsGroup = "" ? undefined : alert.@AIOpsGroup
    },
    "type": {
        "eventType": alert.@Type = 2 ? "resolution" : alert.@Type = 4 ? "resolution" : "problem",
        "classification": alert.@EventId = "" ? alert.@AlertGroup: alert.@EventId
    },
    "eventCount": alert.@Tally,
    "signature": alert.@Identifier,
    "firstOccurrenceTime": alert.@FirstOccurrence,
    "lastOccurrenceTime": alert.@LastOccurrence,
    "severity": alert.@Severity <=0 ? undefined : alert.@Severity = 1 ? 1 : alert.@Severity < 6 ? alert.@Severity + 1 : alert.@Severity >= 6 ? 6,
    "state": alert.@Severity = 0 ? "clear" : "open",
    "acknowledged": alert.@Acknowledged = 1 ? true : false,
    "expirySeconds": alert.@ExpireTime,
    "details": {
        "appId": alert.@AppID = "" ? undefined : alert.@AppID,
      "region": alert.@Region = "" ? undefined : alert.@Region
    },
    "insights": [
      {
        "details": {
          "lastProcessedEventOccurrenceTime": alert.@LastOccurrence,
          "alertOrigin":"OMNIbus"
        },
        "id": "event-occurrence",
        "type": "aiops.ibm.com/insight-type/deduplication-details"
      }
    ]
  }
)

Identify other integration sources

Netcool/OMNIbus and its extensive Probe catalogue will typically contribute the lion's share of the event estate into AIOps. Similarly, Netcool/Impact typically provides integrations into third-party systems, such as databases for enrichment, or ticketing systems for raising incidents. There are a large number of new event integration options available in AIOps including integrations for other types of data. The full listing of AIOps integrations can be found here.

At a high level, AIOps provides the following additional types of integrations:

  • Netcool/Impact (use the AIOps Netcool/Impact Connector);
  • Topology data
  • Metric data
  • Log data
  • Ticket data and other ticketing functions (Github or ServiceNow)
  • ChatOps functions (MS Teams or Slack)
  • E-mail integration
  • Automation (Ansible or SSH)

Identify the new sources of data or integrations that are needed in the AIOps deployment and plan accordingly.

Assess and plan event housekeeping regime and storm protection

Event housekeeping and event storm protection are related best practice concepts that provide important load protections to Netcool deployments. They are both best practice concepts and are typically implemented in all production environments. As such, these automations can remain in-place and continue to provide the protections they do today. More information on Netcool/OMNIbus best practices, including event housekeeping and event storm protection options, can be found in the IBM Netcool/OMNIbus 8.1 Best Practices guide.

EVENT HOUSEKEEPING

Clearing out old events is important to ensure events in the system are relevant and up-to-date. It also helps to keep the system streamlined and performant. For these reasons, it is an essential best practice to ensure a comprehensive event housekeeping automation is in-place to keep event numbers optimised.

The Netcool Connector mapping example above maps the @ExpireTime field from Netcool/OMNIbus to the expirySeconds attribute in AIOps. There is an internal AIOps automation that will clear events in AIOps when an event last occurred more than expirySeconds ago. Note that this is equivalent to the expire trigger in Netcool/OMNIbus. Typically the @ExpireTime field will be set in Netcool/OMNIbus either in the Probes or via some event housekeeping automation. With this in-place, events coming from Netcool/OMNIbus will be automatically house-kept.

For events that don't originate in Netcool/OMNIbus, for example events that flow into AIOps via an AIOps Connector, consideration should be given to implementing an equivalent event housekeeping automation. It is recommended to use Netcool/Impact for this purpose. A suggested implementation would be:

  • Create a Netcool/Impact policy that sets expirySeconds according to an agreed event expiration policy;
  • Create an AIOps policy that sends all new events with expirySeconds = 0 to the Netcool/Impact housekeeping policy;
  • The internal AIOps automation will automatically clear each event where the current time passes (lastOccurrenceTime + expirySeconds).

The ultimate goal in this exercise is to ensure every event in AIOps has expirySeconds set to a non-zero value.

An example Netcool/Impact housekeeping policy is provided below:

Example AIOps Netcool/Impact event housekeeping policy (JavaScript):

// ONLY SET expirySeconds FOR EVENTS WHERE IT IS NOT SET
if (String(EventContainer.alert.expirySeconds) == "undefined") {
// KEEP CRITICAL ALERTS FOR SEVEN DAYS
if (Int(EventContainer.alert.severity) == 6) {
aiopsUtils.patchAlertNoWait(EventContainer.alert.id,{expirySeconds:604800});}
// KEEP MAJOR ALERTS FOR FIVE DAYS
else if (Int(EventContainer.alert.severity) == 5) {
aiopsUtils.patchAlertNoWait(EventContainer.alert.id,{expirySeconds:432000}); }
// KEEP MINOR ALERTS FOR THREE DAYS
else if (Int(EventContainer.alert.severity) == 4) {
aiopsUtils.patchAlertNoWait(EventContainer.alert.id,{expirySeconds:259200}); }
// KEEP WARNING ALERTS FOR ONE DAY
else if (Int(EventContainer.alert.severity) == 3) {
aiopsUtils.patchAlertNoWait(EventContainer.alert.id,{expirySeconds:86400}); }
// KEEP INFORMATIONAL EVENTS FOR TWELVE HOURS
else if (Int(EventContainer.alert.severity) == 2) {
aiopsUtils.patchAlertNoWait(EventContainer.alert.id,{expirySeconds:43200}); }
// KEEP INDETERMINATE EVENTS FOR SIX HOURS
else if (Int(EventContainer.alert.severity) == 1) {
aiopsUtils.patchAlertNoWait(EventContainer.alert.id,{expirySeconds:21600}); }
}

EVENT STORM PROTECTION

Just as for event housekeeping, production Netcool/OMNIbus deployments will typically employ some kind of event storm protections. This may be through Probe rules, ObjectServer automation, architecture design (ie. inclusion of a Collection layer), or a combination of these. It is recommended to keep these existing processes in-place in any AIOps deployment scenario, since they provide essential, effective protections against event storms.

Assess and plan user tooling, views, and filters

As part of the addition of AIOps to your deployment, the users will eventually migrate over from Netcool/WebGUI to the new AIOps UI. As such, all Event Viewer tooling will have to be reimplemented within AIOps, along with Views and Filters. What's new in AIOps is the ability to also define right-click tooling for topology, which provides further user options from the UI. Topology tools can perform similar tasks to alert tools, and can similarly have conditions applied to them, so that they're only available to certain types of resources. Topology tools should be created where it makes sense to do so.

Assess existing event correlation - disable in Netcool, reimplement in AIOps

Netcool/OMNIbus typically implements correlation relationships by linking an event to another event in a parent-child relationship., This involves the creation and management of a synthetic parent event, or simply linking a set of events to another event which then becomes a de facto parent. AIOps dispenses with the need to create additional events and instead renders a virtual event in the Alerts viewer instead of using a real event. The necessary metadata that links events together is encoded within the events themselves. This means that the event store is only holding actual events, which makes it more efficient.

This change in how events are grouped has implications however, for a deployment where AIOps is added and users will instead be using the AIOps UI. Since event grouping is implemented differently in AIOps, you basically have to reimplement grouping in the AIOps layer. The following points summarise the main tasks:

  • The scope-based grouping trigger group should be disabled in Netcool/OMNIbus.
  • Any functions in Netcool that set the @ScopeID field can remain in-place. The @ScopeID field can then be mapped to the resource.scopeId attribute in AIOps, and scope-based grouping will automatically be applied by AIOps via an internal automation. Note that the example mapping above shows an example of this mapping in the Netcool Connector.
  • Any scope-based grouping policies that are required for non-Netcool event sources, can be implemented as AIOps policies. Note that events can be members of multiple scope-based groups in AIOps. This is not the case in Netcool; an event can only be part of a single scope-based group.
  • Any Netcool Operations Insight temporal event correlation functions should be disabled and the Temporal grouping AI automation configured, trained, and enabled instead. This will provide a comparable function.
  • After topology data is ingested into AIOps and merged, consideration should be given to what topology grouping opportunities exist. Not only is this useful for visualisation, it also provides additional event correlation opportunities. See the topology documentation or the AIOps Best Practices guide for more guidance on creating topology templates and groups.
  • Any other correlations, including custom correlations implemented in Netcool/OMNIbus or Netcool/Impact,  or those generated by ITNM's Root Cause Analysis (RCA) engine, should use a combination of local custom fields and custom automation, then leverage the scope-based grouping mechanism at the AIOps layer to effect the grouping. For example, a custom field @RCACorrelation could be created in Netcool to contain the @ServerName + @ServerSerial of a root-cause event. A local automation could set this field value for the root-cause event itself, plus all its correlated child events. This field could then be mapped to AIOps via the Netcool Connector. Finally, an AIOps scope-based grouping policy could be created to group events by this attribute. The result would be a grouping formed with the root-cause event plus its symptomatic children.

Review all customisations and decide best implementation point

All Netcool and AIOps deployments will include customisations, including integrations, automations, or other configuration. When adding AIOps to an existing Netcool environment, it is prudent to review each customisation and assess if it is better to leave it where it is in Netcool, or move it to AIOps. In most cases, it will be simpler and easier to leave it where it is. In some cases however, it might make sense to reimplement at the AIOps layer. For example, it would make sense to move a Netcool/Gateway for ServiceNow ticketing integration to an AIOps ServiceNow Connector. Not only will you get like-for-like ticket creation functionality, you will also get the added benefit of Similar ticket analysis and the collection of topology data. Additionally, the AIOps ServiceNow Connector instance can be set up within minutes via the UI.

Assess and plan event archiving (REPORTER database)

Most Netcool deployments employ a Netcool/Gateway for JCBC (or another type of Gateway) to archive events to an event archive. This data is then saved long-term and used for reporting, legal compliance, and other purposes. In most cases, this can remain as-is. For non-Netcool event sources, an AIOps policy that calls a Netcool/Impact policy can be used to write the events to the event archive. This can safely be configured to write to the same event archive as the Netcool Gateway, provided the mapping to the REPORTER schema is done correctly.

Migrate users, decommission WebGUI servers

After all the above tasks have been carried out, and the new AIOps system has been tested, verified, and validated, the users can begin to move over to the new AIOps UI. It is recommended to run AIOps in parallel with the existing Netcool/WebGUI system for a period of time, until all users have moved over, and the new environment has been deemed ready for use, and providing the necessary level of functions for operational use.

After all users have migrated over to AIOps, the Netcool/WebGUI servers can be shut down. It is advised to leave these servers in-situ for a time before full decommissioning, in case an emergency roll-back is required in exceptional circumstances.

Decommission Display layer Netcool/OMNIbus ObjectServers

After a successful deployment of AIOps to an existing Netcool environment, all users will now be logged into AIOps, and Netcool/WebGUI will be decommissioned. Since the Netcool Connectors connect to the Aggregation layer ObjectServers, the Display ObjectServers become redundant in the architecture. There will be some edge cases where Display ObjectServers are being used to fulfil specific purposes, but in general, they can be decommissioned.

Summary

Note that this blog outlines some of the main considerations and steps that should be taken in the course of an AIOps deployment into an existing Netcool environment. There are many more functions provided by AIOps that aren't discussed here, such as Runbook Automation, that could be used to provide even more value. The many new and powerful capabilities provided by AIOps should be considered for use in addition therefore. The IBM Cloud Pak for AIOps Best Practices guide goes into more detail around planning, deployment, and implementation of AIOps, and should be reviewed ahead of a deployment.

0 comments
14 views

Permalink