AIOps

AIOps

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only

From Silos to Synergy: Transforming IT Operations with AIOps Insights

By Isabell Sippli posted Thu November 23, 2023 11:10 AM

  

This article highlights how silos in IT operations challenge operational staff when it comes to managing their environment and resolving incidents timely. 
It shows how AIOps Insights helps enterprises manage complex, evolving IT environments by aggregating data from existing tools. It introduces how AIOps Insights uses AI to reduce resolution time, provides holistic visibility, and simplifies data collection, hence enhancing productivity.


Ensuring business critical applications and systems run at the required availability and performance is no easy task. Most enterprises have environments that are organically grown, heterogeneous, complex but also rapidly evolving. What used to be on premise and primarily virtual machine based has rapidly extended into modern, often container based environments, and runs private and public Clouds.

Alongside that, operating and monitoring these growing environment has lead to siloed tools - as tools are often specialized to operate specific silos really well, and separate teams used to be (and still are!) in charge of the individual areas. It is not uncommon for a company to have more than 9 tools actively used by its IT Operation teams. All tools are creating events, and it is up to teams to work out what events are useful versus what are duplicates or noise.

The key challenge with that is

  1. Incidents ignore silos. Often, knowledge across silos is needed to solve critical incidents, like networking issues that lead to application problems. For incidents that impact multiple layers of your IT Operation, all affected teams are brought together to work out the issue in what is referred to as a ‘war room.’ From that point, it is a very manual and iterative process for each team to look into their tools and dashboards to figure out what happened. 
  2. IT Operations need holistic answers to questions across their environment and tools. They need enterprise wide control.

So how do you face those challenges, especially if you cannot replace your existing tools?1

How does AIOps Insights address operational challenges?


AIOps Insights gives you the ability to easily and quickly ingest data from your existing tools. It collects 3 key data types: Events, Metrics and Managed Entities and their relationship.

That allows you to build holistic context across your environment, which in turn allows you to address the challenges above:
1. Incidents and events - you can resolve your IT incidents across tools from a single place of control, and aggregate siloed data into a single incident

2. Actionable insights about your managed environment - the topology of your managed services and their dependencies, giving you answers for your questions across tools

The value of AIOps Insights manifests in 3 key areas:

  1. Resolution time reduction through incident context and AI-based remediation

  2. Full IT visibility on a single pane of glass, across previously siloed tools

  3. Productivity gains in hours through simplified data collection and configuration

This blog will walk you through these 3 key areas.

Alongside these areas, we will highlight how AI and automation make it really easy for you to take advantage of the insights provided by AIOps Insights :)

Resolution time reduction through incident context and AI-based remediation


In IT Ops, events provide information about your managed environment. This could be an event from a 3rd party tool, or originating from a threshold breach for a key metric.
AIOps Insights correlates individual events about your environment into incidents.


The incident in the screenshot below comprises of several individual events. AIOps Insights has intelligently grouped them into the incident, based on AI and topological relationships. This makes it much easier and faster for operational staff to spot what is going on, and identify root cause.
It also reduces noise, as there is one incident, rather than 3 individual event - and there might be many more than just those 3.

Finally, it relieves operational staff from writing rules for correlation, which saves time and personnel.

For resolving the above incident, it seems like a problem with a container-registry - and one option is to respin the registry to see if the error clears. Luckily, AIOps Insights provides in-context intelligent automation, to attempt troubleshooting or remediation directly from the incident.

Not only are there predefined actions, but AIOps Insights also provides an AI based confidence rating. It rates actions and their applicability based on text similarity between the action and the event- The action framework in AIOps Insights allows for flexible automation choices, that accelerates automation content generation.

Those choices include:

  • Documentation link referring to a URL like a manual

  • Scripts to run an automation script

  • HTTP calls to invoke webhooks

Full IT visibility on a single pane of glass, across previously siloed tools

Managing an IT environment holistically takes more than just incidents - you need a deep understanding of your managed environment, including key metrics.

AIOps Insights provides a health dashboard across all of your managed entities.
It shows a normalised view of your entire environment, across infrastructure, Network and APM data.

It also shows where you need to investigate issues, based on the colour coding of the boxes.

Each entity has key metrics associated with it - and AIOps Insights pulls them directly from the underlying tools - no need for any configuration.
Data normalized to the same set of metrics, making it very easy to visually understand and compare.
This makes AIOps Insights really special - as its understanding goes beyond plain events, and deep into the managed entities.Insights into managed entities

Productivity gains in hours through simplified data collection

AIOps Insights connects to your common APMs, Hyperscalers, Infrastructure and Network monitors.

It collects all 3 data types through a single connection to the 3rd party tool. The screenshot below shows how you connect to a Datadog instance.

Once configured, you install a single integration agent into your environment (you only need one per 3rd party tool). This is really easy - no need to configure anything beyond the properties listed below.


That integration then connects to Datadog (or any of the other solutions listed below) and starts ingesting events, metrics and managed entities.
It also immediately links those data types with each other - like with the metrics that are associated to a VM, as outlined in the previous section.

Summary 

AIOps Insights allows you to reduce resolution through rich context, AI based remediation and working incidents from a single pane of glass.

It provides you full IT visibility, across previously siloed tools, by normalizing and stitching operational data together.

You will have productivity gains in hours through it’s simplified data collection and configuration.

In short - rather than worrying about your silos, leverage synergies across them and accelerate bringing value to your business.

(1) Reasons for not replacing existing tools:

  • Stickiness due to broad rollout of tool so replacement would be risky or costly
  • Organisational boundaries ie central IT Ops teams cannot control what business teams use.

0 comments
25 views

Permalink