AIOps

 View Only

Have you considered AI fusion in your operations? part 3

By Samir Nasser posted Sun August 30, 2020 01:49 PM

  

In part 2, I provided the sub-activities of the Identify activity. In this blog post, I will take a deeper dive into the various sub-activities of the Identify activity. I will elaborate further on the Identify Metrics and Identify Threshold sub-activities of the Identify activity. For you to identify key metrics of every resource required by the production solution, we should start by doing the following:

  1. Identify the solution architecture/topology
  2. For each transaction supported by the topology, identify the series of resources required to execute the transaction
  3. For each resource, identify the key performance indicators
  4. For each resource, identify the key log messages that indicate significant events. Typically, these messages are provided by the developer of that resource.
  5. For each key performance indicator, identify the corresponding threshold(s):

The exercise performed in the bullets above will subsequently lead to the following:

  • A map between transactions and resources (transaction1 uses resource1, resource2, etc.)
  • Relationships and dependencies between resources

This exercise is quite tedious and requires expertise. The more resources you have in the solution topology, the more expertise is required. Let us assign an expert to each activity:

  1. Identify the solution architecture/topology: Solution architect
  2. For each transaction supported by the topology, identify the series of resources required to execute the transaction: Solution architect/developer
  3. For each resource, identify the key performance indicators: Performance expert
  4. For each resource, identify the key log messages that indicate significant events: Developer
  5. For each key performance indicator, identify the corresponding threshold(s): Performance expert
  6. A map between transactions and resources: Solution architect/performance expert
  7. Relationships and dependencies between resources: Performance expert

These sub-activities are very challenging as today’s solution stack is getting more complex than ever. There are resources at multiple layers: application, middleware, physical, virtual (hypervisor), operating system, storage, and network. Each layer requires one or more performance expert to participate in one or more sub-activities.

The key performance indicators and log messages from bullets 3 and 4 that indicate a problem with the corresponding resource should help identify the impacted transactions and resources that depend on these affected resources. This identifies the so-called “blast radius”.

How do we know that specific key performance indicator values and/or log messages indicate a problem with the corresponding resource? Thresholds are set so that when they are breached, that indicates a problem has occurred.

Please stay tuned for the future blog posts as we focus on the compelling reasons to fuse AI into operations.












0 comments
309 views

Permalink