AIOps

 View Only

AIOps Topology Manager Unleashed: Expert Tips to Maximise your Post-Installation Experience

By Kilian Collender posted Wed November 27, 2024 10:02 AM

  

So, you have AIOps installed and are looking for what to do next to exploit the benefits of the Topology Manager (TM). This guide aims to provide some steps and links to make the most of your full estate overview.

Load

The first thing you need to do is get some data into the system. In TM we mainly use Observer jobs to collect resources from specific technologies and pull them into AIOps. There is a wide range of supported technologies, but we also have the capability to input proprietary data into the system using the File or REST observers. The combination of these jobs should allow you to observe as much of your estate as possible.

Tips:
  1. Ensure the observer you need is enabled in your system, otherwise it won’t show up as a job option in the admin page.
  2. Read the docs for the job you are creating to ensure you have all the required credentials and details needed.
  3. When using a load job, ensure you set the schedule period to match the data you are observing. Is it static or very dynamic? You don’t want to miss key changes, but you also don’t want to generate unnecessary history.
  4. All data added from a particular job will be given a provider name. This acts as a namespace for that data and is important when updating or merging resources. If you want to see where the data came from you can open “Resource details” and check under the “Data origin” tab.  
  5. If you observe the same environment using two different jobs the data will be duplicated. So, it is important to understand the scope of the job.

Merge

TM creates a connected topology representation of your estate’s resources. Each observer job described above, creates a separate sub-topology of the observed resources. To create a fully connected end-to-end representation TM uses merge rules to identify one or more resources that are in fact observations of the same thing. This uses unique identifiers in the properties of the resources to create merge tokens. These tokens are then analysed by the system and if they are identical, the resources will be merged to form a “composite” resource. These composites stitch together all the data in TM. Some merge rules are defined out-of-the-box, but others require specific knowledge of your estate to ensure effective stitching. 

Tips:

  1. Ensure merge tokens are truly unique identifiers, since otherwise it can lead to over-merging.
  2. Merge tokens are only generated as data is observed, so it will be necessary to rerun observer jobs after creation.
  3. You can identify if a resource has been merged based on the property _compositeOfIds property which includes the individual ids of all of the contributing resources.
  4. If you have resources with the same property names but different values, be aware that you can have conflicts if those resources are merged. TM will use a last-in-wins method when this happens.
  5. You can undo a merge by disabling the rule and rerunning the observations.

Enrich

Now that we have some topology resources, how can we enrich the properties to make it more useful for your business? TM has a few different ways to augment the properties of the observed resources. We will now look at the three main things you can do.

Tags

Tag rules, allow you to tag resources as they are observed into the system, and they work using a similar definition to merge rules. There are several different conditions that can be used to decided how and when to tag a resource.

Business criticality

AIOps allows you to define business criticality levels in your system. These can be based on an out-of-the-box set, or you can provide the terminology that best fits your business. The idea of the criticality is to provide a way of identifying the business importance of a resource, resource group or application/service. Imagine you have two devices in your environment, and both are showing critical alerts. However, one is used to support a customer-facing API and the other is just part of an internal demo system. We know which should be fixed first, but it might not always be obvious. So, this is where business criticality comes in. Privileged users can manually set the criticality of entities in AIOps, or rules can be used to automatically set the criticality level as data is loaded. This then provides the required context to make the right decision.

File enrichment

File enrichment rules allow you to add properties or geospatial context to resources outside of the normal observations. This can be helpful to provide additional context to resources from other data sources. 

Tips:

  1. Tags, business criticality and custom properties can all be used as filter criteria
  2. These enrichments provide more context and will help your team better understand what they are looking at
  3. Tags and business criticality can be used when creating policies eg. should the system automatically create an incident if a related resource has a high criticality
  4. Without the geolocation property (GeoJSON feature) the GIS mapping capabilities will not show your resources on the map

Group

You have now created your universe, but now it is time to divide and conquer. Potentially now you have millions of resources in your system all providing an insight into your estate. However, within that large complex network of connected resources we can find subgroups which together form some business context. This could represent a service, organisation or even a geographical region. TM has multiple ways of creating these groups also known as “Resource groups”. I would recommend the article “AIOps from the source: Topology manager's Resource Groups and Templates Demystified” which dives into this topic. But the key takeaway with resource groups is that they should provide a dynamic definition for a collect of resources which can be used as building blocks to define Application/Service definitions in AIOps.

Tips:

  1. Applications/services and resource groups are treated as their own entities and can be viewed, searched and filtered in the Resource manager.
  2. These groupings are used by AIOps’ AI algorithms to form Topological event groups
  3. The groups form context for incidents and users when understanding the impact of alerts affecting individual resources.
  4. Dynamic group templates allows AIOps to learn the patterns used for your business context and apply it automatically across your system.

Enhance

Make your data standout and highlight what is most important. TM allows you to customise the appearance of your resources. This could take the form of using custom iconography that is more meaningful to your business, or it might be changing the size and colour of certain resource types to emphasis their importance. This customisation isn’t just static but can be dynamically defined based on resources’ current property values. I would recommend looking at the article “AIOps Bitesize: Are your Topology manager relationships as stylish as a Ferrari Formula 1 car?” which dives into weather map style changes to your topology.

Tips:

  1. When adding icons, make sure they are the smallest possible SVG definitions. You can optimise them easily with the free svgomg tool.
  2. You may need to modify the stroke or fill  of SVG definition if it doesn’t render correctly. E.g. Setting stroke to “none” to remove unwanted black outlines.
  3. When using dynamic styling, ensure any referenced properties are added to the system’s “required property” definitions. This ensures optimised retrieval of data.
  4. Think about how the styling can enhance the end user’s recognition, without increasing cognitive load.
  5. You can change the label of a resource and even show a different label when it is displayed in the topology viewer tooltip. This can be done by using the inAsmTooltip property in a custom label function to identify if the label is being used in that context.

Act

Use the resource context to your users’ advantage. TM allows you to define custom tools and tooltips. Adding a property to a tooltip might just speed up the time taken to solve an issue. Have a look at “AIOps Bitesize: A quick recipe for more informative resource tooltips”. Or even better give your users a set of tools which act on the data from a given resource context. These tools could be anything from looking up an entry in a playbook or triggering some automations. These tools are defined in JavaScript, and you can set the conditions for when they show.

Tips:

  1. Ensure the conditions for tools protect the user from firing them against the wrong resources.
  2. TM has helpers to access the properties of resources (asmProperties) and resources at each end of relationships (asmSourceProperties, asmTargetProperties)
  3. With great power comes great responsibility. Ensure that the tools are appropriate and use the properties safely.
  4. Conditions can use asynchronous look ups to check remote APIS; these will be dynamically added to the menu when complete.

Explore

AIOps has many ways to view and explore your data, each offering a unique opportunity to solve a different business need. The following section covers some of the key views.
Resource management: Provides a full overview of your estate, giving you an easy way to interrogate properties, launch into other views and perform inventory-like searches.
Topology viewer: Allows you to traverse your data, following the connectivity either via a specific number of hops or incrementally expanding neighbours to follow precise paths.
Application / service / resource group viewer: Allows you to view a fixed collection of resources based on some business context. This view allows you to see the data in tabular, graphical or geospatial viewing modes and provides graphs of changes occurring to the resources in that collection.
Map viewer: Shows all the resource with geospatial context on a map and allows you to understand the location of those resource across the globe.
Historical timeline: In the topology and app/group viewers you can view the historical changes of a given resource as its properties, state and relationships change. It provides the tooling to jump back in time and better understand what has happened and to compare two time points.
Alert: Shows all the alerts in AIOps and, if topology resources have been matched, they will be provided as context to the alert in the form of a mini viewer or even to help group alerts based on their association to topology resource groups and applications/services.
Incident: This is the main entry point for understanding problems that AIOps has identified, pulling together all the context from topology, alerts, runbooks and more.

Tips:

  1. AIOps provides a powerful filtering capability allowing you to craft and save views that are important to you and your team.
  2. If you find your resources are not showing on alerts, check that your Match rules are correct, as they decide if an alert is matched to one or more resources.
  3. The affected resource topology shown in incidents is based on resources that belong to resource groups. So, you need to make sure resources are grouped to get the best results.
  4. You can “favourite” applications/services and resource groups to make them easier to find; these will be pinned to the top of the Resource management page.
  5. AIOps now provides a new RBAC capability which offers the opportunity to control the viewing lens for your different user groups. Have a look at this article for more details “AIOps from the source: v.4.7.1's new ABAC/RBAC capabilities"

Wrap it up

Thanks for following along, and hopefully now you feel like you have a better understanding of how you can unleash the full power of AIOps Topology Manager to provide maximum value to your business. Feel free to leave comments on any other topics you would like to see and explore more.


0 comments
16 views

Permalink