AIOps

 View Only

LinkedIn Share on LinkedIn

AIOps Bitesize: How and why to make Topology Manager forgetful

By Matthew Duggan posted 19 days ago

  
AI generated image of a Tony Hart style watercolour showing wishing to forget.

What and why?

IBM Cloud Pak for AIOps Topology Manager automatically remembers changes to the topology and, by default, this includes any resource property change, state change or relationship change. The aim is to give users comprehensive visibility of how their data has changed over time, but what if you'd rather not remember everything?

In this AIOps Bitesize, we explore how IBM CloudPak for AIOps Topology Manager can be told to selectively forget by history rules. History rules let you nominate which properties should be forgotten so that you can only see the most recently obtained value but not previous ones. There's a couple of reasons you may want to do this - one is that changes to properties like custom timestamps can mask the changes that are really of interest, such as to firmware versions. Another reason is efficiency - if we saw updates to a million records that are never going to be of interest, then that's a lot of records to update and keep history of for no reason.  [This article] goes into depth about how our rule processing works. 

Let's Go

Check out the product documentation at these links for this AIOps Bitesize:-

In this article, we'll use a File Observer file representing set of connected devices, each of which has an uptimeSeconds property that we'll focus on. This file is available [from my Git repository]

What should I forget?

Our experience has shown that properties that change on every observation with an increment tend to be the ones that aren't worth keeping as they don't provide much if any value and can drive a lot of history processing. Good examples of this are the likes of MIB-II's sysUptime where you're really interested if it has a reverse change rather than a positive one. You can always federate to the source via a right-click tool or use a script to get it on-demand if needed.

In our scenario, uptimeSeconds is a proprietary property that increments on every observation for devices that have still been up.  If we can suppress the tracking of this property, it means that any property changes we see on the devices are more likely to be of interest. 

How can I find what to forget?

In this article we'll use the UI to identify the tell-tale pattern that warrants attention but this could of course be automated via a script and appropriate use of our APIs, such as the inventory service query API. 

The first place it may become apparent that attention is needed is in the group view topology activity chart. The example below shows that this group had a series of property changes. In this case, I just updated the File Observer file several times with incrementing uptimeSeconds properties but in a real environment, one could expect to see the chart depict property changes over a long period of time. 

Looking at one of the devices in delta and timeline mode, we can further understand what's changing. Again we see the tell-tale signal of property change occurring over time. Note that I've move the time pins to before and after some changes occurred, ready for our next step. Also note that every device has a blue-dot marker on them which tells me they had all changed between the selected points in time. 

Viewing the resource details now shows us how this resource (device) has changed between the two selected points on the timeline. As suspected, we can see that only one property has changed - uptimeSeconds has incremented. 

How can I create a history rule?

The simplest way we can stop this is with a history rule like below. This tells us to not track history for any uptimeSeconds property we see regardless of resource type or origin. As described in the article referenced at the top of the page, you could be very selective of the conditions under which history should not be tracked but let's keep it simple for now.  Save the rule and lets see what happens...

How can I tell it's worked?

Here's a couple of examples. The first thing is to check the _appliedRules property of the resource, which is new to v4.9. This reveals the type and name of the rules that have been applied to this resource and is a great way of auditing how your data is being processed.  Here we can see our new history rule included in the list. Again, you could automate this. 

 

You can also investigate an area of the timeline where you'd ordinarily have expected to see the dubious property driving history, such as in the following example. When changes of genuine interest occur, they'll be the ones to stand out. 

Lastly, when a genuinely interesting change occurs, you 

Final Thoughts

In this AIOps Bitesize, we used Topology Manager's history rules to forget tracking properties that are not worth keeping history for. This helps you get a more focussed view of how your environment is changing while also being more efficient for us to process.

What would your scenario be?

0 comments
13 views

Permalink