There is big data and then there is quality data. In the realm of data analytics, if you can find a way to isolate useful data from the not so useful, and base your analytics on the useful data, the net result is you reduce noise.
In an effort to give you, the user, higher quality data analysis, and therefore less noise, the team behind IBM Operations Analytics Predictive Insights have introduced a new feature they have been humorously referring to as the ‘Sniff Test’. This new technology is officially known as model pruning.
The first thing it is key to understand is the role of the data model. Analysis of your data is performed by first observing the behavior of your system for a period of time, building a model of the observed normal behavior, something we refer to as ‘training’, and then analyzing incoming metric data based on that model. Predictive Insights generates anomaly events when it sees metric data that contravenes the established model.
In order to ensure the model does not go stale, training takes place frequently on new incoming data, and a new model is generated.
Model pruning works by testing model quality, isolating those models in which Predictive Insights has low confidence and discarding them. As Dr Donagh Horgan, a member of the Predictive Insights development team that worked on this feature, explains, “It’s a bit like when you open milk that’s been in the fridge for a few days: you give it a sniff and, if it smells ok, then you use it; if not, then you throw it away. We do the same thing for models now: if the model smells bad, we throw it away; if not, then it gets deployed as normal.” Donagh is harking back to his student days I am sure.
So how does model pruning improve quality?
Predictive Insights raises anomaly events when it detects behavior it deems to be inconsistent with what is normal for your system. By definition anomalies are infrequent; therefore, if an anomaly detector finds lots of anomalies, it is probably not a good anomaly detector. So the model quality is tested using the data the model was trained on. If Predictive Insights finds very few anomalies, the model is deemed high-quality and deployed; if lots of anomalies are found based on this model, the model is deemed low-quality and discarded.
The net result is fewer bad models, leading to fewer spurious anomaly events or alarms being created. Ian Manning, lead developer on Predictive Insights, describes the benefit of model pruning as “less of the stuff you don’t want to see”, meaning the false alarm rate is lower, the number of alarms per algorithm employed is lower, and therefore quality of the alarms is higher.
Model pruning is included in Predictive Insights version 1.3.1 and works out of the box, so no configuration is required to use this new feature.