This article was originally published by James Cancilla.
This article demonstrates how to use the AnomalyDetector operator, which is capable of detecting anomalous subsequences in a streaming time series.
The AnomalyDetector operator is capable of performing online anomaly detection of a time series. More specifically, the AnomalyDetector operator reports anomalies with the pattern of the incoming time series. This type of operator has many different uses and can be utilized in a number of different industries. One example of where this operator may be useful is in the medical industry. By using this operator in conjunction with monitor patients, medical staff can be alerted immediately to changes in patient vital signs.
A time series
is a sequence of data points, typically consisting of successive measurements made over a time interval. Examples of time series are ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average.Time series - Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Time_series
The following image was developed using actual output from the AnomalyDetector operator. As the time series was ingested by the operator, the anomaly detection algorithm analyzed the patterns to determine if there were any anomalies. The orange area was reported by the AnomalyDetector operator as being anomalous.
How it works
The AnomalyDetector operator maintains a recent history of the input time series, which is referred to as the reference pattern
. Whenever the AnomalyDetector ingests a tuple, that tuple is added to a buffer called the current pattern
(the current pattern is essentially the most recent set of data points received). When this occurs, the operator compares the current pattern with the reference pattern. This comparison operation calculates a score that indicates how similar or dissimilar the current pattern is compared with the reference pattern. The higher the score, the more dissimilar the patterns are. The following example will demonstrate in more detail how the underlying anomaly detection algorithm works.
In this example, I will provide a high-level demonstration of the algorithm that is used by the AnomalyDetection operator. Rather than discuss every possible parameter, I will focus only on those parameters that are necessary to understand the algorithm. For this example, the following parameter values will be used:referenceLength
: 5 (default)
parameter specifies the size of the reference pattern. The patternLength
parameter specifies the size of the current pattern. The patternCount
parameter specifies how many times the current pattern will be compared against sub-sequences of the reference pattern.
For this example, I will use the following time series. The red square represents the reference pattern, which has a length of 10
(as defined by the referenceLength
parameter). The blue square represents the current pattern, which has a length of 3 (as defined by the patternLength
Note: The boxes represented in the following images include both the start and end data points. For example, the blue box in the image below includes points 8, 9 and 10 (in integral notation, this would be written as [8,10]).
The first step is to add a new point to this time series. When the new point is added, the current pattern will be updated to include the new point. (The reference pattern does not get updated until the end, once all of the comparison operations are performed.)