How Azure is transforming the way we do anomaly detection – Part 1


Introduction

Firstly, I would like to apologize that last April I blogged only two blog posts since it was a pretty hectic month for my beloved country where our people have overthrown a 30 years dictator. I was spending most of my time following news and expert’s analysis.

Last month, we were talking about some things that will MAKE YOUR CAREER SMARTER (Software is Eating the World Part 1 & Part 2). Today we will get back to the AI world, i.e., things which will MAKE YOUR CODE SMARTER (Did you subscribe to our mailing list in the right?). Do you still remember what I promised in my introductory blog post

Today, we will continue our cognitive services journey by discussing a new service deployed by Microsoft last March; the anomaly detection service which opens the door for new business applications use cases. From an academic point of view, Anomaly detection is “The identification of rare events, trends, and behaviors which could of interest for the business.”  So, it helps to find out (outliers) within a particular dataset which could be of interest for us. Figure 1 shows a simple representation of an anomaly.

Application of anomaly detection includes banking fraud, predictive maintenance, sales monitoring, login fraud, IoT monitoring, and market changes tracking. Anomaly detection works as a perfect tool for “heads-up” whenever there is a weird data event has occurred.

Figure 1 A visualization of an anomaly. Source

Microsoft Azure Anomaly Detector

Background

Microsoft Anomaly detection targets a subset of anomaly detection problem which is time series anomaly detection, for convenience we will use the term (anomaly detection) to refer to time series anomaly detection in this post. Traditionally, Microsoft used to provide anomaly detection through Anomaly Finder which was retired 2019 April 30th  in favor of their newly launched anomaly Detector. As of now, the anomaly detector service exists only West US 2 and Europe West region in preview mode (i.e., not recommended for production scenarios). It worth mentioning that the service also exists as a standalone anomaly detector resource not within Cognitive Services resource as most of the Azure cognitive services do.

The service supports wide ranges of scenarios since it auto adjusts its underlying algorithm based on the ingested time series to provide the best accuracy only with a single parameter to tune! Microsoft uses Anomaly Detection service for a wide range of their applications such as Windows, Bing, and Office. Currently, Azure is the only major cloud provider which provides anomaly detection as an AI RESTful endpoint, and do not worry, Microsoft does not use or store your data to improve their service: ).

Again we are enjoying all AI as a Service benefits we discussed earlier, no need to buy expensive infrastructure, prepare data, choose and tune algorithms, train, and tag data, and all traditional ML steps, we are getting it for free 😊

Underlying Theory (Feel free to skip!)

To support single parameter tuning, Microsoft has developed a gallery of anomaly detection algorithms and specific technique to achieve the best possible match between precision and recall (You don’t need to worry much about those ML terminologies!). First, the system analyzes the API request to extract features from the time series such as continuity, mean, standard deviation and period. After finding these features, the system will be able to select the most appropriate algorithm.

According to the Microsoft tech community, the following algorithms are used under the hood for anomaly detection:

  • Fourier Transformation
  • Extreme Studentized Deviate (ESD)
  • STL Decomposition
  • Dynamic Threshold
  • Z-score detector
  • Some advanced algorithms to be disclosed pending paper publishing.

Next, the anomaly detector makes sure the recall is high and then it uses the previously extracted features along with the sensitivity to improve the precision.

Anomaly Detection API

The API supports two different endpoints, batch detection, and latest point detection.

Types

Detect anomalies as a batch

Generates the anomaly status (and other useful data) of all data points and the positions of any detected anomaly. This API is useful for detecting anomaly status over a specified time range (historical data). A single statistical model will be created and applied against all data points in the set. It works better for seasonal (periodic) time series with occasional anomalies and for flat trend (straight) time series with occasional spikes (ups) and dips (downs). The batch detection mode is not recommended for real-time applications since it relies on a single model and it is slower due to a large number of points to analyze.

Detect the anomaly status of the latest data point

Generates the anomaly status (and other useful data) for the latest data point in the series. This API is useful for detecting anomaly status over a stream, in particular, the latest data point. This API fits the real-time applications since each point will be analyzed in the context of the previously seen data points.

To be continued

That is, it for today, in the next lesson we will dive into more API details such input parameters, output response, adjustment methods, and detailed explanation. Stay tuned and do not forget to subscribe on the mailing list to get the latest and greatest on making your code smart and your career smarter!