Anomaly detection in streaming time series data using Amazon Managed Service for Apache Flink

SeniorTechInfo
3 Min Read

Have you ever heard of time series data analysis? This unique category of data incorporates time as a fundamental element in its structure. Time series data consists of data points collected sequentially at regular intervals, exhibiting patterns like trends, seasonal variations, or cyclical behaviors. Examples of time series data include sales revenue, system performance metrics, credit card transactions, sensor readings, and user activity analytics.

But what about time series anomaly detection? This intriguing process involves identifying unexpected or unusual patterns in data that evolve over time. An anomaly, also known as an outlier, occurs when a data point significantly deviates from an expected pattern.

In our latest article, we delve into building a robust real-time anomaly detection solution for streaming time series data using Amazon Managed Service for Apache Flink and various AWS managed services. This solution leverages machine learning (ML) for anomaly detection without requiring prior AI expertise, making it accessible to a wider audience.

Solution Overview

Our solution architecture diagram illustrates the core components of the Anomaly Detection Stack solution, showcasing how machine learning can be harnessed for anomaly detection in real time.

The deployment of our solution pathway involves creating an ML model using the Random Cut Forest (RCF) algorithm for anomaly detection. Initially, the model sources input time series data from Amazon Managed Streaming for Apache Kafka (Amazon MSK) for training and continuous monitoring of incoming data points. The model evaluates these data points against historical trends, generates anomaly scores, and categorizes anomalies based on customized thresholds.

If you’re intrigued by this anomaly detection solution and wish to explore it further, simply request access by sending an email to anomalydetection-support-canvas@amazon.com.

Stay with us as we explore a hypothetical scenario involving an on-campus bookstore called AnyBooks, where anomalies in sales quantity need tracking for operational planning. Our end-to-end architecture diagram outlines the data ingestion, anomaly detection, transformation, visualization, and notification layers.

Ingestion

Within the ingestion layer, an AWS Lambda function retrieves sales transactions, transforms them, and publishes them to an input Kafka topic for processing.

Anomaly Detection Stack

The Flink application processes raw data from input topics, trains the ML model, and identifies anomalies, recording them in the output topic for further analysis.

{"detectorName":"canvas-ad-blog-demo-1","measure":"quantity","timeseriesId":"f3c7f14e7a445b79a3a9877dfa02064d56533cc29fb0891945da4512c103e893","anomalyDecisionThreshold":70,"dimensionList":[{"name":"product_name","value":"item-A"}],"aggregatedMeasureValue":14.0,"anomalyScore":0.0,"detectionPeriodStartTime":"2024-08-29 13:35:00","detectionPeriodEndTime":"2024-08-29 13:36:00","processedDataPoints":1261,"anomalyConfidenceScore":80.4674989791107,"anomalyDecision":0,"modelStage":"INFERENCE","expectedValue":0.0}

Discover more about the output results and the importance of anomaly scores in our detailed explanation.

Transform

The transformation layer involves data processing, transformation, and storage in an Amazon S3 data lake for future querying and analysis.

Visualize

Visualize your data anomalies through an Amazon QuickSight dashboard that connects to the Amazon S3 data lake for real-time insights and anomaly tracking.

Notification

Receive near real-time notifications for critical anomalies through Amazon SNS, ensuring timely alerts for anomaly detection.

Conclusion

Our journey through real-time anomaly detection for time series data wraps up with a comprehensive view of AWS managed services and ML applications that enable automatic anomaly identification. Uncover insights, optimize operations, and enhance decision-making with this powerful anomaly detection solution.

Experience the world of anomaly detection on AWS today and unleash the potential for transformative insights and operational efficiency across your business!


About the Authors

Meet the minds behind this insightful article, guiding you through the complexities of time series anomaly detection and ML applications in AWS:

Noah Soprala, Dan Sinnreich, Syed Furqhan, and Nirmal Kumar combine their expertise to deliver compelling insights into anomaly detection and machine learning.

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *