Docs
  • PyCaret 3.0
  • GET STARTED
    • 💻Installation
    • 🚀Quickstart
    • ⭐Tutorials
    • 📶Modules
    • ⚙️Data Preprocessing
      • Data Preparation
      • Scale and Transform
      • Feature Engineering
      • Feature Selection
      • Other setup parameters
    • 💡Functions
      • Initialize
      • Train
      • Optimize
      • Analyze
      • Deploy
      • Others
  • LEARN PYCARET
    • 📖Blog
      • Announcing PyCaret 1.0
      • Announcing PyCaret 2.0
      • 5 things you dont know about PyCaret
      • Build and deploy your first machine learning web app
      • Build your own AutoML in Power BI using PyCaret
      • Deploy ML Pipeline on Google Kubernetes
      • Deploy PyCaret and Streamlit on AWS Fargate
      • Anomaly Detector in Power BI using PyCaret
      • Deploy ML App on Google Kubernetes
      • Deploy Machine Learning Pipeline on GKE
      • Deploy Machine Learning Pipeline on AWS Fargate
      • Deploy ML Pipeline on the cloud with Docker
      • Clustering Analysis in Power BI using PyCaret
      • Deploy PyCaret Models on edge with ONNX Runtime
      • GitHub is the best AutoML you will ever need
      • Deploy PyCaret and Streamlit on AWS Fargate
      • Easy MLOps with PyCaret and MLflow
      • Clustering Analysis in Power BI using PyCaret
      • Machine Learning in Alteryx with PyCaret
      • Machine Learning in KNIME with PyCaret
      • Machine Learning in SQL using PyCaret Part I
      • Machine Learning in Power BI using PyCaret
      • Machine Learning in Tableau with PyCaret
      • Multiple Time Series Forecasting with PyCaret
      • Predict Customer Churn using PyCaret
      • Predict Lead Score (the Right Way) Using PyCaret
      • NLP Text Classification in Python using PyCaret
      • Predict Lead Score (the Right Way) Using PyCaret
      • Predicting Crashes in Gold Prices Using PyCaret
      • Predicting Gold Prices Using Machine Learning
      • PyCaret 2.1 Feature Summary
      • Ship ML Models to SQL Server using PyCaret
      • Supercharge Your ML with PyCaret and Gradio
      • Time Series 101 - For beginners
      • Time Series Anomaly Detection with PyCaret
      • Time Series Forecasting with PyCaret Regression
      • Topic Modeling in Power BI using PyCaret
      • Write and train custom ML models using PyCaret
      • Build and deploy ML app with PyCaret and Streamlit
      • PyCaret 2.3.6 is Here! Learn What’s New?
    • 📺Videos
    • 🛩️Cheat sheet
    • ❓FAQs
    • 👩‍💻Examples
  • IMPORTANT LINKS
    • 🛠️Release Notes
    • ⚙️API Reference
    • 🙋 Discussions
    • 📤Issues
    • 👮 License
  • MEDIA
    • 💻Slack
    • 📺YouTube
    • 🔗LinkedIn
    • 😾GitHub
    • 🔅Stack Overflow
Powered by GitBook
On this page
  • Time Series Anomaly Detection with PyCaret
  • 👉 Introduction
  • 👉 PyCaret
  • 👉 Installing PyCaret
  • 👉 What is Anomaly Detection
  • 👉 PyCaret Anomaly Detection Module
  • 👉 Dataset
  • 👉 Data Preparation
  • 👉 Experiment Setup
  • 👉 Model Training
  • Coming Soon!
  • You may also be interested in:
  • Important Links
  • Want to learn about a specific module?

Was this helpful?

  1. LEARN PYCARET
  2. Blog

Time Series Anomaly Detection with PyCaret

PreviousTime Series 101 - For beginnersNextTime Series Forecasting with PyCaret Regression

Last updated 2 years ago

Was this helpful?

Time Series Anomaly Detection with PyCaret

A step-by-step tutorial on unsupervised anomaly detection for time series data using PyCaret

PyCaret — An open-source, low-code machine learning library in Python

👉 Introduction

This is a step-by-step, beginner-friendly tutorial on detecting anomalies in time series data using PyCaret’s Unsupervised Anomaly Detection Module.

Learning Goals of this Tutorial

  • What is Anomaly Detection? Types of Anomaly Detection.

  • Anomaly Detection use-case in business.

  • Training and evaluating anomaly detection model using PyCaret.

  • Label anomalies and analyze the results.

👉 PyCaret

PyCaret is an open-source, low-code machine learning library and end-to-end model management tool built-in Python for automating machine learning workflows. It is incredibly popular for its ease of use, simplicity, and ability to build and deploy end-to-end ML prototypes quickly and efficiently.

PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few lines only. This makes the experiment cycle exponentially fast and efficient.

PyCaret is simple and easy to use. All the operations performed in PyCaret are sequentially stored in a Pipeline that is fully automated for **deployment. **Whether it’s imputing missing values, one-hot-encoding, transforming categorical data, feature engineering, or even hyperparameter tuning, PyCaret automates all of it.

👉 Installing PyCaret

Installing PyCaret is very easy and takes only a few minutes. We strongly recommend using a virtual environment to avoid potential conflicts with other libraries.

**# install slim version (default)
**pip install pycaret

**# install the full version**
pip install pycaret[full]

👉 What is Anomaly Detection

Anomaly Detection is a technique used for identifying rare items, events, or observations that raise suspicions by differing significantly from the majority of the data.

Typically, the anomalous items will translate to some kind of problem such as:

  • bank fraud,

  • structural defect,

  • medical problem,

  • Error, etc.

Anomaly detection algorithms can broadly be categorized into these groups:

**(a) Supervised: **Used when the data set has labels identifying which transactions are an anomaly and which are normal. (this is similar to a supervised classification problem).

**(b) Unsupervised: **Unsupervised means no labels and a model is trained on the complete data and assumes that the majority of the instances are normal.

(c) Semi-Supervised: A model is trained on normal data only (without any anomalies). When the trained model used on the new data points, it can predict whether the new data point is normal or not (based on the distribution of the data in the trained model).

👉 PyCaret Anomaly Detection Module

👉 Dataset

import pandas as pd
data = pd.read_csv('[https://raw.githubusercontent.com/numenta/NAB/master/data/realKnownCause/nyc_taxi.csv](https://raw.githubusercontent.com/numenta/NAB/master/data/realKnownCause/nyc_taxi.csv)')

data['timestamp'] = pd.to_datetime(data['timestamp'])

data.head()
**# create moving-averages
**data['MA48'] = data['value'].rolling(48).mean()
data['MA336'] = data['value'].rolling(336).mean()

# plot 
import plotly.express as px
fig = px.line(data, x="timestamp", y=['value', 'MA48', 'MA336'], title='NYC Taxi Trips', template = 'plotly_dark')
fig.show()

👉 Data Preparation

Since algorithms cannot directly consume date or timestamp data, we will extract the features from the timestamp and will drop the actual timestamp column before training models.

**# drop moving-average columns
**data.drop(['MA48', 'MA336'], axis=1, inplace=True)

**# set timestamp to index**
data.set_index('timestamp', drop=True, inplace=True)

**# resample timeseries to hourly **
data = data.resample('H').sum()

**# creature features from date**
data['day'] = [i.day for i in data.index]
data['day_name'] = [i.day_name() for i in data.index]
data['day_of_year'] = [i.dayofyear for i in data.index]
data['week_of_year'] = [i.weekofyear for i in data.index]
data['hour'] = [i.hour for i in data.index]
data['is_weekday'] = [i.isoweekday() for i in data.index]

data.head()

👉 Experiment Setup

**# init setup**
from pycaret.anomaly import *
s = setup(data, session_id = 123)

Whenever you initialize the setup function in PyCaret, it profiles the dataset and infers the data types for all input features. In this case, you can see day_name and is_weekday is inferred as categorical and remaining as numeric. You can press enter to continue.

👉 Model Training

To check the list of all available algorithms:

**# check list of available models**
models()
**# train model
**iforest = create_model('iforest', fraction = 0.1)
iforest_results = assign_model(iforest)
iforest_results.head()

Notice that two new columns are appended i.e. **Anomaly **that contains value 1 for outlier and 0 for inlier and **Anomaly_Score **which is a continuous value a.k.a as decision function (internally, the algorithm calculates the score based on which the anomaly is determined).

**# check anomalies
**iforest_results[iforest_results['Anomaly'] == 1].head()

We can now plot anomalies on the graph to visualize.

import plotly.graph_objects as go

**# plot value on y-axis and date on x-axis**
fig = px.line(iforest_results, x=iforest_results.index, y="value", title='NYC TAXI TRIPS - UNSUPERVISED ANOMALY DETECTION', template = 'plotly_dark')

**# create list of outlier_dates**
outlier_dates = iforest_results[iforest_results['Anomaly'] == 1].index

**# obtain y value of anomalies to plot**
y_values = [iforest_results.loc[i]['value'] for i in outlier_dates]

fig.add_trace(go.Scatter(x=outlier_dates, y=y_values, mode = 'markers', 
                name = 'Anomaly', 
                marker=dict(color='red',size=10)))
        
fig.show()

Notice that the model has picked several anomalies around Jan 1st which is a new year eve. The model has also detected a couple of anomalies around Jan 18— Jan 22 which is when the North American blizzard** **(a ****fast-moving disruptive blizzard) moved through the Northeast dumping 30 cm in areas around the New York City area.

If you google the dates around the other red points on the graph, you will probably be able to find the leads on why those points were picked up as anomalous by the model (hopefully).

I hope you will appreciate the ease of use and simplicity in PyCaret. In just a few lines of code and few minutes of experimentation, I have trained an unsupervised anomaly detection model and have labeled the dataset to detect anomalies on a time series data.

Coming Soon!

There is no limit to what you can achieve using this lightweight workflow automation library in Python. If you find this useful, please do not forget to give us ⭐️ on our GitHub repository.

You may also be interested in:

Important Links

Want to learn about a specific module?

Click on the links below to see the documentation and working examples.

To learn more about PyCaret, check out their .

PyCaret’s default installation is a slim version of pycaret which only installs hard dependencies that are .

When you install the full version of pycaret, all the optional dependencies as are also installed.

Anomaly Detection Business use-cases

PyCaret’s ** Module is an unsupervised machine learning module that is used for identifying rare items, events, or **observations. **It provides over 15 algorithms and to analyze the results of trained models.

I will be using the NYC taxi passengers dataset that contains the number of taxi passengers from July 2014 to January 2015 at half-hourly intervals. You can download the dataset from .

Sample raws from the data
value, moving_average(48), and moving_average(336)
Sample rows from data after transformations

Common to all modules in PyCaret, the setup function is the first and the only mandatory step to start any machine learning experiment in PyCaret. Besides performing some basic processing tasks by default, PyCaret also offers a wide array of pre-processing features. To learn more about all the preprocessing functionalities in PyCaret, you can see this .

setup function in pycaret.anomaly module
Output from setup — truncated for display
Output from models() function

In this tutorial, I am using Isolation Forest, but you can replace the ID ‘iforest’ in the code below with any other model ID to change the algorithm. If you want to learn more about the Isolation Forest algorithm, you can refer to .

Sample rows from iforest_results
sample rows from iforest_results (FILTER to Anomaly == 1)
NYC Taxi Trips — Unsupervised Anomaly Detection

Next week I will be writing a tutorial on training custom models in PyCaret using . You can follow me on , , and to get instant notifications whenever a new tutorial is released.

To hear more about PyCaret follow us on and .

Join us on our slack channel. Invite link .

📖
GitHub
listed here
listed here
**Anomaly Detection
several plots
here
link
this
PyCaret Regression Module
Medium
LinkedIn
Twitter
LinkedIn
Youtube
here
Build your own AutoML in Power BI using PyCaret 2.0
Deploy Machine Learning Pipeline on Azure using Docker
Deploy Machine Learning Pipeline on Google Kubernetes Engine
Deploy Machine Learning Pipeline on AWS Fargate
Build and deploy your first machine learning web app
Deploy PyCaret and Streamlit app using AWS Fargate serverless
Build and deploy machine learning web app using PyCaret and Streamlit
Deploy Machine Learning App built using Streamlit and PyCaret on GKE
Documentation
Blog
GitHub
StackOverflow
Install PyCaret
Notebook Tutorials
Contribute in PyCaret
Classification
Regression
Clustering
Anomaly Detection
Natural Language Processing
Association Rule Mining
Hands-on Tutorials