Docs
  • PyCaret 3.0
  • GET STARTED
    • 💻Installation
    • 🚀Quickstart
    • ⭐Tutorials
    • 📶Modules
    • ⚙️Data Preprocessing
      • Data Preparation
      • Scale and Transform
      • Feature Engineering
      • Feature Selection
      • Other setup parameters
    • 💡Functions
      • Initialize
      • Train
      • Optimize
      • Analyze
      • Deploy
      • Others
  • LEARN PYCARET
    • 📖Blog
      • Announcing PyCaret 1.0
      • Announcing PyCaret 2.0
      • 5 things you dont know about PyCaret
      • Build and deploy your first machine learning web app
      • Build your own AutoML in Power BI using PyCaret
      • Deploy ML Pipeline on Google Kubernetes
      • Deploy PyCaret and Streamlit on AWS Fargate
      • Anomaly Detector in Power BI using PyCaret
      • Deploy ML App on Google Kubernetes
      • Deploy Machine Learning Pipeline on GKE
      • Deploy Machine Learning Pipeline on AWS Fargate
      • Deploy ML Pipeline on the cloud with Docker
      • Clustering Analysis in Power BI using PyCaret
      • Deploy PyCaret Models on edge with ONNX Runtime
      • GitHub is the best AutoML you will ever need
      • Deploy PyCaret and Streamlit on AWS Fargate
      • Easy MLOps with PyCaret and MLflow
      • Clustering Analysis in Power BI using PyCaret
      • Machine Learning in Alteryx with PyCaret
      • Machine Learning in KNIME with PyCaret
      • Machine Learning in SQL using PyCaret Part I
      • Machine Learning in Power BI using PyCaret
      • Machine Learning in Tableau with PyCaret
      • Multiple Time Series Forecasting with PyCaret
      • Predict Customer Churn using PyCaret
      • Predict Lead Score (the Right Way) Using PyCaret
      • NLP Text Classification in Python using PyCaret
      • Predict Lead Score (the Right Way) Using PyCaret
      • Predicting Crashes in Gold Prices Using PyCaret
      • Predicting Gold Prices Using Machine Learning
      • PyCaret 2.1 Feature Summary
      • Ship ML Models to SQL Server using PyCaret
      • Supercharge Your ML with PyCaret and Gradio
      • Time Series 101 - For beginners
      • Time Series Anomaly Detection with PyCaret
      • Time Series Forecasting with PyCaret Regression
      • Topic Modeling in Power BI using PyCaret
      • Write and train custom ML models using PyCaret
      • Build and deploy ML app with PyCaret and Streamlit
      • PyCaret 2.3.6 is Here! Learn What’s New?
    • 📺Videos
    • 🛩️Cheat sheet
    • ❓FAQs
    • 👩‍💻Examples
  • IMPORTANT LINKS
    • 🛠️Release Notes
    • ⚙️API Reference
    • 🙋 Discussions
    • 📤Issues
    • 👮 License
  • MEDIA
    • 💻Slack
    • 📺YouTube
    • 🔗LinkedIn
    • 😾GitHub
    • 🔅Stack Overflow
Powered by GitBook
On this page
  • Multiple Time Series Forecasting with PyCaret
  • PyCaret
  • RECAP
  • Installing PyCaret
  • 👉 PyCaret Regression Module
  • 👉 Dataset
  • 👉 Load and prepare the data
  • 👉 Visualize time-series
  • 👉 Start the training process
  • Training Process 👇
  • 👉 Generate predictions using trained models
  • Coming Soon!
  • You may also be interested in:
  • Important Links
  • Want to learn about a specific module?

Was this helpful?

  1. LEARN PYCARET
  2. Blog

Multiple Time Series Forecasting with PyCaret

PreviousMachine Learning in Tableau with PyCaretNextPredict Customer Churn using PyCaret

Last updated 2 years ago

Was this helpful?

Multiple Time Series Forecasting with PyCaret

A step-by-step tutorial to forecast multiple time series with PyCaret

PyCaret — An open-source, low-code machine learning library in Python

PyCaret

PyCaret is an open-source, low-code machine learning library and end-to-end model management tool built-in Python for automating machine learning workflows. It is incredibly popular for its ease of use, simplicity, and ability to build and deploy end-to-end ML prototypes quickly and efficiently.

PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few lines only. This makes the experiment cycle exponentially fast and efficient.

PyCaret is simple and easy to use. All the operations performed in PyCaret are sequentially stored in a Pipeline that is fully automated for **deployment. **Whether it’s imputing missing values, one-hot-encoding, transforming categorical data, feature engineering, or even hyperparameter tuning, PyCaret automates all of it.

This tutorial assumes that you have some prior knowledge and experience with PyCaret. If you haven’t used it before, no problem — you can get a quick headstart through these tutorials:

RECAP

Installing PyCaret

Installing PyCaret is very easy and takes only a few minutes. We strongly recommend using a virtual environment to avoid potential conflicts with other libraries.

**# install slim version (default)
**pip install pycaret

**# install the full version**
pip install pycaret[full]

👉 PyCaret Regression Module

PyCaret Regression Module is a supervised machine learning module used for estimating the relationships between a dependent variable (often called the ‘outcome variable’, or ‘target’) and one or more independent variables (often called ‘features’, or ‘predictors’).

👉 Dataset

For this tutorial, I will show the end-to-end implementation of multiple time-series data forecasting, including both the training as well as predicting future values.

👉 Load and prepare the data

**# read the csv file
**import pandas as pd
data = pd.read_csv('train.csv')
data['date'] = pd.to_datetime(data['date'])

**# combine store and item column as time_series**
data['store'] = ['store_' + str(i) for i in data['store']]
data['item'] = ['item_' + str(i) for i in data['item']]
data['time_series'] = data[['store', 'item']].apply(lambda x: '_'.join(x), axis=1)
data.drop(['store', 'item'], axis=1, inplace=True)

**# extract features from date**
data['month'] = [i.month for i in data['date']]
data['year'] = [i.year for i in data['date']]
data['day_of_week'] = [i.dayofweek for i in data['date']]
data['day_of_year'] = [i.dayofyear for i in data['date']]

data.head()
**# check the unique time_series**
data['time_series'].nunique()
>>> 500

👉 Visualize time-series

**# plot multiple time series with moving avgs in a loop**

import plotly.express as px

for i in data['time_series'].unique():
    subset = data[data['time_series'] == i]
    subset['moving_average'] = subset['sales'].rolling(30).mean()
    fig = px.line(subset, x="date", y=["sales","moving_average"], title = i, template = 'plotly_dark')
    fig.show()

👉 Start the training process

Now that we have the data ready, let’s start the training loop. Notice that verbose = False in all functions to avoid printing results on the console while training.

The code below is a loop around time_series column we created during the data preparatory step. There are a total of 150 time series (10 stores x 50 items).

Line 10 below is filtering the dataset for time_series variable. The first part inside the loop is initializing the setup function, followed by compare_models to find the best model. Line 24–26 captures the results and appends the performance metrics of the best model in a list called all_results . The last part of the code uses the finalize_model function to retrain the best model on the entire dataset including the 5% left in the test set and saves the entire pipeline including the model as a pickle file.

We can now create a data frame from all_results list. It will display the best model selected for each time series.

concat_results = pd.concat(all_results,axis=0)
concat_results.head()

Training Process 👇

👉 Generate predictions using trained models

Now that we have trained models, let’s use them to generate predictions, but first, we need to create the dataset for scoring (X variables).

**# create a date range from 2013 to 2019**
all_dates = pd.date_range(start='2013-01-01', end = '2019-12-31', freq = 'D')

**# create empty dataframe**
score_df = pd.DataFrame()

**# add columns to dataset**
score_df['date'] = all_dates
score_df['month'] = [i.month for i in score_df['date']]
score_df['year'] = [i.year for i in score_df['date']]
score_df['day_of_week'] = [i.dayofweek for i in score_df['date']]
score_df['day_of_year'] = [i.dayofyear for i in score_df['date']]

score_df.head()

Now let’s create a loop to load the trained pipelines and use the predict_model function to generate prediction labels.

from pycaret.regression import load_model, predict_model

all_score_df = []

for i in tqdm(data['time_series'].unique()):
    l = load_model('trained_models/' + str(i), verbose=False)
    p = predict_model(l, data=score_df)
    p['time_series'] = i
    all_score_df.append(p)

concat_df = pd.concat(all_score_df, axis=0)
concat_df.head()

We will now join the dataand concat_df .

final_df = pd.merge(concat_df, data, how = 'left', left_on=['date', 'time_series'], right_on = ['date', 'time_series'])
final_df.head()

We can now create a loop to see all plots.

for i in final_df['time_series'].unique()[:5]:
    sub_df = final_df[final_df['time_series'] == i]
    
    import plotly.express as px
    fig = px.line(sub_df, x="date", y=['sales', 'Label'], title=i, template = 'plotly_dark')
    fig.show()

I hope that you will appreciate the ease of use and simplicity in PyCaret. In less than 50 lines of code and one hour of experimentation, I have trained over 10,000 models (25 estimators x 500 time series) and productionalized 500 best models to generate predictions.

Coming Soon!

There is no limit to what you can achieve using this lightweight workflow automation library in Python. If you find this useful, please do not forget to give us ⭐️ on our GitHub repository.

You may also be interested in:

Important Links

Want to learn about a specific module?

Click on the links below to see the documentation and working examples.

In my , I have demonstrated how you can use PyCaret to forecast time-series data using Machine Learning through . If you haven’t read that yet, you can read tutorial before continuing with this one, as this tutorial builds upon some important concepts covered in the last tutorial.

PyCaret’s default installation is a slim version of pycaret which only installs hard dependencies that are .

When you install the full version of pycaret, all the optional dependencies as are also installed.

The objective of regression is to predict continuous values such as sales amount, quantity, temperature, number of customers, etc. All modules in PyCaret provide many features to prepare the data for modeling through the function. It has over 25 ready-to-use algorithms and to analyze the performance of trained models.

I have used the dataset from Kaggle. This dataset has 10 different stores and each store has 50 items, i.e. total of 500 daily level time series data for five years (2013–2017).

Sample Dataset
Samples rows from data
store_1_item_1 time series and 30-day moving average
store_2_item_1 time series and 30-day moving average
sample_rows from concat_results
Training process
sample rows from score_df dataset
samples rows from concat_df
sample rows from final_df
store_1_item_1 actual sales and predicted labels
store_2_item_1 actual sales and predicted labels

Next week I will be writing a tutorial on unsupervised anomaly detection on time-series data using . Please follow me on , , and to get more updates.

To hear more about PyCaret follow us on and .

Join us on our slack channel. Invite link .

📖
PyCaret 2.2 is here — what’s new
Announcing PyCaret 2.0
Five things you don’t know about PyCaret
last tutorial
PyCaret Regression Module
Time Series Forecasting with PyCaret Regression Module
listed here
listed here
pre-processing
setup
several plots
Store Item Demand Forecasting Challenge
PyCaret Anomaly Detection Module
Medium
LinkedIn
Twitter
LinkedIn
Youtube
here
Build your own AutoML in Power BI using PyCaret 2.0
Deploy Machine Learning Pipeline on Azure using Docker
Deploy Machine Learning Pipeline on Google Kubernetes Engine
Deploy Machine Learning Pipeline on AWS Fargate
Build and deploy your first machine learning web app
Deploy PyCaret and Streamlit app using AWS Fargate serverless
Build and deploy machine learning web app using PyCaret and Streamlit
Deploy Machine Learning App built using Streamlit and PyCaret on GKE
Documentation
Blog
GitHub
StackOverflow
Install PyCaret
Notebook Tutorials
Contribute in PyCaret
Classification
Regression
Clustering
Anomaly Detection
Natural Language Processing
Association Rule Mining