Docs
  • PyCaret 3.0
  • GET STARTED
    • 💻Installation
    • 🚀Quickstart
    • ⭐Tutorials
    • 📶Modules
    • ⚙️Data Preprocessing
      • Data Preparation
      • Scale and Transform
      • Feature Engineering
      • Feature Selection
      • Other setup parameters
    • 💡Functions
      • Initialize
      • Train
      • Optimize
      • Analyze
      • Deploy
      • Others
  • LEARN PYCARET
    • 📖Blog
      • Announcing PyCaret 1.0
      • Announcing PyCaret 2.0
      • 5 things you dont know about PyCaret
      • Build and deploy your first machine learning web app
      • Build your own AutoML in Power BI using PyCaret
      • Deploy ML Pipeline on Google Kubernetes
      • Deploy PyCaret and Streamlit on AWS Fargate
      • Anomaly Detector in Power BI using PyCaret
      • Deploy ML App on Google Kubernetes
      • Deploy Machine Learning Pipeline on GKE
      • Deploy Machine Learning Pipeline on AWS Fargate
      • Deploy ML Pipeline on the cloud with Docker
      • Clustering Analysis in Power BI using PyCaret
      • Deploy PyCaret Models on edge with ONNX Runtime
      • GitHub is the best AutoML you will ever need
      • Deploy PyCaret and Streamlit on AWS Fargate
      • Easy MLOps with PyCaret and MLflow
      • Clustering Analysis in Power BI using PyCaret
      • Machine Learning in Alteryx with PyCaret
      • Machine Learning in KNIME with PyCaret
      • Machine Learning in SQL using PyCaret Part I
      • Machine Learning in Power BI using PyCaret
      • Machine Learning in Tableau with PyCaret
      • Multiple Time Series Forecasting with PyCaret
      • Predict Customer Churn using PyCaret
      • Predict Lead Score (the Right Way) Using PyCaret
      • NLP Text Classification in Python using PyCaret
      • Predict Lead Score (the Right Way) Using PyCaret
      • Predicting Crashes in Gold Prices Using PyCaret
      • Predicting Gold Prices Using Machine Learning
      • PyCaret 2.1 Feature Summary
      • Ship ML Models to SQL Server using PyCaret
      • Supercharge Your ML with PyCaret and Gradio
      • Time Series 101 - For beginners
      • Time Series Anomaly Detection with PyCaret
      • Time Series Forecasting with PyCaret Regression
      • Topic Modeling in Power BI using PyCaret
      • Write and train custom ML models using PyCaret
      • Build and deploy ML app with PyCaret and Streamlit
      • PyCaret 2.3.6 is Here! Learn What’s New?
    • 📺Videos
    • 🛩️Cheat sheet
    • ❓FAQs
    • 👩‍💻Examples
  • IMPORTANT LINKS
    • 🛠️Release Notes
    • ⚙️API Reference
    • 🙋 Discussions
    • 📤Issues
    • 👮 License
  • MEDIA
    • 💻Slack
    • 📺YouTube
    • 🔗LinkedIn
    • 😾GitHub
    • 🔅Stack Overflow
Powered by GitBook
On this page
  • 🚀 Classification
  • Setup
  • Compare Models
  • Analyze Model
  • Predictions
  • Save the model
  • 🚀 Regression
  • Setup
  • Compare Models
  • Analyze Model
  • Predictions
  • Save the model
  • 🚀 Clustering
  • Setup
  • Create Model
  • Analyze Model
  • Assign Model
  • Predictions
  • Save the model
  • 🚀 Anomaly Detection
  • Setup
  • Create Model
  • Analyze Model
  • Assign Model
  • Predictions
  • Save the model
  • 🚀 Time Series
  • Setup
  • Compare Models
  • Analyze Model
  • Predictions
  • Save the model

Was this helpful?

  1. GET STARTED

Quickstart

Get Up and Running in No Time: A Beginner's Guide to PyCaret

PreviousInstallationNextTutorials

Last updated 2 years ago

Was this helpful?

🚀 Classification

PyCaret’s Classification Module is a supervised machine learning module that is used for classifying elements into groups.

The goal is to predict the categorical class labels which are discrete and unordered. Some common use cases include predicting customer default (Yes or No), predicting customer churn (customer will leave or stay), the disease found (positive or negative).

This module can be used for binary or multiclass problems.

Setup

This function initializes the training environment and creates the transformation pipeline. Setup function must be called before executing any other function. It takes two required parameters: data and target. All the other parameters are optional.

# load sample dataset
from pycaret.datasets import get_data
data = get_data('diabetes')

PyCaret 3.0 has two API's. You can choose one of it based on your preference. The functionalities and experiment results are consistent.

Functional API

from pycaret.classification import *
s = setup(data, target = 'Class variable', session_id = 123)

OOP API

from pycaret.classification import ClassificationExperiment
s = ClassificationExperiment()
s.setup(data, target = 'Class variable', session_id = 123)

Compare Models

This function trains and evaluates the performance of all the estimators available in the model library using cross-validation. The output of this function is a scoring grid with average cross-validated scores. Metrics evaluated during CV can be accessed using the get_metrics function. Custom metrics can be added or removed using add_metric and remove_metric function.

# functional API
best = compare_models()

# OOP API
best = s.compare_models()
print(best)

Analyze Model

This function analyzes the performance of a trained model on the test set. It may require re-training the model in certain cases.

# functional API
evaluate_model(best)

# OOP API
s.evaluate_model(best)

evaluate_model can only be used in Notebook since it uses ipywidget . You can also use the plot_model function to generate plots individually.

# functional API
plot_model(best, plot = 'auc')

# OOP API
s.plot_model(best, plot = 'auc')
# functional API
plot_model(best, plot = 'confusion_matrix')

# OOP API
s.plot_model(best, plot = 'confusion_matrix')

Predictions

This function scores the data and returns prediction_label and prediction_score probability of the predicted class). When data is None, it predicts label and score on the test set (created during the setup function).

# functional API
predict_model(best)

# OOP API
s.predict_model(best)

The evaluation metrics are calculated on the test set. The second output is the pd.DataFrame with predictions on the test set (see the last two columns). To generate labels on the unseen (new) dataset, simply pass the dataset in the data parameter under predict_model function.

# functional API
predictions = predict_model(best, data=data)
predictions.head()

# OOP API
predictions = s.predict_model(best, data=data)
predictions.head()

Score means the probability of the predicted class (NOT the positive class). If prediction_label is 0 and prediction_score is 0.90, this means 90% probability of class 0. If you want to see the probability of both the classes, simply pass raw_score=True in the predict_model function.

# functional API
predictions = predict_model(best, data=data, raw_score=True)
predictions.head()

# OOP API
predictions = s.predict_model(best, data=data, raw_score=True)
predictions.head()

Save the model

# functional API
save_model(best, 'my_best_pipeline')

# OOP API
s.save_model(best, 'my_best_pipeline')

To load the model back in environment:

# functional API
loaded_model = load_model('my_best_pipeline')
print(loaded_model)

# OOP API
loaded_model = s.load_model('my_best_pipeline')
print(loaded_model)

🚀 Regression

PyCaret’s Regression Module is a supervised machine learning module that is used for estimating the relationships between a dependent variable (often called the ‘outcome variable’, or ‘target’) and one or more independent variables (often called ‘features’, ‘predictors’, or ‘covariates’).

The objective of regression is to predict continuous values such as predicting sales amount, predicting quantity, predicting temperature, etc.

Setup

This function initializes the training environment and creates the transformation pipeline. Setup function must be called before executing any other function. It takes two required parameters: data and target. All the other parameters are optional.

# load sample dataset
from pycaret.datasets import get_data
data = get_data('insurance')

PyCaret 3.0 has two API's. You can choose one of it based on your preference. The functionalities and experiment results are consistent.

Functional API

from pycaret.regression import *
s = setup(data, target = 'charges', session_id = 123)

OOP API

from pycaret.regression import RegressionExperiment
s = RegressionExperiment()
s.setup(data, target = 'charges', session_id = 123)

Compare Models

This function trains and evaluates the performance of all the estimators available in the model library using cross-validation. The output of this function is a scoring grid with average cross-validated scores. Metrics evaluated during CV can be accessed using the get_metrics function. Custom metrics can be added or removed using add_metric and remove_metric function.

# functional API
best = compare_models()

# OOP API
best = s.compare_models()
print(best)

Analyze Model

This function analyzes the performance of a trained model on the test set. It may require re-training the model in certain cases.

# functional API
evaluate_model(best)

# OOP API
s.evaluate_model(best)

evaluate_model can only be used in Notebook since it uses ipywidget . You can also use the plot_model function to generate plots individually.

# functional API
plot_model(best, plot = 'residuals')

# OOP API
s.plot_model(best, plot = 'residuals')
# functional API
plot_model(best, plot = 'feature')

# OOP API
s.plot_model(best, plot = 'feature')

Predictions

This function predicts prediction_label using the trained model. When data is None, it predicts label and score on the test set (created during the setup function).

# functional API
predict_model(best)

# OOP API
s.predict_model(best)

The evaluation metrics are calculated on the test set. The second output is the pd.DataFrame with predictions on the test set (see the last two columns). To generate labels on the unseen (new) dataset, simply pass the dataset in the predict_model function.

# functional API
predictions = predict_model(best, data=data)
predictions.head()

# OOP API
predictions = s.predict_model(best, data=data)
predictions.head()

Save the model

# functional API
save_model(best, 'my_best_pipeline')

# OOP API
s.save_model(best, 'my_best_pipeline')

To load the model back in the environment:

# functional API
loaded_model = load_model('my_best_pipeline')
print(loaded_model)

# OOP API
loaded_model = s.load_model('my_best_pipeline')
print(loaded_model)

🚀 Clustering

PyCaret’s Clustering Module is an unsupervised machine learning module that performs the task of grouping a set of objects in such a way that objects in the same group (also known as a cluster) are more similar to each other than to those in other groups.

Setup

This function initializes the training environment and creates the transformation pipeline. Setup function must be called before executing any other function. It takes only one required parameter: data. All the other parameters are optional.

# load sample dataset
from pycaret.datasets import get_data
data = get_data('jewellery')

PyCaret 3.0 has two API's. You can choose one of it based on your preference. The functionalities and experiment results are consistent.

Functional API

from pycaret.clustering import *
s = setup(data, normalize = True)

OOP API

from pycaret.clustering import ClusteringExperiment
s = ClusteringExperiment()
s.setup(data, normalize = True)

Create Model

This function trains and evaluates the performance of a given model. Metrics evaluated can be accessed using the get_metrics function. Custom metrics can be added or removed using the add_metric and remove_metric function. All the available models can be accessed using the models function.

# functional API
kmeans = create_model('kmeans')

# OOP API
kmeans = s.create_model('kmeans')
print(kmeans)

Analyze Model

This function analyzes the performance of a trained model.

# functional API
evaluate_model(kmeans)

# OOP API
s.evaluate_model(kmeans)

evaluate_model can only be used in Notebook since it uses ipywidget . You can also use the plot_model function to generate plots individually.

# functional API
plot_model(kmeans, plot = 'elbow')

# OOP API
s.plot_model(kmeans, plot = 'elbow')
# functional API
plot_model(kmeans, plot = 'silhouette')

# OOP API
s.plot_model(kmeans, plot = 'silhouette')

Assign Model

This function assigns cluster labels to the training data, given a trained model.

# functional API
result = assign_model(kmeans)
result.head()

# OOP API
result = s.assign_model(kmeans)
result.head()

Predictions

This function generates cluster labels using a trained model on the new/unseen dataset.

# functional API
predictions = predict_model(kmeans, data = data)
predictions.head()

# OOP API
predictions = s.predict_model(kmeans, data = data)
predictions.head()

Save the model

# functional API
save_model(kmeans, 'kmeans_pipeline')

# OOP API
s.save_model(kmeans, 'kmeans_pipeline')

To load the model back in the environment:

# functional API
loaded_model = load_model('kmeans_pipeline')
print(loaded_model)

# OOP API
loaded_model = s.load_model('kmeans_pipeline')
print(loaded_model)

🚀 Anomaly Detection

PyCaret’s Anomaly Detection Module is an unsupervised machine learning module that is used for identifying rare items, events, or observations that raise suspicions by differing significantly from the majority of the data.

Typically, the anomalous items will translate to some kind of problems such as bank fraud, a structural defect, medical problems, or errors.

Setup

This function initializes the training environment and creates the transformation pipeline. The setup function must be called before executing any other function. It takes only one required parameter only: data. All the other parameters are optional.

# load sample dataset
from pycaret.datasets import get_data
data = get_data('anomaly')

PyCaret 3.0 has two API's. You can choose one of it based on your preference. The functionalities and experiment results are consistent.

Functional API

from pycaret.anomaly import *
s = setup(data, session_id = 123)

OOP API

from pycaret.anomaly import AnomalyExperiment
s = AnomalyExperiment()
s.setup(data, session_id = 123)

Create Model

This function trains an unsupervised anomaly detection model. All the available models can be accessed using the models function.

# functional API
iforest = create_model('iforest')
print(iforest)

# OOP API
iforest = s.create_model('iforest')
print(iforest)
# functional API
models()

# OOP API
s.models()

Analyze Model

# functional API
plot_model(iforest, plot = 'tsne')

# OOP API
s.plot_model(iforest, plot = 'tsne')
# functional API
plot_model(iforest, plot = 'umap')

# OOP API
s.plot_model(iforest, plot = 'umap')

Assign Model

This function assigns anomaly labels to the dataset for a given model. (1 = outlier, 0 = inlier).

# functional API
result = assign_model(iforest)
result.head()

# OOP API
result = s.assign_model(iforest)
result.head()

Predictions

This function generates anomaly labels using a trained model on the new/unseen dataset.

# functional API
predictions = predict_model(iforest, data = data)
predictions.head()

# OOP API
predictions = s.predict_model(iforest, data = data)
predictions.head()

Save the model

# functional API
save_model(iforest, 'iforest_pipeline')

# OOP API
s.save_model(iforest, 'iforest_pipeline')

To load the model back in the environment:

# functional API
loaded_model = load_model('iforest_pipeline')
print(loaded_model)

# OOP API
loaded_model = s.load_model('iforest_pipeline')
print(loaded_model)

🚀 Time Series

PyCaret Time Series module is a powerful tool for analyzing and predicting time series data using machine learning and classical statistical techniques. This module enables users to easily perform complex time series forecasting tasks by automating the entire process from data preparation to model deployment.

PyCaret Time Series Forecasting module supports a wide range of forecasting methods such as ARIMA, Prophet, and LSTM. It also provides various features to handle missing values, time series decomposition, and data visualizations.

Setup

This function initializes the training environment and creates the transformation pipeline. Setup function must be called before executing any other function.

# load sample dataset
from pycaret.datasets import get_data
data = get_data('airline')

PyCaret 3.0 has two API's. You can choose one of it based on your preference. The functionalities and experiment results are consistent.

Functional API

from pycaret.time_series import *
s = setup(data, fh = 3, fold = 5, session_id = 123)

OOP API

from pycaret.time_series import TSForecastingExperiment
s = TSForecastingExperiment()

Compare Models

This function trains and evaluates the performance of all the estimators available in the model library using cross-validation. The output of this function is a scoring grid with average cross-validated scores. Metrics evaluated during CV can be accessed using the get_metrics function. Custom metrics can be added or removed using add_metric and remove_metric function.

# functional API
best = compare_models()

# OOP API
best = s.compare_models()

Analyze Model

# functional API
plot_model(best, plot = 'forecast', data_kwargs = {'fh' : 24})

# OOP API
s.plot_model(best, plot = 'forecast', data_kwargs = {'fh' : 24})
# functional API
plot_model(best, plot = 'diagnostics')

# OOP API
s.plot_model(best, plot = 'diagnostics')
# functional API
plot_model(best, plot = 'insample')

# OOP API
s.plot_model(best, plot = 'insample')

Predictions

# functional API
final_best = finalize_model(best)
predict_model(best, fh = 24)

# OOP API
final_best = s.finalize_model(best)
s.predict_model(best, fh = 24)

Save the model

# functional API
save_model(final_best, 'my_final_best_model')

# OOP API
s.save_model(final_best, 'my_final_best_model')

To load the model back in the environment:

# functional API
loaded_model = load_model('my_final_best_model')
print(loaded_model)

# OOP API
loaded_model = s.load_model('my_final_best_model')
print(loaded_model)
🚀
Output from predict_model(iforest, data = data)
Output truncated
Output truncated