Docs
  • PyCaret 3.0
  • GET STARTED
    • 💻Installation
    • 🚀Quickstart
    • ⭐Tutorials
    • 📶Modules
    • ⚙️Data Preprocessing
      • Data Preparation
      • Scale and Transform
      • Feature Engineering
      • Feature Selection
      • Other setup parameters
    • 💡Functions
      • Initialize
      • Train
      • Optimize
      • Analyze
      • Deploy
      • Others
  • LEARN PYCARET
    • 📖Blog
      • Announcing PyCaret 1.0
      • Announcing PyCaret 2.0
      • 5 things you dont know about PyCaret
      • Build and deploy your first machine learning web app
      • Build your own AutoML in Power BI using PyCaret
      • Deploy ML Pipeline on Google Kubernetes
      • Deploy PyCaret and Streamlit on AWS Fargate
      • Anomaly Detector in Power BI using PyCaret
      • Deploy ML App on Google Kubernetes
      • Deploy Machine Learning Pipeline on GKE
      • Deploy Machine Learning Pipeline on AWS Fargate
      • Deploy ML Pipeline on the cloud with Docker
      • Clustering Analysis in Power BI using PyCaret
      • Deploy PyCaret Models on edge with ONNX Runtime
      • GitHub is the best AutoML you will ever need
      • Deploy PyCaret and Streamlit on AWS Fargate
      • Easy MLOps with PyCaret and MLflow
      • Clustering Analysis in Power BI using PyCaret
      • Machine Learning in Alteryx with PyCaret
      • Machine Learning in KNIME with PyCaret
      • Machine Learning in SQL using PyCaret Part I
      • Machine Learning in Power BI using PyCaret
      • Machine Learning in Tableau with PyCaret
      • Multiple Time Series Forecasting with PyCaret
      • Predict Customer Churn using PyCaret
      • Predict Lead Score (the Right Way) Using PyCaret
      • NLP Text Classification in Python using PyCaret
      • Predict Lead Score (the Right Way) Using PyCaret
      • Predicting Crashes in Gold Prices Using PyCaret
      • Predicting Gold Prices Using Machine Learning
      • PyCaret 2.1 Feature Summary
      • Ship ML Models to SQL Server using PyCaret
      • Supercharge Your ML with PyCaret and Gradio
      • Time Series 101 - For beginners
      • Time Series Anomaly Detection with PyCaret
      • Time Series Forecasting with PyCaret Regression
      • Topic Modeling in Power BI using PyCaret
      • Write and train custom ML models using PyCaret
      • Build and deploy ML app with PyCaret and Streamlit
      • PyCaret 2.3.6 is Here! Learn What’s New?
    • 📺Videos
    • 🛩️Cheat sheet
    • ❓FAQs
    • 👩‍💻Examples
  • IMPORTANT LINKS
    • 🛠️Release Notes
    • ⚙️API Reference
    • 🙋 Discussions
    • 📤Issues
    • 👮 License
  • MEDIA
    • 💻Slack
    • 📺YouTube
    • 🔗LinkedIn
    • 😾GitHub
    • 🔅Stack Overflow
Powered by GitBook
On this page
  • plot_model
  • Example
  • Change the scale
  • Save the plot
  • Customize the plot
  • Use train data
  • Examples by module
  • evaluate_model
  • interpret_model
  • Example
  • Save the plot
  • Change plot type
  • Use train data
  • dashboard
  • check_fairness
  • get_leaderboard
  • assign_model

Was this helpful?

  1. GET STARTED
  2. Functions

Analyze

Analysis and model explainability functions in PyCaret

PreviousOptimizeNextDeploy

Last updated 1 year ago

Was this helpful?

plot_model

This function analyzes the performance of a trained model on the hold-out set. It may require re-training the model in certain cases.

Example

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# creating a model
lr = create_model('lr')

# plot model
plot_model(lr, plot = 'auc')

Change the scale

The resolution scale of the figure can be changed with scale parameter.

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# creating a model
lr = create_model('lr')

# plot model
plot_model(lr, plot = 'auc', scale = 3)

Save the plot

You can save the plot as a png file using the save parameter.

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# creating a model
lr = create_model('lr')

# plot model
plot_model(lr, plot = 'auc', save = True)

Customize the plot

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# creating a model
lr = create_model('lr')

# plot model
plot_model(lr, plot = 'confusion_matrix', plot_kwargs = {'percent' : True})

Use train data

If you want to assess the model plot on the train data, you can pass use_train_data=True in the plot_model function.

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# creating a model
lr = create_model('lr')

# plot model
plot_model(lr, plot = 'auc', use_train_data = True)

Plot on train data vs. hold-out data

Examples by module

Classification

Plot Name

Plot

Area Under the Curve

‘auc’

Discrimination Threshold

‘threshold’

Precision Recall Curve

‘pr’

Confusion Matrix

‘confusion_matrix’

Class Prediction Error

‘error’

Classification Report

‘class_report’

Decision Boundary

‘boundary’

Recursive Feature Selection

‘rfe’

Learning Curve

‘learning’

Manifold Learning

‘manifold’

Calibration Curve

‘calibration’

Validation Curve

‘vc’

Dimension Learning

‘dimension’

Feature Importance (Top 10)

‘feature’

Feature IImportance (all)

'feature_all'

Model Hyperparameter

‘parameter’

Lift Curve

'lift'

Gain Curve

'gain'

KS Statistic Plot

'ks'

Regression

Name

Plot

Residuals Plot

‘residuals’

Prediction Error Plot

‘error’

Cooks Distance Plot

‘cooks’

Recursive Feature Selection

‘rfe’

Learning Curve

‘learning’

Validation Curve

‘vc’

Manifold Learning

‘manifold’

Feature Importance (top 10)

‘feature’

Feature Importance (all)

'feature_all'

Model Hyperparameter

‘parameter’

Clustering

Name

Plot

Cluster PCA Plot (2d)

‘cluster’

Cluster TSnE (3d)

‘tsne’

Elbow Plot

‘elbow’

Silhouette Plot

‘silhouette’

Distance Plot

‘distance’

Distribution Plot

‘distribution’

Anomaly Detection

Name

Plot

t-SNE (3d) Dimension Plot

‘tsne’

UMAP Dimensionality Plot

‘umap’

evaluate_model

# load dataset
from pycaret.datasets import get_data
juice = get_data('juice')

# init setup
from pycaret.classification import *
exp_name = setup(data = juice,  target = 'Purchase')

# create model
lr = create_model('lr')

# launch evaluate widget
evaluate_model(lr)

NOTE: This function only works in Jupyter Notebook or an equivalent environment.

interpret_model

Example

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# creating a model
xgboost = create_model('xgboost')

# interpret model
interpret_model(xgboost)

Save the plot

You can save the plot as a png file using the save parameter.

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# creating a model
xgboost = create_model('xgboost')

# interpret model
interpret_model(xgboost, save = True)

NOTE: When save=True no plot is displayed in the Notebook.

Change plot type

There are a few different plot types available that can be changed by the plot parameter.

Correlation

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# creating a model
xgboost = create_model('xgboost')

# interpret model
interpret_model(xgboost, plot = 'correlation')

By default, PyCaret uses the first feature in the dataset but that can be changed using feature parameter.

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# creating a model
xgboost = create_model('xgboost')

# interpret model
interpret_model(xgboost, plot = 'correlation', feature = 'Age (years)')

Partial Dependence Plot

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# creating a model
xgboost = create_model('xgboost')

# interpret model
interpret_model(xgboost, plot = 'pdp')

By default, PyCaret uses the first available feature in the dataset but this can be changed using the feature parameter.

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# creating a model
xgboost = create_model('xgboost')

# interpret model
interpret_model(xgboost, plot = 'pdp', feature = 'Age (years)')

Morris Sensitivity Analysis

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# creating a model
xgboost = create_model('xgboost')

# interpret model
interpret_model(xgboost, plot = 'msa')

Permutation Feature Importance

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# creating a model
xgboost = create_model('xgboost')

# interpret model
interpret_model(xgboost, plot = 'pfi')

Reason Plot

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# creating a model
xgboost = create_model('xgboost')

# interpret model
interpret_model(xgboost, plot = 'reason')

When you generate reason plot without passing the specific index of test data, you will get the interactive plot displayed with the ability to select the x and y-axis. This will only be possible if you are using Jupyter Notebook or an equivalent environment. If you want to see this plot for a specific observation, you will have to pass the index in the observation parameter.

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# creating a model
xgboost = create_model('xgboost')

# interpret model
interpret_model(xgboost, plot = 'reason', observation = 1)

Here the observation = 1 means index 1 from the test set.

Use train data

By default, all the plots are generated on the test dataset. If you want to generate plots using a train data set (not recommended) you can use use_train_data parameter.

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# creating a model
xgboost = create_model('xgboost')

# interpret model
interpret_model(xgboost, use_train_data = True)

dashboard

Dashboard Example

# load dataset
from pycaret.datasets import get_data
juice = get_data('juice')

# init setup
from pycaret.classification import *
exp_name = setup(data = juice,  target = 'Purchase')

# train model
lr = create_model('lr')

# launch dashboard
dashboard(lr)

Video:

check_fairness

There are many approaches to conceptualizing fairness. The check_fairness function follows the approach known as group fairness, which asks: which groups of individuals are at risk for experiencing harm. check_fairness provides fairness-related metrics between different groups (also called sub-population).

Check Fairness Example

# load dataset
from pycaret.datasets import get_data
income = get_data('income')

# init setup
from pycaret.classification import *
exp_name = setup(data = income,  target = 'income >50K')

# train model
lr = create_model('lr')

# check model fairness
lr_fairness = check_fairness(lr, sensitive_features = ['sex', 'race'])

Video:

get_leaderboard

This function returns the leaderboard of all models trained in the current setup.

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# compare models
top3 = compare_models(n_select = 3)

# tune top 3 models
tuned_top3 = [tune_model(i) for i in top3]

# ensemble top 3 tuned models
ensembled_top3 = [ensemble_model(i) for i in tuned_top3]

# blender
blender = blend_models(tuned_top3)

# stacker
stacker = stack_models(tuned_top3)

# check leaderboard
get_leaderboard()

You can also access the trained Pipeline with this.

# check leaderboard
lb = get_leaderboard()

# select top model
lb.iloc[0]['Model']

assign_model

Clustering

# load dataset
from pycaret.datasets import get_data
jewellery = get_data('jewellery')

# init setup
from pycaret.clustering import *
clu1 = setup(data = jewellery)

# train a model
kmeans = create_model('kmeans')

# assign model
assign_model(kmeans)

Anomaly Detection

# load dataset
from pycaret.datasets import get_data
anomaly = get_data('anomaly')

# init setup
from pycaret.anomaly import *
ano1 = setup(data = anomaly)

# train a model
iforest = create_model('iforest')

# assign model
assign_model(iforest)

PyCaret uses for most of the plotting work. Any argument that is acceptable for Yellowbrick visualizers can be passed as plot_kwargs parameter.

The evaluate_model displays a user interface for analyzing the performance of a trained model. It calls the function internally.

This function analyzes the predictions generated from a trained model. Most plots in this function are implemented based on the SHAP (Shapley Additive exPlanations). For more info on this, please see

The dashboard function generates the interactive dashboard for a trained model. The dashboard is implemented using ExplainerDashboard ()

This function assigns labels to the training dataset using the trained model. It is available for , , and modules.

💡
Yellowbrick
https://shap.readthedocs.io/en/latest/
explainerdashboard.readthedocs.io
Clustering
Anomaly Detection
NLP
plot_model
Output from plot_model(lr, plot = 'auc')
Output from plot_model(lr, plot = 'auc', scale = 3)
Output from plot_model(lr, plot = 'auc', save = True)
Output from plot_model(lr, plot = 'confusion_matrix', plot_kwargs = {'percent' : True})
Output from plot_model(lr, plot = 'auc', use_train_data = True)
Output from evaluate_model(lr)
Output from interpret_model(xgboost)
Output from interpret_model(xgboost, plot = 'correlation')
Output from interpret_model(xgboost, plot = 'correlation', feature = 'Age (years)')
Output from interpret_model(xgboost, plot = 'pdp')
Output from interpret_model(xgboost, plot = 'pdp', feature = 'Age (years)')
Output from interpret_model(xgboost, plot = 'msa')
Output from interpret_model(xgboost, plot = 'pfi')
Output from interpret_model(xgboost, plot = 'reason')
Output from interpret_model(xgboost, use_train_data = True)
Dashboard (Classification Metrics)
Dashboard (Individual Predictions)
Dashboard (What-if analysis)
Output from get_leaderboard()
Output from lb.iloc[0]['Model']
Output from assign_model(kmeans)
Output from assign_model(iforest)