Docs
Search
K
Comment on page

Analyze

Analysis and model explainability functions in PyCaret

plot_model

This function analyzes the performance of a trained model on the hold-out set. It may require re-training the model in certain cases.

Example

1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
lr = create_model('lr')
11
12
# plot model
13
plot_model(lr, plot = 'auc')
Output from plot_model(lr, plot = 'auc')

Change the scale

The resolution scale of the figure can be changed with scale parameter.
1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
lr = create_model('lr')
11
12
# plot model
13
plot_model(lr, plot = 'auc', scale = 3)
Output from plot_model(lr, plot = 'auc', scale = 3)

Save the plot

You can save the plot as a png file using the save parameter.
1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
lr = create_model('lr')
11
12
# plot model
13
plot_model(lr, plot = 'auc', save = True)
Output from plot_model(lr, plot = 'auc', save = True)

Customize the plot

PyCaret uses Yellowbrick for most of the plotting work. Any argument that is acceptable for Yellowbrick visualizers can be passed as plot_kwargs parameter.
1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
lr = create_model('lr')
11
12
# plot model
13
plot_model(lr, plot = 'confusion_matrix', plot_kwargs = {'percent' : True})
Output from plot_model(lr, plot = 'confusion_matrix', plot_kwargs = {'percent' : True})
Before Customization
After Customization

Use train data

If you want to assess the model plot on the train data, you can pass use_train_data=True in the plot_model function.
1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
lr = create_model('lr')
11
12
# plot model
13
plot_model(lr, plot = 'auc', use_train_data = True)
Output from plot_model(lr, plot = 'auc', use_train_data = True)

Plot on train data vs. hold-out data

Train Data
Hold-out Data

Examples by module

Classification

Plot Name
Plot
Area Under the Curve
‘auc’
Discrimination Threshold
‘threshold’
Precision Recall Curve
‘pr’
Confusion Matrix
‘confusion_matrix’
Class Prediction Error
‘error’
Classification Report
‘class_report’
Decision Boundary
‘boundary’
Recursive Feature Selection
‘rfe’
Learning Curve
‘learning’
Manifold Learning
‘manifold’
Calibration Curve
‘calibration’
Validation Curve
‘vc’
Dimension Learning
‘dimension’
Feature Importance (Top 10)
‘feature’
Feature IImportance (all)
'feature_all'
Model Hyperparameter
‘parameter’
Lift Curve
'lift'
Gain Curve
'gain'
KS Statistic Plot
'ks'
auc
confusion_matrix
threshold
pr
error
class_report
rfe
learning
vc
feature
manifold
calibration
dimension
boundary
lift
gain
ks
parameter

Regression

Name
Plot
Residuals Plot
‘residuals’
Prediction Error Plot
‘error’
Cooks Distance Plot
‘cooks’
Recursive Feature Selection
‘rfe’
Learning Curve
‘learning’
Validation Curve
‘vc’
Manifold Learning
‘manifold’
Feature Importance (top 10)
‘feature’
Feature Importance (all)
'feature_all'
Model Hyperparameter
‘parameter’
residuals
error
cooks
rfe
feature
learning
vc
manifold

Clustering

Name
Plot
Cluster PCA Plot (2d)
‘cluster’
Cluster TSnE (3d)
‘tsne’
Elbow Plot
‘elbow’
Silhouette Plot
‘silhouette’
Distance Plot
‘distance’
Distribution Plot
‘distribution’
cluster
tsne
elbow
silhouette
distance
distribution

Anomaly Detection

Name
Plot
t-SNE (3d) Dimension Plot
‘tsne’
UMAP Dimensionality Plot
‘umap’
tsne
umap

evaluate_model

The evaluate_model displays a user interface for analyzing the performance of a trained model. It calls the plot_model function internally.
1
# load dataset
2
from pycaret.datasets import get_data
3
juice = get_data('juice')
4
5
# init setup
6
from pycaret.classification import *
7
exp_name = setup(data = juice, target = 'Purchase')
8
9
# create model
10
lr = create_model('lr')
11
12
# launch evaluate widget
13
evaluate_model(lr)
Output from evaluate_model(lr)
NOTE: This function only works in Jupyter Notebook or an equivalent environment.

interpret_model

This function analyzes the predictions generated from a trained model. Most plots in this function are implemented based on the SHAP (Shapley Additive exPlanations). For more info on this, please see https://shap.readthedocs.io/en/latest/

Example

1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
xgboost = create_model('xgboost')
11
12
# interpret model
13
interpret_model(xgboost)
Output from interpret_model(xgboost)

Save the plot

You can save the plot as a png file using the save parameter.
1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
xgboost = create_model('xgboost')
11
12
# interpret model
13
interpret_model(xgboost, save = True)
NOTE: When save=True no plot is displayed in the Notebook.

Change plot type

There are a few different plot types available that can be changed by the plot parameter.

Correlation

1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
xgboost = create_model('xgboost')
11
12
# interpret model
13
interpret_model(xgboost, plot = 'correlation')
Output from interpret_model(xgboost, plot = 'correlation')
By default, PyCaret uses the first feature in the dataset but that can be changed using feature parameter.
1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
xgboost = create_model('xgboost')
11
12
# interpret model
13
interpret_model(xgboost, plot = 'correlation', feature = 'Age (years)')
Output from interpret_model(xgboost, plot = 'correlation', feature = 'Age (years)')

Partial Dependence Plot

1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
xgboost = create_model('xgboost')
11
12
# interpret model
13
interpret_model(xgboost, plot = 'pdp')
Output from interpret_model(xgboost, plot = 'pdp')
By default, PyCaret uses the first available feature in the dataset but this can be changed using the feature parameter.
1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
xgboost = create_model('xgboost')
11
12
# interpret model
13
interpret_model(xgboost, plot = 'pdp', feature = 'Age (years)')
Output from interpret_model(xgboost, plot = 'pdp', feature = 'Age (years)')

Morris Sensitivity Analysis

1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
xgboost = create_model('xgboost')
11
12
# interpret model
13
interpret_model(xgboost, plot = 'msa')
Output from interpret_model(xgboost, plot = 'msa')

Permutation Feature Importance

1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
xgboost = create_model('xgboost')
11
12
# interpret model
13
interpret_model(xgboost, plot = 'pfi')
Output from interpret_model(xgboost, plot = 'pfi')

Reason Plot

1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
xgboost = create_model('xgboost')
11
12
# interpret model
13
interpret_model(xgboost, plot = 'reason')
Output from interpret_model(xgboost, plot = 'reason')
When you generate reason plot without passing the specific index of test data, you will get the interactive plot displayed with the ability to select the x and y-axis. This will only be possible if you are using Jupyter Notebook or an equivalent environment. If you want to see this plot for a specific observation, you will have to pass the index in the observation parameter.
1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
xgboost = create_model('xgboost')
11
12
# interpret model
13
interpret_model(xgboost, plot = 'reason', observation = 1)
Here the observation = 1 means index 1 from the test set.

Use train data

By default, all the plots are generated on the test dataset. If you want to generate plots using a train data set (not recommended) you can use use_train_data parameter.
1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
xgboost = create_model('xgboost')
11
12
# interpret model
13
interpret_model(xgboost, use_train_data = True)
Output from interpret_model(xgboost, use_train_data = True)

dashboard

The dashboard function generates the interactive dashboard for a trained model. The dashboard is implemented using ExplainerDashboard (explainerdashboard.readthedocs.io)

Dashboard Example

1
# load dataset
2
from pycaret.datasets import get_data
3
juice = get_data('juice')
4
5
# init setup
6
from pycaret.classification import *
7
exp_name = setup(data = juice, target = 'Purchase')
8
9
# train model
10
lr = create_model('lr')
11
12
# launch dashboard
13
dashboard(lr)
Dashboard (Classification Metrics)
Dashboard (Individual Predictions)
Dashboard (What-if analysis)

Video:

check_fairness

There are many approaches to conceptualizing fairness. The check_fairness function follows the approach known as group fairness, which asks: which groups of individuals are at risk for experiencing harm. check_fairness provides fairness-related metrics between different groups (also called sub-population).

Check Fairness Example

1
# load dataset
2
from pycaret.datasets import get_data
3
income = get_data('income')
4
5
# init setup
6
from pycaret.classification import *
7
exp_name = setup(data = income, target = 'income >50K')
8
9
# train model
10
lr = create_model('lr')
11
12
# check model fairness
13
lr_fairness = check_fairness(lr, sensitive_features = ['sex', 'race'])

Video:

get_leaderboard

This function returns the leaderboard of all models trained in the current setup.
1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# compare models
10
top3 = compare_models(n_select = 3)
11
12
# tune top 3 models
13
tuned_top3 = [tune_model(i) for i in top3]
14
15
# ensemble top 3 tuned models
16
ensembled_top3 = [ensemble_model(i) for i in tuned_top3]
17
18
# blender
19
blender = blend_models(tuned_top3)
20
21
# stacker
22
stacker = stack_models(tuned_top3)
23
24
# check leaderboard
25
get_leaderboard()
Output from get_leaderboard()
You can also access the trained Pipeline with this.
1
# check leaderboard
2
lb = get_leaderboard()
3
4
# select top model
5
lb.iloc[0]['Model']
Output from lb.iloc[0]['Model']

assign_model

This function assigns labels to the training dataset using the trained model. It is available for Clustering, Anomaly Detection, and NLP modules.

Clustering

1
# load dataset
2
from pycaret.datasets import get_data
3
jewellery = get_data('jewellery')
4
5
# init setup
6
from pycaret.clustering import *
7
clu1 = setup(data = jewellery)
8
9
# train a model
10
kmeans = create_model('kmeans')
11
12
# assign model
13
assign_model(kmeans)
Output from assign_model(kmeans)

Anomaly Detection

1
# load dataset
2
from pycaret.datasets import get_data
3
anomaly = get_data('anomaly')
4
5
# init setup
6
from pycaret.anomaly import *
7
ano1 = setup(data = anomaly)
8
9
# train a model
10
iforest = create_model('iforest')
11
12
# assign model
13
assign_model(iforest)
Output from assign_model(iforest)
Last modified 1mo ago