Comment on page
Analyze
Analysis and model explainability functions in PyCaret
This function analyzes the performance of a trained model on the hold-out set. It may require re-training the model in certain cases.
1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
lr = create_model('lr')
11
12
# plot model
13
plot_model(lr, plot = 'auc')

Output from plot_model(lr, plot = 'auc')
The resolution scale of the figure can be changed with
scale
parameter.1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
lr = create_model('lr')
11
12
# plot model
13
plot_model(lr, plot = 'auc', scale = 3)

Output from plot_model(lr, plot = 'auc', scale = 3)
You can save the plot as a
png
file using the save
parameter.1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
lr = create_model('lr')
11
12
# plot model
13
plot_model(lr, plot = 'auc', save = True)

Output from plot_model(lr, plot = 'auc', save = True)
PyCaret uses Yellowbrick for most of the plotting work. Any argument that is acceptable for Yellowbrick visualizers can be passed as
plot_kwargs
parameter.1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
lr = create_model('lr')
11
12
# plot model
13
plot_model(lr, plot = 'confusion_matrix', plot_kwargs = {'percent' : True})

Output from plot_model(lr, plot = 'confusion_matrix', plot_kwargs = {'percent' : True})
Before Customization
After Customization


If you want to assess the model plot on the train data, you can pass
use_train_data=True
in the plot_model
function.1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
lr = create_model('lr')
11
12
# plot model
13
plot_model(lr, plot = 'auc', use_train_data = True)

Output from plot_model(lr, plot = 'auc', use_train_data = True)
Train Data
Hold-out Data


Plot Name | Plot |
Area Under the Curve | ‘auc’ |
Discrimination Threshold | ‘threshold’ |
Precision Recall Curve | ‘pr’ |
Confusion Matrix | ‘confusion_matrix’ |
Class Prediction Error | ‘error’ |
Classification Report | ‘class_report’ |
Decision Boundary | ‘boundary’ |
Recursive Feature Selection | ‘rfe’ |
Learning Curve | ‘learning’ |
Manifold Learning | ‘manifold’ |
Calibration Curve | ‘calibration’ |
Validation Curve | ‘vc’ |
Dimension Learning | ‘dimension’ |
Feature Importance (Top 10) | ‘feature’ |
Feature IImportance (all) | 'feature_all' |
Model Hyperparameter | ‘parameter’ |
Lift Curve | 'lift' |
Gain Curve | 'gain' |
KS Statistic Plot | 'ks' |
auc
confusion_matrix
threshold
pr
error
class_report
rfe
learning
vc









feature
manifold
calibration
dimension
boundary
lift
gain
ks
parameter









Name | Plot |
Residuals Plot | ‘residuals’ |
Prediction Error Plot | ‘error’ |
Cooks Distance Plot | ‘cooks’ |
Recursive Feature Selection | ‘rfe’ |
Learning Curve | ‘learning’ |
Validation Curve | ‘vc’ |
Manifold Learning | ‘manifold’ |
Feature Importance (top 10) | ‘feature’ |
Feature Importance (all) | 'feature_all' |
Model Hyperparameter | ‘parameter’ |
residuals
error
cooks
rfe
feature
learning
vc
manifold








Name | Plot |
Cluster PCA Plot (2d) | ‘cluster’ |
Cluster TSnE (3d) | ‘tsne’ |
Elbow Plot | ‘elbow’ |
Silhouette Plot | ‘silhouette’ |
Distance Plot | ‘distance’ |
Distribution Plot | ‘distribution’ |
cluster
tsne
elbow
silhouette
distance
distribution






Name | Plot |
t-SNE (3d) Dimension Plot | ‘tsne’ |
UMAP Dimensionality Plot | ‘umap’ |
tsne
umap


The
evaluate_model
displays a user interface for analyzing the performance of a trained model. It calls the plot_model function internally.1
# load dataset
2
from pycaret.datasets import get_data
3
juice = get_data('juice')
4
5
# init setup
6
from pycaret.classification import *
7
exp_name = setup(data = juice, target = 'Purchase')
8
9
# create model
10
lr = create_model('lr')
11
12
# launch evaluate widget
13
evaluate_model(lr)

Output from evaluate_model(lr)
NOTE: This function only works in Jupyter Notebook or an equivalent environment.
This function analyzes the predictions generated from a trained model. Most plots in this function are implemented based on the SHAP (Shapley Additive exPlanations). For more info on this, please see https://shap.readthedocs.io/en/latest/
1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
xgboost = create_model('xgboost')
11
12
# interpret model
13
interpret_model(xgboost)

Output from interpret_model(xgboost)
You can save the plot as a
png
file using the save
parameter.1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
xgboost = create_model('xgboost')
11
12
# interpret model
13
interpret_model(xgboost, save = True)
NOTE: When
save=True
no plot is displayed in the Notebook. There are a few different plot types available that can be changed by the
plot
parameter.1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
xgboost = create_model('xgboost')
11
12
# interpret model
13
interpret_model(xgboost, plot = 'correlation')

Output from interpret_model(xgboost, plot = 'correlation')
By default, PyCaret uses the first feature in the dataset but that can be changed using
feature
parameter.1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
xgboost = create_model('xgboost')
11
12
# interpret model
13
interpret_model(xgboost, plot = 'correlation', feature = 'Age (years)')

Output from interpret_model(xgboost, plot = 'correlation', feature = 'Age (years)')
1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
xgboost = create_model('xgboost')
11
12
# interpret model
13
interpret_model(xgboost, plot = 'pdp')

Output from interpret_model(xgboost, plot = 'pdp')
By default, PyCaret uses the first available feature in the dataset but this can be changed using the
feature
parameter.1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
xgboost = create_model('xgboost')
11
12
# interpret model
13
interpret_model(xgboost, plot = 'pdp', feature = 'Age (years)')

Output from interpret_model(xgboost, plot = 'pdp', feature = 'Age (years)')
1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
xgboost = create_model('xgboost')
11
12
# interpret model
13
interpret_model(xgboost, plot = 'msa')

Output from interpret_model(xgboost, plot = 'msa')
1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
xgboost = create_model('xgboost')
11
12
# interpret model
13
interpret_model(xgboost, plot = 'pfi')

Output from interpret_model(xgboost, plot = 'pfi')
1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
xgboost = create_model('xgboost')
11
12
# interpret model
13
interpret_model(xgboost, plot = 'reason')

Output from interpret_model(xgboost, plot = 'reason')
When you generate
reason
plot without passing the specific index of test data, you will get the interactive plot displayed with the ability to select the x and y-axis. This will only be possible if you are using Jupyter Notebook or an equivalent environment. If you want to see this plot for a specific observation, you will have to pass the index in the observation
parameter.1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
xgboost = create_model('xgboost')
11
12
# interpret model
13
interpret_model(xgboost, plot = 'reason', observation = 1)

Here the
observation = 1
means index 1 from the test set.By default, all the plots are generated on the test dataset. If you want to generate plots using a train data set (not recommended) you can use
use_train_data
parameter.1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# creating a model
10
xgboost = create_model('xgboost')
11
12
# interpret model
13
interpret_model(xgboost, use_train_data = True)

Output from interpret_model(xgboost, use_train_data = True)
The
dashboard
function generates the interactive dashboard for a trained model. The dashboard is implemented using ExplainerDashboard (explainerdashboard.readthedocs.io)1
# load dataset
2
from pycaret.datasets import get_data
3
juice = get_data('juice')
4
5
# init setup
6
from pycaret.classification import *
7
exp_name = setup(data = juice, target = 'Purchase')
8
9
# train model
10
lr = create_model('lr')
11
12
# launch dashboard
13
dashboard(lr)

Dashboard (Classification Metrics)

Dashboard (Individual Predictions)

Dashboard (What-if analysis)
There are many approaches to conceptualizing fairness. The
check_fairness
function follows the approach known as group fairness, which asks: which groups of individuals are at risk for experiencing harm. check_fairness
provides fairness-related metrics between different groups (also called sub-population).1
# load dataset
2
from pycaret.datasets import get_data
3
income = get_data('income')
4
5
# init setup
6
from pycaret.classification import *
7
exp_name = setup(data = income, target = 'income >50K')
8
9
# train model
10
lr = create_model('lr')
11
12
# check model fairness
13
lr_fairness = check_fairness(lr, sensitive_features = ['sex', 'race'])


This function returns the leaderboard of all models trained in the current setup.
1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable')
8
9
# compare models
10
top3 = compare_models(n_select = 3)
11
12
# tune top 3 models
13
tuned_top3 = [tune_model(i) for i in top3]
14
15
# ensemble top 3 tuned models
16
ensembled_top3 = [ensemble_model(i) for i in tuned_top3]
17
18
# blender
19
blender = blend_models(tuned_top3)
20
21
# stacker
22
stacker = stack_models(tuned_top3)
23
24
# check leaderboard
25
get_leaderboard()

Output from get_leaderboard()
You can also access the trained Pipeline with this.
1
# check leaderboard
2
lb = get_leaderboard()
3
4
# select top model
5
lb.iloc[0]['Model']

Output from lb.iloc[0]['Model']
This function assigns labels to the training dataset using the trained model. It is available for Clustering, Anomaly Detection, and NLP modules.
1
# load dataset
2
from pycaret.datasets import get_data
3
jewellery = get_data('jewellery')
4
5
# init setup
6
from pycaret.clustering import *
7
clu1 = setup(data = jewellery)
8
9
# train a model
10
kmeans = create_model('kmeans')
11
12
# assign model
13
assign_model(kmeans)

Output from assign_model(kmeans)
1
# load dataset
2
from pycaret.datasets import get_data
3
anomaly = get_data('anomaly')
4
5
# init setup
6
from pycaret.anomaly import *
7
ano1 = setup(data = anomaly)
8
9
# train a model
10
iforest = create_model('iforest')
11
12
# assign model
13
assign_model(iforest)

Output from assign_model(iforest)
Last modified 1mo ago