🚀
Quickstart
Get Up and Running in No Time: A Beginner's Guide to PyCaret
PyCaret’s Classification Module is a supervised machine learning module that is used for classifying elements into groups.
The goal is to predict the categorical class labels which are discrete and unordered. Some common use cases include predicting customer default (Yes or No), predicting customer churn (customer will leave or stay), the disease found (positive or negative).
This module can be used for binary or multiclass problems.
This function initializes the training environment and creates the transformation pipeline. Setup function must be called before executing any other function. It takes two required parameters:
data
and target
. All the other parameters are optional.1
# load sample dataset
2
from pycaret.datasets import get_data
3
data = get_data('diabetes')

PyCaret 3.0 has two API's. You can choose one of it based on your preference. The functionalities and experiment results are consistent.
1
from pycaret.classification import *
2
s = setup(data, target = 'Class variable', session_id = 123)

1
from pycaret.classification import ClassificationExperiment
2
s = ClassificationExperiment()
3
s.setup(data, target = 'Class variable', session_id = 123)

This function trains and evaluates the performance of all the estimators available in the model library using cross-validation. The output of this function is a scoring grid with average cross-validated scores. Metrics evaluated during CV can be accessed using the
get_metrics
function. Custom metrics can be added or removed using add_metric
and remove_metric
function.1
# functional API
2
best = compare_models()
3
4
# OOP API
5
best = s.compare_models()

print(best)

This function analyzes the performance of a trained model on the test set. It may require re-training the model in certain cases.
1
# functional API
2
evaluate_model(best)
3
4
# OOP API
5
s.evaluate_model(best)

evaluate_model
can only be used in Notebook since it uses ipywidget
. You can also use the plot_model
function to generate plots individually.1
# functional API
2
plot_model(best, plot = 'auc')
3
4
# OOP API
5
s.plot_model(best, plot = 'auc')

1
# functional API
2
plot_model(best, plot = 'confusion_matrix')
3
4
# OOP API
5
s.plot_model(best, plot = 'confusion_matrix')

This function scores the data and returns
prediction_label
and prediction_score
probability of the predicted class). When data
is None, it predicts label and score on the test set (created during the setup
function).1
# functional API
2
predict_model(best)
3
4
# OOP API
5
s.predict_model(best)

The evaluation metrics are calculated on the test set. The second output is the
pd.DataFrame
with predictions on the test set (see the last two columns). To generate labels on the unseen (new) dataset, simply pass the dataset in the data
parameter under predict_model
function.1
# functional API
2
predictions = predict_model(best, data=data)
3
predictions.head()
4
5
# OOP API
6
predictions = s.predict_model(best, data=data)
7
predictions.head()

Score
means the probability of the predicted class (NOT the positive class). If prediction_label
is 0 and prediction_score
is 0.90, this means 90% probability of class 0. If you want to see the probability of both the classes, simply pass raw_score=True
in the predict_model
function.1
# functional API
2
predictions = predict_model(best, data=data, raw_score=True)
3
predictions.head()
4
5
# OOP API
6
predictions = s.predict_model(best, data=data, raw_score=True)
7
predictions.head()

1
# functional API
2
save_model(best, 'my_best_pipeline')
3
4
# OOP API
5
s.save_model(best, 'my_best_pipeline')

1
# functional API
2
loaded_model = load_model('my_best_pipeline')
3
print(loaded_model)
4
5
# OOP API
6
loaded_model = s.load_model('my_best_pipeline')
7
print(loaded_model)

PyCaret’s Regression Module is a supervised machine learning module that is used for estimating the relationships between a dependent variable (often called the ‘outcome variable’, or ‘target’) and one or more independent variables (often called ‘features’, ‘predictors’, or ‘covariates’).
The objective of regression is to predict continuous values such as predicting sales amount, predicting quantity, predicting temperature, etc.
This function initializes the training environment and creates the transformation pipeline. Setup function must be called before executing any other function. It takes two required parameters:
data
and target
. All the other parameters are optional.1
# load sample dataset
2
from pycaret.datasets import get_data
3
data = get_data('insurance')

PyCaret 3.0 has two API's. You can choose one of it based on your preference. The functionalities and experiment results are consistent.
1
from pycaret.regression import *
2
s = setup(data, target = 'charges', session_id = 123)

1
from pycaret.regression import RegressionExperiment
2
s = RegressionExperiment()
3
s.setup(data, target = 'charges', session_id = 123)

This function trains and evaluates the performance of all the estimators available in the model library using cross-validation. The output of this function is a scoring grid with average cross-validated scores. Metrics evaluated during CV can be accessed using the
get_metrics
function. Custom metrics can be added or removed using add_metric
and remove_metric
function.1
# functional API
2
best = compare_models()
3
4
# OOP API
5
best = s.compare_models()

1
print(best)

This function analyzes the performance of a trained model on the test set. It may require re-training the model in certain cases.
1
# functional API
2
evaluate_model(best)
3
4
# OOP API
5
s.evaluate_model(best)

evaluate_model
can only be used in Notebook since it uses ipywidget
. You can also use the plot_model
function to generate plots individually.1
# functional API
2
plot_model(best, plot = 'residuals')
3
4
# OOP API
5
s.plot_model(best, plot = 'residuals')

1
# functional API
2
plot_model(best, plot = 'feature')
3
4
# OOP API
5
s.plot_model(best, plot = 'feature')

This function predicts
prediction_label
using the trained model. When data
is None, it predicts label and score on the test set (created during the setup
function).1
# functional API
2
predict_model(best)
3
4
# OOP API
5
s.predict_model(best)

The evaluation metrics are calculated on the test set. The second output is the
pd.DataFrame
with predictions on the test set (see the last two columns). To generate labels on the unseen (new) dataset, simply pass the dataset in the predict_model
function.1
# functional API
2
predictions = predict_model(best, data=data)
3
predictions.head()
4
5
# OOP API
6
predictions = s.predict_model(best, data=data)
7
predictions.head()

1
# functional API
2
save_model(best, 'my_best_pipeline')
3
4
# OOP API
5
s.save_model(best, 'my_best_pipeline')

1
# functional API
2
loaded_model = load_model('my_best_pipeline')
3
print(loaded_model)
4
5
# OOP API
6
loaded_model = s.load_model('my_best_pipeline')
7
print(loaded_model)

PyCaret’s Clustering Module is an unsupervised machine learning module that performs the task of grouping a set of objects in such a way that objects in the same group (also known as a cluster) are more similar to each other than to those in other groups.
This function initializes the training environment and creates the transformation pipeline. Setup function must be called before executing any other function. It takes only one required parameter:
data
. All the other parameters are optional.1
# load sample dataset
2
from pycaret.datasets import get_data
3
data = get_data('jewellery')

PyCaret 3.0 has two API's. You can choose one of it based on your preference. The functionalities and experiment results are consistent.
1
from pycaret.clustering import *
2
s = setup(data, normalize = True)

1
from pycaret.clustering import ClusteringExperiment
2
s = ClusteringExperiment()
3
s.setup(data, normalize = True)

This function trains and evaluates the performance of a given model. Metrics evaluated can be accessed using the
get_metrics
function. Custom metrics can be added or removed using the add_metric
and remove_metric
function. All the available models can be accessed using the models
function.1
# functional API
2
kmeans = create_model('kmeans')
3
4
# OOP API
5
kmeans = s.create_model('kmeans')

print(kmeans)

This function analyzes the performance of a trained model.
1
# functional API
2
evaluate_model(kmeans)
3
4
# OOP API
5
s.evaluate_model(kmeans)

evaluate_model
can only be used in Notebook since it uses ipywidget
. You can also use the plot_model
function to generate plots individually.1
# functional API
2
plot_model(kmeans, plot = 'elbow')
3
4
# OOP API
5
s.plot_model(kmeans, plot = 'elbow')

1
# functional API
2
plot_model(kmeans, plot = 'silhouette')
3
4
# OOP API
5
s.plot_model(kmeans, plot = 'silhouette')

This function assigns cluster labels to the training data, given a trained model.
1
# functional API
2
result = assign_model(kmeans)
3
result.head()
4
5
# OOP API
6
result = s.assign_model(kmeans)
7
result.head()

This function generates cluster labels using a trained model on the new/unseen dataset.
1
# functional API
2
predictions = predict_model(kmeans, data = data)
3
predictions.head()
4
5
# OOP API
6
predictions = s.predict_model(kmeans, data = data)
7
predictions.head()

1
# functional API
2
save_model(kmeans, 'kmeans_pipeline')
3
4
# OOP API
5
s.save_model(kmeans, 'kmeans_pipeline')

1
# functional API
2
loaded_model = load_model('kmeans_pipeline')
3
print(loaded_model)
4
5
# OOP API
6
loaded_model = s.load_model('kmeans_pipeline')
7
print(loaded_model)

PyCaret’s Anomaly Detection Module is an unsupervised machine learning module that is used for identifying rare items, events, or observations that raise suspicions by differing significantly from the majority of the data.
Typically, the anomalous items will translate to some kind of problems such as bank fraud, a structural defect, medical problems, or errors.
This function initializes the training environment and creates the transformation pipeline. The
setup
function must be called before executing any other function. It takes only one required parameter only: data
. All the other parameters are optional.1
# load sample dataset
2
from pycaret.datasets import get_data
3
data = get_data('anomaly')

PyCaret 3.0 has two API's. You can choose one of it based on your preference. The functionalities and experiment results are consistent.
1
from pycaret.anomaly import *
2
s = setup(data, session_id = 123)

1
from pycaret.anomaly import AnomalyExperiment
2
s = AnomalyExperiment()
3
s.setup(data, session_id = 123)

This function trains an unsupervised anomaly detection model. All the available models can be accessed using the
models
function.1
# functional API
2
iforest = create_model('iforest')
3
print(iforest)
4
5
# OOP API
6
iforest = s.create_model('iforest')
7
print(iforest)

1
# functional API
2
models()
3
4
# OOP API
5
s.models()

1
# functional API
2
plot_model(iforest, plot = 'tsne')
3
4
# OOP API
5
s.plot_model(iforest, plot = 'tsne')

1
# functional API
2
plot_model(iforest, plot = 'umap')
3
4
# OOP API
5
s.plot_model(iforest, plot = 'umap')

This function assigns anomaly labels to the dataset for a given model. (1 = outlier, 0 = inlier).
1
# functional API
2
result = assign_model(iforest)
3
result.head()
4
5
# OOP API
6
result = s.assign_model(iforest)
7
result.head()

This function generates anomaly labels using a trained model on the new/unseen dataset.
1
# functional API
2
predictions = predict_model(iforest, data = data)
3
predictions.head()
4
5
# OOP API
6
predictions = s.predict_model(iforest, data = data)
7
predictions.head()

Output from predict_model(iforest, data = data)
1
# functional API
2
save_model(iforest, 'iforest_pipeline')
3
4
# OOP API
5
s.save_model(iforest, 'iforest_pipeline')

To load the model back in the environment:
1
# functional API
2
loaded_model = load_model('iforest_pipeline')
3
print(loaded_model)
4
5
# OOP API
6
loaded_model = s.load_model('iforest_pipeline')
7
print(loaded_model)

PyCaret Time Series module is a powerful tool for analyzing and predicting time series data using machine learning and classical statistical techniques. This module enables users to easily perform complex time series forecasting tasks by automating the entire process from data preparation to model deployment.
PyCaret Time Series Forecasting module supports a wide range of forecasting methods such as ARIMA, Prophet, and LSTM. It also provides various features to handle missing values, time series decomposition, and data visualizations.
This function initializes the training environment and creates the transformation pipeline. Setup function must be called before executing any other function.
1
# load sample dataset
2
from pycaret.datasets import get_data
3
data = get_data('airline')

PyCaret 3.0 has two API's. You can choose one of it based on your preference. The functionalities and experiment results are consistent.
1
from pycaret.time_series import *
2
s = setup(data, fh = 3, fold = 5, session_id = 123)

Output truncated
1
from pycaret.time_series import TSForecastingExperiment
2
s = TSForecastingExperiment()

Output truncated
This function trains and evaluates the performance of all the estimators available in the model library using cross-validation. The output of this function is a scoring grid with average cross-validated scores. Metrics evaluated during CV can be accessed using the
get_metrics
function. Custom metrics can be added or removed using add_metric
and remove_metric
function.1
# functional API
2
best = compare_models()
3
4
# OOP API
5
best = s.compare_models()

1
# functional API
2
plot_model(best, plot = 'forecast', data_kwargs = {'fh' : 24})
3
4
# OOP API
5
s.plot_model(best, plot = 'forecast', data_kwargs = {'fh' : 24})

1
# functional API
2
plot_model(best, plot = 'diagnostics')
3
4
# OOP API
5
s.plot_model(best, plot = 'diagnostics')

1
# functional API
2
plot_model(best, plot = 'insample')
3
4
# OOP API
5
s.plot_model(best, plot = 'insample')

1
# functional API
2
final_best = finalize_model(best)
3
predict_model(best, fh = 24)
4
5
# OOP API
6
final_best = s.finalize_model(best)
7
s.predict_model(best, fh = 24)

1
# functional API
2
save_model(final_best, 'my_final_best_model')
3
4
# OOP API
5
s.save_model(final_best, 'my_final_best_model')

1
# functional API
2
loaded_model = load_model('my_final_best_model')
3
print(loaded_model)
4
5
# OOP API
6
loaded_model = s.load_model('my_final_best_model')
7
print(loaded_model)

Last modified 3mo ago