Optimize
Optimization functions in PyCaret
This function tunes the hyperparameters of the model. The output of this function is a scoring grid with cross-validated scores by fold. The best model is selected based on the metric defined in
optimize
parameter. Metrics evaluated during cross-validation can be accessed using the get_metrics
function. Custom metrics can be added or removed using add_metric
and remove_metric
function.# load dataset
from pycaret.datasets import get_data
boston = get_data('boston')
# init setup
from pycaret.regression import *
reg1 = setup(data = boston, target = 'medv')
# train model
dt = create_model('dt')
# tune model
tuned_dt = tune_model(dt)

Output from tune_model(dt)
To compare the hyperparameters.
# default model
print(dt)
# tuned model
print(tuned_dt)

Model hyperparameters before and after tuning
Hyperparameter tuning at the end of the day is an optimization that is constrained by the number of iterations, which eventually depends on how much time and resources you have available. The number of iterations is defined by
n_iter
. By default, it is set to 10
.# load dataset
from pycaret.datasets import get_data
boston = get_data('boston')
# init setup
from pycaret.regression import *
reg1 = setup(data = boston, target = 'medv')
# train model
dt = create_model('dt')
# tune model
tuned_dt = tune_model(dt, n_iter = 50)

Output from tune_model(dt, n_iter = 50)
n_iter = 10
n_iter = 50


When you are tuning the hyperparameters of the model, you must know which metric to optimize for. That can be defined under
optimize
parameter. By default, it is set to Accuracy
for classification experiments and R2
for regression.# load dataset
from pycaret.datasets import get_data
boston = get_data('boston')
# init setup
from pycaret.regression import *
reg1 = setup(data = boston, target = 'medv')
# train model
dt = create_model('dt')
# tune model
tuned_dt = tune_model(dt, optimize = 'MAE')

Output from tune_model(dt, optimize = 'MAE')
The tuning grid for hyperparameters is already defined by PyCaret for all the models in the library. However, if you wish you can define your own search space by passing a custom grid using
custom_grid
parameter. # load dataset
from pycaret.datasets import get_data
boston = get_data('boston')
# init setup
from pycaret.regression import *
reg1 = setup(boston, target = 'medv')
# train model
dt = create_model('dt')
# define search space
params = {"max_depth": np.random.randint(1, (len(boston.columns)*.85),20),
"max_features": np.random.randint(1, len(boston.columns),20),
"min_samples_leaf": [2,3,4,5,6]}
# tune model
tuned_dt = tune_model(dt, custom_grid = params)

Output from tune_model(dt, custom_grid = params)
PyCaret integrates seamlessly with many different libraries for hyperparameter tuning. This gives you access to many different types of search algorithms including random, bayesian, optuna, TPE, and a few others. All of this just by changing a parameter. By default, PyCaret using
RandomGridSearch
from the sklearn and you can change that by using search_library
and search_algorithm
parameter in the tune_model
function.# load dataset
from pycaret.datasets import get_data
boston = get_data('boston')
# init setup
from pycaret.regression import *
reg1 = setup(boston, target = 'medv')
# train model
dt = create_model('dt')
# tune model sklearn
tune_model(dt)
# tune model optuna
tune_model(dt, search_library = 'optuna')
# tune model scikit-optimize
tune_model(dt, search_library = 'scikit-optimize')
# tune model tune-sklearn
tune_model(dt, search_library = 'tune-sklearn', search_algorithm = 'hyperopt')
scikit-learn
optuna
scikit-optimize
tune-sklearn




By default PyCaret's
tune_model
function only returns the best model as selected by the tuner. Sometimes you may need access to the tuner object as it may contain important attributes, you can use return_tuner
parameter.# load dataset
from pycaret.datasets import get_data
boston = get_data('boston')
# init setup
from pycaret.regression import *
reg1 = setup(boston, target = 'medv')
# train model
dt = create_model('dt')
# tune model and return tuner
tuned_model, tuner = tune_model(dt, return_tuner=True)

Output from tune_model(dt, return_tuner=True)
type(tuned_model), type(tuner)

Output from type(tuned_model), type(tuner)
print(tuner)

Output from print(tuner)
Often times the
tune_model
will not improve the model performance. In fact, it may end up making performance worst than the model with default hyperparameters. This may be problematic when you are not actively experimenting in the Notebook rather you have a python script that runs a workflow of create_model
--> tune_model
or compare_models
--> tune_model
. To overcome this issue, you can use choose_better
. When set to True
it will always return a better performing model meaning that if hyperparameter tuning doesn't improve the performance, it will return the input model.# load dataset
from pycaret.datasets import get_data
boston = get_data('boston')
# init setup
from pycaret.regression import *
reg1 = setup(boston, target = 'medv')
# train model
dt = create_model('dt')
# tune model
dt = tune_model(dt, choose_better = True)

Output from tune_model(dt, choose_better = True)
NOTE:
choose_better
doesn't affect the scoring grid that is displayed on the screen. The scoring grid will always present the performance of the best model as selected by the tuner, regardless of the fact that output performance < input performance.This function ensembles a given estimator. The output of this function is a scoring grid with CV scores by fold. Metrics evaluated during CV can be accessed using the
get_metrics
function. Custom metrics can be added or removed using add_metric
and remove_metric
function.# load dataset
from pycaret.datasets import get_data
boston = get_data('boston')
# init setup
from pycaret.regression import *
reg1 = setup(boston, target = 'medv')
# train model
dt = create_model('dt')
# ensemble model
bagged_dt = ensemble_model(dt)

Output from ensemble_model(dt)
type(bagged_dt)
>>> sklearn.ensemble._bagging.BaggingRegressor
print(bagged_dt)

Output from print(bagged_dt)
# load dataset
from pycaret.datasets import get_data
boston = get_data('boston')
# init setup
from pycaret.regression import *
reg1 = setup(boston, target = 'medv')