optimize
parameter. Metrics evaluated during cross-validation can be accessed using the get_metrics
function. Custom metrics can be added or removed using add_metric
and remove_metric
function.n_iter
. By default, it is set to 10
.optimize
parameter. By default, it is set to Accuracy
for classification experiments and R2
for regression.custom_grid
parameter. RandomGridSearch
from the sklearn and you can change that by using search_library
and search_algorithm
parameter in the tune_model
function.tune_model
function only returns the best model as selected by the tuner. Sometimes you may need access to the tuner object as it may contain important attributes, you can use return_tuner
parameter.tune_model
will not improve the model performance. In fact, it may end up making performance worst than the model with default hyperparameters. This may be problematic when you are not actively experimenting in the Notebook rather you have a python script that runs a workflow of create_model
--> tune_model
or compare_models
--> tune_model
. To overcome this issue, you can use choose_better
. When set to True
it will always return a better performing model meaning that if hyperparameter tuning doesn't improve the performance, it will return the input model.choose_better
doesn't affect the scoring grid that is displayed on the screen. The scoring grid will always present the performance of the best model as selected by the tuner, regardless of the fact that output performance < input performance.get_metrics
function. Custom metrics can be added or removed using add_metric
and remove_metric
function.ensemble_model
. You can define this in the method
parameter.Bagging
or Boosting
. You can increase that by changing n_estimators
parameter.ensemble_model
will not improve the model performance. In fact, it may end up making performance worst than the model with ensembling. This may be problematic when you are not actively experimenting in the Notebook rather you have a python script that runs a workflow of create_model
--> ensemble_model
or compare_models
--> ensemble_model
. To overcome this issue, you can use choose_better
. When set to True
it will always return a better performing model meaning that if hyperparameter tuning doesn't improve the performance, it will return the input model.choose_better = True
the model returned from the ensemble_model
is a simple LinearRegression
instead of BaggedRegressor
. This is because the performance of the model didn't improve after ensembling and hence input model is returned. estimator_list
parameter. The output of this function is a scoring grid with CV scores by fold. Metrics evaluated during CV can be accessed using the get_metrics
function. Custom metrics can be added or removed using add_metric
and remove_metric
function.compare_models(n_select = 3
as an input to blend_models
. What happened internally is that the compare_models
function got executed first and the top 3 models are then passed as an input to the blend_models
function. compare_models
are LogisticRegression
, LinearDiscriminantAnalysis
, and RandomForestClassifier
.method = 'soft'
, it predicts the class label based on the argmax of the sums of the predicted probabilities, which is recommended for an ensemble of well-calibrated classifiers.method = 'hard'
, it uses the predictions (hard labels) from input models instead of probabilities.auto
which means it will try to use soft
method and fall back to hard
if the former is not supported, this may happen when one of your input models does not support predict_proba
attribute.tune_model
.blend_models
will not improve the model performance. In fact, it may end up making performance worst than the model with blending. This may be problematic when you are not actively experimenting in the Notebook rather you have a python script that runs a workflow of compare_models
--> blend_models
. To overcome this issue, you can use choose_better
. When set to True
it will always return a better performing model meaning that if blending the models doesn't improve the performance, it will return the single best performing input model.choose_better=True
the final model returned by this function is LogisticRegression
instead of VotingClassifier
because the performance of Logistic Regression was most optimized out of all the given input models plus the blender.estimator_list
parameter. The output of this function is a scoring grid with CV scores by fold. Metrics evaluated during CV can be accessed using the get_metrics
function. Custom metrics can be added or removed using add_metric
and remove_metric
function.compare_models(n_select = 3
as an input to stack_models
. What happened internally is that the compare_models
function got executed first and the top 3 models are then passed as an input to the stack_models
function. compare_models
are LogisticRegression
, RandomForestClassifier
, and LGBMClassifier
.auto
to be automatically determined. When set to auto
, it will invoke, for each model, predict_proba
, decision_function
or predict
function in that order. Alternatively, you can define the method explicitly.meta_model
is passed explicitly, LogisticRegression
is used for Classification experiments and LinearRegression
is used for Regression experiments. You can also pass a specific model to be used as a meta-model.probability_threshold
with a step size defined in grid_interval
parameter. This function will display a plot of the performance metrics at each probability threshold and returns the best model based on the metric defined under optimize
parameter.get_metrics
function. Custom metrics can be added or removed using add_metric
and remove_metric
function.