get_metrics
function. Custom metrics can be added or removed using add_metric
and remove_metric
function.compare_models
returns only the top-performing model based on the criteria defined in sort
parameter. It is Accuracy
for classification experiments and R2
for regression. You can change the sort
order by passing the name of the metric based on which you want to do model selection. F1
.include
parameter.exclude
parameter.compare_models
only return the top-performing model but if you want you can get the Top N models instead of just one model. best
, it will now contain a list of the top 3 models instead of just one model as seen previously. budget_time
parameter.0.5
as a default threshold. AUC
are now different. AUC doesn't change because it's not dependent on the hard labels, everything else is dependent on the hard label which is now obtained using probability_threshold=0.25
.cross_validation=False.
compare_models
function on a cluster in distributed mode using a parameter called parallel
. It leverages the Fugue abstraction layer to run compare_models
on Spark or Dask clusters. n_jobs = 1
in the setup for testing with local Spark because some models will already try to use all available cores, and running such models in parallel can cause deadlocks from resource contention. "dask"
inside FugueBackend
and it will pull the available Dask client.get_metrics
function. Custom metrics can be added or removed using add_metric
and remove_metric
function. All the available models can be accessed using the models
function.10
fold that can either be changed globally in the setup function or locally within create_model
.models
function.create_model('dt')
, it will train Decision Tree with all default hyperparameter settings. If you would like to change that, simply pass the attributes in the create_model
function. create_model
is only displayed and is not returned. As such, if you want to access that grid as pandas.DataFrame
you will have to use pull
command after create_model
.cross_validation=False.
cross_validation
, the model is only trained one time, on the entire training dataset and scored using the test/hold-out set.return_train_score
parameter.0.5
as a default threshold. create_model
function in a loop to train multiple models or even the same model with different configurations and compare their results.sklearn
, it will work like a breeze.fit
and predict
function. PyCaret will be compatible with that. Here is a simple example: