Initialize
Initialize experiment in PyCaret
Last updated
Initialize experiment in PyCaret
Last updated
This function initializes the experiment in PyCaret and creates the transformation pipeline based on all the parameters passed in the function. The setup
function must be called before executing any other function. It takes two required parameters: data
and target
. All the other parameters are optional.
PyCaret 3.0 has two API's. You can choose one of it based on your preference. The functionalities and experiment results are consistent.
There are only two required parameters in the setup
:
target: float, int, str or sequence, default = -1
If int or str, respectively index or name of the target column in data. The default value selects the last column in the dataset. If sequence, it should have shape (n_samples,).
data_func: Callable[[], DATAFRAME_LIKE] = None
The function that generate data
(the dataframe-like input). This is useful when the dataset is large, and you need parallel operations such as compare_models
. It can avoid broadcasting large dataset from driver to workers. Notice one and only one of data
and data_func
must be set.
data: dataframe-like = None
Data set with shape (n_samples, n_features), where n_samples is the number of samples and n_features is the number of features. If data is not a pandas dataframe, it's converted to one using default column names.
NOTE: target parameter is not required in pycaret.clustering
and pycaret.anomaly
module.
You can automatically track entire experiments in PyCaret. A parameter in the setup can be enabled to automatically track all the metrics, hyperparameters, and model artifacts. By default, PyCaret uses MLFlow
for experiment logging. Other available options are wandb
cometml
dagshub
.
Initialize the MLflow
server on localhost:
To learn more about experiment tracking in PyCaret, see this page.
There are quite a few parameters in the setup function that are not directly related to preprocessing or data transformation but are used as part of model validation and selection strategy such as train_size
, fold_strategy
, or number of fold
for cross-validation. To learn more about all the model validation and selection settings in the setup, see this page.
With PyCaret, you can train models on GPU and speed up your workflow by 10x. To train models on GPU simply pass use_gpu = True
in the setup function. There is no change in the use of the API, however, in some cases, additional libraries have to be installed as they are not installed with the default version or the full version. To learn more about GPU support, see this page.
To see the use of the setup
in other modules of PyCaret, see below:
All the examples in the following sections are shown using Functional API only.