Comment on page
Initialize
Initialize experiment in PyCaret
This function initializes the experiment in PyCaret and creates the transformation pipeline based on all the parameters passed in the function. The
setup
function must be called before executing any other function. It takes two required parameters: data
and target
. All the other parameters are optional. PyCaret 3.0 has two API's. You can choose one of it based on your preference. The functionalities and experiment results are consistent.
1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data = diabetes, target = 'Class variable', session_id = 123)

1
# load dataset
2
from pycaret.datasets import get_data
3
diabetes = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import ClassificationExperiment
7
clf1 = ClassificationExperiment()
8
clf1.setup(data = diabetes, target = 'Class variable', session_id = 123)

There are only two required parameters in the
setup
:- target: float, int, str or sequence, default = -1If int or str, respectively index or name of the target column in data. The default value selects the last column in the dataset. If sequence, it should have shape (n_samples,).
- data_func: Callable[[], DATAFRAME_LIKE] = NoneThe function that generate
data
(the dataframe-like input). This is useful when the dataset is large, and you need parallel operations such ascompare_models
. It can avoid broadcasting large dataset from driver to workers. Notice one and only one ofdata
anddata_func
must be set. - data: dataframe-like = NoneData set with shape (n_samples, n_features), where n_samples is the number of samples and n_features is the number of features. If data is not a pandas dataframe, it's converted to one using default column names.
NOTE: target parameter is not required in
pycaret.clustering
and pycaret.anomaly
module.You can automatically track entire experiments in PyCaret. A parameter in the setup can be enabled to automatically track all the metrics, hyperparameters, and model artifacts. By default, PyCaret uses
MLFlow
for experiment logging. Other available options are wandb
cometml
dagshub
. 1
# load dataset
2
from pycaret.datasets import get_data
3
data = get_data('diabetes')
4
5
# init setup
6
from pycaret.classification import *
7
clf1 = setup(data, target = 'Class variable', log_experiment = True, experiment_name = 'diabetes1')
8
9
# model training
10
best_model = compare_models()
Initialize the
MLflow
server on localhost:1
# init server
2
!mlflow ui

There are quite a few parameters in the setup function that are not directly related to preprocessing or data transformation but are used as part of model validation and selection strategy such as
train_size
, fold_strategy
, or number of fold
for cross-validation. To learn more about all the model validation and selection settings in the setup, see this page.With PyCaret, you can train models on GPU and speed up your workflow by 10x. To train models on GPU simply pass
use_gpu = True
in the setup function. There is no change in the use of the API, however, in some cases, additional libraries have to be installed as they are not installed with the default version or the full version. To learn more about GPU support, see this page.To see the use of the
setup
in other modules of PyCaret, see below:All the examples in the following sections are shown using Functional API only.
Last modified 8mo ago