Docs
  • PyCaret 3.0
  • GET STARTED
    • 💻Installation
    • 🚀Quickstart
    • ⭐Tutorials
    • 📶Modules
    • ⚙️Data Preprocessing
      • Data Preparation
      • Scale and Transform
      • Feature Engineering
      • Feature Selection
      • Other setup parameters
    • 💡Functions
      • Initialize
      • Train
      • Optimize
      • Analyze
      • Deploy
      • Others
  • LEARN PYCARET
    • 📖Blog
      • Announcing PyCaret 1.0
      • Announcing PyCaret 2.0
      • 5 things you dont know about PyCaret
      • Build and deploy your first machine learning web app
      • Build your own AutoML in Power BI using PyCaret
      • Deploy ML Pipeline on Google Kubernetes
      • Deploy PyCaret and Streamlit on AWS Fargate
      • Anomaly Detector in Power BI using PyCaret
      • Deploy ML App on Google Kubernetes
      • Deploy Machine Learning Pipeline on GKE
      • Deploy Machine Learning Pipeline on AWS Fargate
      • Deploy ML Pipeline on the cloud with Docker
      • Clustering Analysis in Power BI using PyCaret
      • Deploy PyCaret Models on edge with ONNX Runtime
      • GitHub is the best AutoML you will ever need
      • Deploy PyCaret and Streamlit on AWS Fargate
      • Easy MLOps with PyCaret and MLflow
      • Clustering Analysis in Power BI using PyCaret
      • Machine Learning in Alteryx with PyCaret
      • Machine Learning in KNIME with PyCaret
      • Machine Learning in SQL using PyCaret Part I
      • Machine Learning in Power BI using PyCaret
      • Machine Learning in Tableau with PyCaret
      • Multiple Time Series Forecasting with PyCaret
      • Predict Customer Churn using PyCaret
      • Predict Lead Score (the Right Way) Using PyCaret
      • NLP Text Classification in Python using PyCaret
      • Predict Lead Score (the Right Way) Using PyCaret
      • Predicting Crashes in Gold Prices Using PyCaret
      • Predicting Gold Prices Using Machine Learning
      • PyCaret 2.1 Feature Summary
      • Ship ML Models to SQL Server using PyCaret
      • Supercharge Your ML with PyCaret and Gradio
      • Time Series 101 - For beginners
      • Time Series Anomaly Detection with PyCaret
      • Time Series Forecasting with PyCaret Regression
      • Topic Modeling in Power BI using PyCaret
      • Write and train custom ML models using PyCaret
      • Build and deploy ML app with PyCaret and Streamlit
      • PyCaret 2.3.6 is Here! Learn What’s New?
    • 📺Videos
    • 🛩️Cheat sheet
    • ❓FAQs
    • 👩‍💻Examples
  • IMPORTANT LINKS
    • 🛠️Release Notes
    • ⚙️API Reference
    • 🙋 Discussions
    • 📤Issues
    • 👮 License
  • MEDIA
    • 💻Slack
    • 📺YouTube
    • 🔗LinkedIn
    • 😾GitHub
    • 🔅Stack Overflow
Powered by GitBook
On this page
  • predict_model
  • Hold-out predictions
  • Unseen data predictions
  • Probability by class
  • Setting probability threshold
  • finalize_model
  • deploy_model
  • AWS
  • GCP
  • Azure
  • save_model
  • load_model
  • save_experiment
  • load_experiment
  • check_drift
  • convert_model
  • create_api
  • create_docker
  • create_app

Was this helpful?

  1. GET STARTED
  2. Functions

Deploy

MLOps and deployment related functions in PyCaret

PreviousAnalyzeNextOthers

Last updated 2 years ago

Was this helpful?

predict_model

This function generates the label using a trained model. When data is None, it predicts label and score on the holdout set.

Hold-out predictions

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# create a model
xgboost = create_model('xgboost')

# predict on hold-out
predict_model(xgboost)

Unseen data predictions

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# create a model
xgboost = create_model('xgboost')

# predict on new data
new_data = diabetes.copy()
new_data.drop('Class variable', axis = 1, inplace = True)
predict_model(xgboost, data = new_data)

Probability by class

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# create a model
xgboost = create_model('xgboost')

# predict on new data
new_data = diabetes.copy()
new_data.drop('Class variable', axis = 1, inplace = True)
predict_model(xgboost, raw_score = True, data = new_data)

Setting probability threshold

The threshold for converting predicted probability to the class labels. Unless this parameter is set, it will default to the value set during model creation. If that wasn’t set, the default will be 0.5 for all classifiers. Only applicable for binary classification.

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# create a model
xgboost = create_model('xgboost')

# probability threshold 0.3
predict_model(xgboost, probability_threshold = 0.3)

Comparison between different thresholds on the hold-out data

finalize_model

This function trains a given model on the entire dataset including the hold-out set.

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# create a model
rf = create_model('rf')

# finalize a model
finalize_model(rf)

This function doesn't change any parameter of the model. It only refits on the entire dataset including the hold-out set.

deploy_model

This function deploys the entire ML pipeline on the cloud.

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# create a model
lr = create_model('lr')

# finalize a model
final_lr = finalize_model(lr)

# deploy a model
deploy_model(final_lr, model_name = 'lr_aws', platform = 'aws', authentication = { 'bucket'  : 'pycaret-test' })

AWS

Before deploying a model to an AWS S3 (‘aws’), environment variables must be configured using the command-line interface. To configure AWS environment variables, type aws configure in your python command line. The following information is required which can be generated using the Identity and Access Management (IAM) portal of your amazon console account:

  • AWS Access Key ID

  • AWS Secret Key Access

  • Default Region Name (can be seen under Global settings on your AWS console)

  • Default output format (must be left blank)

GCP

To deploy a model on Google Cloud Platform ('gcp'), the project must be created using the command-line or GCP console. Once the project is created, you must create a service account and download the service account key as a JSON file to set environment variables in your local environment.

Azure

To deploy a model on Microsoft Azure ('azure'), environment variables for the connection string must be set in your local environment. Go to settings of storage account on Azure portal to access the connection string required.

  • AZURE_STORAGE_CONNECTION_STRING (required as environment variable)

save_model

This function saves the transformation pipeline and a trained model object into the current working directory as a pickle file for later use.

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# create a model
dt = create_model('dt')

# save pipeline
save_model(dt, 'dt_pipeline')

load_model

This function loads a previously saved pipeline.

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# create a model
dt = create_model('dt')

# save pipeline
save_model(dt, 'dt_pipeline')

# load pipeline
load_model('dt_pipeline')

save_experiment

The save_experiment function saves the experiment to a pickle file. The experiment is saved using cloudpickle to deal with lambda functions. The data or test data is NOT saved with the experiment and will need to be specified again when loading using load_experiment.

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# save experiment
save_experiment('my_saved_experiment1')

load_experiment

The load_experiment function loads an experiment from the path or a file. The data (and test_data) is not saved with the experiment and will need to be specified again at the time of loading.

# load data
data = get_data('diabetes')

# load experiment function
from pycaret.classification import load_experiment
clf2 = load_experiment('my_saved_experiment1', data = data)

check_drift

The check_drift function generates a drift report file using the evidently library.

# load dataset
from pycaret.datasets import get_data
data = get_data('insurance')

# generate drift report
check_drift(reference_data = data.head(500), current_data = data.tail(500), target = 'charges')

It will generate a HTML report locally.

convert_model

This function transpiles the trained machine learning model's decision function in different programming languages such as Python, C, Java, Go, C#, etc. It is very useful if you want to deploy models into environments where you can't install your normal Python stack to support model inference.

# load dataset
from pycaret.datasets import get_data
juice = get_data('juice')

# init setup
from pycaret.classification import *
exp_name = setup(data = juice,  target = 'Purchase')

# train a model
lr = create_model('lr')

# convert a model
convert_model(lr, 'java')

Video:

create_api

This function takes an input model and creates a POST API for inference. It only creates the API and doesn't run it automatically. To run the API, you must run the Python file using !python.

# load dataset
from pycaret.datasets import get_data
juice = get_data('juice')

# init setup
from pycaret.classification import *
exp_name = setup(data = juice,  target = 'Purchase')

# train a model
lr = create_model('lr')

# create api
create_api(lr, 'lr_api')

# run api
!python lr_api.py

Once you initialize API with the !python command. You can see the server on localhost:8000/docs.

Video:

create_docker

This function creates a Dockerfile and requirements.txt for productionalizing API end-point.

# load dataset
from pycaret.datasets import get_data
juice = get_data('juice')

# init setup
from pycaret.classification import *
exp_name = setup(data = juice,  target = 'Purchase')

# train a model
lr = create_model('lr')

# create api
create_api(lr, 'lr_api')

# create docker
create_docker('lr_api')

You can see two files are created for you.

Video:

create_app

This function creates a basic gradio app for inference. It will later be expanded for other app types such Streamlit.

# load dataset
from pycaret.datasets import get_data
juice = get_data('juice')

# init setup
from pycaret.classification import *
exp_name = setup(data = juice,  target = 'Purchase')

# train a model
lr = create_model('lr')

# create app
create_app(lr)

Video:

NOTE: This is only applicable for the use-cases.

NOTE: This is only applicable for the use-cases (binary only).

Learn more about it:

Learn more about it:

💡
Classification
Classification
https://cloud.google.com/docs/authentication/production
https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python?toc=%2Fpython%2Fazure%2FTOC.json
Output from predict_model(xgboost)
Output from predict_model(xgboost, data=new_data)
Output from predict_model(xgboost, raw_score = True, data = new_data)
Output from predict_model(xgboost, probability_threshold = 0.3)
probability threshold = 0.5 vs. probability threshold = 0.3
Output from finalize_model(rf)
Output from deploy_model(...)
Output from save_model(dt, 'dt_pipeline')
Output from load_model('dt_pipeline')
Output from convert_model(lr, 'java')
Output from create_api(lr, 'lr_api')
FastAPI server hosted on localhost
Output from create_docker('lr_api')
%load requirements.txt
%load DockerFile
Output from create_app(lr)