Docs
  • PyCaret 3.0
  • GET STARTED
    • 💻Installation
    • 🚀Quickstart
    • ⭐Tutorials
    • 📶Modules
    • ⚙️Data Preprocessing
      • Data Preparation
      • Scale and Transform
      • Feature Engineering
      • Feature Selection
      • Other setup parameters
    • 💡Functions
      • Initialize
      • Train
      • Optimize
      • Analyze
      • Deploy
      • Others
  • LEARN PYCARET
    • 📖Blog
      • Announcing PyCaret 1.0
      • Announcing PyCaret 2.0
      • 5 things you dont know about PyCaret
      • Build and deploy your first machine learning web app
      • Build your own AutoML in Power BI using PyCaret
      • Deploy ML Pipeline on Google Kubernetes
      • Deploy PyCaret and Streamlit on AWS Fargate
      • Anomaly Detector in Power BI using PyCaret
      • Deploy ML App on Google Kubernetes
      • Deploy Machine Learning Pipeline on GKE
      • Deploy Machine Learning Pipeline on AWS Fargate
      • Deploy ML Pipeline on the cloud with Docker
      • Clustering Analysis in Power BI using PyCaret
      • Deploy PyCaret Models on edge with ONNX Runtime
      • GitHub is the best AutoML you will ever need
      • Deploy PyCaret and Streamlit on AWS Fargate
      • Easy MLOps with PyCaret and MLflow
      • Clustering Analysis in Power BI using PyCaret
      • Machine Learning in Alteryx with PyCaret
      • Machine Learning in KNIME with PyCaret
      • Machine Learning in SQL using PyCaret Part I
      • Machine Learning in Power BI using PyCaret
      • Machine Learning in Tableau with PyCaret
      • Multiple Time Series Forecasting with PyCaret
      • Predict Customer Churn using PyCaret
      • Predict Lead Score (the Right Way) Using PyCaret
      • NLP Text Classification in Python using PyCaret
      • Predict Lead Score (the Right Way) Using PyCaret
      • Predicting Crashes in Gold Prices Using PyCaret
      • Predicting Gold Prices Using Machine Learning
      • PyCaret 2.1 Feature Summary
      • Ship ML Models to SQL Server using PyCaret
      • Supercharge Your ML with PyCaret and Gradio
      • Time Series 101 - For beginners
      • Time Series Anomaly Detection with PyCaret
      • Time Series Forecasting with PyCaret Regression
      • Topic Modeling in Power BI using PyCaret
      • Write and train custom ML models using PyCaret
      • Build and deploy ML app with PyCaret and Streamlit
      • PyCaret 2.3.6 is Here! Learn What’s New?
    • 📺Videos
    • 🛩️Cheat sheet
    • ❓FAQs
    • 👩‍💻Examples
  • IMPORTANT LINKS
    • 🛠️Release Notes
    • ⚙️API Reference
    • 🙋 Discussions
    • 📤Issues
    • 👮 License
  • MEDIA
    • 💻Slack
    • 📺YouTube
    • 🔗LinkedIn
    • 😾GitHub
    • 🔅Stack Overflow
Powered by GitBook
On this page
  • Machine Learning in Alteryx with PyCaret
  • Introduction
  • PyCaret
  • Alteryx Designer
  • Tutorial Pre-Requisites:
  • 👉We are ready now
  • Dataset
  • 👉 Model Training & Selection
  • 👉 Model Scoring
  • Coming Soon!
  • Important Links
  • More PyCaret related tutorials:

Was this helpful?

  1. LEARN PYCARET
  2. Blog

Machine Learning in Alteryx with PyCaret

PreviousClustering Analysis in Power BI using PyCaretNextMachine Learning in KNIME with PyCaret

Last updated 2 years ago

Was this helpful?

Machine Learning in Alteryx with PyCaret

A step-by-step tutorial on training and deploying machine learning models in Alteryx Designer using PyCaret

Introduction

👉 What is PyCaret and how to get started?

👉 What is Alteryx Designer and how to set it up?

👉 Train end-to-end machine learning pipeline in Alteryx Designer including data preparation such as missing value imputation, one-hot-encoding, scaling, transformations, etc.

👉 Deploy trained pipeline and generate inference during ETL.

PyCaret

Alteryx Designer

Tutorial Pre-Requisites:

👉We are ready now

Open Alteryx Designer and click on File → New Workflow

On the top, there are tools that you can drag and drop on the canvas and execute the workflow by connecting each component to one another.

Dataset

I will create two separate Alteryx workflows. First one for model training and selection and the second one for scoring the new data using the trained pipeline.

👉 Model Training & Selection

Let’s first read the CSV file from the **Input Data **tool followed by a **Python Script. **Inside the Python script execute the following code:

**# install pycaret
**from ayx import Package
Package.installPackages('pycaret')

**# read data from input data tool**
from ayx import Alteryx
data = Alteryx.read("#1")

**# init setup, prepare data**
from pycaret.regression import *
s = setup(data, target = 'charges', silent=True)

**# model training and selection
**best = compare_models()

**# store the results, print and save**
results = pull()
results.to_csv('c:/users/moezs/pycaret-demo-alteryx/results.csv', index = False)
Alteryx.write(results, 1)

**# finalize best model and save**
best_final = finalize_model(best)
save_model(best_final, 'c:/users/moezs/pycaret-demo-alteryx/pipeline')

This script is importing the regression module from pycaret, then initializing the setup function which automatically handles train_test_split and all the data preparation tasks such as missing value imputation, scaling, feature engineering, etc. compare_models trains and evaluates all the estimators using kfold cross-validation and returns the best model.

pull function calls the model performance metric as a Dataframe which is then saved as results.csv on a drive and also written to the first anchor of Python tool in Alteryx (so that you can view results on screen).

Finally, save_model saves the entire transformation pipeline including the best model as a pickle file.

When you successfully execute this workflow, you will generate pipeline.pkl and results.csv file. You can see the output of the best models and their cross-validated metrics on-screen as well.

This is what results.csv contains:

These are the cross-validated metrics for all the models. The best model, in this case, is Gradient Boosting Regressor.

👉 Model Scoring

We can now use our pipeline.pkl to score on the new dataset. Since I do not have a separate dataset for ***insurance.csv without the label, ***what I will do is drop the target column i.e. charges, and then generate predictions using the trained pipeline.

I have used the **Select Tool **to remove the target column i.e. charges . In the Python script execute the following code:

**# read data from the input tool**
from ayx import Alteryx**
**data = Alteryx.read("#1")

**# load pipeline
**from pycaret.regression import load_model, predict_model
pipeline = load_model('c:/users/moezs/pycaret-demo-alteryx/pipeline')

**# generate predictions and save to csv
**predictions = predict_model(pipeline, data)
predictions.to_csv('c:/users/moezs/pycaret-demo-alteryx/predictions.csv', index=False)

**# display in alteryx
**Alteryx.write(predictions, 1)

When you successfully execute this workflow, it will generate predictions.csv.

Coming Soon!

There is no limit to what you can achieve using this lightweight workflow automation library in Python. If you find this useful, please do not forget to give us ⭐️ on our GitHub repository.

Important Links

More PyCaret related tutorials:

In this tutorial, I will show you how you can train and deploy machine learning pipelines in a very popular ETL tool using — an open-source, low-code machine learning library in Python. The Learning Goals of this tutorial are:

is an open-source, low-code machine learning library and end-to-end model management tool built-in Python for automating machine learning workflows. PyCaret is known for its ease of use, simplicity, and ability to quickly and efficiently build and deploy end-to-end machine learning pipelines. To learn more about PyCaret, check out their .

is a proprietary tool developed by ** and is used for automating every step of analytics, including data preparation, blending, reporting, predictive analytics, and data science. You can access any data source, file, application, or data type, and experience the simplicity and power of a self-service platform with 260+ drag-and-drop building blocks. You can download the one-month free trial version of Alteryx Designer from .

https://www.alteryx.com

For this tutorial, you will need two things. The first one being the Alteryx Designer which is a desktop software that you can download from . Second, you need Python. The easiest way to get Python is to download Anaconda Distribution. To download that, .

New Workflow in Alteryx Designer

For this tutorial, I am using a regression dataset from PyCaret’s repository called insurance. You can download the data from .

Sample Dataset
Training Workflow
Scoring Workflow
predictions.csv

Next week I will take a deep dive and focus on more advanced functionalities of PyCaret that you can use within Alteryx to enhance your machine learning workflows. If you would like to be notified automatically, you can follow me on , , and .

PyCaret — Image by Author
PyCaret — Image by Author

To hear more about PyCaret follow us on and .

Join us on our slack channel. Invite link .

📖
Alteryx
PyCaret
PyCaret
GitHub
Alteryx Designer
**Alteryx
here
here
click here
here
Medium
LinkedIn
Twitter
LinkedIn
Youtube
here
Documentation
Blog
GitHub
StackOverflow
Install PyCaret
Notebook Tutorials
Contribute in PyCaret
Machine Learning in KNIME with PyCaret A step-by-step guide on training and deploying end-to-end machine learning pipelines in KNIME using PyCarettowardsdatascience.com
Easy MLOps with PyCaret + MLflow A beginner-friendly, step-by-step tutorial on integrating MLOps in your Machine Learning experiments using PyCarettowardsdatascience.com
Write and train your own custom machine learning models using PyCaret towardsdatascience.com
Build with PyCaret, Deploy with FastAPI *A step-by-step, beginner-friendly tutorial on how to build an end-to-end Machine Learning Pipeline with PyCaret and…*towardsdatascience.com
Time Series Anomaly Detection with PyCaret A step-by-step tutorial on unsupervised anomaly detection for time series data using PyCarettowardsdatascience.com
Supercharge your Machine Learning Experiments with PyCaret and Gradio A step-by-step tutorial to develop and interact with machine learning pipelines rapidlytowardsdatascience.com
Multiple Time Series Forecasting with PyCaret A step-by-step tutorial on forecasting multiple time series using PyCarettowardsdatascience.com