Docs
  • PyCaret 3.0
  • GET STARTED
    • 💻Installation
    • 🚀Quickstart
    • ⭐Tutorials
    • 📶Modules
    • ⚙️Data Preprocessing
      • Data Preparation
      • Scale and Transform
      • Feature Engineering
      • Feature Selection
      • Other setup parameters
    • 💡Functions
      • Initialize
      • Train
      • Optimize
      • Analyze
      • Deploy
      • Others
  • LEARN PYCARET
    • 📖Blog
      • Announcing PyCaret 1.0
      • Announcing PyCaret 2.0
      • 5 things you dont know about PyCaret
      • Build and deploy your first machine learning web app
      • Build your own AutoML in Power BI using PyCaret
      • Deploy ML Pipeline on Google Kubernetes
      • Deploy PyCaret and Streamlit on AWS Fargate
      • Anomaly Detector in Power BI using PyCaret
      • Deploy ML App on Google Kubernetes
      • Deploy Machine Learning Pipeline on GKE
      • Deploy Machine Learning Pipeline on AWS Fargate
      • Deploy ML Pipeline on the cloud with Docker
      • Clustering Analysis in Power BI using PyCaret
      • Deploy PyCaret Models on edge with ONNX Runtime
      • GitHub is the best AutoML you will ever need
      • Deploy PyCaret and Streamlit on AWS Fargate
      • Easy MLOps with PyCaret and MLflow
      • Clustering Analysis in Power BI using PyCaret
      • Machine Learning in Alteryx with PyCaret
      • Machine Learning in KNIME with PyCaret
      • Machine Learning in SQL using PyCaret Part I
      • Machine Learning in Power BI using PyCaret
      • Machine Learning in Tableau with PyCaret
      • Multiple Time Series Forecasting with PyCaret
      • Predict Customer Churn using PyCaret
      • Predict Lead Score (the Right Way) Using PyCaret
      • NLP Text Classification in Python using PyCaret
      • Predict Lead Score (the Right Way) Using PyCaret
      • Predicting Crashes in Gold Prices Using PyCaret
      • Predicting Gold Prices Using Machine Learning
      • PyCaret 2.1 Feature Summary
      • Ship ML Models to SQL Server using PyCaret
      • Supercharge Your ML with PyCaret and Gradio
      • Time Series 101 - For beginners
      • Time Series Anomaly Detection with PyCaret
      • Time Series Forecasting with PyCaret Regression
      • Topic Modeling in Power BI using PyCaret
      • Write and train custom ML models using PyCaret
      • Build and deploy ML app with PyCaret and Streamlit
      • PyCaret 2.3.6 is Here! Learn What’s New?
    • 📺Videos
    • 🛩️Cheat sheet
    • ❓FAQs
    • 👩‍💻Examples
  • IMPORTANT LINKS
    • 🛠️Release Notes
    • ⚙️API Reference
    • 🙋 Discussions
    • 📤Issues
    • 👮 License
  • MEDIA
    • 💻Slack
    • 📺YouTube
    • 🔗LinkedIn
    • 😾GitHub
    • 🔅Stack Overflow
Powered by GitBook
On this page
  • Build your own AutoML in Power BI using PyCaret 2.0
  • PyCaret 2.0
  • Microsoft Power BI
  • What is Automated Machine Learning?
  • How Does PyCaret works?
  • Before we start
  • 👉 Lets get started
  • Setting the Business Context
  • Objective
  • 👉 Step 1 — Load the dataset
  • 👉 Step 2— Run AutoML as Python Script
  • Sample Dashboard
  • 👉 Step 3 — Deploy Model to generate predictions
  • Loading new dataset
  • Deploy on Power BI Service
  • You may also be interested it:
  • Important Links
  • Want to learn about a specific module?

Was this helpful?

  1. LEARN PYCARET
  2. Blog

Build your own AutoML in Power BI using PyCaret

PreviousBuild and deploy your first machine learning web appNextDeploy ML Pipeline on Google Kubernetes

Last updated 2 years ago

Was this helpful?

Build your own AutoML in Power BI using PyCaret 2.0

by Moez Ali

PyCaret — An open source, low-code machine learning library in Python

PyCaret 2.0

By the end of this article you will learn how to implement the following in Power BI:

  • Setting up Python conda environment and install pycaret==2.0.

  • Link the newly created conda environment with Power BI.

  • Build your first AutoML solution in Power BI and present the performance metrics on dashboard.

  • Productionalize / deploy your AutoML solution in Power BI.

Microsoft Power BI

What is Automated Machine Learning?

Automated machine learning (AutoML) is the process of automating the time consuming, iterative tasks of machine learning. It allows data scientists and analysts to build machine learning models with efficiency while sustaining the model quality. The final goal of any AutoML solution is to finalize the best model based on some performance criteria.

Traditional machine learning model development process is resource-intensive, requiring significant domain knowledge and time to produce and compare dozens of models. With automated machine learning, you’ll accelerate the time it takes to get production-ready ML models with great ease and efficiency.

How Does PyCaret works?

PyCaret is a workflow automation tool for supervised and unsupervised machine learning. It is organized into six modules and each module has a set of functions available to perform some specific action. Each function takes an input and returns an output, which in most cases is a trained machine learning model. Modules available as of the second release are:

All modules in PyCaret supports data preparation (over 25+ essential preprocessing techniques, comes with a huge collection of untrained models & support for custom models, automatic hyperparameter tuning, model analysis and interpretability, automatic model selection, experiment logging and easy cloud deployment options.

“PyCaret is democratizing machine learning and the use of advanced analytics by providing free, open source, and low-code machine learning solution for business analysts, domain experts, citizen data scientists, and experienced data scientists”.

Before we start

Setting up the Environment

Before we start using PyCaret’s machine learning capabilities in Power BI we need to create a virtual environment and install pycaret. This is a three-step process:

Open **Anaconda Prompt **from start menu and execute the following code:

conda create --name **myenv** python=3.7

Execute the following code in Anaconda Prompt:

pip install **pycaret==2.0**

The virtual environment created must be linked with Power BI. This can be done using Global Settings in Power BI Desktop (File → Options → Global → Python scripting). Anaconda Environment by default is installed under:

C:\Users*username*\AppData\Local\Continuum\anaconda3\envs\myenv

👉 Lets get started

Setting the Business Context

An insurance company wants to improve its cash flow forecasting by better predicting the patient charges using the demographic and basic patient health risk metrics at the time of hospitalization.

Objective

To train and select the best performing regression model that predicts patient charges based on the other variables in the dataset i.e. age, sex, bmi, children, smoker, and region.

👉 Step 1 — Load the dataset

You can load dataset directly from out GitHub by going to Power BI Desktop → Get Data → Web

Create a duplicate dataset in Power Query:

👉 Step 2— Run AutoML as Python Script

Run the following code in Power Query (Transform → Run Python script):

**# import regression module**
from pycaret.regression import *

**# init setup**
reg1 = setup(data=dataset, target = 'charges', silent = True, html = False)

**# compare models**
best_model = compare_models()

**# finalize best model
**best = finalize_model(best_model)

**# save best model**
save_model(best, 'c:/users/moezs/best-model-power')

**# return the performance metrics df
**dataset = pull()

The first two line of code is for importing the relevant module and initializing the setup function. The setup function performs several imperative steps required in machine learning such as cleaning missing values (if any), splitting the data into train and test, setting up cross validation strategy, defining metrics, performing algorithm-specific transformations etc.

The magic function that trains multiple models, compares and evaluates performance metrics is **compare_models. **It returns the best model based on ‘**sort’ **parameter that can be defined inside compare_models. By default, it uses ‘R2’ for regression use-case and ‘Accuracy’ for classification use-case.

Rest of the lines are for finalizing the best model returned through compare_models and saving it as a pickle file in your local diretory. Last line returns the dataframe with details of model trained and their performance metrics.

Output:

With just few lines we have trained over 20 models and the table presents the performance metrics based on 10 fold cross validation.

Top performing model Gradient Boosting Regressor will be saved along with the entire transformation pipeline as a pickle file in your local directory. This file can be consumed later to generate predictions on a new dataset (see step 3 below).

PyCaret works on the idea of modular automation. As such if you have more resources and time for training you can extend the script to perform hyperparameter tuning, ensembling, and other available modeling techniques. See example below:

**# import regression module**
from pycaret.regression import *

**# init setup**
reg1 = setup(data=dataset, target = 'charges', silent = True, html = False)

**# compare models**
top5 = compare_models(n_select = 5)
results = pull()

**# tune top5 models
**tuned_top5 = [tune_model(i) for i in top5]

**# select best model
**best = automl()

**# save best model**
save_model(best, 'c:/users/moezs/best-model-power')

**# return the performance metrics df
**dataset = results

We have now returned top 5 models instead of the one highest performing model. We have then created a list comprehension (loop) to tune hyperparameters of top candidate models and then finally **automl function **selects the single best performing model which is then saved as a pickle file (Note that we didn’t use **finalize_model **this time because automl function returns the finalized model).

Sample Dashboard

👉 Step 3 — Deploy Model to generate predictions

Once we have a final model saved as a pickle file we can use it to predict charges on the new dataset.

Loading new dataset

For demonstration purposes, we will load the same dataset again and remove the ‘charges’ column from the dataset. Execute the following code as a Python script in Power Query to get the predictions:

**# load functions from regression module**
from pycaret.regression import load_model, predict_model

**# load model in a variable
**model = load_model(‘c:/users/moezs/best-model-powerbi’)

**# predict charges
**dataset = predict_model(model, data=dataset)

Output :

Deploy on Power BI Service

When you publish a Power BI report with Python scripts to the service, these scripts will also be executed when your data is refreshed through the on-premises data gateway.

There is no limit to what you can achieve using this light-weight workflow automation library in Python. If you find this useful, please do not forget to give us ⭐️ on our github repo.

You may also be interested it:

Important Links

Want to learn about a specific module?

Click on the links below to see the documentation and working examples.

Last week we have announced , an open source, low-code machine learning library in Python that automates machine learning workflow. It is an end-to-end machine learning and model management tool that speeds up machine learning experiment cycle and helps data scientists become more efficient and productive.

In this post we present a step-by-step tutorial on how PyCaret can be used to build an Automated Machine Learning Solution within , thus allowing data scientists and analysts to add a layer of machine learning to their Dashboards without any additional license or software costs. PyCaret is an open source and **free to use **Python library that comes with a wide range of functions that are built to work within Power BI.

Power BI is a business analytics solution that lets you visualize your data and share insights across your organization, or embed them in your app or website. In this tutorial, we will use for machine learning by importing the PyCaret library into Power BI.

https://www.pycaret.org/guide

To learn more about PyCaret, to read our official release announcement.

If you want to get started in Python, to see a gallery of example notebooks to get started.

If you are using Python for the first time, installing Anaconda Distribution is the easiest way to get started. to download Anaconda Distribution with Python 3.7 or greater.

https://www.anaconda.com/products/individual

Step 1 — Creating an anaconda environment

Anaconda Prompt — Creating an environment

Step 2 — Installing PyCaret

Installation may take 15–20 minutes. If you are having issues with installation, please see our page for known issues and resolutions.

Step 3 — Setting up a Python Directory in Power BI

File → Options → Global → Python scripting

()

Link to dataset:

Power BI Desktop → Get Data → Web
Power Query → Create a duplicate dataset
Script in Power Query
Output from Python Script
Transformation Pipeline and Model saved as a pickle file

Sample dashboard is created. PBIX file is .

Dashboard created using PyCaret AutoML results
predict_model function output in Power Query

To enable this, you must ensure that the Python runtime with the dependent Python packages are also installed on the machine hosting your personal gateway. Note, Python script execution is not supported for on-premises data gateways shared by multiple users. to read more about this.

PBIX files used in this tutorial is uploaded on this GitHub Repository:

If you would like to learn more about PyCaret 2.0, read this .

If you have used PyCaret before, you might be interested in for current release.

To hear more about PyCaret follow us on and .

📖
PyCaret 2.0
Power BI
Power BI Desktop
Classification
Regression
Clustering
Anomaly Detection
Natural Language Processing
Association Rule Mining
click here
click here
Click here
✅
✅
GitHub
✅
data source
https://raw.githubusercontent.com/pycaret/pycaret/master/datasets/insurance.csv
uploaded here
Click here
https://github.com/pycaret/pycaret-powerbi-automl
announcement
release notes
LinkedIn
Youtube
Machine Learning in Power BI using PyCaret
Build your first Anomaly Detector in Power BI using PyCaret
How to implement Clustering in Power BI using PyCaret
Topic Modeling in Power BI using PyCaret
Blog
Release Notes for PyCaret 2.0
User Guide / Documentation
Github
Stackoverflow
Install PyCaret
Notebook Tutorials
Contribute in PyCaret
Classification
Regression
Clustering
Anomaly Detection
Natural Language Processing
Association Rule Mining