Build and deploy your first machine learning web app
Last updated
Last updated
In our last post we demonstrated how to train and deploy machine learning models in Power BI using PyCaret. If you haven’t heard about PyCaret before, please read our announcement to get a quick start.
In this tutorial we will use PyCaret to develop a machine learning pipeline, that will include preprocessing transformations and a regression model to predict patient hospitalization charges based on demographic and basic patient health risk metrics such as age, BMI, smoking status etc.
What is a deployment and why do we deploy machine learning models.
Develop a machine learning pipeline and train models using PyCaret.
Build a simple web app using a Python framework called ‘Flask’.
Deploy a web app on ‘Heroku’ and see your model in action.
PyCaret is an open source, low-code machine learning library in Python to train and deploy machine learning pipelines and models in production. PyCaret can be installed easily using pip.
Flask is a framework that allows you to build web applications. A web application can be a commercial website, a blog, e-commerce system, or an application that generates predictions from data provided in real-time using trained models. If you don’t have Flask installed, you can use pip to install it.
GitHub is a cloud-based service that is used to host, manage and control code. Imagine you are working in a large team where multiple people (sometime hundreds of them) are making changes. PyCaret is itself an example of an open-source project where hundreds of community developers are continuously contributing to source code. If you haven’t used GitHub before, you can sign up for a free account.
Heroku is a platform as a service (PaaS) that enables the deployment of web apps based on a managed container system, with integrated data services and a powerful ecosystem. In simple words, this will allow you to take the application from your local machine to the cloud so that anybody can access it using a Web URL. In this tutorial we have chosen Heroku for deployment as it provides free resource hours when you sign up for new account.
The deployment of machine learning models is the process of making models available in production where web applications, enterprise software and APIs can consume the trained model by providing new data points and generating predictions.
Normally machine learning models are built so that they can be used to predict an outcome (binary value i.e. 1 or 0 for Classification, continuous values for Regression, labels for Clustering etc. There are two broad ways of generating predictions (i) predict by batch; and (ii) predict in real-time. In our last tutorial we demonstrated how to deploy machine learning model in Power BI and predict by batch. In this tutorial we will see how to deploy a machine learning model to predict in real-time.
An insurance company wants to improve its cash flow forecasting by better predicting patient charges using demographic and basic patient health risk metrics at the time of hospitalization.
To build a web application where demographic and health information of a patient is entered in a web form to predict charges.
Train and validate models and develop a machine learning pipeline for deployment.
Build a basic HTML front-end with an input form for independent variables (age, sex, bmi, children, smoker, region).
Build a back-end of the web application using a Flask Framework.
Deploy the web app on Heroku. Once deployed, it will become publicly available and can be accessed via Web URL.
Training and model validation are performed in Integrated Development Environment (IDE) or Notebook either on your local machine or on cloud. In this tutorial we will use PyCaret in Jupyter Notebook to develop machine learning pipeline and train regression models. If you haven’t used PyCaret before, click here to learn more about PyCaret or see Getting Started Tutorials on our website.
In this tutorial, we have performed two experiments. The first experiment is performed with default preprocessing settings in PyCaret (missing value imputation, categorical encoding etc). The second experiment has some additional preprocessing tasks such as scaling and normalization, automatic feature engineering and binning continuous data into intervals. See the setup example for the second experiment:
The magic happens with only a few lines of code. Notice that in Experiment 2 the transformed dataset has 62 features for training derived from only 7 features in the original dataset. All of the new features are the result of transformations and automatic feature engineering in PyCaret.
Sample code for model training and validation in PyCaret:
Notice the impact of transformations and automatic feature engineering. The R2 has increased by 10% with very little effort. We can compare the residual plot of linear regression model for both experiments and observe the impact of transformations and feature engineering on the **heteroskedasticity **of model.
Machine learning is an *iterative *process. Number of iterations and techniques used within are dependent on how critical the task is and what the impact will be if predictions are wrong. The severity and impact of a machine learning model to predict a patient outcome in real-time in the ICU of a hospital is far more than a model built to predict customer churn.
In this tutorial, we have performed only two iterations and the linear regression model from the second experiment will be used for deployment. At this stage, however, the model is still only an object within notebook. To save it as a file that can be transferred to and consumed by other applications, run the following code:
When you save a model in PyCaret, the entire transformation pipeline based on the configuration defined in the **setup() **function is created . All inter-dependencies are orchestrated automatically. See the pipeline and model stored in the ‘deployment_28042020’ variable:
We have finished our first task of training and selecting a model for deployment. The final machine learning pipeline and linear regression model is now saved as a file in the local drive under the location defined in the **save_model() **function. (In this example: c:/username/ins/deployment_28042020.pkl).
Now that our machine learning pipeline and model are ready we will start building a web application that can connect to them and generate predictions on new data in real-time. There are two parts of this application:
Front-end (designed using HTML)
Back-end (developed using Flask in Python)
Generally, the front-end of web applications are built using HTML which is not the focus of this article. We have used a simple HTML template and a CSS style sheet to design an input form. Here’s the HTML snippet of the front-end page of our web application.
You don’t need to be an expert in HTML to build simple applications. There are numerous free platforms that provide HTML and CSS templates as well as enable building beautiful HTML pages quickly by using a drag and drop interface.
**CSS Style Sheet **CSS (also known as Cascading Style Sheets) describes how HTML elements are displayed on a screen. It is an efficient way of controlling the layout of your application. Style sheets contain information such as background color, font size and color, margins etc. They are saved externally as a .css file and is linked to HTML but including 1 line of code.
The back-end of a web application is developed using a Flask framework. For beginner’s it is intuitive to consider Flask as a library that you can import just like any other library in Python. See the sample code snippet of our back-end written using a Flask framework in Python.
If you remember from the Step 1 above we have finalized linear regression model that was trained on 62 features that were automatically engineered by PyCaret. However, the front-end of our web application has an input form that collects only the six features i.e. age, sex, bmi, children, smoker, region.
How do we transform 6 features of a new data point in real-time into 62 features on which model was trained? With a sequence of transformations applied during model training, coding becomes increasingly complex and time-taking task.
In PyCaret all transformations such as categorical encoding, scaling, missing value imputation, feature engineering and even feature selection are automatically executed in real-time before generating predictions.
Imagine the amount of code you would have had to write to apply all the transformations in strict sequence before you could even use your model for predictions. In practice, when you think of machine learning, you should think about the entire ML pipeline and not just the model.
**Testing App **One final step before we publish the application on Heroku is to test the web app locally. Open Anaconda Prompt and navigate to folder where ‘app.py’ is saved on your computer. Run the python file with below code:
Once executed, copy the URL into a browser and it should open a web application hosted on your local machine (127.0.0.1). Try entering test values to see if the predict function is working. In the example below, the expected bill for a 19 year old female smoker with no children in the southwest is $20,900.
Congratulations! you have now built your first machine learning app. Now it’s time to take this application from your local machine into the cloud so other people can use it with a Web URL.
Now that the model is trained, the machine learning pipeline is ready, and the application is tested on our local machine, we are ready to start our deployment on Heroku. There are couple of ways to upload your application source code onto Heroku. The simplest way is to link a GitHub repository to your Heroku account.
If you would like to follow along you can fork this repository from GitHub. If you don’t know how to fork a repo, please read this official GitHub tutorial.
By now you are familiar with all the files in repository shown above except for two files i.e. ‘requirements.txt’ and ‘Procfile’.
**requirements.txt **file is a text file containing the names of the python packages required to execute the application. If these packages are not installed in the environment application is running, it will fail.
**Procfile **is simply one line of code that provides startup instructions to web server that indicate which file should be executed first when somebody logs into the application. In this example the name of our application file is ‘**app.py’ **and the name of the application is also ‘app’. (hence app:app)
Once all the files are uploaded onto the GitHub repository, we are now ready to start deployment on Heroku. Follow the steps below:
Step 1 — Sign up on heroku.com and click on ‘Create new app’
Step 2 — Enter App name and region
Step 3 — Connect to your GitHub repository where code is hosted
Step 4 — Deploy branch
Step 5 — Wait 5–10 minutes and BOOM
App is published to URL: https://pycaret-insurance.herokuapp.com/
There is one last thing to see before we end the tutorial.
So far we have built and deployed a web application that works with our machine learning pipeline. Now imagine that you already have an enterprise application in which you want to integrate predictions from your model. What you need is a web service where you can make an API call with input data points and get the predictions back. To achieve this we have created the predict_api function in our ‘app.py’ file. See the code snippet:
Here’s how you can use this web service in Python using the requests library:
In the next tutorial for deploying machine learning pipelines, we will dive deeper into deploying machine learning pipelines using docker containers. We will demonstrate how to easily deploy and run containerized machine learning applications on Linux.
Follow our LinkedIn and subscribe to our Youtube channel to learn more about PyCaret.
User Guide / Documentation GitHub Repository Install PyCaret Notebook Tutorials Contribute in PyCaret
As of the first release 1.0.0, PyCaret has the following modules available for use. Click on the links below to see the documentation and working examples in Python.
Classification Regression Clustering Anomaly Detection Natural Language Processing Association Rule Mining
PyCaret getting started tutorials in Notebook:
Clustering Anomaly Detection Natural Language Processing Association Rule Mining Regression Classification
We are actively working on improving PyCaret. Our future development pipeline includes a new **Time Series Forecasting **module, integration with **TensorFlow, **and major improvements on the scalability of PyCaret. If you would like to share your feedback and help us improve further, you may fill this form on the website or leave a comment on our GitHub or LinkedIn page.
PyCaret is an open source project. Everybody is welcome to contribute. If you would like contribute, please feel free to work on open issues. Pull requests are accepted with unit tests on dev-1.0.1 branch.
Please give us ⭐️ on our GitHub repo if you like PyCaret.
Medium : https://medium.com/@moez_62905/
LinkedIn : https://www.linkedin.com/in/profile-moez/
Twitter : https://twitter.com/moezpycaretorg1