Watson Studio

 View Only

Trust your ML models with Streamlit and IBM Cloud Pak for Data (Part 2)

By Jerome Kafrouni posted Wed February 02, 2022 08:02 PM

  
Part 2: Build models and test them in Streamlit

Welcome back to our series on trusting machine learning models by using Cloud Pak for Data and Streamlit!


In Part 1, we gathered data in a Watson Studio project and built a Streamlit app to support data discussions with project stakeholders. In Part 2, we'll keep working on the same use case, on credit scoring, and on building a page to interact with models deployed in Cloud Pak for Data.


In this post, you’ll learn:


  • How to add pages to your app
  • How to build models by using code, low-code, and no-code tools
  • How to deploy models for online scoring and use their serving REST APIs
  • How to create a page to test models and perform a what-if analysis
  • How to improve the performance of the app by using the built-in caching mechanism
Want to follow along by exploring the code? You can do so here.


How to add pages to your app
Blog posts that mention making a Streamlit app multi-page use the same idea: add an input from the user to select a page (typically radio buttons in a sidebar, i.e., st.sidebar.radio()), then add code to render a different page depending on the value of this input. Pretty easy, right? (Also, make sure to keep an eye on Streamlit’s forum too, multipage apps will be natively supported in Streamlit soon!)

Adding radio buttons in the sidebar makes an easy multi-page app!
Adding radio buttons in the sidebar makes an easy multi-page app!

Once you add more pages, modularize your code so that each page is in a separate script.
In Part 1, we started off with a simple app.py script. Only some logic (the API calls to Cloud Pak for Data) got abstracted away in a separate script. In Part 2, we'll keep app.py simple and create a pages folder. I like to have a function called write() on each page. This way, rendering a page corresponds to calling my_page.write().

Once you start building the pages, you’ll see a requirement appear: how to make the pages talk to each other? The easiest way is to use the Session State mechanism. st.session_state is a dictionary that you can read from and write to. It contains the state of each user. For example, since users authenticate to Cloud Pak for Data on the first page, we can re-use their authentication token across the pages by storing it in the session state, as seen here.

Before creating a model testing page, let's head back to Watson Studio and discuss the options of building machine learning models.


How to build models by using code, low-code, and no-code tools
In the past few years, I've worked with many data science teams. Some empowered the role of a citizen data scientist and had a deep understanding of the business but a limited background in predictive modeling. Others had statisticians who worked for years in statistical modeling software and slowly ramped up to newer open-source machine learning tools. And yet others wanted to build everything in Python or R.

This is one of the strengths of Watson Studio. You can rely on AutoAI to build your predictive models. Want more control over the modeling process but still use a UI to build models? Then try SPSS Modeler.

As mentioned in Part 1, on the coding side, you can use Jupyter notebooks (with a Python, R, or Scala kernel) and RStudio (embedded in the platform). You can even start with an AutoAI experiment then turn it into a notebook. For more complex use cases, deep learning experiments will let you perform Hyperparameter Optimization and Distributed Deep Learning on GPU, while the Federated Learning services will help you when your data is distributed across locations.

Building a pipeline for XGBoost and SVM models by using SPSS Modeler's drag and drop interface

How to deploy models for online scoring and use their serving REST APIs
Models built in any of these sources can be stored and shared in Watson Studio and deployed to a fully managed hosting service. This means that no matter how the data scientists of our sample project decide to build models, we'll be able to interact with their results through a consistent API. This makes our Streamlit app development much simpler!

Depending on the modeling tool you’re using, you can save and deploy a model either from the UI or from code by using either a Python library or the underlying REST API. This same API includes prediction endpoints which you can see in action in our app under cpd_helpers.py.


How to create a page to test models and perform a what-if analysis
Now that we know where to build models in Watson Studio and how to make them available as a web service to serve the app's requests, let's get back to building the app.

We're going to add a page that can access any deployed model accessible by the end-user, then send it data to be scored. To help the user do that, we let them select a row in the same dataframe they loaded in the EDA page and make changes to the values. This answers questions such as, "What would be the prediction if feature X was one unit higher/lower"?


Our app lets end users predict selected rows and modify feature values to check their effect on predictions
Our app lets end users predict selected rows and modify feature values to check their effect on predictions

Check out the code for this page
model_testing.py. It’ll introduce you to new Streamlit components.

Since this page has more information, we’ll start using containers (st.columns) to display components side by side. We’ll also use a form to populate on the fly with input data coming from a selected dataframe row. This way the user can change values and click the "Predict" button again but hide it by default inside an *st.expander (*to keep the UI simple). When we show predictions, we make use of the session state to keep track of the previously obtained prediction. This way, if a user changes the value of one feature, they’ll see not only the new prediction but also how much it changed (colored green or red depending on the change).

In model_testing.py the dataframe is not displayed with the native Streamlit function from the first page. Instead, it’s displayed with an extension from the community called streamlit-aggrid. This library lets us build more interactive tables. For example, here it lets users select a row, insert it in the form on the right (it could also be made editable directly in place), and turn it into a payload ready to be sent to the prediction endpoint.

What struck me when I started building the Streamlit app is that the community is extremely active. Whenever something wasn’t yet natively in Streamlit, not only did I find a ton of reusable code in tutorials but there were quite a few well-built extensions such as this one. Another example of a great extension is the Pandas Profiling extension which I could’ve added to the EDA part of this app. Give it a try!


How to improve the performance of the app by using the built-in caching mechanism

As you go through the code, notice changes in
cpd_helpers.py where I added the decorator @st.cache() to a couple of functions (a built-in caching functionality available in Streamlit).

Why use caching?

When you run your first Streamlit app, you’ll notice the behavior that’s at the center of how Streamlit works. Every time one of the variables changes its state (e.g., when the user interacts with the UI), the Python script re-runs "top to bottom". This means that all pieces of code get re-run a couple of times.

In the first section, we discussed how caching helps you keep information stored across re-runs. But if you think about this app’s logic, quite a few API calls happen that we’d rather not repeat. In other contexts, it may not be the API calls but the expensive data-wrangling logic.

Caching in Streamlit is as simple as you see it in the code. Once you use the decorator, every time this function is called again in the same context (with the same parameters), Streamlit will use cached results instead. Check out
this guide to learn more about caching.


What's next?

You learned how to take your app further and use it to call different models deployed in Watson Studio. While doing that, you learned about modeling tools in Watson Studio and different components and techniques in Streamlit: adding pages, leveraging the session state, using powerful extensions, and caching.

In Part 3 of this series, we'll take this app further and add a page to explore
SHAP values and other model details. To do that, we’ll show you how Notebook jobs in Watson Studio can help you defer heavy computations from Streamlit to Watson Studio. We’ll discuss the importance of monotonicity constraints for this use case and how to check and enforce them. Lastly, we'll discuss hosting solutions using Streamlit Cloud or IBM Cloud Code Engine.

If you have any questions, please leave them in the comments below or
add me on LinkedIn.


Until the next post!
0 comments
37 views

Permalink