Watson Studio

 View Only

Trust your ML models with Streamlit and IBM Cloud Pak for Data (Part 3)

By Jerome Kafrouni posted Thu February 03, 2022 05:21 PM

  
Part 3: Diving deeper into the model behaviors

Welcome back to our series on trusting machine learning models by using Cloud Pak for Data and Streamlit!

In Part 1, we gathered data in a Watson Studio project and built a Streamlit app to support data discussions with project stakeholders. In Part 2, we trained and deployed candidate models in Cloud Pak for Data using code, low-code, and no-code tools. We also built a page in Streamlit to test different predictions.

In Part 3 (the last part), we'll add a final page to our sample app to dive deeper into model behavior by using SHAP values. You'll learn:

  • How to compute SHAP values and use them for local explainability
  • How to use Watson Studio Notebooks to precompute SHAP values in batch
  • How to display global explainability plots using SHAP in Streamlit
  • How to turn the SHAP computations into a Notebook job and trigger that job remotely from Streamlit
  • How to host the final app on IBM Cloud or Streamlit Cloud
  • Want to follow along by exploring the code? You can do so here.

A model inspection page to explore global explainability and other charts that
help us pin down how different candidates models behave once deployed

How to compute SHAP values and use them for local explainability

Your project stakeholders want to make informed decisions regarding their models. To help them with the right information, you can use explainability methods such as SHAP (SHapley Additive exPlanations). However, I prefer to use Partial Dependence Plots (PDP) because it’s easier to reason about and share with non-technical subject matter experts. For simplicity, in this post, I’ll rely on SHAP. But keep an eye on my Medium account for a post on PDP!



Never heard about SHAP? Read chapters 9.5 and 9.6 of the Interpretable Machine Learning book. At a high level, SHAP is a technique built on top of Shapley values. It’s a game-theoretic approach where we think of features as being parts of coalitions playing a game. Collectively, they contribute to predictions. Our job is to evaluate the marginal contribution of each feature to the prediction.
Force plots are a common local explainability plot using SHAP values,
which show each feature's contribution (absolute value and direction) to the output prediction 
Using SHAP is simple. After a model has been trained, keep it into memory, load the shap library in Python, and get SHAP values for each row with the high-level shap.Explainer() class. Use these values for plot explanations of a given prediction, then add them to our model testing page from Part 2.


I like using individual SHAP values for:
  1. Looking at the "biggest" model mistakes;
  2. Looking at mistake explanations with a subject matter expert to refine the model (and once in production, sending them alongside the prediction to explain the decision to the end-user);

How to use Watson Studio Notebooks to precompute SHAP values in batch


Individual SHAP values also work great for global explainability, which is how we'll use them in this app. Compute SHAP values for your entire training or testing dataset, then look at them to understand model behavior. Since you are computing them in batch, this can get computationally heavy. Streamlit is running in Python, so you could technically load the model into memory, compute them locally, then display them. But depending on data size, it can make your UI less reactive (since the user will have to wait for a few minutes for them to be computed).


This is where Streamlit and Watson Studio work well together. We'll precompute the SHAP values in a Notebook in Watson Studio, store them somewhere (attached as metadata of the saved models), and pull and display them in the Streamlit app.


I like to do this in compute-and-store-shap-values.ipynb — a Notebook that you can import into your Watson Studio project. I load one of the stored models using the right IBM python library, then use the shap package to compute SHAP values and attach them as additional metadata to that stored model.


By doing this, we gain two advantages:
  1. We can compute the SHAP values once and for all, before the user visualizes them in the app, so that the UI remains blazing fast;
  2. Since we’re using a separate Python environment to do the heavy calculations, that environment can be short-lived but have a higher memory and CPU footprint (we’ll only get billed for the few minutes it takes to do the calculations and store them);
How to display global explainability plots using SHAP in Streamlit


Once we have the SHAP values stored as metadata, we’ll automatically receive them as part of the *model_details we were already receiving from an API call in Part 2. The same get_deployment_details function now receives SHAP values inside the model metadata dictionary. No need to initializing an shap.Explainer() as it requires access to the model. Simply call the shap library and load the shap values inside a shap.Explanation() object as seen in the new model_inspection.py page. Then call one of shap's built-in visualization functions like this beeswarm plot:
 


The beeswarm plot is one of the most popular SHAP plots. It shows SHAP and feature values in one plot.


The points on each row represent one dataset sample. The value on the x-axis shows if this feature has a positive or negative impact on the prediction, while the color shows the value of the feature itself. For example, MSinceMostRecentInqexcl7days tends to have a negative impact on predictions (negative SHAP value on the x-axis) while its value is negative (blue points). It captures how long it’s been since this person's credit was checked by a financial institution.


Typically, a recent credit check is an indicator of bad credit. If we look at the FICO data dictionary (request access to it here), we’ll see that this feature should have a monotonically increasing constraint with respect to the "Good" class (notice that FICO uses "Bad" as the reference/positive class). This means that the higher the value, the higher the likelihood that the credit is good.


Overall, our model is behaving in accordance with this desired behavior. We can also observe a couple of blue points with high SHAP values. This chart helps a subject matter expert to tell us if this behavior is acceptable (and how to understand and mitigate it). We can improve this page by letting users hover over points and review them one by one (use Partial Dependence Plot to identify the monotonicity issues).


Finally, if this behavior is prevalent and is coming from our model, we can load it back into a Notebook in Watson Studio and train it with a monotonicity constraint. There are several blog posts that dig deeper into monotonicity constraints, such as this Medium post on Analytics Vidhya about gradient boosting models.


How to turn the SHAP computations into a Notebook job and trigger that job remotely from Streamlit


Our users can now understand their model by using global explainability plots while offloading computations to a remote environment in Watson Studio. But how do you trigger these computations? Manually opening and running the compute-and-store-shap-values.ipynb notebook is tedious. Wouldn't it be great to automate it?


In Watson Studio, you can run Notebooks (and Python scripts and other assets) as automatic jobs. Check out Jobs in projects to learn more. If you look at the beginning of the Notebook, you’ll notice that some parameters are not hardcoded but rather received as environment variables. You can set up a job to pass values for these parameters dynamically to the Notebook. Once you create a job, you can trigger its execution via yet another REST API provided by Watson Studio. You can also schedule the job to check periodically for newly stored models and compute their SHAP values automatically (by using the built-in scheduling feature).


Next we check if the selected model already has precomputed SHAP values attached as metadata. If not, we give the option to the end-users to trigger that remote job. Once it’s over, the user can reload the page and see the SHAP visualizations:



If SHAP values aren't precomputed yet, we can let users trigger a remote Notebook job on Watson Studio to launch that calculation asynchronousl
y


Look closely at the Streamlit code. One Streamlit number input controls how many text inputs appear below it. This way the UI is super flexible and lets users input as many parameters as they want. You’ll also notice that I’m using a callback. For simplicity, in Part 1 and Part 2, I was checking if certain buttons were clicked and were making certain function calls (for example, when loading the dataset). The concept of callback in Streamlit lets our app behave a bit closer to typical web apps and have certain actions performed automatically depending on the events such as a widget's on_change or on_click events.


How to host the final app on IBM Cloud or Streamlit Cloud
Once your app is ready you can easily host it on Streamlit Cloud or on IBM Cloud.


Streamlit Cloud is the first-party solution for hosting, sharing, and collaborating on Streamlit apps—and it’s completely free to get started. It takes the seamless live updating experience you love using locally and puts it on the Cloud so you can quickly iterate on prototypes and ideas with your teammates. Deploy apps in minutes, share them securely by hooking into your SSO, and watch as edits update automatically when you push changes to the code. 


If you want to host this app on IBM Cloud, the go-to solution is a service called IBM Cloud Code Engine. Code Engine is a service for serverless app hosting. It lets you host and scale web applications by providing a Docker image or a Dockerfile (you can start for free!).


Streamlit itself is super easy to Dockerize. I included instructions on how to install the command-line interface tools and provide the right services to host this app using this strategy in the main readme of this series’ repo. With Code Engine, you can easily update existing deployments using the CLI (I mention it in the instructions) without any downtime. This means you can integrate the CLI update commands in your CI/CD pipelines to update your deployed app based on the newer versions pushed to GitHub.


Streamlit Cloud makes that process seamless by watching any edit you make to your app's source code and automatically taking care of the app updates. This lets you get an experience almost as seamless as when the app is running on your laptop.


Where can we take this sample app next?
We saw how we can seamlessly build an architecture that leverages both Streamlit and various Watson Studio features in order to build machine learning models that we trust.


As I progress in my data science career, I realize that decision-making isn’t easy. Not only do you need the right tool to build things, but you also need to put your results in front of stakeholders and make choices together. By using Streamlit to build this type of Model Inspection app, you'll boost your team's confidence in the machine learning models they build and mitigate typical risks that emerge in data science projects.


Feel free to fork the repo that contains all the starter code and take it further. There are other explainability techniques to help you dig deeper into your model's behavior, such as Weak Slice Analysis (check out FreaAI from IBM Research) and Contrastive Explanations that power the Explainability service of Watson Studio. On the Streamlit side, the community is publishing amazing apps every day in the Gallery that showcase how to consume ML models in highly interactive apps.


If you liked this series and have questions, follow me here or on Medium, or add me on LinkedIn! I’m part of a team called Data Science and AI Elite at IBM. My job is to solve business use cases leveraging the latest open-source technologies such as Streamlit in conjunction with IBM's Data Science platform.


To learn how to kick-start your data science project with the right expertise, tools, and resources, the Data Science and AI Elite (DSE) is here to help. The DSE team can plan, co-create and prove the project with you based on our proven Agile AI methodology. Visit ibm.biz/datascienceelite to connect with us, explore our resources and learn more about Data Science and AI Elite. Request a complimentary consultation: ibm.co/DSE-Consultation.
0 comments
52 views

Permalink