Cloud Pak for Data

 View Only

Evaluating machine learning model deployments with Watson OpenScale

By Saloni Saluja posted 13 days ago


If your organization is using AI and machine learning models for making crucial decisions, you must ensure the performance, fairness, quality, and explainability of these models. Such rigorous evaluation is not just a best practice; it's a necessity. Fortunately, with the launch of Cloud Pak for Data version 4.7, IBM has made it easier than ever to monitor, evaluate, and improve your AI deployments by combining the capabilities of Watson OpenScale with Watson Machine Learning. In this blog post, we will dive into how you can monitor deployments with Watson OpenScale, as detailed in the document Evaluating deployments in spaces with Watson OpenScale.

Why evaluate AI models?

Before we delve into configuring Watson OpenScale evaluations in deployment spaces, let's understand why model evaluations are important. AI models are used in various applications, from healthcare to finance, and they can significantly impact real-world outcomes. Therefore, it's essential to:

  • Measure performance: Evaluations help measure how well your AI model is performing. Are the predictions accurate? Is the model consistent over time?

  • Ensure fairness: AI models must produce unbiased results, especially when dealing with sensitive attributes like gender, race, or age. Evaluating fairness ensures that the model's predictions are not discriminatory.

  • Assess quality: Quality evaluations determine the model's ability to produce correct outcomes. By comparing model predictions to labeled test data, you can assess whether your model meets  quality thresholds.

  • Monitor drift: With changes in data over time, models can become less accurate. Drift evaluations help ensure that your models remain up-to-date and consistent by identifying shifts in data distribution and prediction accuracy.

  • Explainability: Understanding why a model makes specific predictions is essential for trust and transparency. Explainability evaluations help interpret the factors that influence a prediction.

Now that we understand the importance of model evaluations, let's explore how to configure Watson OpenScale evaluations in deployment spaces.

Evaluating model deployments with Watson OpenScale

Evaluating models in deployment spaces involves several steps, as described in the documentation. Here's a simplified walkthrough:

1. Create a deployment space

To get started, create a deployment space, and associate a Watson OpenScale instance with it. Depending on your requirements, you can choose the type of space, such as production or pre-production.

2. Promote your model

Promote a trained machine learning model, input data to the deployment space and create an online deployment for the model.

3. Configure evaluations

Now, configure evaluations to monitor your model's performance:

  • Fairness monitoring
    Configure a monitor for fairness to check if your model produces biased results for different groups, like gender or race. Set thresholds to measure predictions for a monitored group compared to a reference group.

  • Quality monitoring
    Configure a monitor for quality to assess your model's performance based on labeled test data. Set quality thresholds to track when a metric value falls outside an acceptable range.

  • Drift monitoring
    Configure a monitor for drift to ensure your deployments are up-to-date and consistent. Use feature importance to determine the impact of feature drift on your model.

4. Explain transactions

Configure explainability settings to understand which features influence your model's predictions. Different methods like SHAP and LIME are available to suit your needs.

5. Provide model details

To configure evaluations effectively, provide model details, including information about your training data and model output. This step ensures that Watson OpenScale understands how your model is set up.

6. Run evaluations

After configuring evaluations, you can run them by selecting Evaluate now in the Actions menu on the Evaluations tab. This sends model transactions for analysis.

7. Review results

Analyze the evaluation results on the Evaluations tab to gain insights into your model's performance. The charts and details provided will help you understand if your model meets the set thresholds.

8. Explore transactions

On the Transactions tab, you can analyze individual model transactions to understand why the model made specific predictions. Experiment with what-if scenarios to explore ways to improve the model.


The integration of Watson OpenScale capabilities with deployments empowers organizations to make AI and machine learning models more accountable and reliable. By following the steps described in the documentation, you can configure and monitor your AI models effectively, ensuring they perform with fairness, quality, and transparency.

As AI and machine learning continue to evolve, Watson OpenScale provides the tools needed to stay ahead of the curve, ensuring your models deliver value while maintaining ethical standards and accuracy. So, if you're on the journey of AI adoption, remember that Watson OpenScale has your back, helping you unlock the true potential of your AI deployments.

Watch a video showing how to evaluate models in deployment spaces.

To learn more, see Evaluating deployments in spaces with Watson OpenScale in IBM Cloud Pak for Data documentation.