This post explains the MLOps and trustworthy AI with an example and the focus is more on the Monitoring of AI activities . The purpose is to give developers a good understanding of MLOps and trustworthy AI practices. The articles considered for input are mentioned in the references section. To understand the capabilities of Cloud Pak for Data (CP4D), the learning path Get started with IBM Cloud Pak for Data is recommended. Many articles, tutorials are available on the model development, deployment and monitoring and this post does not attempt to duplicate the content. The second link in the references section is about a lab covering all the implementation details of model development, deployment and monitoring. The first link in the references section describes the common problems faced in the AI journey and MLOps and Trustworthy AI as the solution and is considered as the baseline and this post elaborates on the data governance, data lineage, policy management to quickly discover, curate, categorize and share data assets. Deployment of models using Watson Studio to Watson Machine Learning (WML) is also covered. Monitoring aspects of AI activities using AI Factsheets is covered in detail. The AI Factsheets component of Watson Knowledge Catalag (WKC) aims to automatically document all information about a model throughout it's lifecycle and is the focus of this blog post.
This section provides the pointers and reference examples to deal with the issues such as lack of right data, efficiently deploying the ML models. The problems of data and data governance are solved by Watson Knowledge Catalog (WKC). The details on usage of data governance, data quality, and active policy management in order to help protect and govern data and trace data lineage are covered with a credit risk data example in the last item of references section. Governance of data help users quickly discover, curate, categorize and share data assets, data sets, analytical models and their relationships with other members of your organization. It serves as a single source of truth for data engineers, data stewards, data scientists and business analysts to gain self-service access to data they can trust. The workshop provides the details on separation of data governance for viewers and admins. The development of machine learning model using Jupyter notebook and AutoAI feature of Watson Studio and deployment are also covered along with the monitoring the ML model with OpenScale. A more comprehensive example is provided in the second link of references section. Readers can use either of the posts for model development and deployment. The next section provide details on the usage of AI Factsheets to monitor the ML model lifecycle.
AI Factsheets captures the model meta data across the develop, deploy/test, validate and operate phases of model life cycle and facilitates the data scientists and ML Engineers to focus their time on model building instead of writing model documentations. The next sections depict on using Factsheets to capture the model metadata. Configuration and set up details are not covered and only the steps to capture meta data are discussed for brevity reasons.
Step 1: Create Model Use Case/Inventory
Create an instance of Watson Knowledge Catalog (WKC) in the IBM Cloud Console and associate the service in the project created using Watson Studio. The 'Catalogs" option will be displayed in the menu of Watson Studio as shown below. Select Model Inventory and create a model use case.
Step 2: Enable Tracking of the model
After developing the model, click on the model in the Assets section and enable tracking of the model by clicking the "Track this model" button, then select the appropriate model use case. Now, select the Model Inventory from Catalogs in the menu and choose the model use case. Under the Asset tab, notice that the tracking data is available in the "Develop" section. Click on the model to see the model meta data along with the model information, training features, etc.
Step 3: Deploy the model and view the updated Factsheets
Deploy the model to Watson Machine Learning (WML) service and notice the Factsheets updates. The deployed model is shown in Pending Evaluation state. The Develop and Test stages of the Factsheets are shown below
This is the screenshot of the Factsheets that is implemented per the second link in the references section. The validate and operate stages are not covered.
This post has shown how easy it is to use the AI Factsheets of WKC on Cloud Pak for Data. The main advantage being tracking the meta data through out the model life cycle and also facilitates the collaboration.
MLOps and Trustworthy AI made simple
Data fabric lab - MLOps and trustworthy AI
Data Threads: AI Factsheets on Cloud Pak for Data
Demystifying AI: Understanding the Importance and Value of AI Factsheets
Analyzing Credit Risk with Cloud Pak for Data on OpenShift
#Cloud Pak for Data