Global AI and Data Science

 View Only

watsonx.governance 2.0 is here! — Learn what’s new

By NICK PLOWDEN posted 21 days ago

  

Note: Posting on Behalf of Doug Stauber, Director of Product Management, watsonx.governance

I am excited to share the new features for next major release of watsonx.governance known as watsonx.governance 2.0. This release brings major features to support our AI Anywhere initiative, hallucination detection, and regulatory requirements out-of-the-box.

AI Anywhere for both development and runtime

I wrote earlier this year how watsonx.governance can monitor ML and LLMs from any vendor. With this release, watsonx.governance can now monitor both development time metrics and runtime metrics. This means the software can monitor all metrics, from quality to faithfulness to drift, regardless of the AI platform you are using. So whether you are using watsonx.ai, Amazon Bedrock, Microsoft Azure, or ChatGPT watsonx.governance can monitor a huge variety of metrics both in development and in production.

The rest of the software stack works with 3rd party models now too. So, calculating metrics, automatically updating a model Factsheet, and then automatically updating the enterprise wide dashboard are all supported. In short, it no longer matters where you are building or deploying your AI: watsonx.governance now is fully compatible.

For any 3rd party ML Models and LLMs, watsonx.governance can evaluate development and runtime metrics, document metadata, and create dashboards and workflows

Hallucination Detection

Retrieval augmented generation, or RAG, is the most popular use case of LLMs today. With it, enterprise can infuse proprietary data, keep the LLM relevant, and increase accuracy. To enhance this popular technique, watsonx.governance now supports out of the box evaluation of RAG metrics during development and run time. These new metrics include:

  • Faithfulness measures how faithful the model output is to to the reference data provided.
  • Answer relevance measures how relevant was the LLM output/response to the user query.
  • Unsuccessful requests measures the ratio of questions answered unsuccessfully out of the total number of questions.

All of these metrics provide a score from 0 to 1. The combination of these metrics will help developers and prompt engineers create more accurate, higher efficiency AI use cases with less worry about hallucinations.


Using watsonx.governance notebook to setup evaluations of RAG metrics

Regulatory Requirements Out of the Box

We are introducing two new features to help ensure AI systems conform to AI regulations and have a clear risk score associated with them.

  • First, we are introducing AI Model Risk Assessments. This feature helps users understand which AI risks are applicable to their use case. It does this via a customizable questionnaire. After answering a few questions, the questionnaire yields a risk score for that particular AI use case. The questionnaire out-of-the-box is based on the EU AI Act, but can be customized to suit the needs of your particular geography or organization policies. By attributing a risk score to the AI use case, the AI project lead can determine how best to monitor the AI system, including the number of metrics, frequency of monitoring, and who should approve as the models progress through the AI Lifecycle. This provides more confidence that models with high risk get the attention they deserve, while those with lower risk are placed into production with efficiency.
  • Second, we are introducing an out-of-the-box ability to assess the applicability of AI systems against the EU AI Act. The product will now include a user-friendly questionnaire dedicated to the applicability of a AI system to the EU AI Act. Similar to the Model Risk Assessment, the responses determines both applicability to the EU AI Act as well as determining the Risk Category (Prohibited, High , Limited or Minimal Risk). Use of this assessment will aid an organization in the determination of applicability and corresponding risk categorization of an AI use case against the EU AI Act, helping to improve efficiency and lower the risk of non-conformance.

The built in questionnaire allows a model owner to assess risk level toward the EU AI Act

These new assessments for regulatory compliance fit well with our already available product features:

  • AI Factsheets for technical documentation and record keeping
  • Governance Console as a quality management system
  • Model Monitoring and evaluation tools for accuracy and robustness
  • Risk assessment with our Risk Atlas tool

Together, watsonx.governance services as a comprehensive platform to honor both internal policies and external regulations.

Learn More and Try Today

The 2.0 software release will be generally available mid-June, and on Cloud shortly. Stay up to date on the latest information on AI governance by joining the watsonx.governance Community.

To learn more about watsonx.governance or give it a try for free, visit our website at https://www.ibm.com/products/watsonx-governance

Here you can start a trial or setup a briefing where you can request a demo of the above features in action.

0 comments
11 views

Permalink