watsonx.governance

watsonx.governance

Direct, manage and monitor your AI using a single toolkit to speed responsible, transparent, explainable AI

 View Only

IBM’s answer to governing AI Agents: Automation and Evaluation with watsonx.governance

By NICK PLOWDEN posted Wed March 05, 2025 02:40 PM

  

Agentic AI is transforming IT landscapes globally, but most organizations still face uncertainty over how to use AI agents safely and effectively. This is due to the complexity of developing and managing these agents, ensuring compliance and governance, and mitigating risks associated with models, users and data sets.

The potential for agents is immense, which is why Gartner predicts that by 2028, one-third of gen AI interactions will use action models and autonomous agents. The risks for generative AI and machine learning can be significant to begin with, especially for certain use cases. Add in AI agents, and the risks are further amplified.  

We are excited to announce that a tech preview of new agentic evaluation capabilities will be available the week of March 3. These metrics can help organizations track agents more closely, confirming they are acting appropriately and detect early warning signs if they are not.

Here are the new RAG, agentic AI evaluation metrics you'll find in watsonx.governance:

  • Context Relevance: Measures how well does the data retrieved by the model align with the question specified in the prompt. Scores range from 0 to 1. Higher scores indicate that the context is more relevant to the question in the prompt
  • Faithfulness: Indicates how accurately and reliably the generated response reflects the information contained in the retrieved documents or context. It measures the extent to which the generative model stays true to the content it has retrieved, without introducing errors, hallucinations (i.e., generating information not supported by the retrieved context), or misleading details that aren’t present in the source material. Scores range from 0 to 1. Higher scores indicate that the output is more grounded and less hallucinated.
  • Answer Relevance: Answer relevance refers to how well does the response generated by the model align with the user’s question to the model in terms of its meaning and usefulness. Scores range from 0 to 1. Higher scores indicate that the output is more relevant to the user’s question.

Read the full article.


#watsonx.governance

0 comments
14 views

Permalink