IBM AI →
The Community for AI architects and builders to learn, share ideas and connect with others
Join/Log In
Limited-Time Offer: Get on the waiting list now for the 2025 Conference,
happening October 6-9 in Orlando, FL, and reserve your 50% “exclusive early rate” discount.
The release of watsonx.governance 2.0.1 includes several new features making it the best AI Governance solution for your customers. One highly requested feature is the ability to evaluate and monitor generative AI prompts for Retrieval Augmented Generation (RAG) use cases. UI enhancements and new RAG specific metrics are now available!
The Retrieval Augmented Generation (RAG) prompt task type makes these new evaluations available as part of the Generative AI Quality metric group in watsonx.governance. You can use Generative AI Quality evaluations to measure how well your foundation model performs tasks. The following new RAG evaluation metrics can be used without providing ground truth. Ground truth is the expected answer to each question asked when calling a model prompt.
Faithfulness measures how grounded the model output is in the model context and provides attributions from the context to show the most important sentences that contribute to the model output.How it works: Higher scores indicate that the output is more grounded and less hallucinated.
Answer relevance measures how relevant the answer in the model output is to the question in the model input.How it works: Higher scores indicate that the model provides relevant answers to the question.
Unsuccessful requests measures the ratio of questions that are answered unsuccessfully out of the total number of questions.How it works: Higher scores indicate that the model can not provide answers to the question.
Answer similarity measures how similar the answer or generated text is to the ground truth or reference answer to determine the quality of your model performance.How it works: Higher scores indicate that the answer is more similar to the reference output.
Context relevance measures how relevant the context that your your model retrieves is with the question that is specified in the prompt.How it works: Higher scores indicate that the context is more relevant to the question in the prompt.
You can now see transaction level detail and source attribution of RAG based prompt transactions in the watsonx.governance UI. In a project or deployment space, choose the section called “Answer Quality and choose an evaluation run. You will see each transaction’s evaluation metrics for that run. Click analyze on a transaction to see a color coded view of the answers, the associate source context and the relevance of that context to the answer.
This is a great way to show the power of watsonx.governance when evaluating RAG use cases.