watsonx.governance

 View Only

Use watsonx.governance monitoring Toolkit to Evaluate the Quality of an LLM powered Langchain Application

By Ravi Chamarthy posted Tue September 03, 2024 08:07 AM

  

Greetings!

The Use Case

Let’s consider a scenario where we are developing an LLM powered application using langchain, say for Mobiles Issues Summarization, further classifying the issue, and finally generate issue resolution for that issue type.

So, in total 3 processing steps in the langchain — for which we would be using 3 large language models, as below:

  • Issue Summarization — Azure OpenAI GPT Turbo 8K model.
  • Issue classification — IBM watsonx.ai Flan T5 XXL model.
  • Issue Resolution — IBM watsonx.ai Llama 2 13B model.

The Problem

But wait! .. how do we know the quality of each processing step in the langchain? Quality, as-in, is the generated mobile issue summary comparable to, say, any ground truth summary? Is the mobile issue resolution comparable to a ground truth resolution?

For this, IBM watsonx.governance — monitoring SDK has got a wide range of metrics, like ROUGE, BLEU, Text Quality, Input/Output HAP, Input/Output PII etc metrics, that can be evaluated on the generated summary and generated content.

Read the full article.


#watsonx.governance

0 comments
10 views

Permalink