Getting Started
Limited-Time Offer: Get on the waiting list now for the 2025 Conference,
happening October 6-9 in Orlando, FL, and reserve your 50% “exclusive early rate” discount.
By Haris Pozidis, Principal Scientist and Manager of AI for Infrastructure
Welcome back!
Last time, we delved into the fascinating world of Agent Assist, an AI cloud service that aids IBM support agents in swiftly locating the most relevant information within the IBM knowledge base to recommend optimal solutions for resolving cases. Click here to check out part 1 of this blog and podcast.
Today, let's explore our quality assessment approach and next steps.
Measuring quality
The most critical step in building our solution was the quality assessment task. But how do we evaluate quality in a complex pipeline like ours? It's the result that matters most, although every step is important and affects the overall quality since every stage builds on the output of the previous one.
To that end, we implemented a "divide-and-conquer" strategy, evaluating the quality of every step before proceeding to the next one. Since we are dealing with text and NLP techniques, we adopted established metrics in the NLP community to compare our results with reference results.
We created a reference dataset, a so-called ground truth, by sampling several customer cases and asking Subject Matter Experts (SMEs) to go through the respective case feeds and extract and curate the problem description and the resolution.
We compared every intermediate result in our pipeline to either the reference problem description or the reference resolution for the cases in the ground truth dataset.
We calculated several metrics for each comparison, both lexical and semantic, and also used LLMs to compute the similarity between extracted and reference answers.
Human experts reviewed and graded the extracted answers. One of our main goals was to establish a metric that would be a good proxy for the human expert preference, as this would be invaluable for continuous quality assessment and scalability to large numbers of cases.
Next steps
Over the past months, we've been gathering extensive user feedback on our AI feature through star ratings and detailed comments and using this valuable information to improve the quality of the service.
We carry out periodical, manual failure analysis,
We categorize the root causes and try to identify remediation strategies,
We rank issues and mitigation methods and use those as guidelines for the next steps.
That led to consistent improvement of our quality metrics. However, this procedure relies on human observation and judgment, therefore, one important next step for us is to explore automated and reinforcement learning techniques for scalable, continuous improvement.
Another area that we are working on currently is ingesting, indexing, and using rich information available in IBM documents. These outlets contain useful complementary information to what we are ingesting from customer support cases.
We are currently finalizing a prototype that builds separate collections of such data and queries those in parallel to querying historical cases.
And last, but not least, one fascinating area of ongoing work is combining knowledge graphs and GenAI-based pipelines, aiming to be more effective in handling complex support cases.
Final remarks
So far, our journey in the GenAI world has been exciting, a continuous learning experience, where we are constantly trying and testing different things, discussing with human experts, and going back to the whiteboard to tweak and try again.
It has been gratifying to offer a useful service to our support agents to aid them in everyday tasks. With further exploration and integration of additional useful services and APIs as they emerge, there's ample room for growth.
The journey has just begun!
In part 2 of this blog and podcast, Haris Pozidis, Principal Scientist and Manager of AI for Infrastructure, shares insights on the quality measurement approach for Agent Assist and offers a sneak peek into our next steps. Tune in and enjoy!