by Haris Pozidis, Principal Scientist and Manager of AI for Infrastructure
Generative AI (GenAI) offers a whole new world of capabilities, complementary to those that machine learning and AI brought to businesses and consumers.
Motivated by the power of this technology in improving our efficiency in everyday life, we wondered if it could increase productivity in the business setting - in particular, to aid human agents in the IT support center to increase productivity in resolving customer cases faster and with higher quality.
This was the starting point for developing Agent Assist (AA), a tool to empower human agents to be more effective and efficient.
Technical approach
Most new cases are probably not unique and entirely new. Therefore, to get a head start with troubleshooting and resolving a newly opened case, one could resort to a similar, successfully closed one and understand how these had been resolved. To implement a prototype around this idea we needed three main components:
- A repository of historical cases, ideally covering a broad range of customer issues,
- A mechanism to search that repository, to retrieve the most relevant similar historical cases,
- A method to extract and summarize the resolutions.
Noisy data
Everything starts with the data. In this case, we needed the content of past closed cases associated with IBM infrastructure products. Thankfully, all IBM-supported cases are handled by a common platform and there is a repository of all historical cases dating back a long time.
Besides that, before persisting a closed case in the repository, all Personal Identifiable Information (PII) is stripped from the so-called "feed", i.e. the case text. We quickly had an API that offered us access to troves of historical data, which was ready for processing.
However, it was difficult to handle such a diverse set of cases from the get-go. To work around that problem we decided to focus on one category of products from the IBM Storage business, namely the class of IBM FlashSystem products. That gave us a good combination of a relatively confined set of customer issues and a large enough body of cases to work with.
Similar historical cases
The second key component of our solution was to design a method to search our repository for similar past cases. We wanted to retrieve the most relevant ones, but there were several decisions that we had to make, such as:
-
Deciding what information was critical to identify a similar case,
-
How to perform an effective search,
-
How to make the search efficient and scalable to large repositories,
-
And how many similar cases to retrieve.
When a new case is opened, we know the product name and type, the problem subject, and the problem description as provided either by the customer or by the machine itself for automatically generated tickets. We would use all these fields to search for similar cases; matching both the product and the problem reported in the case.
We decided to implement a search for past cases with similar products and problems, but not precisely match exact words found in the problem description, rather extract and match their semantics. This was possible by using embedding models and vector similarity search technology, which has been established for natural language queries and information retrieval in the past few years.
This choice also solved our second challenge, of making the search efficient across a repository of historical cases that were expected to continuously expand. By adopting FAISS, a library for fast similarity search, we verified that we were able to perform similarity search in a few tens of milliseconds across many tens of millions of objects, which would be sufficient for our application also in the future.
Another important decision was the number of historical cases to retrieve. Our initial intuition was approximately 100, especially as we were expecting inaccurate results for ill-formulated problem descriptions.
Although this was a good intuition, it proved to be only one criterion for the retrieval size. In the end, we had to tune this number based on the next step in our processing pipeline, which defines what happens with the retrieved cases.
Extracting the resolution
The next step was to extract their resolutions. We thought that would be easy - just use the text in the field tagged "Resolution", which is present in every case, since support agents are asked to fill that field with a summary. However, in some cases, the information provided may not be very informative.
What do we do on such occasions? Luckily, this happened at about the same time that generative AI was becoming popular and IBM was launching its watsonx.ai service, which offered the ability to run inference on many popular open-source and inner-source GenAI models (LLMs) via an API, at low latency.
Armed with that revolutionary capability, we went back to the drawing board and came up with a method that involved extracting relevant context from a case's text, feeding that to an LLM, and instructing it to understand and summarize the resolution. As expected, we had to experiment with various prompts and different models until we arrived at a satisfactory solution. This resolution extraction step was proven to be the most fundamental one for the quality of the provided resolution.
Handling non-unique resolutions
Most customer issues are not uniquely resolved; rather, there may be several possible solutions for the same problem. As an example, consider that a customer reports a slow disk in their storage system. The solution to that problem could be either:
- To replace the disk as it may be faulty,
- To reset the disk at a different slot in the system, or
- To upgrade the firmware to a newer version.
If we retrieve enough such "faulty disk" historical cases, we will capture all of these possible resolutions. This is where the number of retrieved historical cases becomes relevant, as discussed in the section "Similar historical cases" above. Through experimentation and SME consultation, we decided to retrieve 300 cases in the similarity search.
The next step was to identify the various unique topics from the resolutions of the retrieved tickets. Our solution was to resort to vector embedding technology. By encoding each resolution by a numerical vector we could easily apply unsupervised clustering techniques to group resolutions by topic. Embedding a few 100 text chunks can be done extremely fast even in the CPU, so the latency of our solution was not affected.
Interface to the human agent
The last step was to figure out how to offer our solution to the end user, our support agent. At the end of the previous step, we could group resolutions in buckets of similar topics. Our goal was to provide a short list of concise recommendations, and to accomplish that, we decided to rank the groups according to a relevance metric, select one representative resolution from each group, and provide the top-3 resolutions to the user.
The way to select the most representative member of the group and the prioritization of the groups was a subject of intense experimentation and iteration between the engineering team and the Support SMEs. Finally, the resolution recommendations are pushed to the platform support agents use to troubleshoot cases. Within less than a minute from ticket opening, the platform is updated with these recommendations, well before an agent is assigned to the case for further processing.
The journey has just begun. Stay tuned for Part 2!
#InfrastructureServices