As reliance on AI and Large Language Models (LLMs) continues to increase, it’s more important than ever to ensure responses are accurate and up-to-date. Retrieval Augmented Generation (RAG) is the premiere prompting pattern used to address this concern. RAG is a framework that allows LLMs to utilise external information to ground their answers. With the new on-premise release of IBM Cloud Pak for Data (CPD), you can easily implement RAG directly in IBM watsonx.ai Prompt Lab.
Milvus is a free, open-source, high performance vector database perfectly suited to be the “Retriever” of RAG. Milvus facilitates the encoding and searching of relevant context so that your LLM can generate accurate responses. You can use PDFs, Powerpoints, images, and more to give your AI model context for the questions you want answered.
If you’re also an administrator of an IBM watsonx.data instance, you can deploy Milvus within the lakehouse to take advantage of the many permissions and storage integration features. Hosting Milvus on watsonx.data gives you the ability to specify granular access controls for existing platform users and groups directly from the UI.
This blog will cover the basics of using Milvus in IBM watsonx.ai Prompt Lab for RAG and highlight unique advantages that come with hosting Milvus on watsonx.data.
Setting up Milvus
There are many ways to deploy Milvus. If deploying on watsonx.data, all you have to do is specify the storage size and location. Refer to the official documentation for options.
When hosting Milvus on watsonx.data, you can use the pre-defined access roles or set your own granular permissions using access policies.
Steps to Utilise Milvus in Prompt Lab
Integrate your Milvus service with IBM watsonx.ai by creating a project and a Milvus connection. In watsonx.data, the connection details can be copied from the UI. Be sure to input the CA certificate in the form’s certificate field. You can always test the connection before creating it to ensure your credentials are correct.
Create a new Prompt Lab session. Here, you can select which LLM you want to converse with, tweak parameters, and ground your chat.
Most LLMs are able to use text to ground their responses. If your model can handle additional file types like images and Powerpoints, watsonx.ai facilitates using those for RAG as well.
Start by creating a new vector index using your Milvus connection. Vector indexes allow you to perform similarity searches on your uploaded document.
Next, you can choose to use an existing collection or create a new one. Collections store the vector representations of your documents which Milvus uses for similarity searching. Collections created outside of the prompt lab (like in a notebook) can also be used to ground documents.
When your document(s) are uploaded, a background job is started to encode your document into vectors. To verify that the job is complete, navigate to the “Jobs” tab in your Project. Ensure that the build job finishes successfully before sending a prompt to your LLM.
You now have all the tools to ground AI chats with documents of your choosing. When you send a prompt, Milvus searches for relevant context, augments your prompt, and the LLM responds using that context.
Example Use Case
RAG enhances the user’s ability to search for information. For instance, you can summarize parts of contracts or articles using RAG. This is perfect for when your source of knowledge is constantly being updated (like a terms of service). When information changes, you can simply swap out the document instead of re-training/re-tuning your whole model.
Suppose you’re looking for something in a company’s terms of service. Instead of using basic keyword search that naively matches any instance of a phrase, Milvus searches for semantically similar sections and finds the most relevant information to answer your question.
Below is an example of using RAG to learn about the legal considerations in the Spotify terms of use.
This is not immediately helpful to the user. Terms of use are often hard to parse without knowledge of legalese or the surrounding context.
The un-grounded LLM references an outdated Swedish terms of service contract. Without specific reference to which “terms of service” contract the user asked about, the LLM had to make assumptions which led to minor hallucinations. Since I’m in Canada, the Canadian terms of use would be the correct version to refer to.
With Milvus augmenting the prompt, the LLM does a much better job of outlining the legal considerations described in the contract the user provided. This response is clearer and utilised the grounding text in its answer.
Going Beyond
This blog only scratches the surface of the many integration features between watsonx.ai and watsonx.data. You can ground a Prompt Lab session with several documents at once, allowing you to reference multiple sources of knowledge all in one place. If you just want to use the similarity search Milvus offers, you can send queries into the vector index directly in your project. You can also access Milvus elsewhere in CPD using the Milvus connector, like in python notebooks.
Summary
RAG is a powerful framework that allows users to enhance their LLM in both accuracy and relevance. By using Milvus alongside watsonx.ai, users can ground their AI chats with up-to-date knowledge and get relevant answers to their queries. And by deploying Milvus on watsonx.data, administrators can enjoy granular access control for existing users without having to set them programmatically.
#CloudPakforData#data-highlights-home