An effective way to harness the power of generative AI to produce relevant content is to ground it with specific, curated content. Retrieval-augmented generation (RAG) turns an LLM from a creative text generator to a truly useful assistant. Learn how the AutoAI tool in IBM watsonx.ai makes creating a RAG solution for your use case fast and efficient with an automated, no-code approach.
Interaction with a RAG application follows this pattern:
-
A user submits a question to a generative AI app.
-
The search first retrieves relevant context from a set of grounding documents.
-
The accompanying large language model generates an answer that includes the relevant information.
The applications for RAG solutions are too numerous to list. Some representative examples include:
- An HR chat app is trained with the organization’s policy documents to provide relevant answers to employee questions.
- A financial services firm generates shareholder reports based on recent documents showing market trends.
- A medical app recommends a treatment plan based on a curated set of medical literature.
Automating the configuration of a RAG search
The utility of RAG solutions is clear, but, as with so much in data science, the devil is in the execution. Finding the optimal RAG pattern can be a labor-intensive, time-consuming job of testing combinations of configuration options, embedding models, and LLMs. Automating the process with IBM’s AutoAI tool dramatically accelerates the time to value.
RAG comes with many configuration parameters, including which large language model to choose, how to chunk the grounding documents, and how many documents to retrieve. Configuration choices that work well for another use case might not be the best choice for your data. To create the best possible RAG pattern for your dataset, you might explore all the possible combinations of RAG configuration options to find, evaluate, and deploy the best solution. Just as you can use AutoAI to rapidly train and optimize machine learning models, you can use AutoAI capabilities to automate the search for the optimal RAG solution based on your data and use case. Accelerating the experimentation can dramatically reduce the time to production.
Key features of the AutoAI approach include:
- Full exploration and evaluation of a constrained set of configuration options.
- Rapidly reevaluate and modify the configuration when something changes. For example, you can easily re-run the training process when a new model is available or when evaluation results signal a change in the quality of responses.
- Embedding models provided to turn the document collection into indexed vectors for retrieval by the LLM.
- LLM models curated through watsonx.ai for your RAG tasks.
- Choice of vector storage for storing the documents after an embedding model transforms them into an index of vectors for retrieval by the LLM.
- Use a default, in-memory Chroma database for storing the document index for the duration of the experiment. Because the index does not persist, this method is best used for prototyping and demos.
- Set up a Milvus database as a vector store for persistent storage of the document index. This is the best solution for production work.
How does the process work?
To create a RAG experiment with AutoAI, you follow these steps:
- Prepare, then upload a collection of documents of up to 20 files or folders of files in the following formats: HTML, PDF, MD, DOCX, or TXT. These are the documents will act as the ground truth for content the model retrieves when generating answers.
- Prepare, then upload a JSON file with a set of test questions and answers. A template is provided for you. The test info is used as benchmark data to test the RAG pattern. For example, these questions are used for the sample in the AutoAI tool to test RAG patterns grounded with the AutoAI SDK documentation.
-
Choose vector storage. When the experiment runs, the documents are embedded and turned into vectors to make the content retrievable. You can use either a default, in-memory vector store that is suitable for prototyping or demo purposes or set up a more durable solution with an external Milvus vector store.
-
Run the experiment to create the RAG pipelines by using the default settings. Note: The AutoAI approach also provides a number of configuration options that let you exercise control over your RAG solution.
-
View the resulting pipelines to find and save the pattern that performed the best for the optimized metric.
- View the details for a pipeline to see how the pipeline performed for all metrics:
Behind the scenes: the AutoAI RAG optimization process
Running experiments by using AutoAI RAG avoids testing all RAG configuration options (for example, it avoids a grid search) by using a hyper-parameter optimization algorithm. The following diagram shows a subset of the RAG configuration search space with 16 RAG patterns to choose from. If the experiment evaluates them all, they are ranked 1 to 16, with the highest-ranking three configurations tagged as best performing. The optimization algorithm determines which subset of the RAG patterns to evaluate and stops processing the others, which are shown in gray. This process avoids exploring an exponential search space while still selecting better-performing RAG patterns in practice.
Deploying and using a RAG pattern
When you are satisfied with a RAG pattern, you can save and deploy it so that you can submit new questions to an inferencing endpoint. For the fast path approach, save the indexed documents used in the experiment as an auto-generated notebook saved to the project.
If you create an experiment with a Milvus vector store, you can save a pattern notebook for refreshing the document collection index, or a notebook for retrieval and generation. You can also save and deploy as an AI service, which creates an online deployment you can use to inference against the endpoint for the RAG pattern.
Summary: Reaping the rewards of automating RAG solutions
With AutoAI, you can dramatically accelerate the process of finding the best RAG pattern for your use case. Use AutoAI to rapidly prototype or demonstrate solutions for proof of concepts, then build a production-ready solution. Review the experiment code in notebooks for full transparency while getting the benefit of the ease of an automated approach. Put your RAG patterns to productive use in chat apps or extend their capabilities with complementary technologies such as text extraction. For more information on working with RAG patterns with AutoAI, see Automating a RAG pattern with AutoAI in the IBM watsonx.ai documentation.
To get started with watsonx.ai, visit the product page and start a free trial, then visit the Developer Hub to leverage quickstarts and examples designed to help you jumpstart your development of AI-based solutions.
#watsonx.ai#AutoAI#GenerativeAI