watsonx.ai

 View Only

Use Watsonx.ai with LlamaIndex to build RAG applications

By Elena Lowery posted Tue May 28, 2024 12:38 PM

  

In last week’s post, I discussed LangChain, an orchestration framework for developing applications with Large Language Models (LLMs). In the fast-evolving domain of Generative AI, it’s common to have multiple frameworks offering similar capabilities. In this article, I will introduce another popular framework, LlamaIndex, and explain how it can be used to build LLM-driven applications.

Founded shortly after LangChain, LlamaIndex is positioned as a framework for building context-aware applications. In Generative AI terminology, this application pattern is known as Retrieval Augmented Generation (RAG). RAG is widely used in Generative AI applications because it enables LLMs to work with additional data which LLMs were not originally trained on, such as internal company information.

A key component of the RAG pattern is a vector database. Vector databases store various types of unstructured data, such as documents and web content. Figure 1 illustrates the RAG process, where search and retrieval is the first step. During this step, relevant text passages related to the user query are fetched from the vector database. These passages are then attached to the prompt sent to the LLM.

RAG Workflow
While this workflow appears straightforward, the complexity lies in the accurate retrieval of passages. Since we’re working with natural language queries instead of SQL, the accuracy of data retrieval can vary. For instance, if we want to build an HR AI assistant, we would load various HR policies into a vector database in formats such as Word documents, PDFs, or web pages. Example queries to the vector database might include:
  • I have worked at IBM for 3 years, how many vacation days do I have?
  • I need to add a dependent to my insurance. What steps should I follow?
  • What childcare benefits does IBM provide?

This type of search is called semantic search, which is based on meaning rather than keyword matching. To enable semantic search, we need to convert our input data into vectors—a series of numbers stored in the vector database. Once unstructured data is represented numerically, the vector database can find the “closest match” by comparing distances between numbers. In natural language processing (NLP), converting unstructured data to vectors is called “creating an embedding,” and in database terminology, it is referred to as “indexing.”

Understanding the implementation of RAG allows us to infer that the original purpose of the LlamaIndex framework was to streamline the RAG process. LlamaIndex provides a comprehensive set of APIs for all steps in the RAG pattern. It integrates seamlessly with multiple vector databases and embedding models, which are essential for creating vectors. This flexibility is a key reason for its wide adoption.

LLMs are foundational to AI-driven applications, and frameworks like LangChain and LlamaIndex offer both an architectural blueprint and the utilities necessary to complete an application. While it is possible to build LLM-driven applications without such frameworks, using established open source APIs results in well-designed applications that are easier to maintain.

Examining the APIs provided by LangChain and LlamaIndex reveals some overlap, particularly because LangChain provides support for RAG. Many developers have shared their opinions on this subject, with a general consensus that LlamaIndex excels as a framework for building RAG applications and can be used in conjunction with LangChain.

Watsonx.ai supports integration with both LlamaIndex and LangChain. For code samples, refer to the LlamaIndex product documentation.


#watsonx.ai

0 comments
30 views

Permalink