watsonx.data

Put your data to work, wherever it resides, with the hybrid, open data lakehouse for AI and analytics

View Only

Back to Blog List

Build RAG with watsonx.data Milvus and LangChain

By Divya posted Tue April 22, 2025 05:03 AM

Build RAG with watsonx.data Milvus and LangChain

Retrieval-Augmented Generation (RAG) is quickly becoming the go-to approach for building smarter, more reliable AI systems. Instead of relying only on what a language model already knows, RAG brings in external data at the time of the request—giving the model access to fresh, relevant information it might not have seen during training.

This documentation walks you through how to build a RAG system using LangChain and Milvus. By leveraging LangChain’s orchestration capabilities and Milvus’s high-performance vector search, we create a system capable of semantically retrieving documents and generating insightful answers in real time.

What is LangChain?

LangChain is an open-source framework designed to help developers build applications using large language models (LLMs). It offers a modular architecture that enables seamless integration between language models, prompt engineering, external data sources, and downstream tasks.

A diagram of a diagram

AI-generated content may be incorrect.

At its core, LangChain enables the following:

Prompt Templates: Easily manage and reuse prompts with variables
Language Model Support: Plug in models like Granite or Slate from watsonx.ai
Data Source Integration: Pull in external data from APIs, databases, and vector stores like Milvus
Chains: Create workflows that chain multiple steps together (e.g., retrieval + generation)
Memory & Agents: Keep track of previous interactions or let the system decide what tool to use

Prerequisites -

Step 1. Create Milvus Instance on watsonx.data

You can refer to this Getting Started with IBM watsonx.data Milvus .

Step 2. Set up a Watson Machine Learning service instance and API key

Create a Watson Machine Learning service instance (you can choose the Lite plan, which is a free instance).
Generate an API Key in WML. Save this API key for use in this tutorial.
Associate the WML service to the project you created in watsonx.ai.

Step 3. Install the necessary libraries:

Before proceeding, ensure that your environment is set up with the necessary libraries.These libraries are essential for loading, processing, and working with documents and embeddings. Follow the steps below to set up your environment. Run the following command to install all the required dependencies:

> pip install --upgrade --quiet langchain langchain-core langchain-community langchain-text-splitters langchain-milvus ibm-watsonx-ai bs4 unstructured

Note:
In this notebook, we used the following specific package versions to avoid compatibility issues. Please ensure you're using the same versions in case you encounter any discrepancies:

numpy==2.2.4
pandas==2.2.3
ibm-watsonx-ai==1.3.8
python >= 3.10

Authentication Setup-

from ibm_watsonx_ai import APIClient

# Set up WatsonX API credentials

my_credentials = {

"url": "<watsonx_url>", # Replace with your your service instance url (WatsonX URL)

"apikey": "<watsonx_api_key>", # Replace with your watsonx_api_key

}

# Initialize the WatsonX client for embeddings

client = APIClient(my_credentials)

Embedding Configuration-

IBM watsonx.ai offers several embedding models. Here we use SLATE_30M_ENGLISH_RTRVR, with truncation enabled.

from ibm_watsonx_ai.foundation_models.embeddings import Embeddings

from ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames as EmbedParams

model_id = client.foundation_models.EmbeddingModels.SLATE_30M_ENGLISH_RTRVR

# Define embedding parameters

embed_params = {

EmbedParams.TRUNCATE_INPUT_TOKENS: 128, # Adjust token truncation as needed

EmbedParams.RETURN_OPTIONS: {'input_text': True},

}

# Set up the embedding model

embedding = Embeddings(

model_id=model_id,

credentials=my_credentials,

params=embed_params,

project_id="<project_id>", # Replace with your project ID

space_id=None,

verify=False

)

Loading and Splitting Documents-

We use the Langchain DirectoryLoader to read all .txt files from a specific directory and split them into chunks using the RecursiveCharacterTextSplitter.
Note:- You can also use WebLoader to load documents from web sources.

from langchain_community.document_loaders import DirectoryLoader

from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load all .txt files from a directory

loader = DirectoryLoader("/root/scripts", glob="*.txt")

# Load documents from directory using the loader

documents = loader.load()

# Initialize a RecursiveCharacterTextSplitter for splitting text into chunks

splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

# Split the documents into chunks using the text_splitter

docs = splitter.split_documents(documents)

# Let's take a look at the first document

print(docs[1])

$Text Box: page_content='Interestingly, every Zilliz open‑source project is named after a bird, which is a naming convention that symbolizes freedom, foresight, and the agile evolution of technology. Unstructured Data, Embeddings, and Milvus Unstructured data, such as text, images, and audio, varies in format and carries rich underlying semantics, making it challenging to analyze. To manage this complexity, embeddings are used to convert unstructured data into numerical vectors that capture its essential characteristics. These vectors are then stored in a vector database, enabling fast and scalable searches and analytics. Milvus offers robust data modeling capabilities, enabling you to organize your unstructured or multi-modal data into structured collections. It supports a wide range of data types for different attribute modeling, including common numerical and character types, various vector types, arrays, sets, and JSON, saving you from the effort of maintaining multiple database systems.' metadata={'source': '/root/scripts/milvus_intro.txt'}$

As we can see, the document is already split into chunks. And the content of the data is about Milvus.

Generate Embeddings

Each document chunk is converted into a high-dimensional vector.

embedding_vectors = embedding.embed_documents(texts=[doc.page_content for doc in docs])

Store Embeddings in Milvus

We will initialize a Milvus vector store with the documents, which load the documents into the Milvus vector store and build an index under the hood.

from langchain_milvus import Milvus

vectorstore = Milvus.from_documents(

documents=docs,

embedding=embedding,

connection_args={

"uri": "http://<host>:<port>", # Replace with your watsonx.data Milvus URI or IP

"user":"<user>",

"password":"<password>",

"secure": True, # Set True if TLS is enabled

"server_pem_path":"<path of ca.cert of your Milvus instance>"

drop_old=True

)

Perform a Search Query

query = "What is a collection in Milvus ?"

print(vectorstore.similarity_search(query, k=1))

This finds the top 1 document most semantically similar to the query.

$Text Box: [Document(metadata={'pk': 457499506939836870, 'source': '/root/Scripts/collection.txt'}, page_content='Collection Explained In Milvus, you can create multiple collections to manage your data, and insert your data as entities into the collections. Collection and entity are similar to tables and records in relational databases. This page helps you to learn about the collection and related concepts.\n\nCollection A collection is a two-dimensional table with fixed columns and variant rows. Each column represents a field, and each row represents an entity.\n\nThe following chart shows a collection with eight columns and six entities.\n\nCollection explained\n\nCollection explained')]$

Set up watsonx.ai Language Model

We use ibm/granite-13b-instruct-v2 for answering the user query:

from ibm_watsonx_ai.foundation_models import ModelInference

from langchain_ibm import WatsonxLLM

# Initialize model inference

model_inference = ModelInference(

model_id="ibm/granite-3-3-8b-instruct", # Use a supported model

credentials=my_credentials,

params={

"max_new_tokens": 1024

project_id="<project_id>"

)

# Wrap with LangChain's WatsonxLLM

llm = WatsonxLLM(watsonx_model=model_inference)

We wrap the Granite model for use in LangChain.

Compose the Final RAG Chain

We define a prompt template and wrap the whole process into a LangChain Runnable.

This part turns query → context → final answer.

from langchain_core.runnables import RunnablePassthrough

from langchain_core.prompts import PromptTemplate

from langchain_core.output_parsers import StrOutputParser

# Define the prompt template for generating AI responses

PROMPT_TEMPLATE = """

Human: You are an AI assistant, and provides answers to questions by using fact based and statistical information when possible.

Use the following pieces of information to provide a concise answer to the question enclosed in <question> tags.

If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

</context>

{question}

</question>

The response should be specific and use statistics or numbers when possible.

Assistant:"""

# Create a PromptTemplate instance with the defined template and input variables

prompt = PromptTemplate(

template=PROMPT_TEMPLATE, input_variables=["context", "question"]

)

# Convert the vector store to a retriever

retriever = vectorstore.as_retriever()

def format_docs(docs):

return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (

{"context": retriever | format_docs, "question": RunnablePassthrough()}

| prompt

| llm

| StrOutputParser()

)

response = rag_chain.invoke(query)

print("\n Answer:", response)

Retrieved Response:-

Text Box: Answer: A collection is a two-dimensional table with fixed columns and variant rows. Each column represents a field.

There you go !! You have built a basic RAG chain powered by Milvus and LangChain.

References:-

· watsonx-ai-python-sdk

· Langchain | Milvus

· Choosing a foundation model in watsonx.ai | IBM watsonx

· Supported foundation models in watsonx.ai | IBM watsonx

· Langchain | IBM watsonx.ai

· Retrieval-augmented generation | IBM watsonx

· Converting text to text embeddings | IBM watsonx

#watsonx.data

1 comment

99 views

Permalink

https://community.ibm.com/community/user/blogs/divya13/2025/04/22/build-rag-with-watsonxdata-milvus-and-langchain

Comments

Gifi Siby

Wed April 23, 2025 01:33 AM

Very informative. Well done. . !

watsonx.data

watsonx.data

Build RAG with watsonx.data Milvus and LangChain

By Divya posted Tue April 22, 2025 05:03 AM

Step 1. Create Milvus Instance on watsonx.data

Step 2. Set up a Watson Machine Learning service instance and API key

Step 3. Install the necessary libraries:

Permalink

Comments

Additional
Resources

Office

Quick Links

watsonx.data

watsonx.data

Build RAG with watsonx.data Milvus and LangChain

By Divya posted Tue April 22, 2025 05:03 AM

Step 1. Create Milvus Instance on watsonx.data

Step 2. Set up a Watson Machine Learning service instance and API key

Step 3. Install the necessary libraries:

Permalink

Comments

Additional Resources

Office

Quick Links

Additional
Resources