Build RAG with watsonx.data Milvus and LangChain
Retrieval-Augmented Generation (RAG) is quickly becoming the go-to approach for building smarter, more reliable AI systems. Instead of relying only on what a language model already knows, RAG brings in external data at the time of the request—giving the model access to fresh, relevant information it might not have seen during training.
This documentation walks you through how to build a RAG system using LangChain and Milvus. By leveraging LangChain’s orchestration capabilities and Milvus’s high-performance vector search, we create a system capable of semantically retrieving documents and generating insightful answers in real time.
What is LangChain?
LangChain is an open-source framework designed to help developers build applications using large language models (LLMs). It offers a modular architecture that enables seamless integration between language models, prompt engineering, external data sources, and downstream tasks.

At its core, LangChain enables the following:
- Prompt Templates: Easily manage and reuse prompts with variables
- Language Model Support: Plug in models like Granite or Slate from watsonx.ai
- Data Source Integration: Pull in external data from APIs, databases, and vector stores like Milvus
- Chains: Create workflows that chain multiple steps together (e.g., retrieval + generation)
- Memory & Agents: Keep track of previous interactions or let the system decide what tool to use
Prerequisites -
Step 1. Create Milvus Instance on watsonx.data
You can refer to this Getting Started with IBM watsonx.data Milvus .
Step 2. Set up a Watson Machine Learning service instance and API key
-
Create a Watson Machine Learning service instance (you can choose the Lite plan, which is a free instance).
-
Generate an API Key in WML. Save this API key for use in this tutorial.
-
Associate the WML service to the project you created in watsonx.ai.
Step 3. Install the necessary libraries:
Before proceeding, ensure that your environment is set up with the necessary libraries.These libraries are essential for loading, processing, and working with documents and embeddings. Follow the steps below to set up your environment. Run the following command to install all the required dependencies:
> pip install --upgrade --quiet langchain langchain-core langchain-community langchain-text-splitters langchain-milvus ibm-watsonx-ai bs4 unstructured
Note:
In this notebook, we used the following specific package versions to avoid compatibility issues. Please ensure you're using the same versions in case you encounter any discrepancies:
numpy==2.2.4
pandas==2.2.3
ibm-watsonx-ai==1.3.8
python >= 3.10
Authentication Setup-
from ibm_watsonx_ai import APIClient
# Set up WatsonX API credentials
my_credentials = {
"url": "<watsonx_url>", # Replace with your your service instance url (WatsonX URL)
"apikey": "<watsonx_api_key>", # Replace with your watsonx_api_key
}
# Initialize the WatsonX client for embeddings
client = APIClient(my_credentials)
Embedding Configuration-
IBM watsonx.ai offers several embedding models. Here we use SLATE_30M_ENGLISH_RTRVR
, with truncation enabled.
from ibm_watsonx_ai.foundation_models.embeddings import Embeddings
from ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames as EmbedParams
model_id = client.foundation_models.EmbeddingModels.SLATE_30M_ENGLISH_RTRVR
# Define embedding parameters
embed_params = {
EmbedParams.TRUNCATE_INPUT_TOKENS: 128, # Adjust token truncation as needed
EmbedParams.RETURN_OPTIONS: {'input_text': True},
}
# Set up the embedding model
embedding = Embeddings(
model_id=model_id,
credentials=my_credentials,
params=embed_params,
project_id="<project_id>", # Replace with your project ID
space_id=None,
verify=False
)
Loading and Splitting Documents-
We use the Langchain DirectoryLoader to read all .txt files from a specific directory and split them into chunks using the RecursiveCharacterTextSplitter.
Note:- You can also use WebLoader to load documents from web sources.
from langchain_community.document_loaders import DirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
# Load all .txt files from a directory
loader = DirectoryLoader("/root/scripts", glob="*.txt")
# Load documents from directory using the loader
documents = loader.load()
# Initialize a RecursiveCharacterTextSplitter for splitting text into chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
# Split the documents into chunks using the text_splitter
docs = splitter.split_documents(documents)
# Let's take a look at the first document
print(docs[1])

As we can see, the document is already split into chunks. And the content of the data is about Milvus.
Generate Embeddings
Each document chunk is converted into a high-dimensional vector.
embedding_vectors = embedding.embed_documents(texts=[doc.page_content for doc in docs])
Store Embeddings in Milvus
We will initialize a Milvus vector store with the documents, which load the documents into the Milvus vector store and build an index under the hood.
from langchain_milvus import Milvus
vectorstore = Milvus.from_documents(
documents=docs,
embedding=embedding,
connection_args={
"uri": "http://<host>:<port>", # Replace with your watsonx.data Milvus URI or IP
"user":"<user>",
"password":"<password>",
"secure": True, # Set True if TLS is enabled
"server_pem_path":"<path of ca.cert of your Milvus instance>"
},
drop_old=True
)
Perform a Search Query
query = "What is a collection in Milvus ?"
print(vectorstore.similarity_search(query, k=1))
This finds the top 1 document most semantically similar to the query.
![Text Box: [Document(metadata={'pk': 457499506939836870, 'source': '/root/Scripts/collection.txt'}, page_content='Collection Explained In Milvus, you can create multiple collections to manage your data, and insert your data as entities into the collections. Collection and entity are similar to tables and records in relational databases. This page helps you to learn about the collection and related concepts.\n\nCollection A collection is a two-dimensional table with fixed columns and variant rows. Each column represents a field, and each row represents an entity.\n\nThe following chart shows a collection with eight columns and six entities.\n\nCollection explained\n\nCollection explained')]](https://dw1.s81c.com//IMWUC/MessageImages/f70bb6245d8349a5af8e86ea488d0a62.png)
Set up watsonx.ai Language Model
We use ibm/granite-13b-instruct-v2 for answering the user query:
from ibm_watsonx_ai.foundation_models import ModelInference
from langchain_ibm import WatsonxLLM
# Initialize model inference
model_inference = ModelInference(
model_id="ibm/granite-3-3-8b-instruct", # Use a supported model
credentials=my_credentials,
params={
"max_new_tokens": 1024
},
project_id="<project_id>"
)
# Wrap with LangChain's WatsonxLLM
llm = WatsonxLLM(watsonx_model=model_inference)
We wrap the Granite model for use in LangChain.
Compose the Final RAG Chain
We define a prompt template and wrap the whole process into a LangChain Runnable
.
This part turns query → context → final answer.
from langchain_core.runnables import RunnablePassthrough
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
# Define the prompt template for generating AI responses
PROMPT_TEMPLATE = """
Human: You are an AI assistant, and provides answers to questions by using fact based and statistical information when possible.
Use the following pieces of information to provide a concise answer to the question enclosed in <question> tags.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
<context>
{context}
</context>
<question>
{question}
</question>
The response should be specific and use statistics or numbers when possible.
Assistant:"""
# Create a PromptTemplate instance with the defined template and input variables
prompt = PromptTemplate(
template=PROMPT_TEMPLATE, input_variables=["context", "question"]
)
# Convert the vector store to a retriever
retriever = vectorstore.as_retriever()
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
response = rag_chain.invoke(query)
print("\n Answer:", response)
Retrieved Response:-

There you go !! You have built a basic RAG chain powered by Milvus and LangChain.
References:-
· watsonx-ai-python-sdk
· Langchain | Milvus
· Choosing a foundation model in watsonx.ai | IBM watsonx
· Supported foundation models in watsonx.ai | IBM watsonx
· Langchain | IBM watsonx.ai
· Retrieval-augmented generation | IBM watsonx
· Converting text to text embeddings | IBM watsonx
#watsonx.data