Instana

Instana

The community for performance and observability professionals to learn, to share ideas, and to connect with others.

 View Only

Observability for Multi-Agent RAG Workflows with Instana

By Divya Pathak posted 2 days ago

  

As AI evolves, so does the way it understands and responds to our questions. A popular technique powering many advanced AI systems today is RAG — RetrievalAugmented Generation. RAG works in three steps: 

  1. Retrieval — fetch relevant documents from a vector database. 

  1. Augmentation — insert those documents into the prompt as extra context. 

  1. Generation — let a large language model (LLM) craft the final answer. 

But as tasks get more complex — like researching, planning, or writing multi-step reports — traditional RAG starts to fall short. That’s where Agentic RAG comes in. It builds on the basic RAG framework by introducing intelligent agents that can plan, decide, and collaborate. As shown in the Figure 1, key component here is the Retrieval Router Agent, which acts as a smart coordinator. Based on the input query, it decides which specialized retriever agent (X, Y, or Z) is best suited for the task — or even assigns the task to multiple retrievers in parallel. By breaking down responsibilities and routing tasks intelligently, Agentic RAG makes complex reasoning and multi-step generation much more scalable and effective.  

Shape A diagram of a computer process

AI-generated content may be incorrect. Figure 1: An Overview of Multi-Agent Agentic RAG Systems 

 

As shown in Figure 1, a Retrieval Router delegates queries to specialized retrievers (e.g., X, Y, Z). When retrieval fails— due to timeouts, misconfigurations, or mismatch embeddings —the system may silently degrade, returning poor or irrelevant results. 

Without observability, you can’t: 

  • Diagnose failures (such as, vector store issues, malformed queries, incorrect search metrics) 

  • Understand why a retriever was chosen or skipped for the task 

  • Detect document quality issues like irrelevant documents or redundant documents 

Instrumenting observability lets you track everything from query input, performance to document retrieval and generation—crucial since generation quality depends on retrieval quality.  

In this article, we’ll guide you through integrating Traceloop into your Multi-agent RAG system to emit traces for each decision cycle. We’ll then demonstrate how to use the Instana Agent to forward these traces to Instana, where Workflow View and Tool View provide real-time insights into agent performance, tool usage patterns, and the overall decision-making process. Lastly, we will showcase a custom dashboard created solely for Agentic RAG applications.

  

Agentic RAG Application Workflow 

We build a Multi-Agent RAG application as shown in Figure 2, inspired by LangGraph’s Hierarchical Agent Teams tutorial and extend it into a Retrieval-Augmented Generation (RAG) system tailored for complex, multi-step tasks like research and content creation. At its core, the system features a Supervisor agent that intelligently routes user queries to specialized agent teams—Research and Writing—based on task requirements. Each agent follows the ReACT (Reasoning and Acting) paradigm, blending language models, structured prompts, and external tools to operate step-by-step with traceable logic. 

A diagram of a research process

AI-generated content may be incorrect. Figure 2: A High-Level View of the Multi-Agent RAG System Referenced Throughout This Article 

 

For retrieval, the system leverages a standalone Milvus vector store populated with documents on leading AI agents. We ingest the corpus with pymilvus, create indexes, and expose it through an embedded retriever. When a user asks, What are the top AI agents and write a blog on it the Supervisor first dispatches the query to the Research Team. The rag_agent, has a retriever tool which performs a semantic vector search over Milvus to fetch relevant documents. These documents are then summarized before the Writing Team takes over, using agents like doc_writer to transform the research output into a well-structured blog post. 

The complete code for the Multi-Agent RAG application is available in the following GitHub Enterprise repository: https://github.com/IBM/multiagentic-rag-monitoring-blog

Setting Up Observability for Multi-Agent RAG using Traceloop and Instana 

The following are the steps to setup Traceloop and Instana for observing Multi-Agent RAG application:

    1. Clone the Repository: MultiAgentic-RAG-Monitoring Github IBM 

  1. Environment Configuration: Create an environment and Milvus vector store  

  1. Add Observability with TraceloopStep 2 in Figure 3 

  1. Set up Instana AgentStep 3 Figure 3 

 

A diagram of a process

AI-generated content may be incorrect. Figure 3: Instrumenting Multi-Agent RAG Application with Traceloop and Routing Traces/Metrics to Instana 

Given how crucial retrieval quality is to the performance of RAG systems, having deep visibility into agent behavior is vital for debugging and trust. By instrumenting the system with Traceloop, we gain complete traceability across the RAG lifecycle—from prompt creation and agent routing to documents retrieval and response generation. Instana augments this with dashboards, alerts, and performance validation tools, allowing teams to monitor and fine-tune their RAG workflows effectively. 

Figure 4: Traces View showcasing (A) Retriever Milvus Span (B) Retrieval Attributes related to Input query and Performance 

 

Figure 4 (A) illustrates the Milvus search spans captured via Traceloop pymilvus instrumentation. Figure 4 (B) shows key attributes such as result count, query vector dimensions, total query data searched, and the similarity metric used provide valuable insights into the retrieval behavior and performance of vector searches. 

Figure 5 displays the Retrieval Tool output, which logs each document retrieved along with its ID, source entity, full text, and similarity score. This helps assess the relevance of retrieved vectors and understand how effectively the retriever is returning contextually useful documents. Figure 6 shows the summarization output of the rag_agent LLM based on the retrieved documents.

 

A screen shot of a computer

AI-generated content may be incorrect. Figure 5: Retrieval: Output after calling the Milvus Search using tool retrieve_recent_events_rag  

A screen shot of a computer

AI-generated content may be incorrect.Figure 6: Generation: Retriever documents are Summarized by the LLM by rag_agent 

One insightful observation made through Instana involves prompt-specific behavior in the RAG workflow. In Scenario 1, when the input prompt is “What are the top AI agents and write a blog on it and save it,” the system triggers 810 sub-calls, logs 3 errors, and takes approximately 2.8 minutes to complete. As seen in Figure 7, Instana’s Workflow View clearly highlights this spike—LLM calls surge, and errors arise due to recursion limits. This occurs because the Agent Writing Supervisor is repeatedly invoked by the Agent Supervisor, struggling to determine the correct file name to save the output. 

Figure 7: Workflow View: Not specifying the file name in the initial prompt leads to 810 sub-calls! 

In Scenario 2, simply specifying the file name in the promptAI_agents.txt”—dramatically optimizes the workflow. Sub-calls drop to 68, and the total execution time shrinks to just 17 seconds (Figure 8). Thanks to Instana Dashboards, such inefficiencies become immediately visible, allowing teams to diagnose and fix issues with minimal friction. 

A screenshot of a computer

AI-generated content may be incorrect. Figure 8: Workflow view: After specifying the filename, the subcalls reduced to just 68! 

 

Creating Dashboards using Instana

Figure 9 shows a custom dashboard on Instana which allows monitoring Agentic RAG systems. It provides end-to-end visibility into critical performance metrics such as RAG mean latency, retriever-specific latency, and success vs. failure rates of retrieval calls, error categories. With visual breakdowns of search types: search, hybrid search and sub-types (basic, range, filter, range + filter, hybrid) and retrieval metrics (e.g., L2, Cosine, IP), teams can quickly identify bottlenecks, inefficient search strategies, or misaligned similarity measures. Time-series charts reveal query volume spikes, retriever errors, and latency fluctuations—empowering developers and SREs to diagnose issues, understand system behaviour in real-time, and continuously improve retrieval and generation quality. 

 

Figure 9: Instana Custom Dashboard

Conclusion 

In this article, we showed how to make Multi-agent RAG systems observable using Traceloop and Instana. With trace data flowing into Instana, you can easily monitor agent decisions, tool usage, and retrieval performance in real time. Clear visualizations and custom dashboards help quickly spot issues, reduce latency, and improve output quality —making your RAG pipeline easier to debug, optimize, and trust. 


#Agent
#Tracing
#BusinessObservability
#OpenTelemetry
#LLM


#CustomDashboards
0 comments
31 views

Permalink