Instana

The community for performance and observability professionals to learn, to share ideas, and to connect with others.

View Only

Back to Blog List

Elevating AI observability: Monitoring OpenAI agents with Instana

By Adharsh H posted Tue August 05, 2025 11:54 AM

In today’s fast-paced world of AI-powered applications, maintaining the performance, reliability, and cost-effectiveness of your OpenAI agents is critical. As these intelligent agents independently execute tasks, make decisions, and engage with users or systems, robust observability becomes essential. This is where an advanced observability platform, such as, Instana proves invaluable. Instana delivers in-depth monitoring for both your OpenAI agents and the large language models (LLMs) they rely on.

The challenge of observing AI agents

Observing AI agents presents unique challenges that traditional monitoring tools are not equipped to handle effectively:

Dynamic and non-deterministic behavior: Unlike conventional applications, AI agents often make autonomous decisions that can vary with the same input. This non-determinism renders static monitoring approaches ineffective, as they fail to capture the fluid and adaptive nature of agent behavior.
The black box nature of LLMs: Large language models (LLMs) are inherently opaque, making it difficult to understand the reasoning behind a specific response. In agent workflows, this lack of transparency poses a major challenge for debugging, auditing, and ensuring accountability.
Complex chains of reasoning and tool use: AI agents typically perform multi-step tasks that involve reasoning, calling external tools or APIs, and sometimes coordinating with other agents. Monitoring and tracing these sequences end-to-end are essential for understanding how the agent reaches conclusions and where failures may occur.
Cost visibility and management: Each interaction with an LLM—especially through APIs like OpenAI—incurs a cost. Without fine-grained visibility into usage patterns and token consumption, it becomes difficult to control spending and optimize for cost-efficiency.
Ensuring quality, safety, and compliance: Beyond technical performance, it’s critical to ensure that the AI agent delivers accurate, safe, and unbiased results. This is crucial in regulated or high-stakes domains where errors can have significant consequences.

These challenges demonstrate the need for an automated observability tool tailored to the specific demands of LLM-powered systems.

Instana for OpenAI agent monitoring

Instana solves the challenges of monitoring AI agents by offering a comprehensive, automated observability platform tailored to the unique demands of LLM-powered systems. Instana is an enterprise-grade, AI-powered observability platform designed to deliver real-time insights into complex, cloud-native environments. It automatically discovers services, traces transactions, and monitors performance across the entire technology stack with minimal manual intervention.

Here is how it enables effective monitoring and optimization of your OpenAI agents:

Seamless OpenTelemetry integration via Traceloop: Instana natively supports Opentelemetry(OTel), the open standard for telemetry data. By using Traceloop—an OTel-native SDK purpose-built for OpenAI agent instrumentation—you can generate standardized traces, metrics, and logs directly from your AI applications. Instana ingests this data effortlessly, providing a unified, real-time view of your agent workflows and the underlying LLM interactions.
LLM-specific metrics: Beyond generic application metrics, Instana focuses on the unique performance indicators crucial for AI agents and LLMs:

Token usage: Monitor prompt, completion, and total token counts to directly track consumption and manage costs.
Latency: Measure the end-to-end response time of your agents, including LLM inference time and tool execution.
Error rates: Quickly identify and diagnose failures in agent runs, function calls, and LLM interactions.

End-to-end distributed tracing: Instana’s powerful distributed tracing capabilities provide complete visibility into the execution of OpenAI agents. You can trace the journey from the user's initial request, through the agent's internal decision-making and tool usage, to the final response. This end-to-end view is crucial for identifying "soft failures," where an agent appears to function correctly but returns inaccurate or less-than-ideal outcomes.
Customizable alerts and events: Instana allows you to define events and alerting conditions based on the behavior of your OpenAI agents. Rather than relying solely on automatic anomaly detection, you can set specific thresholds—for example, sudden spikes in latency, increased token usage, or higher error rates. When these thresholds are crossed, Instana triggers alerts, helping teams respond quickly and maintain reliable AI performance.
Custom dashboards for focused insights: Instana allows you to build custom dashboards to visualize the metrics most important to your specific OpenAI agent use cases. By visualizing these metrics in a customizable and interactive interface, teams can quickly detect anomalies, optimize model usage, and make informed decisions to improve reliability and efficiency across AI-driven workflows.

Together, these capabilities position Instana as a purpose-built solution for monitoring the complex and dynamic behavior of OpenAI agents. It converts raw observability data into actionable insights, closing visibility gaps and removing the manual effort usually needed to monitor LLM-powered systems.

Get started with Instana and OpenAI agents

To get started with Instana and OpenAI agents, follow these steps:

Install OpenAI agents: To get started with building and monitoring OpenAI agents, install the official openai-agents package. You can install it using pip:
```
pip install openai-agents==0.0.19
```
Deploy the Instana agent: Deploy the Instana agent to your host to collect telemetry data. This enables real-time monitoring and observability for your OpenAI agent applications. To deploy the Instana agent, see Deploy Instana agent
Installation of OTel LLM data collector: Install and set up the OTel LLM Data Collector to enable metrics collection. For installation and configuration instructions, refer to the official guide on OTel Data Collector for LLM.
Instrument your application with Traceloop: Leverage the Traceloop SDK to seamlessly instrument your OpenAI agent application. With OpenTelemetry embedded, Traceloop simplifies the process of capturing traces and metrics from OpenAI API calls and agent frameworks. To install Traceloop, use the following command:
```
pip install traceloop-sdk==0.41.0
```
Configure environment variables for Instana export: Configure the environment variables where your OpenAI agents application runs to enable the export of traces, logs, and metrics to Instana. To view the required environment variables, see Configuring the environment.

Monitoring OpenAI agents with Instana

In this section, we’ll walk through a simple example app — the Travel Itinerary Assistant — to show how you can use OpenAI Agents to build a smart, multi-step workflow for planning a trip. Once you’ve created the app, you can see how metrics and traces collected during execution are visualized in Instana, offering deep insight into agent behavior and LLM-driven workflows.

The Travel Itinerary Assistant is an OpenAI agent-powered application that generates personalized travel plans from natural language input. It uses a modular, multi-agent workflow where each agent handles a specific task — from extracting destinations and planning routes to estimating travel times and summarizing the itinerary. This step-by-step design makes the planning process both accurate and conversational. The application is integrated with the Traceloop SDK for observability, enabling real-time tracing and debugging.

import asyncio
from agents import Agent, Runner, function_tool
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import workflow

# Initialize Traceloop for observability
Traceloop.init(app_name="Travel_Itinerary_Assistant", disable_batch=True)

@function_tool(
    name_override="calculate_travel_time",
    description_override="Estimate travel time based on cities and transport type."
)
def calculate_travel_time(source: str, destination: str, mode: str) -> str:
    if mode.lower() == "flight":
        return f"Approx. 2 hours from {source} to {destination} by air."
    elif mode.lower() == "train":
        return f"Approx. 6 hours from {source} to {destination} by train."
    else:
        return f"Approx. 10 hours from {source} to {destination} by road."

useModel = "gpt-4.1"

# 1. Destination Extractor
Destination_Extractor_Agent = Agent(
    name="Destination Extractor",
    instructions="""
    You are a helpful assistant that extracts destination cities from the user's travel request.
    Only return a comma-separated list of places they want to visit.
    """,
    model=useModel,
)

# 2. Planner Agent
Planner_Agent = Agent(
    name="Planner Agent",
    instructions="""
    You are a smart travel planner. Given a list of destinations, return an ideal travel order and suggest how many days to spend in each place.
    Keep it realistic and concise.
    """,
    model=useModel,
)

# 3. Travel Time Estimator
Time_Estimator_Agent = Agent(
    name="Travel Time Estimator",
    instructions="""
    You are a travel time estimator. Use the 'calculate_travel_time' tool to estimate how long it takes to travel between the destinations.
    Assume travel is by flight unless stated otherwise.
    """,
    tools=[calculate_travel_time],
    model=useModel,
)

# 4. Final Recommender
Recommendation_Agent = Agent(
    name="Recommendation Agent",
    instructions="""
    You are a friendly travel assistant. Based on the itinerary and travel times, summarize the full travel plan for the user.
    Keep it organized and engaging.
    """,
    model=useModel,
)



@workflow(name="travel_itinerary_builder")
async def main():
    user_input = """I'm planning a trip next month. I want to visit New York, San Francisco, and Las Vegas.
    I'd like to spend more time in San Francisco and travel by flights where possible."""

    # Step 1: Extract destinations
    result = await Runner.run(
        starting_agent=Destination_Extractor_Agent,
        input=f"Extract all cities mentioned: {user_input}",
    )
    destinations = result.final_output
    print("\n Destinations:\n", destinations)

    # Step 2: Plan order and duration
    result = await Runner.run(
        starting_agent=Planner_Agent,
        input=f"Plan the travel route and suggest stay durations for: {destinations}",
    )
    itinerary = result.final_output
    print("\n Itinerary Plan:\n", itinerary)

    # Step 3: Estimate travel time
    result = await Runner.run(
        starting_agent=Time_Estimator_Agent,
        input=f"Estimate time to travel between: {destinations}",
    )
    travel_times = result.final_output
    print("\n Travel Times:\n", travel_times)

    # Step 4: Final recommendation
    result = await Runner.run(
        starting_agent=Recommendation_Agent,
        input=f"User wants to travel to {destinations}. Here is the plan: {itinerary}. Travel time info: {travel_times}. Summarize everything.",
    )
    print("\nFinal Recommendation:\n", result.final_output)


if __name__ == "__main__":
    asyncio.run(main())

The following image displays the outputs generated by the Travel Itinerary Assistant application after processing the user's input. It visually represents the results from each stage of the agent-driven workflow.

Here is an in-depth view of the traces collected during the execution of the user’s input. These traces offer a sequential breakdown of how each agent handled the input, triggered relevant tools, and produced corresponding outputs, providing clear insights into the internal flow and behavior of the system.

Detailed traces collected from the application

The following image shows metrics collected from the large language model (LLM) used by agents in the Travel Itinerary Assistant application. These metrics provide valuable insights into the model’s performance, including token usage, response time, number of requests, and overall latency. Monitoring these metrics helps evaluate the efficiency and cost-effectiveness of the application, identify performance bottlenecks, and ensure optimal resource utilization.

The metrics dashboard showing the metrics collected from LLMs used in the OpenAI agent powered application

Conclusion

Thus, monitoring OpenAI agents with Instana delivers real-time, in-depth observability required to navigate the complexity of AI-driven systems. With its full-stack monitoring capabilities, seamless OpenTelemetry integration, and actionable insights, Instana offers a clear window into the performance of your AI workloads. This enables teams to swiftly detect and address issues, fine-tune performance, manage costs, and enhance the overall effectiveness of your intelligent agents. Leveraging Instana’s observability ensures you get the most out of your OpenAI agent deployments.

References

Enhancing AI agent observability: Monitoring CrewAI using WatsonX with Instana

#Infrastructure
#OpenTelemetry
#Tracing
#LLM

0 comments

30 views

Permalink

https://community.ibm.com/community/user/blogs/adharsh-h/2025/08/05/elevating-ai-observability-monitoring-openai-agent

Instana

Instana

Elevating AI observability: Monitoring OpenAI agents with Instana

By Adharsh H posted Tue August 05, 2025 11:54 AM

Permalink

Additional
Resources

Office

Quick Links

Instana

Instana

Elevating AI observability: Monitoring OpenAI agents with Instana

By Adharsh H posted Tue August 05, 2025 11:54 AM

Permalink

Additional Resources

Office

Quick Links

Additional
Resources