High latency in agents responses

Question

Hello,
I'm currently working on a multi-agent setup using watsonx Orchestrate, where a primary orchestrator agent delegates tasks to multiple collaborator (sub) agents. These sub-agents are configured with a combination of workflows and tool integrations.

Issue 1: High Latency in Agent Interactions
I'm observing noticeable latency in the following scenarios:
- Tool invocation within agents
- Handoffs between orchestrator and collaborator agents
- Overall response time when multiple agents are involved in a single user request

This latency becomes more pronounced in multi-step interactions involving multiple tool calls and agent transfers.

Could you please suggest:
- Recommended best practices for optimizing latency in multi-agent setups
- Any configuration changes, architectural patterns, or limits (e.g., number of agents, tool calls) to be aware of
- Whether there are known performance constraints specific to workflows vs direct tool calls

Issue 2: Inconsistent Agent Transfer & Context Loss
I am also encountering inconsistencies in agent-to-agent transfers, specifically:
- Partial or complete loss of context when transitioning between agents
- Occasional failure in maintaining continuity of user intent across collaborator agents

I understand that context propagation across agents may have known limitations. As a workaround, I am currently adding more explicit instructions and structured inputs, which has helped to some extent.

However, I would like guidance on:
- Recommended approaches for ensuring reliable context transfer between agents
- Any built-in mechanisms, patterns, or roadmap features that address this limitation

I would appreciate any guidance, best practices, or documentation references that can help improve performance and reliability in this setup.

Thank you,
Ramya

Answer

Hi Ramya,

Great questions — we've been working through very similar challenges on a multi-agent WxO deployment and wanted to share what's worked for us.

Issue 1 — Latency

First thing to check: pull a trace and inspect the orchestrator's LLM span — look at the gen_ai.request.functions.* attributes. Confirm your collaborators actually appear under the names your instructions reference. WxO derives runtime tool names from the agent's display_name (lowercased, spaces become underscores, parentheses stripped) — NOT from the YAML name field. If your orchestrator instructions say "delegate to my_agent_name" but the LLM only sees my_agent_display_name_pha in its available tools, it can't match them — so it asks endless clarifying questions instead of delegating. This looks exactly like latency but it's actually a name mismatch.

On instruction size: on gpt-oss-120b, orchestrator instructions become unusable past roughly 5K characters. Our sweet spot is under 2.5K using a lean router pattern — just identity, intent categories, numbered delegation steps, and restrictions. Move all procedural detail into specialist agent instructions or KB documents. IBM's own developer docs confirm this model should be limited to about 3 sequential reasoning steps and ~150 word responses.

On architecture: IBM's docs confirm collaborator agents run sequentially — there's no parallelism. If you have sequential multi-agent chains or more than 4 collaborators, the architecture itself becomes the latency driver. A centralized orchestrator with 3-6 tools per agent works best. Beyond 8-10 tools per agent, coordination overhead eats into reasoning quality.

Issue 2 — Context Loss

Trace the collaborator's own LLM span. If gen_ai.request.functions.* is empty on the sub-agent (no tool schemas bound), the sub-agent has nothing to call and returns empty content back to the orchestrator — which then hallucates data to fill the gap. Re-importing the tool with --app-id and re-importing the agent YAML usually restores the binding. Testing the sub-agent directly (not as a collaborator) is a fast way to isolate whether it's the tool binding or the handoff path.

Also worth noting — changes you make to a collaborator agent in draft don't propagate to the orchestrator's routing view until that collaborator is deployed to the live environment. If you're iterating in draft, the orchestrator may still be routing against the last-deployed version.

One more thing on gpt-oss-120b: when delegating to collaborators, make sure the orchestrator passes input_message as a plain string, not a JSON object. The LLM sometimes sends {"input_message": {"key": "value"}} instead of {"input_message": "plain text context"}, which causes a pydantic validation error and drops context silently.

References that helped us:

IBM's high-performance agent design guide: https://www.ibm.com/docs/en/watsonx/watson-orchestrate/base?topic=agents-designing-high-performance-ai
Best practices for instructions and descriptions: https://developer.watson-orchestrate.ibm.com/agents/descriptions#best-practices-for-writing-instructions

Hope this helps — happy to compare notes if you want to dig into traces together.

watsonx Orchestrate