watsonx Orchestrate

 View Only

 High latency in agents responses

Ramya Swarna's profile image
Ramya Swarna posted Mon April 13, 2026 06:42 AM

Hello,
I'm currently working on a multi-agent setup using watsonx Orchestrate, where a primary orchestrator agent delegates tasks to multiple collaborator (sub) agents. These sub-agents are configured with a combination of workflows and tool integrations.

Issue 1: High Latency in Agent Interactions
I'm observing noticeable latency in the following scenarios:
- Tool invocation within agents
- Handoffs between orchestrator and collaborator agents
- Overall response time when multiple agents are involved in a single user request

This latency becomes more pronounced in multi-step interactions involving multiple tool calls and agent transfers.

Could you please suggest:
- Recommended best practices for optimizing latency in multi-agent setups
- Any configuration changes, architectural patterns, or limits (e.g., number of agents, tool calls) to be aware of
- Whether there are known performance constraints specific to workflows vs direct tool calls

Issue 2: Inconsistent Agent Transfer & Context Loss
I am also encountering inconsistencies in agent-to-agent transfers, specifically:
- Partial or complete loss of context when transitioning between agents
- Occasional failure in maintaining continuity of user intent across collaborator agents

I understand that context propagation across agents may have known limitations. As a workaround, I am currently adding more explicit instructions and structured inputs, which has helped to some extent.

However, I would like guidance on:
- Recommended approaches for ensuring reliable context transfer between agents
- Any built-in mechanisms, patterns, or roadmap features that address this limitation

I would appreciate any guidance, best practices, or documentation references that can help improve performance and reliability in this setup.

Thank you,
Ramya

Laurent de Clermont-Tonnerre's profile image
Laurent de Clermont-Tonnerre

Hello Ramya, thank you for your inquiry .

Are you using guidelines or plugins in addition to instructions? Note that guidelines do add a cost in performance, regardless their number, so if you don't use many, stick to instructuctions.

Make sure to make use of the agent and tool descriptions as those are leveraged by the orchestrator/parent agent along with its own instructions, not just to know if to invoke that agent but also how. This is also a good way to re-use instructions and reduce the complexity to manage that of the parent agent.

Finally, make sure you refer to those pages in our docs:

I hope this helps!
Harold Bergeron's profile image
Harold Bergeron

Hi Ramya,

Great questions — we've been working through very similar challenges on a multi-agent WxO deployment and wanted to share what's worked for us.

Issue 1 — Latency

First thing to check: pull a trace and inspect the orchestrator's LLM span — look at the gen_ai.request.functions.* attributes. Confirm your collaborators actually appear under the names your instructions reference. WxO derives runtime tool names from the agent's display_name (lowercased, spaces become underscores, parentheses stripped) — NOT from the YAML name field. If your orchestrator instructions say "delegate to my_agent_name" but the LLM only sees my_agent_display_name_pha in its available tools, it can't match them — so it asks endless clarifying questions instead of delegating. This looks exactly like latency but it's actually a name mismatch.

On instruction size: on gpt-oss-120b, orchestrator instructions become unusable past roughly 5K characters. Our sweet spot is under 2.5K using a lean router pattern — just identity, intent categories, numbered delegation steps, and restrictions. Move all procedural detail into specialist agent instructions or KB documents. IBM's own developer docs confirm this model should be limited to about 3 sequential reasoning steps and ~150 word responses.

On architecture: IBM's docs confirm collaborator agents run sequentially — there's no parallelism. If you have sequential multi-agent chains or more than 4 collaborators, the architecture itself becomes the latency driver. A centralized orchestrator with 3-6 tools per agent works best. Beyond 8-10 tools per agent, coordination overhead eats into reasoning quality.

Issue 2 — Context Loss

Trace the collaborator's own LLM span. If gen_ai.request.functions.* is empty on the sub-agent (no tool schemas bound), the sub-agent has nothing to call and returns empty content back to the orchestrator — which then hallucates data to fill the gap. Re-importing the tool with --app-id and re-importing the agent YAML usually restores the binding. Testing the sub-agent directly (not as a collaborator) is a fast way to isolate whether it's the tool binding or the handoff path.

Also worth noting — changes you make to a collaborator agent in draft don't propagate to the orchestrator's routing view until that collaborator is deployed to the live environment. If you're iterating in draft, the orchestrator may still be routing against the last-deployed version.

One more thing on gpt-oss-120b: when delegating to collaborators, make sure the orchestrator passes input_message as a plain string, not a JSON object. The LLM sometimes sends {"input_message": {"key": "value"}} instead of {"input_message": "plain text context"}, which causes a pydantic validation error and drops context silently.

References that helped us:

Hope this helps — happy to compare notes if you want to dig into traces together.