Hi Ramya,
Great questions — we've been working through very similar challenges on a multi-agent WxO deployment and wanted to share what's worked for us.
Issue 1 — Latency
First thing to check: pull a trace and inspect the orchestrator's LLM span — look at the gen_ai.request.functions.* attributes. Confirm your collaborators actually appear under the names your instructions reference. WxO derives runtime tool names from the agent's display_name (lowercased, spaces become underscores, parentheses stripped) — NOT from the YAML name field. If your orchestrator instructions say "delegate to my_agent_name" but the LLM only sees my_agent_display_name_pha in its available tools, it can't match them — so it asks endless clarifying questions instead of delegating. This looks exactly like latency but it's actually a name mismatch.
On instruction size: on gpt-oss-120b, orchestrator instructions become unusable past roughly 5K characters. Our sweet spot is under 2.5K using a lean router pattern — just identity, intent categories, numbered delegation steps, and restrictions. Move all procedural detail into specialist agent instructions or KB documents. IBM's own developer docs confirm this model should be limited to about 3 sequential reasoning steps and ~150 word responses.
On architecture: IBM's docs confirm collaborator agents run sequentially — there's no parallelism. If you have sequential multi-agent chains or more than 4 collaborators, the architecture itself becomes the latency driver. A centralized orchestrator with 3-6 tools per agent works best. Beyond 8-10 tools per agent, coordination overhead eats into reasoning quality.
Issue 2 — Context Loss
Trace the collaborator's own LLM span. If gen_ai.request.functions.* is empty on the sub-agent (no tool schemas bound), the sub-agent has nothing to call and returns empty content back to the orchestrator — which then hallucates data to fill the gap. Re-importing the tool with --app-id and re-importing the agent YAML usually restores the binding. Testing the sub-agent directly (not as a collaborator) is a fast way to isolate whether it's the tool binding or the handoff path.
Also worth noting — changes you make to a collaborator agent in draft don't propagate to the orchestrator's routing view until that collaborator is deployed to the live environment. If you're iterating in draft, the orchestrator may still be routing against the last-deployed version.
One more thing on gpt-oss-120b: when delegating to collaborators, make sure the orchestrator passes input_message as a plain string, not a JSON object. The LLM sometimes sends {"input_message": {"key": "value"}} instead of {"input_message": "plain text context"}, which causes a pydantic validation error and drops context silently.
References that helped us:
Hope this helps — happy to compare notes if you want to dig into traces together.