Global AI and Data Science

Global AI & Data Science

Train, tune and distribute models with generative AI and machine learning capabilities

 View Only

Anatomy of an AI Agent / Watsonx Orchestrate

By Patrick Meyer posted 10 hours ago

  

Today's AI agents are no longer those chatbots with sophisticated prompts that we once knew. They are structured software entities that can combine an LLM "brain," a scheduling engine, tool APIs, and memory warehouses to pursue goals autonomously. Agent Development Kits (ADKs) illustrate distinctive design patterns that solve the same fundamental problem, how to plan and execute a multi-step job, while differing in the level of control they expose to the developer.

Image: Anatomy of an AI Agent based on the painting "The Anatomy Lesson of Doctor Tulp" by Rembrandt (1632). AI generated

The boom of Agentic AI and planification

Since the release of ChatGPT in late 2022, global interest in "AI agents" and "Agentic AI" has increased sharply, reflecting a shift from passive text generation to complex problem solving where the goal guides the slicing of activities. Early single-agent prototypes like AutoGPT and BabyAGI sparked the imagination, showing that large language models could chain tasks together by acting on instructions. But these early systems, limited in terms of planning, tended to get stuck in loops of action or reflection, without reaching a conclusion or completing the task at hand.

Planning is difficult because it requires the system to:

  • Breaks down ambiguous goals into executable steps, based on knowledge of the external world or the organization's internal processes.
  • Assigns the right tools or delegates to specialized agents for each subtask. And
  • Dynamically adjusts plans in response to failures, tool outputs, controls from other agents, or feedback from the "human in the loop" (HITL).

Some of the first responses to these challenges include Microsoft's AutoGen, one of the first open-source frameworks to formalize agent communication, recursive reasoning, and planning as concepts. Released in early 2023, AutoGen laid the groundwork for a more structured agent design by introducing multi-agent conversations, tool integration, and memory support, all in a modular, developer-friendly framework. His early influence helped to establish the building blocks for future design frameworks.

Building on this foundation, a new wave of frameworks has emerged since AutoGen, including CrewAI,  Langchain's LanGraph, or more recently  Google's Agent Development Kit (ADK), each introducing more sophisticated planning layers to tackle coordination issues head-on. These frameworks have gone beyond single agent loops to offer explicit orchestration, dynamic delegation, and task decomposition engines, helping agents operate more efficiently in real-world environments.

Through successive software iterations, agents have become much easier to set up and configure. They are now able to manage multi-step tasks with memory, the use of tools, and in-flight replanning, rather than collapsing at the first unexpected result. What started as a set of random demos, has evolved into a growing ecosystem of ADKs, each offering tools to manage complexity, apply guardrails, and design agents that can reason, act, and adapt.


What exactly is an AI agent?

The recently published research paper by Sapkota et al. (May 2025): "AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges" describes a taxonomy that distinguishes AI agents (single actors augmented with tools) from agentic AI (multi-agent collaborative systems). In both categories, an individual agent implements the same internal anatomy.

According to Sapkota, an AI agent is a self-contained program designed to achieve a specific goal in a digital environment. It is based on fundamental models such as large language models (LLM) or image models (LIM), and combines perception, reasoning, and action skills. These agents can perform targeted tasks such as assistance with writing, planning, or information retrieval. He acts sequentially, sometimes uses tools, and follows a logic of "perceiving reasoning → acting" within a well-defined perimeter. It may appear intelligent, but its behavior is often scripted or limited to simple loops of reasoning, without sophisticated memory or the ability to self-reorganize. He is an intelligent executor, but not very reflective or self-adaptive.

Its operation is based on techniques such as prompt engineering, step-by-step reasoning (chain of thought), or the use of external tools (API, database). However, his autonomy remains limited to the scope of the task, and he is subject to notorious weaknesses such as hallucinations, the absence of persistent memory, or difficulties in generalizing in complex contexts.

The AI agent ecosystem is typically made up of several functional building blocks, each responsible for a distinct part of decision-making and action pipeline. Understanding these building blocks helps clarify how agents operate, make decisions, and interact with tools and environments:

  1. Loop of goals and tasks. At the top layer, the agent should be able to receive goals and track their completion. This loop works in a REPL (read-eval-print-loop) style in connection with the user or to follow up on an event-triggered trigger, initiating the reasoning and planning cascade each time a new goal is introduced. Upon receipt of the objective, the agent's reasoning engine interprets the context and generates high-level planning. This engine is typically powered by a call to a large language model (LLM) such as GPT-4o, Gemini, or Mistral-Large, which allows for contextual understanding and strategic thinking.
  2. Scheduling engine. Once the high-level plan is created, the scheduling engine breaks it down into smaller, orderly subtasks. This engine organizes the steps necessary to complete tasks, often based on approaches such as ReAct (Reasoning + Acting) or Chain of Thought (CoT). This module is usually driven by dynamic prompt engineering or by high-level scripts that structure the agent's behavior. Some frameworks support agent-to-agent communication to efficiently transfer subtasks between agents.
  3. Runtime. An execution mechanism orchestrates the entire process, ensuring the smooth transition between perception, reasoning, action, and feedback.
  4. Interface with tools. To interact with the external world, whether through APIs, code execution, or information retrieval, the agent uses a tool interface. It does this via a tool former, which allows it to interact with external resources, such as APIs, knowledge bases, or web browsers, as needed. These tools are called opportunistically, often triggered by explicit queries generated by the model itself. This can include calling JSON functions, running code in Python sandboxes, or external services such as web search. These interfaces transform the agent from a passive reasoning engine into an active system capable of producing effects in the real world. This relationship is now amplified by Anthropic's MCP (Model Context Protocol) approaches.
  5. Memory. To ensure continuity between turns or sessions, the agent needs memory systems (sometimes absent or rudimentary) that persist the state and remember previous interactions. Working memory allows agents to maintain the context of an active session. It stores recent conversation information and the current state of the system, allowing for contextual understanding in real time. Long-term memory (called Persistent Memory) retains information across multiple sessions, allowing for continuous personalization and adaptive learning. It is subdivided into episodic memory which stores the specific events experienced by the agent with their spatial and temporal context, semantic memory which contains factual and general knowledge about the world and procedural memory which encodes automated skills and knowledge. 
  6. Communication between agents. In a collaborative system, collaboration between specialized agents is essential to accomplish complex objectives effectively. This communication is based on standardized or adaptive protocols that allow agents to share information, delegate tasks, or synchronize their internal states. The structure of the messages exchanged can follow formal formats such as JSON (for encoded instructions and return results) or semi-structured formats closer to natural language, interpreted by models. Exchanges can be synchronous (immediate request-response between agents) or asynchronous (delayed triggering of actions, often managed by a message queue).
  7. Safeguards and observability. Finally, a robust agent includes guardrails to prevent abuse and hallucinations. This includes control planes, security tooling templates, and access control mechanisms (ACLs). These safeguards are essential for both the ethical operation and oversight of the system.

Une image contenant dessin, croquis, dessin humoristique, illustration

Le contenu généré par l’IA peut être incorrect.

Anatomy of an AI Agent

An AI agent is therefore an autonomous program focused on a specific task. It is an autonomous agent, focused on a single task, operating within a small perimeter and without advanced orchestration or meaningful multi-agent collaboration. In contrast, an AI agentic system refers to a structured set of AI agents or specialized modules, capable of collaborating, self-organizing, and adapting their strategies according to the environment or intermediate outcomes. It is no longer a simple single agent, but a distributed and dynamic system, sometimes composed of several agents that coordinate (via messages, shared plans, or hierarchy), with mechanisms for global planning, shared memory, or even feedback loops. These systems can exhibit complex, adaptive behaviors, and closer to what is considered "agentic."

It is important to remember that an AI agent is an autonomous link dedicated to a predetermined task or activity, while an AI agentic system is an orchestrated architecture of several agents or functions, aiming for a form of collective intelligence, more robust, strategic, and resilient but not deterministic.

Criterion

AI Agent

Agentic System

Unit

One agent

System composed of several agents

Goal

Targeted task

Broader or Dynamic Goals

Behavior

Sequential, prompt-driven

Strategic, adaptive, broken down

Coordination

Low or absent

Strong (between agents)

Memory / Contextualization

Often limited

Persistent memory, shared contexts

Adaptability

Low, depends on engineering

Higher, with the possibility of self-adjustment

Examples of ADKs

LangChain, AgentKit (Inngest), Griptape

Google ADK, OCI ADK, LanGraph, AutoGen, Semantic Kernal, Watsonx Orchestrate (IBM), CrewAI, MetaGPT, GPTSwarm, FastAgency, Agent Assist / Now Assist (ServiceNow), AgentForce (Salesforce), Joule (SAP)…

Summary of the differences between Agent IA and Agentic IA


Tool interface: the MCP protocol

Regarding the interface of the tool, a recent and influential development in this space is the MCP (Model Context Protocol), introduced by Anthropic at the end of 2024. Rather than being a framework or SDK, MCP is a protocol that defines how language models, like Claude, can interact with structured tools, memory, and user context in a standardized way. It allows developers to compose workflows where the model can call tools, access repositories, or interpret structured state transitions, all through a clean, declarative interface.

MCP enables agentic behavior without requiring a separate agent execution, making the model itself part of a contextual system using tools. This design marks the shift from hard-coded orchestration at the individual agent level to a native autonomy of the model, where the boundaries between scheduler, executor, and reasoner are blurred in favor of more fluid, protocol-based interactions.


How does Watsonx Orchestrate implement these layers?

IBM's Watsonx Orchestrate offers a distinct vision of AI agent orchestration, structuring it around five pillars: philosophy, planning, tooling, memory, and deployment.

Philosophy

  • Centrality of multi-agent orchestration: Watsonx Orchestrate focuses on creating a collaborative ecosystem where specialized agents, often from different business areas (HR, sales, purchasing, support, etc.), interact to execute complex workflows.
  • Conversational interface: via a natural language interface, the platform aims to transform each employee into a "super-user", capable of delegating actions to AI without the need for technical skills.
  • Advanced customization: agents are adaptable to the specific needs of each business, by being configurable both by specialists (scripts, APIs, advanced configurations) and by business lines (visual tools, no-code).

Planning

  • Built-in scheduler: Watsonx Orchestrate has internal logic that can break down projects into subtasks, then assign the right sequence of agents and tools at each stage.
  • Varied agent styles: The platform introduces agent styles (Default, ReAct), each managing a different behavior. The "ReAct" style allows the model to think, act, observe, and refine its approach until a task is terminated.
  • Initiative-taking orchestration: Workflows consider business rules, user context, agents declared skills, and any external conditions.

Tooling

  • Extensive "skills" catalog: more than 1,600 native or custom integrations with market tools (Outlook, Salesforce, SAP, Jira, etc.), extensible via API, scripts, or RPA.
  • No-code & pro-code construction: The Agent Developer Kit (ADK) allows you to define the structure, logic, tools, style, and collaboration of agents via YAML, JSON, or Python; it is also possible to integrate third-party or external agents (such as those built on Langraph, or CrewAI).
  • Centralized management: tools are shared between agents and can be assigned according to the business, task, or context, facilitating traceability and reuse.
  • Discoverability: Watsonx Orchestrate has a search engine in the catalog of agents and tools (as of July 2025, there are more than one hundred agents and more than four hundred tools).

Memory

  • Contextual memory and user session: Watsonx Orchestrate enriches each exchange with context variables (e.g., employee preferences) to personalize interactions and maintain consistency over time.
  • Chat with documents capability: Ability to upload and query documents directly in the conversation (RAG) while maintaining context, improving agent autonomy and relevance.
  • Memory structuring: Memory logic can be customized in agent configuration to remember status, goals, or execution reports.

Deployment

  • Deployment flexibility: Offered in IBM cloud SaaS mode, on-premises (IBM Cloud Pak for Data, Software Hub from version 5.1.x) or via a local developer edition, Watsonx Orchestrate offers options for any context, from prototyping to large-scale production.
  • Interoperability: native, external (via Agent Connect Framework, A2A, APIs or third-party providers) and assistants can be orchestrated in the same interface. This makes it possible to compose workflows with agents from different technological ecosystems.
  • Observability and governance: Integrated tools to monitor, manage usage, automate supervision, and track agent performance across the organization.

Inside a Scheduling Engine

At the heart of any intelligent agent is a planning engine, the layer responsible for turning abstract goals into structured, achievable steps. While different frameworks take different approaches, they all share a common need to break down complex problems, plan tasks, adapt to execution conditions, and learn from continuous execution.

Decomposition of the problem

The first step in planning is to break down a high-level goal into manageable subtasks. This process typically relies on a language model that interprets the agent or crew specifications and generates a logically ordered plan.

Assignment and Planning

Once the subtasks are defined, the system must decide how and when to perform them. Some ADKs address this issue by encapsulating subtasks as independent child agents within a parent workflow agent. This parent can schedule execution in multiple modes: sequentially for ordered dependencies, in parallel for speed, or iteratively for return-based refinement. This flexibility allows developers to fine-tune the control flow while maintaining the clarity and reusability of agent definitions.

Dynamic rescheduling

Real-world scenarios rarely play out exactly as planned, so agents must be able to adapt as they run.

State supervision

A well-designed planning system also incorporates execution feedback to fine-tune decisions on the fly. Once each subtask has been executed, its results (tool outputs, errors, or success signals) are fed back into the reasoning engine. This allows the agent to revise their course, try again with adjustments, or step up if necessary. Frameworks typically capture and record these execution traces, which enables observability, debugging, and performance analysis. In addition to supporting understanding of what happened (forensic approach), these logs serve as a foundation for guardrails and auditability in production environments.


What are the good design rules?

Building AI agents that behave reliably in dynamic environments is not only about choosing the right model, but also about designing a robust system architecture. Below is a handy checklist to guide you through the design of multi-agent systems.

1. Clarify the level of autonomy

Start by asking yourself: is a single agent enough, or does the task require a collaborative team? Single-agent architectures tend to be simpler to build, easier to monitor, and simpler to debug. However, they often struggle to plan complex tasks, especially when long-term memory, role specialization, or parallel execution is required. For advanced scenarios such as search agents, product comparison bots, or data pipelines, adopting a multi-agent model from the start can provide better scalability and resiliency.

2. Choose a planning strategy

The way your agent plans their actions is fundamental. Some ADKs favor LLM-generated blueprints, which offer flexibility and dynamic reasoning (ideal for open, creative, or exploratory tasks where structure can emerge on-the-fly). On the other hand, some favor static workflows, which are easier to audit, deterministic by design, and often preferred in environments where compliance is high or security is critical. Others bring a third model: delegation ‘on-the-fly.’ Here, agents can dynamically reassign running tasks based on tool responses or internal trust, which is especially effective when the environment is unpredictable or when agents specialize in narrow areas.

Frameworks are not limited to these paradigms, however. In many cases, it is entirely possible, and sometimes preferable, to use traditional workflow modeling tools, such as Business Process Model and Notation (BPMN) or other rule-based systems, to define high-level logic. These visual (e.g., n8n) or declarative workflows can then be executed or translated into agent actions. This hybrid approach bridges classic business automation with modern LLM-based reasoning, providing both auditability and flexibility. By adapting your scheduling strategy to your application's constraints from the beginning, whether it is language-native planning, strict workflow logic, or a mix of both, you will reduce friction and architectural rewrites down the road.

3. Select tooling and memory backends in advance

Agents do not think in a vacuum: they rely heavily on how they access tools and remember the state. Choosing the appropriate backends for invoking memory and tool is a crucial decision from the beginning. These components influence everything from the function call syntax to the context visible during execution. Exchanging them during development can invalidate benchmarks, disrupt model behavior, or even break existing workflows. Make these choices deliberately and document assumptions related to memory access patterns or tools.

4. Instrument railings from the start

Security and observability are not optional features, but essential architectural elements. These mechanisms make it possible to intercept hallucinatory failures and to apply the expected behavior. Unit testing agents, output qualification, and the use of sandboxing tools are all part of a necessary guardrail strategy.

5. Measure the quality of planning, not just the responses

A good agent is not only the one who gives the right definitive answer, he is also the one who reasons effectively along the way. This means evaluating multi-step chains of reasoning, success rates on entire task trees, and the system's ability to recover from its mistakes. Cross-task logging, as well as built-in assessment tools, can support this type of analysis. Designing evaluation criteria around the quality of planning will help you identify fragility before it becomes a production problem.


Conclusion

Understanding the anatomy of modern AI agents reveals how far we have gone beyond simple prompt-based chatbots to adopt complex, goal-driven systems. At their core, these agents incorporate a powerful reasoning engine (usually a large language model), a scheduling engine to break down and orchestrate tasks, a versatile tool interface for interacting with the external world, persistent memory to maintain context, and essential guardrails to ensure safe and auditable operation. The frameworks exemplify distinct but converging approaches to designing these components into reliable, autonomous agents that can navigate the unpredictable real world.

In the future, this structured yet flexible agent architecture lays the foundation for increasingly sophisticated autonomous systems. We can expect AI agents to move from single-task resolvers to dynamic multi-agent teams, collaborating, delegating, and adapting seamlessly in real-time. Protocols such as MCP and A2A point to a future where models themselves become native orchestrators, blurring the lines between reasoning, planning, and execution, thanks to standardized interfaces that facilitate smooth interactions with tools and memory without runtime overhead.

As AI agents become more capable, modular, and trustworthy, they will open new applications across industries, from automated search and personalized assistants to adaptive enterprise workflows and intelligent infrastructure management. The future of AI agents is one of continuous integration between language understanding, structured planning, and contextual action, a constructive collaboration that promises to transform the way humans and machines collaborate to solve complex challenges at scale.

0 comments
2 views

Permalink