MQ

MQ

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only

IBM MQ Agent - A Technical Deep Dive

By Chris Leonard posted Tue April 28, 2026 10:53 AM

  

Co-Authors - Anthony Beardsmore, Kieran Murphy

How we built the IBM MQ AI Agent, and what we learned along the way

This article delves into IBM MQ’s first foray into the use of generative AI within the product. We explore the architecture of the solution, describe the components within it, discuss some of the lessons learned, and take a peek at potential future plans.

Introduction

The IBM MQ Agent is the first use of large language models (LLMs) in the IBM MQ product. That meant a lot of first of a kind decisions. To what aspect of the product should we apply AI? What will this look like architecturally? How can we do this safely in a product often embedded in business-critical solutions? After some exploration, we concluded that MQ administrators/operators had some of the most well suited use cases for a first excursion into AI, and the IBM MQ Agent was born. In this article we’re going to explore how we built it, and what lessons we picked up along the way. We’ll also take a peek at what we might do next.

What does the IBM MQ Agent do today?

One operational use case stood out as ripe for optimisation - diagnosing the cause of queue build up. This is almost certainly the most common issue an MQ administrator is called upon to investigate, and also one of the more time consuming.

There could be many reasons for an increasing number of messages on a queue, such as, applications not consuming messages, network connectivity, incorrect configuration, poison messages and more. It often takes a lot of threads of investigation to reach a conclusion, which takes time, and requires a deep specialist who knows how to navigate the messaging landscape.

This turns out to be a perfect use case for an AI Agent. It can make a plan based on the available evidence, make a series of queries to help us quickly narrow down where the issues are, what’s causing them, and even recommend how to resolve them. All without the operator having to remember the details of the suite of commands required.

To be able to do the above diagnostics use case and respond intelligently, the agent inherently has to have a good knowledge of IBM MQ concepts. This means that the agent can also answer questions about IBM MQ based on its knowledge of the documentation. We anticipate this being particularly useful for users who are new to IBM MQ, enabling them to get up to speed much more quickly, but it is likely that experienced practitioners will also find it a more intuitive way of interacting with the documentation too.

This use case has the added advantage of being non-disruptive (read-only) so an administrator can roll this out safe in the knowledge that it cannot make any unintended changes to an existing IBM MQ landscape.

The details of this use case, and what the outcome looks like are described in more depth in the blog “IBM MQ Agents: Detect the Signal. Diagnose the Cause” and further detailed in the documentation. There's also a video demo, linked at the end of this post. What we’re interested in in this article, is how we built it, and what the key learning points were.

IBM MQ Agent architecture

At a high level, the MQ user/administrator simply makes requests to a chat window. This passes the requests to the IBM MQ Agent, which uses an LLM to decide what the request means, what information it needs to gather and interpret before responding to the user. Mostly the agent gathers information from queue managers, but it also has access to a knowledge store of documentation, and of course the knowledge trained into the LLM itself. As we will see, these multiple sources of information can conflict with one another in some intriguing ways.

MQ Agent

There are a few components that make up the IBM MQ Agent architecture. The agent chat extension, the agent itself, a knowledge store, the LLM, a model context protocol (MCP) server, and of course the queue managers themselves. Let’s look at each of these in turn.

Agent Chat Extension

This is a chat window that can be configured in the MQ Web Console from IBM MQ 9.4.5 onwards. It forwards the natural language requests from a user to the Agent over an HTTPs-based API.

One IBM MQ Agent can support multiple chat assistant sessions. So multiple different users can have simultaneous independent conversations with the same agent.

Agent

The Agent is a Python application based on the LangGraph framework. It receives a users’ natural language requests from the Agent Chat Extension and orchestrates calls to LLMs, knowledge bases, and queue managers, then responds to the user. This is the heart of the solution and gets a more in-depth explanation later in the article.

At the time of writing the Agent and the associated MCP Server are containers that live together within a single Kubernetes pod. This design won out for reasons of minimised latency and simplified security. The alternative of having them in separate pods, would offer independent scaling, and re-use of the MCP server, which may become more important in the future. They are installed using supplied helm charts and are currently only supported on the OpenShift Container Platform.

MCP Server

The MCP Server provides connectivity to all the queue managers, so the agent doesn’t need to worry about initiating and maintaining connections, nor how to perform the variety of different commands needed to gain status from them. The agent can only communicate with queue managers via the MCP server, which is “read-only” – in other words it can’t perform any operations on the queue managers that would change their state or configuration.

MCP provides descriptions of the capabilities it hosts. In this case, it describes type of information it can get from the queue managers. These descriptions enable the LLM to prepare a plan of action to resolve your requests.

The MCP server is written in Python and uses the FastMCP framework to simplify MCP exposure over HTTP. For those in the know, we currently use the streamed HTTP variant of the MCP protocol.

To talk to the queue managers, it uses the recently refreshed IBM MQ Python client over a TLS secured connection to send PCF commands over the Message Queue Interface (MQI). This is a well-established administrative interface to IBM MQ, and so benefits from all of the audit and security capabilities that MQ administrators are already familiar with.

Vector database

The agent, and indeed the LLM, need access to a trusted and curated set of reference data about IBM MQ. As such we have a dedicated vector database, populated with a specially prepared cocktail of IBM MQ documentation. This enables the agent to respond to questions about concepts in IBM MQ and helps with the wording of responses generally. It also gives the agent trusted knowledge about how IBM MQ works to help with its reasoning and planning of more complex queries.

The vector database is delivered as part of the agent container, helping to reduce latency of requests, simplifying the security model, and providing a simple way to deliver updates to the reference data.

LLM

The MQ Agent uses large language models (LLMs) to interpret your questions, make plans on how to gather the information necessary to answer them, evaluate the resulting information, and formulate its responses back to you. It uses different types of models for different purposes. At the time of writing, it uses three models: IBM Granite and GPT open-source models for differing types of general reasoning, and IBM Slate (embedding model) for retrieval from its internal knowledge base. The latest information on models used is listed in the documentation.

Today IBM MQ Agent accesses these LLMs from a separately obtained IBM watsonx.ai runtime in an IBM Cloud account. We are exploring other model deployment options including the use of on-premise models, which are likely to be popular, especially with the current focus on data sovereignty.

We expect we will remain opinionated on the models used for different purposes. We have tested many and the current choices give us the best value for tokens, best performance, and best quality responses.

Queue managers

The MCP server is given an explicit list of queue managers to which it can connect. It should be noted that this is a separate list to the ones accessible to the MQ Web Console, and this can be handy from a governance perspective. The number of queue managers per agent is currently limited to 20, but you can of course have multiple agents, each with a specific set of queue managers, to suit different roles in the organisation.

The various queue managers can be on a variety of different platforms and form factors of IBM MQ and can even have different security mechanisms specified if necessary. It reads the queue manager connection information from a CCDT stored in a Kubernetes secret.

The agent can connect to queue managers at any version level of IBM MQ but will of course only be able to use the features available at that level. A good example of this is the introduction of access to queue manager logs via the PCF interface in v9.4.5. This was in part introduced for the MQ Agent, to enable it to draw information from the queue manager log without needing direct access to the underlying file system.

Deep dive into the agent application

The agent itself has a modular design, with one overarching “supervisor” agent, which then orchestrates requests across a number of more targeted “specialists”. These in turn know how to make use of the MCP Server for particular types of enquiry.

The agent is built on the LangGraph/LangChain framework because it enables a combination of LLM based reasoning and deterministic behaviour. A common theme in Agentic AI design is that you only want the LLM reasoning on things that you couldn’t have done deterministically. The most efficient and effective agents have specialist knowledge burnt into them where the path is obvious and only reach out to LLMs for broader help on reasoning. We have therefore put significant investment into codifying subject matter expert (SME) knowledge of diagnostic workflows into the solution. In this first use case, this includes complex decision paths around how queue build up issues are commonly diagnosed. This both saves money on tokens (frontier models can be expensive) and improves the quality and speed of responses.

Furthermore, there are Watsonx.ai plugins for LangGraph/LangChain enabling rapid integration for the initial LLM usage whilst retaining the option to support more sources of LLMs in the future.

Supervisor

The supervisor part of the agent is responsible for the overall orchestration of the request. There are essentially three phases:

  1. Analyse the question: Understand intent, ensure the question is appropriate for the agent and it is unambiguous.
  2. Create a plan of action: Using the curated version of the question, along with details of the available specialists, work out a set of steps to achieve a response.
  3. Execute the plan: Run each step and re-evaluate, potentially adding steps to the plan as you go along. When you have enough information, compose a suitable response.

The first step has a tough job as the incoming request could be literally anything. What type of a question is it? Is it a conceptual question about how IBM MQ works, or is it a question about the available MQ environment? Is it even a question about MQ? If it’s not, the agent will politely decline to answer.

Ambiguity

We then need to establish whether the question is clear or are there ambiguities. For example, “which is the biggest queue” could be referring to queue depth, or total size in memory/disk. The question “is QueueX a shared queue?” could be referring to being available of a queue sharing group, or it could be that the queue was opened for shared input, of that it is configured for sharing (SHARE attribute).

As well as disambiguating the question itself, our responses also include reasoning such that the user can gain confidence that the agent actually answered the question they asked. For example, “Yes, Q1 on QM1 is a shared queue. Its DEFSOPT attribute is set to shared, and the SHARE attribute is yes…”

Another ambiguity issue we found was that sometimes the LLM would try to make assumptions based on common patterns that were part of its base training knowledge. A good example here would be the fact that often, transmission queue names contain the name of the target queue manager. The LLM would be happy to assume that something is a transmission queue based on its name, but to be sure, you should always check by looking deeper at its attributes. The agent code performs a number of checks and balances to catch issues such as these.

Amusingly there is some ambiguity concerning what exactly constitutes an ambiguous question, but let’s not go there.

Explainability and debugging

An essential part of our testing was ensuring not only that we came up with correct responses, but we got there through the right reasoning. We have made good use of Langfuse to trace the path of orchestrations and routing decisions. We have also implemented custom logging for areas where Langfuse is unable to go and have a state store in LangGraph that allows the user to gain explainability insights into previous responses in order to gain confidence that the reasoning is working correctly.

A whole new way of testing: Evals

We in IBM MQ engineering are very conscious of the level of trust that our customers place in our product and the business critical solutions that it is a part of. We have built deep expertise into our testing capabilities to ensure we provide the level of trust that our customers expect. How then do we build that same level of confidence in this new agentic space?

The equivalent of traditional testing in the agentic space is of course evaluations, or “evals”. Quality checks performed on the responses from the LLM. We have incorporated a number of different techniques including code based (pattern matching, static keyword analysis), and model based (using an LLM as a judge). An example test was to provide the agent with an MQ environment, break that environment in increasingly creative ways, drive the agent to diagnose it, and evaluate the responses.

Even human based evaluation, which has obvious drawbacks, remains a necessary and essential part of the test suite. We did a particularly revealing test with users who knew MQ conceptually but were not hands on practitioners and gave them a broken MQ environment to diagnose and fix. We had one group use the new MQ Agent. We then had another control group use regular admin tools, a search engine and gave them access to a subject matter expert (SME).

The control group - even with substantial help from the SME - were 33-48% slower than the group who used the MQ Agent. This was an isolated experiment, but the results certainly gave us some confidence in the approach.

Specialists

The supervisor has a number of related internal component we have called “specialists” – a set of sub-agents that have specific domain knowledge. Breaking capabilities up into specialists with well-defined roles makes it easier for the LLM to reason about ways to solve a problem and come up with an execution plan.

We’ve used the term specialist here because as with many terms in IT, “agent” has become overloaded and definitions vary dramatically. Not all these specialists necessarily warrant the term agent, and we don’t need or want to get caught up in semantics.

The scope and granularity of the specialists as you might expect is still in flux, but in short, they can each provide answer centred around particular domains such as the following:

  • Queue manager: configuration and status.
  • Channels: client, server, sender, receiver, status, and related diagnostics.
  • Queues: local, remote, transmission, model queues, status, and related diagnostics.
  • Listeners: listener status and related diagnostics.
  • Clustering: clustering objects, status, and related diagnostics.
  • Applications: application connectivity status and related diagnostics.
  • Message build-up: analysis of queue managers to identify message build-up and recommend remedial actions.
  • Documentation: context enhanced search and next-step planning by using IBM MQ documentation.

The knowledge specialist that provides information from the IBM MQ documentation is a little different to the others in that it is working from a knowledge store rather than querying queue managers. It provides an optimised way to retrieve data from the knowledge store using the Retrieval Augmented Generation (RAG) pattern. The knowledge store contains a curated set of IBM MQ documentation and other sources such as tech notes.

We quickly realised that simply plugging in the vector databases for semantic search would not be effective. Furthermore, the LLM itself often “thinks” it knows answers based on its own model training, even though that data may well be out of date due to the training cut off. So, for non-semantic searches of reference data where questions are more suited to an exact match (e.g. looking up error codes) we have supplemented with BM25 text search to perform better, more traceable searches.

What’s the future for the IBM MQ Agent

There’s plenty of directions we could take the IBM MQ Agent in the future. The three key aspects we’re currently evaluating are:

  • Deployment options: As noted above, we expect to remain opinionated about what models we use, but we expect to provide more flexibility in where those models are deployed. We are very close to enabling on-premises model (inferencing) deployment already. We also expect to expand beyond the current deployment of the agent itself on OpenShift, to other Kubernetes environments.
  • Agent skill set: This is where you come in. We’d love to hear what use cases you’d like us to focus on. What issues do your team spend most of their time handling? What are the biggest barriers when onboarding new members to your team?
  • Composability: We see strong requirements to make these AI capabilities available to other agents. This could be other agents in the IBM ecosystem, or from our partners, or indeed customers creating their own agentic capabilities. There are two levels at which we could do this. We could make the agent available over the A2A protocol such that for example a broader monitoring agent might make requests to the IBM MQ Agent as part of an end to end diagnostic workflow. We could also make the MCP server available in some form directly for other agents to use. Note that way back in July 2025 we provided an open source example of a basic MCP server for IBM MQ that enables an agent to send MQSC to a queue manager. Clearly to be used with caution, but please do feel free to experiment with this and keep us informed.

Conclusion

The IBM MQ Agent targets two key personas. Firstly, it aims to make experienced MQ administrators more productive, by enabling them to simply articulate an end goal such as “Where are messages building up”. They can then sit back and let the agent work through the various queue managers in the estate, issuing the relevant commands to answer the question.

Secondly it aims to enable new users of IBM MQ rapidly familiarise with deeper concepts without having to piece together snippets of information spread across documentation and tech notes. This significantly reduces the time it takes to onboard new IBM MQ administrators or enable developers who may only have occasional use of IBM MQ.

We’ve already seen from our own research just how effective it can be for these use cases. Perhaps the most refreshing thing has been how quickly we have been able to bring this to market, and how rapidly we can iterate improvements into it. If you get a chance to try it out, we’d love to hear your feedback.

Want to see the IBM MQ Agent in action?

A video demo is available which includes some example uses as well as a walk-through of the key design points discussed in this document.

0 comments
31 views

Permalink