MQ

MQ

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only
Expand all | Collapse all

MQ in Containers: How are you managing scaling for Readers? Writers?

  • 1.  MQ in Containers: How are you managing scaling for Readers? Writers?

    Posted Fri March 30, 2018 03:34 PM

    MQ in Containers: How are you managing scaling for Readers? Writers?

    It's easy to install the MQ software in a Docker container.  It's just as easy to create a Queue Manager in the container; either within the Dockerfile or at deploy time.  That's fine for a stand-alone Queue Manager.  A little more difficult for a Queue Manager that will be in a Cluster.  How would your clients connect to the Cluster?  How would the Cluster grow with additional container instances?   How would it shrink as container instances are removed?  This is the difference between a stand-alone Queue Manager and a MQ network.  

     

    Regards,

    Glen Brumbaugh



  • 2.  RE: MQ in Containers: How are you managing scaling for Readers? Writers?

    Posted Mon April 02, 2018 06:29 AM

    One thought we have had is to create a "Client Concentrator" MQ cluster out of 2-4 persistent servers and all clients would connect to these servers. Then we could spin up/down containers that join the MQ Cluster that is hosted on these servers to actually do the work (most likely these containers would be IIB rather than pure MQ.)

    However it is these questions along with "How do you recover persistent messages from a container?" that prevents us from going down a containerized path for MQ. But man I would love to go down a Docker path rather than setup another multi-instance QMGR!

     



  • 3.  RE: MQ in Containers: How are you managing scaling for Readers? Writers?

    Posted Mon April 02, 2018 10:33 AM

    Devin,

    It seems that we have been thinking along similar lines.  Here's where my current thinking is:  

    Writers (MQPut only) and/or Request/Reply

    This pattern is for software that either sends datagrams (fire & forget) or that generates the Request message and reads a Reply message in a Request/Reply paradigm.  This use case is typical of front-end application needs.  For these patterns, use a Virtual IP/Load Balancer in front of as many Clustered Queue Managers/containers as required.  Kubernetes can be configured to scale these front-end Queue Managers as needed.  In order to route to the back-end, each Queue Manager must be a member of a Cluster. All back-end queues must be Cluster queues.  Front-end Reply queues will be local.  Reply messages will be routed back to the originating Queue Manager and dynamic queues could also be used.   See Container and Cluster notes below.  

     

    Readers

    This pattern is for software that either reads datagrams (fire & forget) or that generates the Reply message after reading a Request message in a Request/Reply paradigm.  This use case is typical of back-end application needs.  For this pattern, reading applications need to read from Cluster queues so that the front-end Queue Managers can load balance requests to the back end.  Kubernetes can also be configured to scale these back-end Queue Managers as needed.  In order to read from Cluster queues, all back-end Queue Managers must be members of a Cluster.  All back end datagram and request queues must be Cluster queues.  Requests must be routed to the Reply Queue and Queue Manager.  See Container and Cluster notes below.  

     

    Cluster notes

     Scaling requires that all of the front-end and back-end Queue Managers be members of a Cluster.  All of the back-end datagram and Request queues must be Cluster queues.  This allows new back-end Queue Managers to be added with messages being dynamically routed to them.  All back-end software processing request messages and generating reply messages must send the reply to the "Reply To" Queue and Queue Manager.  These reply queues must be local in order for routing to work correctly.  

    All Queue Managers need to join the Cluster exactly once and to maintain a stable IP address regardless of which server they are brought up on.  So, from a Cluster point of view, a Container environment is no different than a server based environment.  

     

    Container notes

     There are two ways to create a Queue Manager in a Docker container.  One way is to build the Queue Manager in the Dockerfile.  This creates an image with a specific Queue Manager.  The second way is to build the Queue Manager when the image is run.  The Queue Manager would then be defined in the startup script.  Note that the samples on developerWorks use this second approach. 

    While this second approach is fine for a stand-alone Queue Manager, that's really a special case and not generally useful.  For a Cluster, you don't want Queue Managers with the same name popping in and out of the cluster with different IP addresses!  The generic Container model works perfectly for stateless types of solutions, but MQ is a network product (OSI levels 4, 5, and 6) and networks must maintain network topology state.  Therefore, defining a Queue Manager in the Dockerfile image is the correct approach.  

    Please note that each Queue Manager image could be build upon a common Linux with MQ image, so the Linux and MQ standard configuration stuff only needs to be done once.   Each Queue Manager container should be configured to have a constant network address.  Look for an upcoming (soon I hope) Blog on this.  

     

    Container State & Persistence notes

     There is some confusion between "state" and "persistence" with containers.  Containers are "stateless" in the sense that the have no knowledge of anything that happened before they were run unless it's baked into the container iamge (like our Queue Managers).  We need these Queue Managers to have names, network locations, channels (Cluster Sender, Cluster Receiver, SvrConn), and queues. 

    When these containers are first run, they are stateless in the sense that they have no connections with the Cluster.  Part of the startup script must be to run the Cluster Sender channel to announce this Queue Manager to the cluster.  Each time the container is started or stopped, this startup script will be executed.  The container doesn't remember whether or not this is the first time it's run.  The startup script, however, could be smart enough to detect state, for example based on the existence of a file 

    Containers DO PERSIST DATA.  All file I/O to a container (queues, logs, etc) is maintained across starts and stops.  There is a common misconception that this is not the case.  Starting the container will have the same effect as starting or stopping a Queue Manager.  The MQ state is persisted in the container file system.  Deleting a container, however, deletes the file system associated with the Queue Manager, so all is lost.  This is similar to the result of a dltmqm command, except that any container administrator could execute the command.  If necessary, the container file system can be externalized onto shared disk, so that even a delete container command would not destroy the Queue Manager data.  

     

    HA notes

     I believe that containers are the future of HA architecture and the Multi-Instance Queue Managers will eventually become completely obsolete.   Note that if the Queue Manager is defined in the Dockerfile, Kubernetes provides automatic HA, it will restart the one instance if it stops running.  

     

    In summation, I believe that there's a brave new world in containers, but we administrators have a lot to learn.  Docker and Kubernetes are beginning to become required skills.  

     

    Regards,

    Glen Brumbaugh



  • 4.  RE: MQ in Containers: How are you managing scaling for Readers? Writers?

    Posted Tue April 03, 2018 02:09 AM

    Glen,

    Many thanks for this thread and the subject is one that I too have been wondering about. Since leaving IBM 5 years ago, I have been working with MQ customers and have worked on a number of projects where some sort of Cluster change has been required (Code upgrade with/without Cluster Topology/infrastructure change) and while I am pleased to say these have been successful, they all involved a manual process (following a script) which had a number of intervening "check steps" (lots of dis clusqmgr and qcluster cmds) to ensure all was well before proceeding.
    And the reason (as you allude to) is that by and large, MQ Clusters were permanent things that didn't change very often and I wanted to be as certain as I could that Step "n" had worked before moving onto Sept "n+1".

    (Plus the fact that a few years ago while doing a full repository move using MQ V7.5. I hit an error which turned out to be a valid PMR).

    The moral is that I am not certain how many Cluster Admin actions I would be happy to delegate to an automated script but that might just be me.

    But in the Cloud world, as you point out, things will be a lot more dynamic and anyway some of the Cloud/Docker/Kubernetes functionality may be used to replace/augment "traditional" MQ Clustering.

    I have searched the internet for any presentations that cover this and while I have found some that pose questions and list a set of issues to be addressed, yours is the first set of posts to suggest a possible solution.

    Digging down a bit more, I assume that the Full Repositories are more or less permanent and we are looking at partial repositories being spun up, joining the cluster, leaving the cluster and being taken down.

    Security of the cluster will also be interesting.

    I look forward to following other MQ practitioner thoughts on this fascinating (and I think important) subject.

    Dermot



  • 5.  RE: MQ in Containers: How are you managing scaling for Readers? Writers?

    Posted Tue April 03, 2018 06:41 AM

    Dermot,

    Thank you for your kind words.  As a follow up, the Queue Managers in containers that I described above would only need to join the Cluster on their first spin-up, although starting the Cluster Sender channel each time the container is started would be a great practice. 

    Kubernetes would automatically create an HA behavior for each Queue Manager, so there would be no need for multi-instance.

    However, if a container is stopped, just as with a Queue Manager, any messages still in the Queue Manager would be stranded until the container was re-started.  

    I can envisage a scenario in which Queue Managers could be dynamically added to a Cluster, but that ups the complexity level quite a bit.  You'd would need:

    • To build the Queue Managers dynamically, at run time, instead of pre-baked into a container.  This would require some mechanism to: (1) Name the Queue Managers and (2) provide them with a stable network address; e.g. some kind of registry.  
    • An intelligent process that would not shut down a Queue Manager while it had messages to process.  I see this as something hard to maintain over time and am not yet sure how I would implement that in Kubernetes.  I'm currently exploring this, but think that this function would have to reside in another layer.  The Kubernetes model is for stateless containers, which doesn't really apply here.  

    Frankly, the idea of popping Queue Managers into and out of a Cluster automatically should give most administrators some concern.  I totally agree with your comments.  I have always treated Clusters gingerly and haven't automated any removal from the Cluster.  I've always done this manually, in part because it's a very rare event.  

    A containerized Cluster with thousands to tens of thousands of phantom Queue Managers defined is truly a terrifying thought.  I think that until the Hursley lab modifies Clusters to be more container friendly and to formally support some kind of transient presence, such a concept should not be entertained.  Or, if it is, I don't want to be the one to support it!

     

    Regards,

    Glen Brumbaugh



  • 6.  RE: MQ in Containers: How are you managing scaling for Readers? Writers?

    Posted Thu April 05, 2018 08:00 AM

    I'm trying to write some more formal guidance on this subject, so watch this space.  Here are a few thoughts for now though:

    1. An MQ cluster will "remember" the queue manager IDs of all members for the last 90 days.  My understanding is that in older MQ versions, this might have caused some problems, but I'm not aware of why this would really be a problem in recent versions (happy to be educated here).  So in the simple case, adding and removing lots of queue managers seems like it would be fine to me (BTW, given the 90 day limit, you'd have to remove 111 queue managers from a cluster every day to reach your feared thought of ten thousand).
    2. From your patterns, you need to be careful with putting a load balancer in front of the "put" clients, if you want to do request/reply messaging.  It's often fine for simple fire-and-forget puts, but if you want a reply, you need to remember that your client could easily get restarted or disconnected in a container environment.  If you've got a load balancer in front (e.g. a Kubernetes service), then you might re-connect to a different queue manager, and not be able to receive your replies.  Also, if you've been restarted, you might not understand the replies, unless you've either a.) Persisted everything you need to process the reply; or b.) included everything needed to process the reply in the message itself.  In general, anything that introduces server affinity is something to be careful about with this sort of L4 load balancing.  Watch out for use of XA transaction, "request/reply" patterns, durable subscriptions and use of JMS (which typically creates multiple TCP connections).
    3. I'm not sure I understand how you're saying you'd create the queue manager in a Docker image.  The persistent data (under /var/mqm) is the critical bit here.  If you use a volume (a Docker volume, or a Kubernetes Persistent Volume), then the MQ Docker image will populate that volume with the right directory structure, before creating and starting a queue manager, all when the container first starts.  If you do this at image-creation time, then your data will only be persisted in the copy-on-write Docker filesystem, and your container will not be portable to other hosts.  This is an important benefit of containerization, and not one to miss out on.  So I think it's nearly always better to create and start the queue manager at runtime.  The MQ Docker image changes the queue manager name based on an environment variable.  You can also create new image layers for each queue manager configuration that you want.

    As I say, I'd like to spend more time on producing some clearer, more concise guidance, but in the meantime, I hope this is useful.

    Arthur Barr

    IBM MQ Cloud Architect



  • 7.  RE: MQ in Containers: How are you managing scaling for Readers? Writers?

    Posted Thu April 05, 2018 10:54 AM

    Arthur,

    Thank you for your response.  I look forward to some of your future posts.  A point and then a question. 

     

    Point:

    The issues of both MQ architectural topology and Containers/Orchestration lend themselves to a number of options that can produce different qualities of service.  The MQ architect/administrator needs to know the business requirements to produce the appropriate configuration.  The application designer/programmer needs to understand the configuration to produce the appropriate code.  Multiple configurations can be made to serve, but they place differing requirements on both the infrastructure and the application.  

    This necessarily means that a number of application behavior assumptions are implicit from the infrastructure point of view and vice versa.  I now think that perhaps we should start adding these assumptions to the discussion so that the implications are explicit and everyone is on the same page.  This is really a subject of Redbook length and so not that easy to summarize in a few paragraphs.  

    Question?

    I haven't run into any significant Cluster FR issues in the last few years, but then they would not have haven't in my particular work assignments.  I can't remember when Clusters were first introduced (v5 maybe), but most experienced MQ administrators & consultants that you will talk to will have been through at least serious Cluster problem.  Since Clusters tend to be business critical, both their failures and their repairs tend to happen under very visible and uncomfortable circumstances.  For historical reasons, most real world administrators are cautious when dealing with Clusters.  Is this attitude still necessary or is this a historical artifact that should be jettisoned?  

    Since Kubernetes is so good about spinning up new containers and can respond dynamically to throughput issues, it's very conceivable to have from hundreds to thousands of container instances created and torn down each day.  This could lead to tens to hundreds of thousands of entries in the FR.  For clarity, I'm not talking about a Mom & Pop shop here.  Imagine the Cloud based front-end of a major retailer using IBM Cloud or Amazon and handling web requests that dispatch messages to the back-end.  

    Some of the production issues that have happened with Clusters over the decades have not occurred due to design/programming defects but to unintended consequences.  An example of this would be message storms from the FRs to the network.  The Cluster "logic" is correct, but the resulting network loads can cause operational issues.  Have the MQ v9 Clusters been tested with the potentially large level of registration events that could come from this kind of environment?  Have they been tested in a sufficient large Cluster (thousands of Queue Managers)?  Many retail customers have Queue Manager numbers measuring in the thousands, so this is not an idle question.  I should have done my homework by reviewing the latest performance reports, but I don't recall anything along these lines.  

     

    Regards,

    Glen Brumbaugh



  • 8.  RE: MQ in Containers: How are you managing scaling for Readers? Writers?

    Posted Fri April 06, 2018 12:54 AM

    Hello,

    I work a lot with MQ clusters, I implemented relatively complex architectures with inter-connected clusters, SSL, AMS. I think I know a little about it.

    When the MQ clusters were available in version 5.0, there were a number of issues, which gave customers and administrators a bad image.
    Since MQ 6.0, the product is very stable and very reliable, but you must have a good knowledge of how to use it and solve problems. The low quality of the available documentation (including KC) does not help. Most of the MQ cluster issues I have seen in my clients come from a lack of skill and / or understanding of the administrators on this technology.

    In short, in my opinion, the MQ cluster is today incompatible with a container approach. Why ?

    When a Partial Repository Queue Manager (or the container of this Queue Manager) is destroyed, the cluster keeps the memory of this Queue Manager for 60 + 30 days, including the memory of queues belonging to this Queue Manager.

    Any topology changes in this cluster (for example, a new Queue Manager) will be sent by the Full Repository to ALL known Queue Managers in the cluster (including those that are destroyed). This volume of messages can be very significant.

    The technical messages of these changes will remain pending in the SYSTEM.CLUSTER.TRASMIT.QUEUE of Full Repositories for 30 days (EXPIRY).
    These pending messages will slow the processing of legitimate messages (technical and application), and very significantly if the CURDEPTH exceeds 2000.
    It is for this reason that I am one of the people who advise to have dedicated Full Repositories (without application activity).

    In some cases, application messages may be sent to queues belonging to deleted Queue Managers, before being redirected to active Queue Managers after a certain delay.

    To be able to use the MQ cluster technology in a container environment, Hursley would have to modify some points, for example:
    - send the changes to a Partial Repository only if it is active (and therefore the Partial Repository requests these changes when it reconnects to the Full Repository)
    - to be able to modify the delay (60 + 30 days) to something like 60 + 30 minutes via a parameter at the level of the Full Repositories.

    HTH, LMD.



  • 9.  MQ in Containers: How are you managing scaling for Readers? Writers?

    Posted Fri April 06, 2018 05:24 AM
    Just want to make some point out of above , it was mentioned that messages
    will stay up in CLUSTER TRANSMIT QUEUE but we can have CLUSTER TRANSMIT
    QUEUE different for each queue manager connection from 7.5 version so may
    that will not impact other queue managers even though there are messages
    pending deleted queue manager transmit queue as its particular to that
    queue manager if we have enabled DEFCLXQ this parameter to channel .

    By the way i am not saying that its supported in containers but just want
    to add having that will not really create issues with TRANSMIT QUEUES if we
    have separate XMITQ for each cluster sender channel .


    Regards

    Vinay Kumar

    On Fri, Apr 6, 2018 at 1:24 PM, Luc-Michel Demey <wsmqfam-ws@lists.imwuc.org
    > wrote:

    > Hello,
    >
    > I work a lot with MQ clusters, I implemented relatively complex
    > architectures with inter-connected clusters, SSL, AMS. I think I know a
    > little about it.
    >
    > When the MQ clusters were available in version 5.0, there were a number of
    > issues, which gave customers and administrators a bad image.
    > Since MQ 6.0, the product is very stable and very reliable, but you must
    > have a good knowledge of how to use it and solve problems. The low quality
    > of the available documentation (including KC) does not help. Most of the MQ
    > cluster issues I have seen in my clients come from a lack of skill and / or
    > understanding of the administrators on this technology.
    >
    > In short, in my opinion, the MQ cluster is today incompatible with a
    > container approach. Why ?
    >
    > When a Partial Repository Queue Manager (or the container of this Queue
    > Manager) is destroyed, the cluster keeps the memory of this Queue Manager
    > for 60 + 30 days, including the memory of queues belonging to this Queue
    > Manager.
    >
    > Any topology changes in this cluster (for example, a new Queue Manager)
    > will be sent by the Full Repository to ALL known Queue Managers in the
    > cluster (including those that are destroyed). This volume of messages can
    > be very significant.
    >
    > The technical messages of these changes will remain pending in the
    > SYSTEM.CLUSTER.TRASMIT.QUEUE of Full Repositories for 30 days (EXPIRY).
    > These pending messages will slow the processing of legitimate messages
    > (technical and application), and very significantly if the CURDEPTH exceeds
    > 2000.
    > It is for this reason that I am one of the people who advise to have
    > dedicated Full Repositories (without application activity).
    >
    > In some cases, application messages may be sent to queues belonging to
    > deleted Queue Managers, before being redirected to active Queue Managers
    > after a certain delay.
    >
    > To be able to use the MQ cluster technology in a container environment,
    > Hursley would have to modify some points, for example:
    > - send the changes to a Partial Repository only if it is active (and
    > therefore the Partial Repository requests these changes when it reconnects
    > to the Full Repository)
    > - to be able to modify the delay (60 + 30 days) to something like 60 + 30
    > minutes via a parameter at the level of the Full Repositories.
    >
    > HTH, LMD.
    >
    > -----End Original Message-----
    >


  • 10.  RE: MQ in Containers: How are you managing scaling for Readers? Writers?

    Posted Fri April 06, 2018 06:02 AM

    Luc-Michel,

    Terrific discussion of the issues with Containers and Clusters.  Everyone should pay close attention to this post.  Managing the Cluster is currently the number one challenge driving my MQ Cloud designs.  I'm sure we're all waiting to hear what the Lab's direction is.  

     

    Regards,

    Glen Brumbaugh



  • 11.  RE: MQ in Containers: How are you managing scaling for Readers? Writers?

    Posted Fri April 06, 2018 06:04 AM

    Arthur,

    I look forward to your future posts.  Thank's for adding to the discussion.  Are there any statements of direction regarding Clusters, as per Luc-Michel's comments below?

     

    Regards,

    Glen Brumbaugh



  • 12.  Q

    Posted Sat April 07, 2018 03:29 AM

    We are starting to get an idea of the issues involved here and I know for a fact that this is getting a lot of attention in IBM and Hursley are working on it. As to what the solution(s) might be and when it(they) will be delivered I don't know, but I am as certain as I can be that if MQ code changes are required to support MQ Clustering in a dynamic Cloud env. then MQ development will deliver what's necessary.

    In the meantime, I would like to add a few general comments based on my experience with MQ Clustering.

    First of all, in my customer experience, I find IIB involved a lot of the time so any discussion about how MQ Clustering best fits in the Cloud had better include IIB (or App Connect these days). Quite often I have found the IIB Qmgr being the FR (despite advice to isolate FRs),  or more usually a pair of IIB/FRs for HA, surrounded by PRs each of which is acting as a Gateway for Client connections.

    Secondly, while I can't disagree with the comments on particular patterns (request/reply, fire and forget, etc.) in practice these are not that easy to discover. Typically you'll be talking to infrastructure people who will have only the haziest idea of what the App Developers are using the MQ system for. MQ Health Checks will show up queues that do not appear to used anywhere, etc. etc. Maybe this is getting better with the growth of Devops and Containerisation, but  don't know. All I know is that this is very typical of "legacy" MQ/IIB installations.

    Finally, as regards the Cluster code itself, it is incredibly powerful and I am a fan. But I have been involved in the past few years with a number of customer Cluster upgrade projects and even with MQ V7.5 and well planned (and correct) procedures, I have found instances of Cluster messages sitting on queues, channels not quite stopping, etc. such that as has been mentioned before, I would be extremely reluctant to trust some cluster admin operations to scripts.

    So like a lot of people, I eagerly await IBM advice.



  • 13.  Future MQ support for Clustering in a Containerized environment

    Posted Sat April 07, 2018 10:48 AM

    Dermot,

    I agree with all of your comments above.  Like you, I also have great faith that Hursley Park will develop whatever features are necessary for MQ to be a leader in the containerized environment.  This will be the next battleground for messaging dominance and IBM can't afford loose so, of course, the lab will deliver a solid product.    

    It may be a significant effort, however, and we shouldn't expect the complete solution in the next FixPack (but we can always dream).  The problem is that MQ spans OSI levels 3, 4, 5.  This is really transportation level software.  Imagine if containers required DNS support at the network level and new containers had to update the registered DNS servers.  The latencies involved would be utterly incompatible with the objectives of containerization. 

    The answer for containers has been to localize these "network" changes.  Some of this dynamic "network" configuration only exists within the Docker engine and the local OS.  This allows containers to be provisioned rapidly.  MQ Clusters need a similar solution for ephemeral Queue Managers.  It may take some foundational work by the lab before this can be implemented.  I'm also pretty sure they're already working on it, but I have no official word.  

    Until then, we have to make do with what we have and communicate with the lab the challenges that we are facing.  This could turn out to be a good forum for that conversation.  

     

    Regards,

    Glen Brumbaugh



  • 14.  RE: Future MQ support for Clustering in a Containerized environment

    Posted Sun April 08, 2018 05:18 PM

    Last year we implemented IIB v10 in the Cloud, with 4 Linux servers each hosting an IIB instance that primarily does MQ message transformation and routing, but also does web service calls and DB SQL select / update. Production volume is a few million transactions per day, mainly during BH.  Approx 100 IS's.

    We avoided the issue of MQ queue managers in the Cloud and use of MQ Clusters, by creating one qmgr that was not in the Cloud to host all of the local queues used by message flows, and have the Cloud IIB instances connect to that qmgr via MQ Client.

    Cheers,
    Glenn Baddeley

    Coles Supermarkets Australia Pty Ltd



  • 15.  RE: Future MQ support for Clustering in a Containerized environment

    Posted Sun April 08, 2018 05:32 PM

    Glenn,

    Thank you for the response.  That pattern seems to work well.  IIB Message Flows are often stateless and so make a nice fit for containers.  I've had a couple of people tell me that they are doing the same thing, so you're not alone.  It looks like you're putting a pretty reasonable amount of traffic through the system.  

     Did you use the new IIB Cloud product or did you use traditional IIB deployed into the Cloud?

     

    Cheers,

    Glen Brumbaugh



  • 16.  RE: Future MQ support for Clustering in a Containerized environment

    Posted Mon April 09, 2018 05:17 PM

    We used traditional IIB deployment, but will be looking at containerised solutions in the near future. The bottom line is that Cloud app servers must be stateless, no persistent data (eg. DBs), disposable, and easy to rebuild from scratch on demand. This makes it problematic to use MQ Queue Managers on Cloud servers for MQ Clusters, and application message queuing and routing.

    Glenn Baddeley

    Coles Supermarkets Australia Pty Ltd



  • 17.  RE: MQ in Containers: How are you managing scaling for Readers? Writers?

    Posted Thu May 03, 2018 02:06 AM

    I have written a blog entry on IBM developerWorks which discusses some of these issues in more depth.  This is an evolving subject, but I hope you find it useful!