MQ

 View Only

Detecting RDQM kernel module compatibility issues (and how to fix them)

By Alex Chatt posted Mon November 27, 2023 06:03 AM

  

From the RDQM support tickets we get from customers, many of them come down to the same issue, DRBD kernel module compatibility issues with the RHEL kernel level.

Its easy for this to happen, from an un-noticed kernel update that has been applied by patching, to different teams managing MQ vs managing the machine, many can find themselves in a state where the DRBD kernel module installed is not supported by the new OS’s kernel.  This can lead to the nodes with this incompatibility, being unable to host/run the queue manager and of course, if 2+ nodes have this issue within a HA setup, then the queue manager will not be able to start anywhere within the HA group.

This leads us to some obvious questions:

  • How do we spot when we are in this situation?
  • How do we resolve this situation?
  • How can we spot this early and try to prevent it happening again?

So, let’s tackle these one by one.

How do we spot when we are in this situation?

Generally, if there is a DRBD kernel module compatibility issue present on a node, any queue managers that would be expected to be running on that node will not be running at all (DR only setup) or the queue manager would have jumped and be running on a different node (HA setup).

After this, the best place to start would be issuing the command ‘rdqmstatus’ on the node you would expect the queue manager to be running on, which should result in an output like the following:


There are a couple of clues within here that tell us there is a mismatch, the first being the “DRBD kernel module status” which is reflected as “Partially loaded”. The other clue is the mismatch of the “-{version}” number between the “OS kernel Version” and the “DRBD OS kernel version”. If these two values are not the same, and you are seeing issues, there is a strong likelihood that the current DRBD kernel module is not compatible with the OS kernel level.

There are a few other commands, and logs that we can use to check if we have a DRBD kernel module mismatch issue. Within the rsyslog messages file on the troubled node, we might find the following that suggests the DRBD kernel module is not available:
Nov 12 03:35:07 <####> pacemaker-schedulerd[2333]: warning: Unexpected result (not installed: DRBD kernel (module) not available?) was recorded for probe of <####>:0 on <####> at July 01 09:30:00 2023

You can also issue the command “drbdadm status” which, if there is an incompatibility present, could show you the following:

modinfo: ERROR: Module drbd not found.
modinfo: ERROR: Module drbd not found.
modprobe: FATAL: Module drbd not found in directory /lib/modules/#######.x86_64
Failed to modprobe drbd (No such file or directory)
Command '/usr/sbin/drbdsetup status' terminated with exit code 20
command status exited with code 20


With these commands and output, we should be able to comfortably conclude that we have a DRBD kernel module mismatch. As a final step, if you have access to the “modver” command (which is provided as part of the MQ installation media) on the node, running it will show you which DRBD kernel module we believe you should have installed and is supported by your OS’s kernel.


How do we resolve this situation?

So now that we have spotted that there is a DRBD kernel module compatibility issue, how do we resolve it?

You will first have to get your hands on the supported DRBD kernel module for your node’s Kernel level. This can either be found within your installation RDQM directory or can be found in the ifix zip for your MQ version within fix central. These links can be found within the RDQM kernel modules support page http://ibm.biz/mqrdqmkernelmods.

The detailed steps to upgrade your DRBD kernel module can be found within https://www.ibm.com/docs/en/ibm-mq/9.3?topic=aour-update-drbd-kernel-module-after-node-has-rebooted-into-new-kernel but the basic process is

  • Identify the DRBD kernel module you need to install.
  • Upgrade to the identified DRBD kernel module.
  • Reboot the node.

Hopefully if all goes well, your node should now be able to host RDQM HA/DR queue managers again. Running “rdqmstatus” should now give you something like the following:

 

How can we spot this early and try to prevent it happening again?

Of course, there are many internal solutions teams can co-operate to ensure that the kernel is not upgraded before knowing if a new DRBD kernel module will need to be installed, but knowing when it is safe to upgrade the kernel is important.

The best place to find this information would be in our RDQM DRBD kernel module support page http://ibm.biz/mqrdqmkernelmods which is updated to signal when a Kernel level is not yet compatible with the available DRBD kernel modules. The page will then be updated when a support DRBD kernel module is available within an ifix (links to these can be found within the document).

To spot the issue early, it might be helpful to have tools that look for some of the described symptoms above, such as checking the rsyslog logs for the “not installed: DRBD kernel” message that can be logged when the issue has occurred. Another suggestion would be to disable auto-patching for just the kernel packages, and instead co-ordinate kernel updates alongside DRBD kernel module updates.

Summary

I hope this information proves helpful when it comes to the management of your RDQM setups, and help you spot if a DRBD kernel module mismatch is the culprit for your issue, and, how to fix it.

Other Useful links 

https://www.ibm.com/docs/en/ibm-mq/9.3?topic=problems-after-upgrading-rdqm

0 comments
18 views

Permalink