View Only

IBM MQ Little Gem #46: SYSTEM compression algorithm

By Morag Hughson posted Wed March 25, 2020 02:25 AM

This is part of a series of small blog posts which will cover some of the smaller, perhaps less likely to be noticed, features of IBM MQ. Read other posts in this series.

I've written before about channel compression, specifically on how to enable it across the various clients. There are two types of compression on IBM MQ channels, Message level compression which offers some well defined algorithms; Run-length encoding (RLE), and two flavours of ZLIB; and Header compression which only offers SYSTEM. This post is going to look at what SYSTEM means.

Morags_MQ_Gems_34.jpgThe idea of COMPHDR(SYSTEM) is that IBM MQ knows how to best compress it's own headers, and a bespoke 'algorithm' will do a better job of compression that an off-the-shelf algorithm would manage.

SYSTEM header compression works by comparing the data in the current message header with the data in the previous message header. Many of the bytes in the 440 bytes that make up an MQXQH plus MQMD will be same for each message sent across the channel, and so this data is extremely compressible.

For the first message sent over a channel the entire header needs to be sent. For the second and subsequent messages however, the header structures are first XORed with those of the previous message, leaving only the bytes which differ and therefore a lot of long runs of NULL characters.

Long runs of any character that is the same, including NULL characters, are very well suited to compression using a Run-length encoding (RLE) technique.

Let's just look at the MQXQH header that accompanies each message sent over a channel.

StrucId      :'XQH '
Version      :1
Remote Q     :'SALES.INPUT                                     '
Remote QMgr  :'MQG1                                            '

Here we have 104 bytes that might well be the same for each message that goes over the channel. This would result in an RLE encoding string of:-

ESC - NULL - 104

that is shorthand for conveying 104 NULL characters. This takes up 3 bytes rather than the 104 bytes uncompressed.

Now if we imagine that the next message is going to a different queue thus:-

StrucId      :'XQH '
Version      :1
Remote Q     :'ACCOUNT.INP                                    '
Remote QMgr  :'MQG1                                            '

XOR-ing this header with the previous header would result in:-

ESC - NULL - 8 - "ACCOUNT.INP" - ESC - NULL - 85

Or to think of that another way,

  • 8 bytes the same (accounting for the 4-byte StrucId, and the 4-byte Version)
    Represented by sending 3 bytes
  • 11 bytes that are different - ACCOUNT.INP
  • 85 bytes the same - the spaces filling the remainder of the 48 character Remote Q name, and the entire field of the Remote QMgr name field.
    Represented by sending 3 bytes

This means instead of sending 104 bytes, the channel just sends 17 bytes.

The receiving end of the channel first decodes the run-length encoded data and then XOR's the decoded data with a saved copy of the previous message's header.

Now we've only looked at MQXQH here, but there's a whole MQMD following it for every message, and some internal headers used by the channel too, which are also treated in this way.

Measurements done show that a Sender-Receiver channel using COMPHDR(SYSTEM) will have its header length reduced from 476 bytes to 66 bytes (86%); and a client channel moving an MQPUT request (which has similar headers) would have its header length reduced from 500 bytes to 89 bytes (82%).

Morag Hughson is an MQ expert. She spent 18 years in the MQ Devt organisation before taking on her current job writing MQ Technical education courses with MQGem. She also blogs for MQGem. You can connect with her here on IMWUC or on Twitter and LinkedIn.