MQ

 View Only

Performance of NativeHA MQ with Redhat ODF Storage

By ABHISHEK VISHWAKARMA posted Wed April 10, 2024 04:15 PM

  

Performance of NativeHA MQ with Redhat ODF Storage

Based on the research conducted by the IBM team, we've identified that ODF serves as a general-purpose replicated data solution for OpenShift. It ensures data integrity for single volumes by replicating data to other nodes, making it highly valuable for many general-purpose workloads and a suitable starting point. However, it may not be suitable for all scenarios, especially for workloads demanding low disk latency such as IBM MQ in specific situations. IBM MQ can be deployed in a Native HA configuration, where it independently replicates data to different nodes for high availability, making ODF redundant in this regard.

It is advisable to utilize:

a) Any non-replicating block storage for Native HA MQ.

or

b) Redhat ODF "Non-resilient storage class" for Native HA MQ.

After testing the environment, we discovered that we can significantly reduce latency and eliminate timeout issues by omitting ODF for MQ for this particular workload, all while maintaining reliability and data integrity. The objective in performance testing is to determine the optimal configuration for maximum throughput, and we have the flexibility to choose from various storage types to achieve our goal.

Testing architecture employed:

Multi-instance application ----msg----> ACE1  ----msg----> ACE2  ----msg----> ACE3 ----msg---->  ACE4 ----msg----> destination system
                                                                                                                                                                                                   |
                                                                                                                                                                                                   |
                                                                                                                                                                                                 DB

MQ is present at every step between ACE1, ACE2, ACE3, ACE4, and the destination. All communication between ACE flows occurs through MQ Input and Output nodes. Both ACE and MQ will reside on one of those green worker pods.

TEST 1 (Conducted with NativeHA MQ using ODF storage)

Configurations :-  

Max No. of Connections - 500/port (Total 9 ports used) 

Timeout on TCPServerInput Node, TCPServerReceive Node and TCPServerOutput Node - 5s 

Native HA MQ CPU - 2Core 

Native HA MQ Memory - 16Gi 

replica=3 is set in ODF configuration

Results of Testing :- 

a) Total Records Hit by UPI Switch - 12000 

b) Total Records Received by ACE - 1455 

c) ./amqsrua -m EIPMQMHA -c DISK -t Log 

Publication received PutDate:20240228 PutTime:08105414 Interval:4.261 seconds 

Log - bytes in use 16777216 

Log - bytes max 1677721600 

Log file system - bytes in use 1173692416 

Log file system - bytes max 52576092160 

Log - physical bytes written 442368 103809/sec 

Log - logical bytes written 368640 86508/sec 

Log - write latency 53425 uSec 

Log - write size 6025 

Log - current primary space in use 0.72% 

Log - workload primary space utilization 1.42% 

Log - bytes required for media recovery 21MB  

Log - bytes occupied by reusable extents 0MB  

 

TEST 2 (Conducted with Single Resilient QM using ODF storage) 

Configurations :-  

Max No. of Connections - 500/port (Total 9 ports used) 

Timeout on TCPServerInput Node, TCPServerReceive Node and TCPServerOutput Node - 5s 

Single Resiliant MQ CPU - 2Core 

Single Resiliant MQ Memory - 16Gi 

replica=3 is set in ODF configuration

Results of Testing :- 

a) Total Records Hit by UPI Switch - 12000 

b) Total Records Received by ACE - 2165 

c) ./amqsrua -m EIPMQMSI -c DISK -t Log 

Publication received PutDate:20240228 PutTime:08382303 Interval:9.644 seconds 

Log - bytes in use 1006632960 

Log - bytes max 1677721600 

Log file system - bytes in use 1111490560 

Log file system - bytes max 53687091200 

Log - physical bytes written 1077248 111694/sec 

Log - logical bytes written 983804 102006/sec 

Log - write latency 46177 uSec 

Log - write size 6622 

Log - current primary space in use 1.71% 

Log - workload primary space utilization 1.71% 

TEST 3 (Conducted with VM QM)

Configurations :-

Max No. of Connections - 500/port (Total 9 ports used)

Timeout on TCPServerInput Node, TCPServerReceive Node and TCPServerOutput Node - 5s

VM CPU - 1 Core

Results of Testing :- 

a) Total Records Hit by UPI Switch - 12000 

b) Total Records Received by ACE - 6168 

c) ./amqsrua -m EIPMQM -c DISK -t Log

Publication received PutDate:20240228 PutTime:09405414 Interval:1 minutes,4.785 seconds

Log - bytes in use 2013265920

Log - bytes max 3355443200

Log file system - bytes in use 2944016384

Log file system - bytes max 17169383424

Log - physical bytes written 26169344 403939/sec

Log - logical bytes written 4382523 67647/sec

Log - write latency 526 uSec

Log - write size 4612

Log - current primary space in use 3.07%

Log - workload primary space utilization 3.07%

Upon comparing the "Log - write latency" across different environments, it is evident that there are variations in the output:

Log - write latency: 526 uSec (VM QM) < 46177 uSec (Single Resilient QM) < 53425 uSec (NativeHA QM)

Although the "Log - write latency" doesn't show significant differences between Single Resilient QM and NativeHA QM, the recorded values are notably high. This suggests a potential issue with the disk performance, storage utilized, or extra replication on the container.

TEST 4 (Conducted with VM QM using mqldt tool) 

mqldt tool focuses on writing files with various block sizes and measures metrics such as total writes, bytes written, and latency.

./mqldtTest.sh

Executing test for write blocksize 16384 (16k). Seconds elapsed -> 60/60     Total writes to files                               :          81717  Total bytes written to files                        :     1338851328  Max bytes/sec written to files (over 1 sec interval) :       23379968  Min bytes/sec written to files (over 1 sec interval) :       21217280  Avg bytes/sec written to files                      :       22321743     Max latency of write (ns)     :       17236448 (#23599) (17236 uS)  Min bytes/sec (slowest write) :         950543  Min latency of write (ns)     :         364030 (#68893)  Max bytes/sec (fastest write) :       45007280  Avg latency of write (ns)     :         713826 (713.826 uS) 

mqldt (VM):

Total bytes written to files                        :     1338851328  (1.3 GB)

Avg latency of write (ns)     :         713826 (713.826 uSec) 

Avg bytes/sec written to files                      :       22321743 

Comparison:

a) Total Writes:

a.1) amqsrua: 26,169,344 bytes (logical) + 4,382,523 bytes (physical) ≈ 30,551,867 bytes

a.2) mqldt: 1,338,851,328 bytes

b) Latency:

b.1) amqsrua: 526 microseconds

b.2) mqldt: 713.826 microseconds

c) Write Throughput Rate:

c.1) amqsrua: 403,939 bytes/second (physical) + 67,647 bytes/second (logical) ≈ 471,586 bytes/second

c.2) mqldt: 22,321,743 bytes/second

While the tests measure different aspects of disk performance, the comparison suggests that the disk performance is relatively similar in terms of write throughput and latency. The mqldt test shows higher total bytes written, but amqsrua has a lower write latency. Both tests indicate decent disk performance (in comparison).

Please note that we have already conducted a comprehensive analysis by comparing the amqsrua across three types of queue managers (VM QM, Single Resilient QM, and NativeHA QM). The examination of mqldt/mqldt-c is being carried out solely for additional verification. The accuracy of the amqsrua results is nearly confirmed. 

Illustration Demonstrating the Impact of RHODF (with a Replication Factor of 3) on Increased Latency for NativeHA MQ

MQ NativeHA is designed for cloud-native RWO/block storage setups and is responsible for replication. If you also have ODF with replication, then you introduce additional latency because you must wait for ODF to complete its replication. As per the diagram above, the ODF replications, represented by the lines, are adding unnecessary lag.

Performance is diminished due to the replication of data eightfold

  • Systems such as MQ Native HA or API Connect, which possess built-in replication mechanisms, dispatch two duplicates of the data to additional replicas to bolster stability.
  • Upon this data being recorded onto the local block storage volume of every instance, a storage provider engaged in replication will proceed to duplicate this data into two additional sites for each volume, culminating in a total of eight replicas.
  • Ordinarily, the system is required to await the completion of all these synchronous write operations prior to relinquishing control back to the initial requestor, leading to a decline in the performance during operation.
4 comments
50 views

Permalink

Comments

26 days ago

We have a statement of direction around high availability and cross-region replication that you might find interesting,

https://www.ibm.com/docs/en/announcements/mq-935-api-connect-app-connect-enterprise-noname-advanced-api-security-111-noname-advanced-api-security-as-service-deliver-new-integration-capabilities#statement_direction__title__1

You could also consider a using a Uniform Cluster with one queue manager in each site running active-active.

26 days ago

This is good article but for public cloud deployments. However, if the entire OpenShift cluster has a problem, the entire application is down. With on-prem deployments where there are only two sites, it will be good to decouple the loadbalancing of MQ across two OpenShift clusters, which means we need to deploy native MQ HA across two OpenShift clusters rather than just one, which I know is not available today.

Wed April 17, 2024 06:01 AM

In response to the comment about Native HA not being available on other platforms, there's an Aha Idea for that here, Make MQ Native HA available in Linux installations | Integration (ibm.com) your votes do count.

Tue April 16, 2024 07:14 AM

One of the primary motivations for the development of MQ's Native HA capability was the lack of any common shared storage capability in the 'cloud' at the time this capability was needed.

The cloud environments are a kind of lowest common denominator in terms of MQ deployment options and native HA was designed to provide an HA capability in a very basic hardware environment to overcome this limitation. 

As the cloud environment matures with the availability of things such as replicated or shared storage then other HA configurations would be possible, but it really doesn't make ANY sense (to me) to use this sort of technology with native HA.

Native HA does however have some useful advantages over shared storage (typically multi-instance queue managers) or replicated storage (e.g RDQM) queue managers in that with the queue manager itself doing the replication it understands which I/O's relate to persistent message integrity and which do not.

Take for example a cluster transmit queue used to forward messages to other queue managers. This queue would be likely to host both persistent and non-persistent messages. If any of these messages were to spill to disk then either a shared or replicated storage solution wouldn't be able to tell which IO's related to persistent messages and which related to non-persistent messages and would therefore have the HA overheads on all of the messages spilled to disk, while native HA would only ship the persistent data to the partner sites. With sufficient effort a customer might be able to configure things so that the persistent messages used one set of transmit queues and the non-persistent messages another, but this causes a lot of unnecessary complication and does not maintain the same message order.

The queue manager has a very simplistic buffer manager to avoid spilling all messages to disk, but it's incredibly basic and is not suited to managing large amounts of memory. When messages spill out of this simplistic buffer manager they then spill to the file systems buffer manager, but at this point they become of interest to the replication or sharing technology and non-persistent messages risk being shipped over the network in some fashion.

It's been a bit of a surprise to me that native HA has not therefore been made available in more MQ environments. As a customer I think I'd be struggling to understand the distributed MQ HA strategy, and therfore which horse to back.