MQ

 View Only

Impact of TLS ciphers and key-size on MQ SVRCONN channels into a z/OS queue manager

By Anthony Sharkey posted Wed April 10, 2024 04:36 AM

  

Whilst writing my earlier blog which looked at the impact of the key-size of the certificate on MQ channels protected with TLS ciphers, it occurred to me that there was nothing that considered the impact on SVRCONN channels.

When I did start to look for information, I recalled that several (many) years ago, I wrote a blog on the impact of TLS ciphers on an MQ for z/OS channel – but again that did not cover SVRCONN channels on z/OS.

As a result, this blog looks at the impact of adding TLS protection to a SVRCONN channel and what additional impact larger certificate key-sizes may have. The blog also discusses the load on the MQ channel initiator resources when using TLS-protected SVRCONN channels.

Measurement environment:

z/OS LPAR with z/OS 3.1 and 3 dedicated processors running on IBM z16 (3931):

  • IBM MQ queue manager at MQ 9.3
  • IBM MQ queue manager is configured with TRACE(S) CLASS(4) enabled.
  • IBM MQ channel initiator configured with SVRCONN using SHARECNV(0).
  • 2 CryptoExpress 8S co-processors and 1 CryptoExpress 8S accelerator configured for use by IBM MQ.

Linux on Z – hosted on IBM z15 running RHEL 8.9 with 6 CPUs configured with 2 threads per core giving 12 CPUs and 128GB memory.

  • This partner machine uses a MQ-CPH (MQ C Performance Harness), to run its client workload. MQ-CPH is available for download from Git at: MQ-CPH.

The two systems (z/OS and Linux on Z) are connected via a dedicated 10Gb network over short distance ( <50 metres ) for minimal network latency.

For the purposes of these measurements the workload runs a CPH application on the Linux on Z partner, which is configured to:

  • Connect, open the request and reply queues, put a message, get the reply, close the request and reply queues, disconnect and then repeat until 60 seconds has passed.

On the z/OS LPAR:

  • A batch application is started to MQGET from the request queue and put the same message to the pre-determined reply-to-queue. This task will run until cancelled.
  • The light workload being run ensures that there are sufficient channel initiator and CPU resources such that there are no extraneous delays.

Measurements run:

The baseline measurement is to have the CPH application run for 60 seconds with 1KB non-persistent messages.

Subsequently, the measurement were repeated with the TLS_RSA_WITH_AES_128_CBC_SHA256 cipher configured.

The certificates on both the z/OS queue manager and the Linux on Z client will be of equal size for each measurement unless otherwise stated, and will be 1024, 2048 and 4096-bits. For simplicity of setting the test up, all certificates are self-signed.

Headline performance

Baseline

TLS-protected

Certificate Key-Size

N/A

1024-bits

2048-bits

4096-bits

Transaction Rate

Per second

390

46

40

23

Cost / Transaction

CPU microseconds

455

754

757

758

Transaction round-trip time

milliseconds

2.5

21.7

25

42.7

As the table demonstrates, there is a significant impact to the transaction rate and round-trip times when TLS ciphers are enabled on the SVRCONN channel.

Enabling the TLS cipher with the z/OS’ default key size of 2048 bits resulted in the round-trip time increasing 10 times over a non-TLS equivalent workload.

For this application model, enabling TLS ciphers adds 65% to the cost of the transaction on z/OS in the channel initiator address space.

Once TLS ciphers are enabled, the certificate key-size does not significantly affect the transaction cost, which is similar to what was observed using MCA channels in the impact of TLS ciphers on an MQ for z/OS channel blog.

Channel initiator task usage:

The channel initiator runs several types of tasks that are used for specific purposes. These include:

  • Adapters (CHIADAPS) – used to interact with the queue manager.
  • Dispatchers (CHIDISPS) – used by channels.
  • SSL (SSLTASKS) – can use the cryptographic co-processors available to the LPAR, as well as performing encryption and decryption of data.
  • Domain Name Server (DNS) task – which can go out of your enterprise to look up the IP address associated with a name as well as performing local DNS API requests.

Using IBM MQ for z/OS class(4) statistics trace, we can determine the load on the channel initiator tasks and since these workloads form the entirety of the load, we can also determine the number of task requests per transaction.

Note that determining the number of channel initiator task requests is only possible as we know the exact processing workflow of this simple workload and that this is the only workload passing through the channel initiator at the time of the measurement.

Channel Initiator Task Usage – calls per transaction:

Baseline

TLS protected

Certificate Key-Size

N/A

1024-bits

2048-bits

4096-bits

Adapter

19

19

19

19

Dispatcher

42

100

100

100

DNS

1

11

11

11

SSL

0

37

37

37

The table shows that for the baseline transaction i.e. MQ connect, MQOPEN of 2 queues, MQPUT, MQGET, MQCLOSE of 2 queues and MQ disconnect, there were 19 adapter requests, 42 dispatcher requests, 1 DNS request and 0 SSL requests.

As the table indicates, enabling TLS protection to SVRCONN channels make no difference to the number of adapter calls per transaction.

Enabling TLS protection does however affect the number of dispatcher, DNS and SSL task requests.

The number of DNS requests may be surprising but not all the requests result in a call to an external name server and may be resolved locally much faster.

The reason for the increase in dispatcher calls in the TLS-protected configurations is that the dispatcher will request action from an associated SSL task, which upon completion will result in a subsequent dispatcher request, thus resulting in multiple dispatcher requests to complete the same work.

EDIT

You may wonder about the number of adapter calls that this transaction model - 19 does appear to be a large number for a simple transaction such as this one.

To help understand this number, I have broken the transaction down into the MQ APIs:

MQ API

Adapter calls

MQCONNX

10

MQOPEN (request queue)

1

MQOPEN (reply queue)

1

MQPUT

1

MQGET

1

MQCLOSE (request queue)

1

MQCLOSE (reply queue)

1

MQCMIT

1

MQDISC

2

Total calls

19

The above table may also help understand why the short-lived transaction is relatively expensive compared to a long-time connection application.

Channel Initiator Task Usage – average cost per request (CPU microseconds):

Baseline

TLS protected

Certificate Key-Size

N/A

1024-bits

2048-bits

4096-bits

Adapter

4

4

4

4

Dispatcher

9

5

5

5

DNS

2

3

3

3

SSL

0

4

4

4

For this workload, the individual requests for each task are generally quite low and consistent.

The exception is the dispatcher cost for the non-TLS configuration. As mentioned in the previous section, the same dispatcher processing as performed in the baseline is split into multiple (smaller) requests interspersed with SSL requests. As such we expect to see more dispatcher requests per transaction but with a lower average cost.

Channel Initiator Task Usage – average elapsed time per request (CPU microseconds):

For this non-persistent workload, the average elapsed time per request mirrored the average CPU cost for each of the channel initiator tasks except for the SSL tasks.

For the SSL tasks, the key-size of the certificate had a significant impact on the elapsed time per request:

Key size (bits)

Average elapsed time per SSL task request

1024

11 microseconds

2048

22 microseconds

4096

107 microseconds

Channel Initiator Task Usage – cost per transaction (CPU microseconds):

Baseline

TLS protected

Certificate Key-Size

N/A

1024-bits

2048-bits

4096-bits

Adapter

76

76

76

76

Dispatcher

377

500

500

501

DNS

2

30

33

33

SSL

0

148

148

148

Total

455

754

757

758

Enabling TLS protection on SVRCONN channels increases the total cost on each of the dispatcher, DNS, and SSL tasks for each transaction.

Channel Initiator Task Usage – elapsed time per transaction (microseconds):

Baseline

TLS protected

Certificate Key-Size

N/A

1024-bits

2048-bits

4096-bits

Adapter

76

76

76

76

Dispatcher

377

500

500

501

DNS

2

33

33

33

SSL

0

408

815

3969

Total

456

1017

1425

4579

Wait time.

(elapsed – cost)

1

262

667

3820

In the TLS-protected SVRCONN channels there is a notable amount of time between the total elapsed time and the total cost per transaction, which is labelled “wait time”.

Since the workload is light, the wait is not due to waiting for tasks but rather the time taken to process a request by the CryptoExpress8S co-processor.

In the RMF Cryptographic report for these workloads, we see approximately one request per transaction – and the EXEC TIME corresponds quite well with the reported “wait time”.

A large disparity between the “wait time” and the EXEC TIME in the RMF Cryptographic report could indicate that there are queued requests for the Cryptographic hardware, even though the Cryptographic co-processors are less than 100% utilised, for example 5 requests are made when there are only 2 co-processors available.

CryptoExpress usage:

As discussed in the previous section, the RMF Cryptographic report indicates that there is approximately one request to the hardware for each transaction.

The EXEC TIME per request is shown in the following table:

Key size (bits)

Average EXEC TIME of Cryptographic hardware

per transaction

1024

226 microseconds

2048

631 microseconds

4096

3761 microseconds

What if the client has a different key-size to the server?

The key-size of the certificate on the client partner also affects the round-trip time of the transaction.

When using certificates with a key size of 4096 bits on both client and server, we saw a transaction rate of 23 transactions per second.

By reducing the size of the client certificate, our client application was able to drive 48% more transactions per second, for a total of 34 transactions per second (per client). This also had the effect of reducing the round-trip time by 31%.x

The cost and elapsed times on the z/OS server remained the same.

Resource Usage:

An SVRCONN channel on a MQ 9.3 queue manager using messages up to 10KB typically uses 83KB of channel initiator storage, as per “How many clients can I connect to my queue manager?” in the MQ for z/OS 9.3 performance report.

Adding a TLS cipher typically adds 30KB of channel initiator storage for each MQ channel started, as per MP16, i.e. 113KB per TLS SVRCONN channel for messages up to 10KB.

With regards to channel initiator task usage, it is good practise to monitor the usage by enabling TRACE(S) CLASS(4) and reviewing the resulting data.

  • For adapters, there should always be at least 1 with low usage. More adapter tasks may be required when the workload uses persistent messages or message selection.
  • For dispatchers and SSL tasks, monitor the usage to determine that none of the tasks are 100% utilised, either for CPU or elapsed time. If the task(s) are fully utilised, additional tasks may be required, or a re-balancing of the channels to tasks – refer to MP16’s CHIDISPs and MAXCHL section.
  • For DNS, a high utilisation or a large disparity between elapsed and CPU time may indicate slow responses from the DNS. There is no ability to configure a second DNS in the channel initiator.

Summary

The purpose of this blog was to demonstrate the impact of enabling TLS ciphers on an MQ SVRCONN channel on z/OS.

Given the high cost of an MQ connect, the application model in use in this blog is sub-optimal, but it is recognised that there are situations where a client application may connect, do some small MQ interaction and then disconnect.

Adding TLS cipher protection into this model has a significant impact on the response time of the client application as the key-negotiation occurs at connection time.

As this blog discusses, enabling TLS ciphers meant an increase in load on the dispatcher, SSL, and DNS tasks. As such it may be necessary to increase the number of dispatcher and SSL tasks based on data collected from the class(4) statistics trace.

When increasing the key-size of the z/OS queue managers certificate, monitor the SSL task usage, particularly the elapsed time, as well as the usage of the Cryptographic hardware.

Finally, if CryptoExpress hardware is not available for use with MQ TLS-enabled channels, the CPU and elapsed time of the SSL tasks may be significantly higher as the cryptographic work will be performed on CPU or Central Process Assist Cryptographic Function (CPACF) at the time of the MQ connect.

0 comments
25 views

Permalink