MQ

 View Only

MQ for z/OS: Impact of key-size on Advanced Message Security performance

By Anthony Sharkey posted Fri January 19, 2024 07:45 AM

  

For IBM MQ configurations that use Advanced Message Security (AMS) protection, the key-size specified when the certificate is generated can have a significant impact on the performance achieved. 

To achieve the best performance on MQ for z/OS with queues protected by AMS policies, cryptographic hardware (Crypto Express) cards should be made available.

Cryptographic hardware coupled with CPACF (CP Assist for Cryptographic Functions) offer the best options for offloading the expensive asymmetric and symmetric functions including data encryption.

Whilst the volume of workload involving MQ messages protected by AMS policies can be limited by the available CPU, consideration should also be given to ensure there are sufficient cryptographic resources made available.

The purpose of this blog is to demonstrate the impact that changing the key-size has on a particular known workload. These impacts include:

  • Transaction throughput
  • Cost per transaction
  • When more cryptographic hardware resources might be required.

The blog does not intend to suggest what key-size is appropriate for your individual environment.

Cryptographic Hardware on an IBM z16

Crypto Express adapters can be configured as a coprocessor for secure key operation or as an accelerator for clear key RSA operations.

Cryptographic algorithms can be implemented in both software and specialized hardware. A hardware solution is often desirable because it provides these advantages:

  • More secure protection to maintain the secrecy of keys
  • Greater transaction rates.

As the data in this blog shows later, the AMS protection offered by MQ Advanced for z/OS can exploit both coprocessor and accelerator features, thus in our performance test environments we configure both types of cryptographic hardware support.

RMF Cryptographic report

The RMF Crypto Hardware Activity report generated as part of the long-term overview reporting with the post processor application ERBRMFPP with option “REPORTS(CRYPTO)” indicates the type(s) of cryptographic processor available to the system and statistics relating to their use including the rate, execution time and utilization percentage.

An extract from an example RMF Cryptographic Hardware report is provided below:

RMF Cryptographic Hardware report - example

The report tells us that there are 2 CEX8C (coprocessors) that are each approximately 21% busy for the interval. Each request processed by the coprocessor is taking approximately 0.596 milliseconds.

Additionally, the report shows that there is 1 CEX8A (accelerator) that is 46.6% utilized for the interval with an average request time of 0.138 milliseconds.

The Cryptographic Accelerator section of the report provides further detail in that there are ME-format RSA operations with key length 2048, that account for 2706 of the 3382 operations with an average execution time of 0.046 milliseconds.

Additionally, the largest usage of the Cryptographic Accelerator is the CRT-format RSA operation with key length 2048, which whilst only performing 676 of the total 3382 operations per second, has a significantly longer average execution time of 0.507 milliseconds.

How do I check the size of my key?

In ISPF you can issue:

TSO RACDCERT ID(<user>) LIST(LABEL('<my label>’))

This will return data including:

Digital certificate information for user <user>:

  Label: <my label>

  Certificate ID: …

 

  Key Type: RSA

  Key Size: 1024

  Private Key: YES

 

What data does MQ for z/OS publish?

The end of LTS-release performance reports, such as “MQ for z/OS 9.3” which can be found on the MQ performance GitHub page provides a snapshot of our regression tests and allows the comparison of the currently supported MQ releases on the latest hardware that we have available at the time of publishing.

The section on Advanced Message Security (AMS) performance offers a comparison of the three types of protection, namely Integrity, Privacy and Confidentiality, with the latter allowing for symmetric key reuse over a sequence of messages.

The data in this section of the performance report is based upon using certificates with a key-size of 1024 bits.

Why run MQ performance tests with keys smaller than the system default?

When running performance tests against MQ for z/OS, we are primarily focused on the performance of MQ rather than being constrained by hardware limitations. As such our AMS tests typically use the smallest supported key-size (1024 bits) rather than the z/OS default (2048 bits) or indeed the maximum 4096 bits that is currently supported. This also allows us to stress MQ Advanced for z/OS with fewer cryptographic hardware resources.

Our test configuration

As Appendix B – System Configuration of the MQ for z/OS 9.3 performance report stated, our performance test system is configured with 2 Crypto Express Coprocessors on each LPAR and 1 Crypto Express Accelerator that is shared across the sysplex.

Since that report was published, our system has moved from IBM z15 with Crypto Express7S to an IBM z16 with Crypto Express8S cards.

For the purposes of this blog, a simple request/reply workload is configured using a single z/OS queue manager (MQ for z/OS 9.3) with a pair of queues protected by AMS security policies. In this scenario, there is 1 requester task and 1 server task that run as long-running batch tasks.

The AMS policies defined are:

  • Integrity – Messages digitally signed with SHA256.
  • Privacy – Messages digitally signed with SHA256 and encrypted with AES256.
  • Confidentiality – Messages encrypted with AES256, and the key can be re-used:

o   For 1 message

o   For 32 messages

o   For 64 messages

o   For an unlimited number of messages while the application remains connected.

In this configuration a transaction is defined as:

  • Requester puts a message to the protected request queue,
  • Server gets the message from the protected request queue and puts the message onto the protected reply queue,
  • Requester then gets the message from the protected reply queue.

This means that each transaction involves 2 MQPUTs involving the signing and/or encryption of the messages and 2 MQGETs which validate and/or decrypt the messages from the queues.

Later in this blog, we will compare the performance of these workloads with the following key sizes:

  • RSA 1024, 2048 and 4096 bits

When calculating the cost of the transaction, the costs are based on the application address spaces, as well as the MQ MSTR and MQ AMSM subsystems.

Impact on transaction cost

There can be an impact on the transaction cost of AMS workloads when changing the certificate key-size, particularly with smaller messages, however the impact will be exaggerated in these example measurements as they are micro-benchmarks with very simple processing requirements.

2KB workloads:

With Confidentiality using key reuse 32 or higher, the impact of using larger key-sizes to the transaction cost is insignificant as there is less reliance on the cryptographic hardware.

By contrast, Integrity, Privacy and Confidentiality with key reuse 1:

  • Migrating key size from 1024 to 2048: Cost increase by up to 10%
  • Migrating key size from 2048 to 4096: Cost increase by up to 60%

The first graph compares the cost per transaction for the different workload types when a 2KB non-persistent message is used.

Transaction cost using 2KB non-persistent messages

If we compare the transaction cost of the privacy workload where the cost has increased most significantly and break the costs down by address space, we can plot the following chart:

Transaction cost by address space for AMS Privacy - using 2KB non-persistent messages

What this chart shows is that most of the increase is incurred in the MQ MSTR address space. 

The reason for the increase in cost is largely down to the increasing response times from the cryptographic hardware which results in the transaction rate decreasing (see next section) and generally the MQ MSTR address space incurring higher “administration” costs per transaction.

 

64KB workloads:

With 64KB messages, a similar pattern of performance to that of the 2KB workloads can be observed although the impact is smaller.

With Confidentiality using key reuse 32 or higher, the impact of using larger key-sizes to the transaction cost is insignificant.

Integrity, Privacy and Confidentiality with key reuse 1 show that:

  • Migrating key size from 1024 to 2048: Cost increase by up to 4%
  • Migrating key size from 2048 to 4096: Cost increase by up to 30%

The following graph offers a comparison of the transaction cost of a 64KB non-persistent messaging workload:

Transaction cost of 64KB non-persistent workload

 

As with the 2KB workload, the primary contributor of the increase in transaction cost is the MQ MSTR address space.

4MB workloads:

With the increased cost of processing larger messages, there is less impact to the workload from the increase in the certificate key-size.

The following chart shows that even for Integrity, Privacy and Confidentiality with key 1, the impact of larger keys is less than 4%.

Transaction cost of 4MB non-persistent workload

Impact on throughput

As alluded to in the “Impact on transaction cost” section, the extended EXEC TIMEs achieved with larger key size can affect the round-trip time of a transaction, and as a result larger key sizes in a single-threaded transaction model can significantly degrade the throughput.

The section following this one reviews the total EXEC TIME for our particular transaction model for each of the AMS protection types, whilst this section considers the overall performance because of the changes to certificate key-size.

The impact of the key size on the throughput affects almost all the AMS measurements although larger messages and key reuse of 32 or more are less affected.

2KB transaction rate:

The impact on transaction rate due to the larger key size can be observed with all configurations except with AMS Confidentiality reusing the key unlimited times.

As the chart below demonstrates, AMS Integrity throughput decreases from 1342 to 122 transactions per second, an equivalent of 91% drop, when using a 4096-bit key size rather than a 1024-bit key.

Transaction rate achieved when using 2KB non-persistent messages

64KB Messaging rate:

The impact of the certificates key-size is less with 64KB messages than with 2KB messages. For example, with key-size 4096, AMS Confidentiality with key reuse 32 can sustain approximately 50% of the rate achieved when using a key-size of 1024. Using a 2KB message, the measurement using the 4096-bit certificate was able to sustain only 25% of the 1024-bit measurement.

Transaction rate achieved when using 64KB non-persistent messages

4MB Messaging rate:

For AMS workloads using larger messages, such as 4MB, the impact of the key-size specified in the certificate is much less significant than for smaller messages but still can affect the transaction rates by up to 30% (AMS Privacy).

Transaction rate achieved with 4MB non-persistent messages

Impact of cryptographic EXEC TIME on the MQ transaction.

Whilst the size of the key makes a significant difference to the MQ transaction rate, particularly with higher-volume small messages, the typical EXEC TIME reported on the RMF Crypto Hardware remains similar regardless of message size.

The type of AMS workload using the cryptographic hardware is more of a factor and as such, the following 3 charts show the total cryptographic time per transaction. These values are based on the RMF Cryptographic Hardware Activity reports’ “EXEC TIME” fields multiplied by the number of cryptographic hardware requests for each transaction.

Total Cryptographic EXEC TIME for AMS Integrity workload:

AMS Integrity workload - total Cryptographic execution time

 

The AMS Integrity workload relies primarily on the cryptographic coprocessor, but can exploit the accelerators, specifically the ME (Modulus-Exponent) format functions, which typically have shorter execution times.

Total Cryptographic EXEC TIME for AMS Privacy workload:

Total execution time in cryptographic hardware with AMS privacy workload

 

The AMS Privacy workload relies almost equally between the cryptographic coprocessor and the accelerator, but unlike the AMS Integrity workloads, relies more heavily on the CRT (Chinese Remainder Theorem) format functions than ME format functions in the accelerator.

Total Cryptographic EXEC TIME for AMS Confidentiality with key reuse 1 workload:

AMS Confidentiality with key reuse 1 and the total execution time spent in cryptographic hardware

The AMS Confidentiality workload relies heavily on the accelerator’s CRT function with less ME function use and very little use of the coprocessor – which is used at MQOPEN.

With increased successful AMS key reuse, the time spent on the cryptographic hardware is spread across the number of transactions covered by the key reuse value.

Capacity planning

The RMF Cryptographic hardware report provides usage information for processors configured as Coprocessor as well as Accelerators.

The accelerator section of the report provides an overall summary as well as separating “ME” type requests from “CRT” type requests.

Due to the way the report is written, the summary usage is capped at 100% utilized per accelerator, even though the sum of the “ME” type and “CRT” type requests may exceed 100% - as demonstrated in the extract below:

RMF Cryptographic accelerator report showing more than 100% usage

In this report, the summary reports that the accelerator is reported as 100% utilized.

The RSA ME 2048 row suggests that is 73.8% utilized and the RSA CRT 2048 suggests that it is 78.9% utilized – for a total of 152.7%.

A cryptographic hardware adapter configured as an accelerator can process both ME and CRT type requests – but there is some common processing which allows the by-function part of the report to give numbers that when added together can exceed 100%.

As a result, it is difficult to know precisely when the card is running at absolute capacity.

The summary section of report is designed to be somewhat defensive and is capped at 100%.

When summary section of the report indicates that the 100% value is met, then it is advisable to provision additional accelerator resources.

Similarly, it is worth considering the type of workloads that use your Cryptographic hardware resources – RMF may show the resources being lightly loaded but queuing may occur if multiple tasks are making requests at the same time. As such it may be advisable to ensure there is sufficient resource for parallel requests.

Non-RSA certificates

As the RACDCERT GENCERT documentation states, RSA is not the only key type available on z/OS.

If we wanted to use shorter keys of the ECC type such as NISTECC, the documentation states:

RSA and NISTECC key size mapping

So, how do NISTECC with key size 192, 224 and 521 bits compare?

Whilst NISTECC with 521 bits may seems an unfair comparison as it equates to an equivalent RSA key size of 15360 bits, it is the maximum supported – as indeed is 4096 bits the maximum supported with RSA.

It is important to note, as per the KEYUSAGE information in the RACDCERT GENCERT documentation that NISTECC-type keys do not support the DATAENCRYPT option, therefore with AMS can only be used for AMS Integrity-type protection.

AMS Integrity performance with NISTECC certificates

In our measurements where there is a direct comparison in key size strengths e.g. RSA 1024 and NISTECC 192, as well as RSA 2048 and NISTECC 224, we saw better performance from the RSA configurations than the NISTECC configurations.

As an example, RSA 1024-bits achieved up to 67% higher throughput rates with 2KB messages than NISTECC 192-bits.

When comparing NISTECC 521-bits, which has the equivalent of RSA 15360, NISTECC was able to achieve up to 22% higher throughput than the RSA 4096-bits configuration.

The table below provides an indication of the percent change in throughput achieved by NISTECC compared to RSA:

Comparison

2KB

64KB

4MB

NISTECC 192 v RSA 1024

-67%

-57%

-5%

NISTECC 224 v RSA 2048

-33%

-29%

-3%

NISTECC 521 v RSA 4096

+22%

+22%

+5%

The lower transaction rate, particularly for NISTECC 192-bits with does result in higher transaction costs in the MQ MSTR address space – again due to MQ administration tasks having a larger impact on the slower transaction rate.

NISTECC and Cryptographic hardware

The largest factor affecting NISTECC’s performance with an AMS Integrity workload is the execution time in the cryptographic hardware.

Unlike RSA-type certificates, NISTECC requests are limited to Crypto Express Coprocessor.

The following chart compares the EXEC TIME per transaction of the RSA and NISTECC-type certificates.

AMS Integrity workload comparing RSA and NISTECC certificates

For the smaller key-sizes, including z/OS 3.1’s default for RSA of 2048-bits, the performance is better than the comparable key-size for NISTECC. However, using NISTECC 521-bits does provide better EXEC TIMEs than RSA 4096-bits, so if a larger key size is required for an AMS Integrity workload, then a NISTECC certificate type may be of benefit.

Summary

This blog does not suggest that smaller key-sizes should be used with certificates for AMS workloads simply to give the best response times.

The key-size of the certificate may be mandated by business and security requirements, and that clearly overrides performance benefits, but hopefully this blog provides some guidance as to what to consider should you need to increase the size of your certificate key-size, i.e. potentially more cryptographic hardware resources and the ability to scale up the workload.

The blog has focused on performance of a very simple transaction with specific requirements as to how the message data is serviced. Some environments will be able to mitigate the increased round-trip times by parallelizing the applications.

Finally, it is always worth measuring applications in your own environment – your requirements and configuration may be different to our very specific configuration but do consider the load and usage of the cryptographic hardware as this is potentially just as important with a workload protected by AMS policies as the traditional bottlenecks of general-purpose CPUs, disk response time and network.


#MQ
#IBMMQ
#IBMMQ
0 comments
17 views

Permalink