MQ

 View Only

MQ for z/OS: Impact of AMS policy protection on MQPUT1

By Anthony Sharkey posted Fri January 21, 2022 09:03 AM

  

 I recently received a question on why the performance of a transaction using MQPUT1 degraded so much when an AMS Confidentiality policy was applied to the queue and this blog aims to answer that question.

 

What is Advanced Message Security (AMS)?

IBM MQ Advanced Message Security (IBM MQ AMS) provides a high level of protection for sensitive data flowing through the MQ network with different levels of protection by using a public key cryptography model.

 

IBM MQ provides three qualities of protection:- Integrity, Privacy and Confidentiality.

 

Integrity protection is provided by digital signing, which provides assurance on who created the message, and that the message has not been altered or tampered with.

 

Privacy protection is provided by a combination of digital signing and encryption. Encryption ensures that message data is viewable by only the intended recipient, or recipients.

 

Confidentiality protection is provided by encryption only. Confidentiality policies allow for symmetric key reuse over a sequence of messages.

 

Why call out MQPUT1?

When using the MQPUT1 verb, AMS Confidentiality is unable to reuse the symmetric key and loses some of the performance benefits which are discussed in the MQ performance report “MQ for z/OS V920 Performance Report”.

What basic performance do we see when issuing MQPUT1 and MQCMIT?

In this initial measurement we have configured an queue manager on z/OS running MQ Advanced for z/OS using V920.

 

The measurements are run on our performance IBM z15 running z/OS v2r4, configured with 3 dedicated CPUs and access to dedicated CryptoExpress7S processors.

 

The LPAR has access to dedicated cryptographic processors configured thus:

  • 2 co-processors
  • 2 accelerators

 

The queue manager is configured with 1 million pages in the buffer pool and a page set that is large enough to contain the workload without resorting to page set expansion.

 

There is a single batch application that generates a 1KB non-persistent message and then in a tight loop issues MQPUT1 followed by MQCMIT. This tight loop is repeated 250,000 times.

 

The following table shows the costs by address space per transaction of a set of configurations as well as the achieved transaction rate:

 

Configuration

Cost

(uSeconds / iteration)

Transaction Rate

 

Roundtrip

Time

(uSeconds)

Idle

 

MQ MSTR

MQ AMSM

Application

Total

 

 

Roundtrip - Total

BASELINE

2

0

10

12

67905

14.7

2.7

AMS INTEGRITY

3

99

22

124

2970

337

213

AMS PRIVACY

3

136

22

161

2305

434

273

AMS CONFIDENTIALITY

3

116

21

140

4951

202

62

           

Notes on table:

  • Measurements were run on AMS Confidentiality using key reuse values of 1, 32 and “*” with no difference to performance.
  • AES256 was specified for data encryption.
  • SHA256 was specified for data signing.
  • Round trip time is based on CPU microseconds per second (1,000,000) divided by transaction rate.
  • Idle time is based on subtracting the CPU cost per transaction from round trip time.

For the purposes of this investigation, we are primarily looking at the impact of AMS Confidentiality on the MQPUT1 request, but have included AMS Privacy and AMS Integrity for completeness.

 

Comparing the baseline performance of MQPUT1 with the performance when putting to a queue with an AMS Confidentiality policy applied we see:

  • The cost in the AMSM region increase by 116 CPU microseconds per transaction,
  • The cost in the application region increases by 11 CPU microseconds, from 10 to 21.
  • The cost in the MQ MSTR region increase by 1 CPU microsecond.

In this instance we are seeing a cost increase of 128 CPU microseconds per transaction but the amount of time spent “idle” increase from 2.7 to 62 CPU microseconds.

 

We know from previous investigations into AMS performance that the CPU processing is doing the essential processing, so the CPU cost is relatively static for this type of workload on this particular CPU configuration, but what about the “idle” time?

 

In this configuration, idle time may vary for a number of reasons including:

  1. Waiting for CPU.
  2. Waiting for cryptographic processing performed on CryptoExpress hardware, including queuing due to the cryptographic hardware being busy processing requests for other tasks.
  3. MQ Commit processing – waiting for commit to complete.

 

In order to minimize the wait for CPU and cryptographic hardware, the workload is single threaded on a z/OS LPAR with low CPU utilization.

 

In a system with multiple applications running, there may be some wait time in both CPU as well as waiting for the CryptoExpress hardware to complete its processing. 

What is happening in the “idle” time?

For the baseline tests where no AMS protection is applied, the idle time, i.e. the difference between the round-trip time and the cost of the transaction, is minimal and can largely be attributed to the elapsed time to complete the MQCMIT.

 

In the AMS Confidentiality measurement, we also see some idle time accrued for the MQCMIT, but this is just a small percentage of the overall time.

 

In addition we are also making requests to the CryptoExpress hardware as per the following RMF report:

RMF Cryptographic Hardware Report

Details of this report can be found at: https://www.ibm.com/docs/en/zos/2.4.0?topic=cchar-contents-report

 

What does the report tell us?

Co-processor:

The report shows that we have 2 CEX7C (Co-processors) that are processing at a combined rate of 105.54 requests per second, with an average response time of 138 microseconds and they are each lightly utilized (0.7%).

For the interval of 54.727 seconds, this means that the co-processors have processed 5776 requests.

As the system is being used solely for this measurement we know that we processed 250,000 MQPUT1 requests in this interval, which suggests that we made 1 call to the co-processor every 43 transactions.

Given the average EXEC time of 138 microseconds, we can determine the impact of these co-processor requests to be 3.2 microseconds per MQPUT1.

 

Accelerator:

The report shows that the MQPUT1 workload has used the accelerator hardware at a more frequent rate than the co-processors.

In this instance the report shows 9129 requests per second with an average response time of 24 microseconds, and a heavier usage of the resource (22.2%).

For the interval of 54.727 seconds, this means that the accelerators have processed  499,603 requests, which is approximately 2 requests per MQPUT1.

For 2 accelerator requests per MQPUT1, this equates to 48 microseconds spent per transaction.

 

If we take the average time spent in cryptographic co-processor and accelerator per MQPUT1 we see the time is equivalent to 51.2 microseconds, which is a significant proportion of the 62 microseconds idle time.

 

MQCMIT – Log I/O time:

From the MQ class(3) accounting data we also know that the MQCMIT is taking an average of 9 microseconds.

In this particular configuration, the time spent in cryptographic hardware plus log I/O time equates to 60.2 of 62 microseconds.

 

When adding AMS Confidentiality to this particular workload, we can explain the increase in round-trip times due to:

  • Increased CPU cost in AMSM and Application for cryptographic key processing and encryption
  • Less busy z/OS LPAR leading to longer elapsed times for commit (cache is less “hot”)

What other factors should I consider?

Impact of key size

In the previous measurements using AMS, the size of the private key size used in the RACF certificates was set to 1024 bits. An increased key size will have a detrimental effect on the response times of the cryptographic hardware.

  

Key size and AMS Confidentiality-protected queues

In the following measurements we repeated the AMS Confidentiality measurement using key sizes of 2048 and 4096 bits in addition to the 1024 used previously.

The table reports only the AMS Confidentiality configuration with a range of key sizes.

 

Size of private key

Cost

(uSeconds / iteration)

Transaction Rate

 

Roundtrip

Time

(uSeconds)

Idle

 

MQ MSTR

MQ AMSM

Application

Total

 

 

Roundtrip - Total

1024

3

116

21

140

4951

202

62

2048

3

120

24

147

3891

257

110

4096

3

122

24

149

2365

422

273

 

This table shows that there is a small impact to the transaction cost of increasing the private key size from 1024 to either 2048 or 4096 bits but there is a much larger impact to the transaction rate, and by implication to the “idle” time.

 

Why the increase in idle time?

Once more, we look at the RMF Cryptographic reports for the answer.

 

For a private key size of 2048:

Cryptographic co-processor response time remains at 138 microseconds per request but we are seeing an increased number of requests – approximately 1 in 26 MQPUT1 calls result in an co-processor request. This means the impact to an MQPUT1 of the co-processor request is now 5 microseconds.

 

With regards to the accelerator usage, there are still 2 requests per MQPUT1 but the average response time has increased to 46 microseconds per request, or a total of 92 microseconds per MQPUT1.

This means that with a private key size of 2048 bits, the time spent in CryptoExpress hardware increases from 51.2 to 97 microseconds.

Including 9 microseconds of elapsed time per commit and we account for 106 of 110 idle microseconds per transaction.

 

For a private key size of 4096 bits:

Cryptographic co-processor response time increases slightly to 139 microseconds per request but again we are see an increased number of requests – approximately 1 in 14 MQPUT1 requests results in an co-processor request. This means the impact to each MQPUT1 of the co-processor request is now 10 microseconds.

With regards to the accelerator usage, there are still 2 requests per MQPUT1 but the average response time has increased to 131 microseconds per request, or a total of 262 microseconds per MQPUT1.

This means that with a private key size of 4096 bits, the time spent in CryptoExpress hardware increases to 272 microseconds.

 

Summary of CryptoExpress7S usage for AMS Confidentiality with varied key size:

Private key size

Co-processor

Accelerator

TOTAL

Idle Time

(bits)

Requests / MQPUT1

Time spent / MQPUT1

(microseconds)

Requests / MQPUT1

Time spent / MQPUT1

(microseconds)

(microseconds)

 

1024

1 in 43

3.2

2

48

51.2

62

2048

1 in 26

5

2

92

97

110

4096

1 in 14

10

2

262

272

273

 

As the measurements show, private key size is an important factor that affects the performance of MQPUT1’s asymmetric key processing with an AMS Confidentiality policy applied to the MQ queue.

 

In these AMS Confidentiality measurements, the co-processor and accelerator execution time accounts for nearly all of the idle time in the transaction round-trip time.

 

Key size on AMS Integrity-protected queues

As with AMS Confidentiality, the original AMS Integrity configuration was measured with a private key size of 1024.

To demonstrate the impact of larger private key sizes, we have repeated the measurements with key sizes set to both 2048 and 4096 bits. 

The following table reports only the AMS Integrity configuration with a range of key sizes.

Private key size

Cost

(uSeconds / iteration)

Transaction Rate

 

Roundtrip

Time

(uSeconds)

Idle

 

MQ MSTR

MQ AMSM

Application

Total

 

 

Roundtrip - Total

1024

3

99

22

124

2970

337

213

2048

3

100

24

127

1355

737

610

4096

3

104

25

132

261

3825

3693

 

This table shows again that there is only a small impact to the transaction cost of increasing the key size from 1024 to either 2048 or 4096 but there is a much larger impact to the transaction rate, and by implication to the “idle” time.

 

In the case of AMS Integrity, we observe no usage of the CryptoExpress7S Accelerator but do see requests made to the CryptoExpress7S Co-processor – and there is 1 request per MQPUT1.

 

The key size makes a significant difference to the execution time of the co-processor request:

Private key size

Co-processor

Idle time

(from previous table)

 

Requests / MQPUT1

Time spent / MQPUT1

(microseconds)

(microseconds)

1024

1

204

213

2048

1

603

610

4096

1

3684

3693

 

In the AMS Integrity configuration, the co-processor execution time accounts for nearly all of the idle time in the transaction round-trip time.

 

Key size on AMS Privacy-protected queues

As with AMS Confidentiality and AMS Integrity, the original AMS Privacy configuration was measured with a private key size of 1024 bits.

 

To demonstrate the impact of larger key sizes, we have repeated the measurements with key sizes set to both 2048 and 4096 bits.

The following table reports only the AMS Privacy configuration with a range of private key sizes.

Private key size

Cost

(uSeconds / iteration)

Transaction Rate

 

Roundtrip

Time

(uSeconds)

Idle

 

MQ MSTR

MQ AMSM

Application

Total

 

 

Roundtrip - Total

1024

3

136

22

161

2305

434

273

2048

3

141

25

169

1127

886

717

4096

7

147

27

181

233

4291

4010

 

Once more, this table shows that there is a small impact to the transaction cost of increasing the key size from 1024 to either 2048 or 4096 bits but there is a much larger impact to the transaction rate, and by implication to the “idle” time.

In the case of AMS Privacy, we observe 1 request per MQPUT1 in the CryptoExpress7S Co-processor – and 2 requests per MQPUT1 in the CryptoExpress7S accelerator.

For AMS Privacy, the key size makes a significant difference to the execution time of both the co-processor and accelerator requests:

 

Private key size

Co-processor

Accelerator

TOTAL

Idle Time

 

Requests / MQPUT1

Time spent / MQPUT1

(microseconds)

Requests / MQPUT1

Time spent / MQPUT1

(microseconds)

(microseconds)

 

1024

1

201

2

48

249

273

2048

1

594

2

92

686

717

4096

1

3475

2

262

3737

4010

 

In the AMS Privacy configuration, the co-processor and accelerator execution time accounts for nearly all of the idle time in the transaction round-trip time.

 

It is worth noting that with a key size of 4096 bits, the AMS Privacy test does see some additional idle time that is not accounted for by the time spent on cryptographic hardware.

This additional time was spent in page set expansion as the size of the message arriving on the queue manager exceeded 1 page per message when including MQ implementation headers.

 

Impact of AMS protection and private key size on message size

When protecting an MQ message using AMS, the size of the message actually put to the queue will typically exceed the size of the message being put by the application.

When MQ adds the AMS implementation headers and encrypts the message data, the message size increases.

 

The following table offers a guide to message size with AMS protection on the queue when putting a 1KB message. These message sizes are collected from class(3) accounting trace data:

 

Private Key Size

Policy type

1024

2048

4096

AMS Integrity

2215

2605

3373

AMS Privacy

2508

3023

4047

AMS Confidentiality

1520

1651

1907

Note: The table shows the size of the messages as reported by Accounting trace class(3).

For example, with a key size of 1024 bits, a 1KB message was put to the queue with a size of 2215 bytes, an overhead of 1191 bytes.

 

The effect of this is that larger messages put to MQ will potentially use additional space, whether in buffer pool, page set or logging, as well as costing additional CPU.

 

Type of key pair

When defining the RACF certificates to use with AMS with MQ for z/OS, the default definition is RSA.

 

The RACDCERT GENCERT documentation specifies that the default option stores the private key in a RACF database as an RSA key.

 

It is possible to specify RSA(PKDS(*)) which will use hardware protection but we have seen no performance benefit from using this option.

 

Alternatively, equivalent strength NISTECC-type keys can be used and these are typically shorter in length than RSA private keys, for example an RSA key length of 1024 bits is equivalent to a NISTECC key of length of 192 bits.

 

Using a shorter NISTECC key can offer some benefits to performance particularly when using NISTECC keys of 224 bits or larger, which is equivalent to RSA keys of 2048 bits and larger.

 

Unfortunately the NISTECC key does not support a key usage value of DATAENCRYPT, which means that the NISTECC key can only be used for AMS Integrity-type policies and not ones which involve encryption of the MQ message, i.e. Privacy or Confidentiality-type policies.

 

How much difference might NISTECC make to AMS Integrity policies?

The following table is an extension of the table previously shown in “Key size on AMS Integrity-protected queues” with the performance including NISTECC keys.

Private key size

Cost

(uSeconds / iteration)

Transaction Rate

 

Roundtrip

Time

(uSeconds)

Idle

 

MQ MSTR

MQ AMSM

Application

Total

 

 

Roundtrip - Total

1024

3

99

22

124

2970

337

213

192 (NISTECC)

3

94

23

120

2708

369

249

2048

3

100

24

127

1355

737

610

224 (NISTECC)

3

94

23

120

2552

392

272

4096

3

104

25

132

261

3825

3693

384 (NISTECC)

3

95

23

121

1755

569

448

 

Notes on table:

  • The performance of RSA key size 1024 exceeds that of the NISTECC key size 192, so whilst they are comparable in strength, our measurement suggests RSA 1024 bits offers some benefit.
  • With larger NISTECC key sizes, the performance significantly exceeds that of the equivalent strength RSA keys.
    • For example NISTECC keys of 384 bits, which are equivalent to RSA keys of 7680 bits, are able to achieve more than 6 times throughput of the RSA key size 4096 bits, which is due to the reduced execution time on the co-processor.
  • As the NISTECC keys are smaller than the equivalent strength RSA keys, there is less additional data added to the message stored on the MQ queue.

 

The following table compares the Cryptographic co-processor execution times for the RSA and NISTECC key sizes of equivalent or greater strength:

RSA

Key Size

Time spent / MQPUT1

(microseconds)

 

NISTECC

Key size

Time Spend / MQPUT1

(microseconds)

1024

204

 

192

241

2048

603

 

224

262

4096

3684

 

384

440

 

Whilst work executed on the CryptoExpress processors is not charged to the overall workload, a longer execution time means that requests from other workloads are potentially delayed waiting for the processor.

Use of Cryptographic hardware

Throughout this investigation, the measurements have relied on Cryptographic hardware features to assist with the AMS protection of messages.

 

What this has shown is that the transaction can appear to spend a significant proportion of the round-trip time in a wait or idle state. In our measurements this wait time is primarily due to the execution time of calls to the cryptographic hardware, or indeed waiting for access to said hardware.

 

It is possible to run MQ workload against AMS protected queues without having the Cryptographic features available but it is not necessarily advisable.

 

General purpose processors are less efficient at cryptographic processing that the dedicated cryptographic co-processor and accelerator features.

 

The following table demonstrates the difference in performance to the AMS Confidentiality workload when one or more of the cryptographic features are not available:

 

Configuration

Cost

(uSeconds / iteration)

Transaction Rate

 

Roundtrip

Time

(uSeconds)

Idle

 

MQ MSTR

MQ AMSM

Application

Total

 

 

Roundtrip - Total

Cryptographic H/W available

3

116

21

140

4951

202

62

Accelerator only

3

115

24

140

4932

203

63

Co-processor only

3

116

24

142

2656

376

234

No cryptographic H/W available

3

337

21

361

2732

366

5

 

Notes on table:

  • If only one type of cryptographic feature is enabled, whether co-processor or accelerator, the cryptographic work will be run on that available feature.
    • For AMS Confidentiality protected queues, the work is served more efficiently on an accelerator. As a guide, on our system each accelerator request took 24 microseconds but when run on a co-processor the equivalent execution time was 111 microseconds, for a total of 222 microseconds per MQPUT1.
  • If no cryptographic features are available, the cryptographic work will be run on the general purpose processors, at additional cost.
    • In the table, relying on general purpose CPU’s increased the AMSM cost by 221 microseconds per transaction.
    • As there is no cryptographic feature to use and there is sufficient CPU in the LPAR, there is little ‘idle’ time.

 

If cryptographic hardware features are not available, the private key size will have a significant effect on the cost of the work performed by the AMSM region.

 

For example using the AMS Confidentiality configuration:

  • With a key size of 2048 bits, the AMSM cost per transaction increases from 120 microseconds to 940, nearly a 8x increase in cost.
  • With a key size of 4096 bits, the AMSM cost per transaction increases from 122 microseconds to 3268, nearly a 27x increase in cost.

 

For AMS Integrity, the cryptographic hardware usage is limited to co-processor. Should the co-processor be unavailable, we saw the AMSM cost per transaction increase from 97 microseconds to 2.5 milliseconds, an increase of nearly 26x in chargeable CPU.

 

For AMS Privacy, the cryptographic hardware uses both co-processor and accelerators. In the event of no cryptographic hardware being available, the increase to the AMSM cost per transaction was of the order 20x, as we saw a cost of 137 microseconds increase to 2.7 milliseconds.

 

The guidance we would suggest is that depending on the AMS policy type(s) in use, ensure that at the minimum, the cryptographic hardware types are available as per the following table:

Policy type

Co-Processor

Accelerator

AMS Integrity

Yes

-

AMS Privacy

Yes

Yes

AMS Confidentiality

-

Yes

 

Why might you consider relying on general purpose processors?

As we have shown there is a significant cost associated with running cryptographic processing on general purpose processors, but it is possible that your system may have more CPUs configured that cryptographic features.

 

By running on general purpose CPUs, there is significant additional cost but potentially a reduction in queuing time for a more limited resource, which may in some circumstances improve the overall throughput.

Summary 

With regards to AMS protection, particularly when using MQPUT1 to protected queues where key reuse is unable to offer its typical performance benefits, the response time of the Cryptographic hardware is a large factor.

 

Even with the smallest private RSA key size (1024), we are seeing a significant proportion of the round trip time spent waiting for functions in cryptographic hardware to complete – of the order of 62 microseconds, or 30% of our very contained workload.

 

For larger private key sizes, there can be a far larger impact from waiting for cryptographic hardware processing, for example:

  • For AMS Confidentiality and a key size of 4096, 272 microseconds or 65% of the round-trip time was spent waiting for cryptographic hardware.
  • For AMS Privacy with a key size of 4096, the time spent waiting for cryptographic hardware accounted for 3.7 milliseconds or 87% of the round-trip time.

 

Even these relatively large wait times are on a system that has sufficient resource to avoid being delayed.

 

 

Now consider a workload with 20 tasks attempting MQPUT1’s to AMS protected queues, where the system has 2 cryptographic accelerators.

 

At any time, 2 requests can be serviced by the accelerators, and with a key size of 1024, we have observed this time would be 48 microseconds per MQPUT1. If there are 10 tasks all attempting processing requiring cryptographic hardware support, the last task may be waiting for 9 other tasks to complete – which could result in waiting 480 microseconds to complete its own processing. With 2 accelerators we might hope the work is spread evenly across the resources so the wait time is 240 microseconds, but that will still have a notable effect on the round-trip time of the transaction.

 

The RMF Cryptographic report does report a utilization figure but this does not always show the full picture. For example it might report the processor is 50% busy over the interval.

 

It could be that the processor was 50% busy over the full interval or it might have been 100% for half of the interval and unused for the remainder. There is no indication of delays for queuing.

 

It is always worth having additional resources available – for example we have 2 accelerators and 2 co-processors configured.

 

In all of these measurements, we see all of the accelerator processing performed by only 1 of those 2 processors, which suggests we were not delayed waiting for an accelerator processor.

 

By contrast, with the co-processors we see the load shared evenly across the 2 resources which might suggest that even though the utilization was no more than 2%, there were concurrent requests – or it could be that co-processors are chosen in a round-robin order.

 

What is useful to monitor:

  • CPU utilization - Monitor CPU utilization and distribution of “in-ready work unit queue” – there may be tasks waiting for CPU.
  • Cryptographic usage
    • What AMS protection policy are you implementing
      • Integrity and privacy will benefit from co-processors.
      • Privacy and confidentiality will benefit from accelerators.
    • Have you got the correct cryptographic processors configured?
    • How busy are the cryptographic resources?
    • Are all of the tasks busy at any point in time?
    • Is the key size appropriate?

 

Hopefully this investigation has given some indication of where the additional round-trip time is being gained when applying AMS policies to your MQ queues.

It is worth highlighting that in AMS Confidentiality configurations when key reuse is achievable, the use of the CryptoExpress hardware is reduced. This will result in less idle time that can be attributed to Cryptographic processing.

 

Whilst we have concentrated primarily on MQPUT1 and AMS Confidentiality, it is worth reviewing the performance data in the AMS regression section of “MQ for z/OS V920 Performance Report” to see the benefit of AMS Confidentiality when key reuse can be used.

 



#cryptographic
#Integration
#MQ
#z/OS
1 comment
106 views

Permalink

Comments

Thu January 27, 2022 02:24 PM

Thanks for the write-up, @Anthony Sharkey !​