Yes – the recently issued report , available here: “IBM MQ for z/OS – Managed File Transfer Performance” has already received a minor refresh!
The main reason for the minor refresh is that the difference in performance for text file transfers from Unix System Service files was so much worse than for z/OS datasets when sent to the xLinux partner machine - and as such I wanted to work through how I investigated the differences. Details on this are available in the "Debugging the variability in MFT measurements" section that follows.
In addition, since the initial release, we have looked at the performance of a number of tuning options for the MFT agent on z/OS and will be applying those options in subsequent releases.
These options were found to provide benefit to our file transfer workloads in terms of both reduced transfer cost and increased transfer rates of up to 30% and are:
- Large page memory allocation
- Garbage collection using the pause less option
- Disabling IBM Health Center data gathering in “headless” mode
To re-iterate, the published performance report does not contain the results of these tuning options - these will be made available in the next major release!
Finally, we have added a brief section discussing the performance of message to file performance.
Debugging the variability in MFT measurements
In the recently issued performance report “IBM MQ for z/OS – Managed File Transfer Performance” report, I noted an anomaly in the performance between outbound text transfers from Unix System Service (USS) files when compared with similar transfers from z/OS data sets.
In particular when sending larger files (100MB and 1GB) to our xLinux partner, the USS transfer achieved a transfer rate of 32-33 MB/second whereas the z/OS transfer achieved 70 MB/second.
Upon investigation, I started to see more variable results – such that text files sent from both USS and z/OS achieved transfer rates in the range of 20 to 70 MB/second.
I thought that it might be useful to show the process that I went through to try to understand why this occurred and to resolve the performance differences, and that is what follows below.
In this instance the files are sent from either USS or z/OS datasets over a dedicated 1Gb performance network to the destination running on xLinux.
The files transferred to the xLinux partner machine are:
- 1024 files of 1MB
- 200 files of 10MB
- 100 files of 100MB
- 10 files of 1GB
The files are sent as both binary and text transfers.
Working through the problem:
Given the files being transferred from z/OS and USS are essentially the same, including code page, record length, and the resulting data conversion required on the remote system, we might expect similar performance from the two configurations, and yet as reported earlier, the z/OS file transfer is sometimes significantly out-performing the USS transfer.
The JZOS classes that provide access for z/OS datasets use LE file management methods, such as fopen, fread, fwrite and fremove and the performance of these is limited by the underlying z/OS services. Native Java classes are used to access USS files and typically show slightly lower cost.
Is the warm-up phase and workload similar in each case?
Yes. In order to try to ensure the agents are suitably warmed-up, the MFT agents are given at least 4 individual transfers each consisting of transferring 1024 files of 1MB in the same direction as the measured workload. The file transfers are then run in the same order – binary transfers for each of the specified file sizes, followed by text transfer.
Does the variability remain when the network is not a constraining factor?
Yes. Sending the file transfers over the 10Gb network did not reduce the variability of the workloads.
Is the source system the cause of the variability?
In all cases, there was sufficient CPU and network available, and the network response times and MQ channel batch sizes were comparable in both the high and low transfer rate measurements.
This left the DASD response times, to read the data from file. RMF 42 subtype(6) records suggested that the performance was consistent.
Despite this consistent I/O performance, we decided to try to remove as much potential for variability as possible, and investigated message-to-file (now included in the performance report) transfers – but this did not resolve the variability issues.
Does the destination file system affect the performance?
Following guidance in the “SAN tuning” section of the “Persistent Messaging Performance” document, we looked at the impact of write merges. With minimal write merges occurring on the writes to the SAN, disabling the feature had little impact.
In an attempt to rule out the impact of the file system on the xLinux partner, we configured a RAM disk – which allowed us to prove that the inconsistencies observed were not as a result of file system variability.
Is the destination agent processing causing the variability?
When running the file transfers, we monitored the destination system using the following command to poll the system: top -S -u mqperf -d 5 -n 24 -b
- Where mqperf is the user id running the MFT agent.
This allowed us to monitor the CPU utilisation of the Java™ process performing the data conversion.
For each interval during the text-type transfer, the Java™ process was using 100-108% of a CPU, for example:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
404574 mqperf 20 0 5686512 686052 16404 S 105.6 2.1 10:17.27 java
Given the xLinux partner has 12 cores and is dedicated for this workload, there is CPU capacity to spare should the workload require it.
This might suggest that whilst the xLinux partner has capacity for more work, the particular workload may be CPU bound on a single thread.
What is the java workload doing on the xLinux partner?
For this we looked at several options – enabling trace as well as sampling using the IBM Health Center.
Enabling trace by specifying trace=com.ibm.wmqfte=all in the agent.properties file, resulted in the performance degrading to such an extent as to be of little benefit. Targeted trace would have had less impact on the file transfer but at this stage it was difficult to determine which classes should be traced.
Enabling the IBM Health Center sampling in “headless” mode as part of the agent process created the HCD files for use in the Health Center client.
To enable sampling we configured:
export BFG_JVM_PROPERTIES="-Xmx1024M -Xhealthcenter:level=headless -Dcom.ibm.java.diagnostics.healthcenter.headless.files.to.keep=0”
Enabling Health Center requires the MFT agent process to be restarted, with the HCD files were created once the agent was stopped. These files could be found in the agent log directory.
Viewing the data in the Health Center client indicated that 80% of the CPU used was in the data conversion routines, whether converting the source data to unicode byte-by-byte, or subsequently converting each byte to a character.
From a processing perspective, the time spent in data conversion and the methods to perform the data conversion does make sense, and also further suggests that the process is somewhat constrained by the clock speed on the xLinux partner.
Up to now, we have been able to demonstrate that the transfer is limited by CPU, rather than network or file system performance, but this does not explain why we can see such variability in transferring the text files.
Is it Java™?
If the source and destination systems are not changing between high and low achieved throughput file transfers, and the file contents are constant, then surely the variability must be due to Java differences?
To see whether the JIT would affect the performance, negatively or otherwise, we looked at the JIT options.
- Disable JIT using -Xnojit
Unsurprisingly this option did not result in well performing code.
- Configure the JIT to compile the method once it has been called a fixed number of times using -Xjit:count=10
This option did result in more consistent performance but at the lower end of the throughput range.
- Force the JIT compiler to compile all methods at a specific optimization level for example -Xjit:optlevel=scorching
We selected option “scorching”, with the intention to Java a nudge that performance is critical, resulting in the 1GB text transfer achieving 62 MB/second and perhaps more importantly from a performance perspective, this configuration achieved consistent results.
The “veryHot” option also resulted in consistent performance results but peaked at 58 MB/second – however this is within acceptable boundaries for test variability for these workloads.
As the following chart indicates, by specifying the “scorching” level of optimisation on the xLinux MFT agent we were able to nearly double the throughput rates for files of 10MB and larger in a consistent manner.
With regards to the cost of transferring the data, by specifying the JIT optimization level on the partner, the z/OS costs were up to 20% lower than in the baseline, primarily because we are recording less “z/OS overheads” due to the reduced run time.
Of course, by specifying the JIT optimization level on the xLinux partner, we are increasing slightly increasing the load on that partner machine:
Early stages of the file transfer showed the usage, where the JIT was aggressively optimizing the code:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
632991 mqperf 20 0 5697764 639272 16356 S 231.1 2.0 7:18.24 java
Once the JIT process was complete:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
632991 mqperf 20 0 5683432 686644 16392 S 111.8 2.1 12:28.78 java
Does this apply to the MFT agent running on z/OS?
When we apply the same level of JIT optimization to the z/OS agent, we do not see a benefit in terms of throughput but do see an increase in the transfer costs.
Should I override the JIT optimization level on the distributed MFT agent?
As the Java documentation for -Xjit:optlevel= suggests, this may have an unexpected effect on performance, including reduced overall performance, so it is advisable to test this in your own environment before implementing in your production environment.
With hindsight and a better understanding of Java™ and the JIT process, it now seems obvious that the performance variability could be affected in this way.
It is likely that the method of testing we use does not lend itself to achieving the best results, as each configuration uses the following model of testing:
- Create and start agent processes
- Warm up agents by sending 4 * 1024 * 1MB files in the desired direction e.g. z/OS to xLinux
- Run binary transfers – using 1MB , 10MB, 100MB and 1GB files
- Run text transfers – again using 1MB, 10MB, 100MB and 1GB files
- Shutdown and delete agent processes
This means the MFT agents are not particularly long running – existing for no more than 2 hours at a time, whereas a production environment may see the agents available for months at a time.
As such, for longer running MFT agent processes, it may not be necessary to encourage the JIT to compile the classes at the scorching optimization level on the distributed partner – and we saw no benefit from applying the same optimization on the z/OS MFT agent.
Of course, as with many performance options, clients may see different results with their own MFT workloads.