In a recent quality assurance exercise against a body of code for a customer, I came across a brute force method in moving data from the InputRoot to the OutputRoot that looked very inefficient for me. The input data was an ISO8583-1993 dataset and the message flow was reading it in via a TCPIP Server Input node.
The input data structure consists of about 260 fields, and apart from a few exceptions, the output data structure consists of the same set of fields.
The compute node was copying these fields one by one from the InputRoot to the OutputRoot. In the rest of this document I’m going to show you how to get a really good performance improvement, but first, I’m going to show you the performance improvement you can get by making use of reference variables.
1.1 Hardware and Software versions
My trusty laptop, ThinkPad W540 with 16Gb of memory, running IBM Integration Bus V10.0.0.4 and IBM MQ V8.
1.2 Input data
The input data as can be seen in Figure 1 has about 260 fields in it with 128 of them in the PrimaryBitmap and SecondaryBitmap fields. I’ve blanked out some fields to protect the innocent.
Figure 1: Input Data
The PrimaryBitmap and SecondaryBitmap fields look as in Figure 2, just a list of bits and terrible boring, all together 128 fields.

Figure 2: Bitmap Fields
1.3 The Original code
The original code had a very big section in it that looked like the extract in Figure 3.

Figure 3: Original code
From Figure 3 you can see many references to correlation names “OutputRoot.DFDL.ISO8583_1987” and “InputRoot.DFDL.ISO8583_1987”. This is not a very deep path, but the IIB documentation says that it should be faster to use a reference variable instead of parsing the path every time, see http://www.ibm.com/support/knowledgecenter/en/SSMKHH_10.0.0/com.ibm.etools.mft.doc/bj28653_.htm
I’ve changed the code by adding two reference variables, see Figure 4, and I’ve added the reference variables in the assignments, see Figure 5.

Figure 4: Reference variables

Figure 5: Code with reference variables
Compare Figure 3 and Figure 5 with each other.
1.4 Replace line by line copy with root copy
All of the changes above is just to verify that using reference variables are faster than using long correlation names. The real solution for this specific example was to first copy the entire input data to the output data (SET OutputRoot = InputRoot;) and then to do the specific changes to the output data.
This solution was practical in this case because only a few changes were necessary to the output data because most of the data stayed exactly the same in the output data as in the input data. For the optimized code see Figure 6, again, I’ve blanked out some text to protect the innocent and I’ve removed some temp variable manipulations that is not important for this example. For interest sake, this module is only 53 lines of code compared to the original of 348 lines of code.

Figure 6: Optimized code
1.5 Test Message Flow
In order to test the performance of the original compute node versus the changed one I created the following message flow, see Figure 7. This flow will consume input via the TCPIP Server Input node. The Flow Order node will first follow the top path where I will terminate the TCPIP input just to avoid timeouts. I’ll then create a message with only the current time in it and I’ll output it to an MQ queue. This message is signalling the start of my tests.
In the second path of the Flow Order node the LoopAroundXTimes compute node will execute one of the three compute nodes, OriginalCopyDataByLine, CopyDataLineByLineWithRef or CopyDataRoot, depending on which one is wired, 1 000 000 times. After that it will flow to the Add Timestamp compute node that will end in another message in the output queue. This timestamp will signal the end of the test and will allow me to calculated number of messages executed per second.

Figure 7: Test message flow
For my comparison I wired OriginalCopyDataByLine, CopyDataByLineWithRef and CopyDataRoot up in turn. OriginalCopyDataByLine compute node executed the original code of which you can see a portion in Figure 3. The second compute node that I wired in, CopyDataLineByLineWithRef, executed the code that was changed to use reference variables, see Figure 5. The third compute node that I wired in, CopyDataRoot, is the code in Figure 6.
I’ve deployed each of the three different setups and executed my test 3 times for every setup. Each setup executed the loop code 1 000 000 times in order to give me a workable timespan. The average seconds to execute each round, the TPS and the difference with the original is noted in Table 1.

Table 1: Results
1.6 Conclusion
Even for a relative short correlation name like “OutputRoot.DFDL.ISO8583_1987” the advantage is 27% in the gain in transactions per second. So, be on the lookout for any situation where it is obvious to use reference variables. I do not have any real numbers for deeper correlation names, but maybe one day I’ll do that test.
The real reason why I actually started this test was to see what the advantage was to remove a few hundred “SET” statements with a root copy, “SET OutputRoot = InputRoot” and a few “IF” statements. From my tests in this example it is 7.5 times faster to opt for the root copy. And yes, maybe this was a rookie programmer error, but if you do not QA your code by a senior developer you risk running very inefficient code in production. In the end inefficient code results in unnecessary spend on software licenses.
I would love to receive questions and comments on this article, please do not hesitate to contact me.