Cloud Pak for Integration

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

View Only

Back to Blog List

IBM Integration Bus (IIB) Development Good Practice – Avoid “field by field” copying

By Francois van der Merwe posted Thu August 18, 2016 05:06 AM

In a recent quality assurance exercise against a body of code for a customer, I came across a brute force method in moving data from the InputRoot to the OutputRoot that looked very inefficient for me. The input data was an ISO8583-1993 dataset and the message flow was reading it in via a TCPIP Server Input node.

The input data structure consists of about 260 fields, and apart from a few exceptions, the output data structure consists of the same set of fields.

The compute node was copying these fields one by one from the InputRoot to the OutputRoot. In the rest of this document I’m going to show you how to get a really good performance improvement, but first, I’m going to show you the performance improvement you can get by making use of reference variables.

1.1 Hardware and Software versions

My trusty laptop, ThinkPad W540 with 16Gb of memory, running IBM Integration Bus V10.0.0.4 and IBM MQ V8.

1.2 Input data

The input data as can be seen in Figure 1 has about 260 fields in it with 128 of them in the PrimaryBitmap and SecondaryBitmap fields. I’ve blanked out some fields to protect the innocent.

fig 1 input data.PNG Figure 1: Input Data

The PrimaryBitmap and SecondaryBitmap fields look as in Figure 2, just a list of bits and terrible boring, all together 128 fields.

fig 2 bitmap fields.PNG

Figure 2: Bitmap Fields

1.3 The Original code

The original code had a very big section in it that looked like the extract in Figure 3.

Figure 3: Original code

From Figure 3 you can see many references to correlation names “OutputRoot.DFDL.ISO8583_1987” and “InputRoot.DFDL.ISO8583_1987”. This is not a very deep path, but the IIB documentation says that it should be faster to use a reference variable instead of parsing the path every time, see http://www.ibm.com/support/knowledgecenter/en/SSMKHH_10.0.0/com.ibm.etools.mft.doc/bj28653_.htm

I’ve changed the code by adding two reference variables, see Figure 4, and I’ve added the reference variables in the assignments, see Figure 5.

fig 4.PNG

Figure 4: Reference variables

Figure 5: Code with reference variables

Compare Figure 3 and Figure 5 with each other.

1.4 Replace line by line copy with root copy

All of the changes above is just to verify that using reference variables are faster than using long correlation names. The real solution for this specific example was to first copy the entire input data to the output data (SET OutputRoot = InputRoot;) and then to do the specific changes to the output data.

This solution was practical in this case because only a few changes were necessary to the output data because most of the data stayed exactly the same in the output data as in the input data. For the optimized code see Figure 6, again, I’ve blanked out some text to protect the innocent and I’ve removed some temp variable manipulations that is not important for this example. For interest sake, this module is only 53 lines of code compared to the original of 348 lines of code.

Figure 6: Optimized code

1.5 Test Message Flow

In order to test the performance of the original compute node versus the changed one I created the following message flow, see Figure 7. This flow will consume input via the TCPIP Server Input node. The Flow Order node will first follow the top path where I will terminate the TCPIP input just to avoid timeouts. I’ll then create a message with only the current time in it and I’ll output it to an MQ queue. This message is signalling the start of my tests.

In the second path of the Flow Order node the LoopAroundXTimes compute node will execute one of the three compute nodes, OriginalCopyDataByLine, CopyDataLineByLineWithRef or CopyDataRoot, depending on which one is wired, 1 000 000 times. After that it will flow to the Add Timestamp compute node that will end in another message in the output queue. This timestamp will signal the end of the test and will allow me to calculated number of messages executed per second.

fig 7.PNG

Figure 7: Test message flow

For my comparison I wired OriginalCopyDataByLine, CopyDataByLineWithRef and CopyDataRoot up in turn. OriginalCopyDataByLine compute node executed the original code of which you can see a portion in Figure 3. The second compute node that I wired in, CopyDataLineByLineWithRef, executed the code that was changed to use reference variables, see Figure 5. The third compute node that I wired in, CopyDataRoot, is the code in Figure 6.

I’ve deployed each of the three different setups and executed my test 3 times for every setup. Each setup executed the loop code 1 000 000 times in order to give me a workable timespan. The average seconds to execute each round, the TPS and the difference with the original is noted in Table 1.

table 1.PNG

Table 1: Results

1.6 Conclusion

Even for a relative short correlation name like “OutputRoot.DFDL.ISO8583_1987” the advantage is 27% in the gain in transactions per second. So, be on the lookout for any situation where it is obvious to use reference variables. I do not have any real numbers for deeper correlation names, but maybe one day I’ll do that test.

The real reason why I actually started this test was to see what the advantage was to remove a few hundred “SET” statements with a root copy, “SET OutputRoot = InputRoot” and a few “IF” statements. From my tests in this example it is 7.5 times faster to opt for the root copy. And yes, maybe this was a rookie programmer error, but if you do not QA your code by a senior developer you risk running very inefficient code in production. In the end inefficient code results in unnecessary spend on software licenses.

I would love to receive questions and comments on this article, please do not hesitate to contact me.

1 comment

18 views

Permalink

https://community.ibm.com/community/user/blogs/francois-van-der-merwe/2022/05/31/ibm-integration-bus-iib-development-good-practice-avoid-field-by-field-copying

Comments

Glen Brumbaugh

Mon April 02, 2018 08:35 PM

Best productivity practices

Greate article!

All other things being equal (and they never are) it's better to write efficient code than inefficient code. With IIB, using Reference variables hits both the efficient code button and the easier to read, write, and understand buttons. A win-win-win.

Keep in mind, however, that IIB is a programming language with only one purpose, to make you a more productive programmer. If not, we might as well be writing in something else, such as Java. Furthermore, the productivity gap between IIB and other options should be very large!

IIB Message Flows should be completed and ready for QA testing in a time frame of hours to days. So, don't obsess about performance unless it's critical or you have time one your hands. It's Brokers job to make you perform well, not vice versa!

As a historical note, in the first ten years or so of NEON/WebSphere Business Integration Message Broker/WebSphere Message Broker we used to routinely deploy Message Flows into Production at the rate of 2-3 per week. This assumes, of course, that the requisite analysis had been performed and that coding was ready to begin. Today, alas, productivity is often much lower. If were not carefully as a community, Node-RED and other rapid deployment platforms will become the new targets for rapid development.

So keep writing good code and keep developing it quickly.

Cloud Pak for Integration

Cloud Pak for Integration