C/C++ and Fortran

 View Only

Performance Gain with Newer XLC Compiler

By Archive User posted Sun August 14, 2011 08:54 PM


Originally posted by: Anh Tuyen Tran

Programmers put effort into optimizing their applications for performance. Still, performance gain could also be achieved by simply switching to a newer version of IBM compiler. Although improvement might vary depending on the design and intent of each application, performance difference becomes more visible for programs that handle large amount of data.

The following simple program was designed to require a lot of data: multiplication of two matrices with large dimensions. (It was not written to multiply the matrices faster). Due to the size of data, the number of registers available on the CPU becomes insufficient to hold program data effectively. Spill! The source code of the program matMult.c can be found here .

Let’s have a look at the performance of this program when using V90, V10.1 and V11.1 XLC compilers on AIX. We can see  that the time needed to do the computation has been improved just by switching from V9 to V11.1.

$ time ./v09MatMult

Elapsed Time Without I/O: 28918.17 ms.   <== V9 XLC compiler

real    0m29.00s

user    0m28.93s

sys     0m0.00s


$ time ./v10MatMult

Elapsed Time Without I/O: 27978.91 ms.   <== V10.1 XLC compiler

real    0m28.10s

user    0m28.03s

sys     0m0.00s


$ time ./v11MatMult

Elapsed Time Without I/O: 9162.79 ms.     <== V11.1 XLC compiler

real    0m9.20s

user    0m9.17s

sys     0m0.01s


The compiler command in use is:

xlc -qnostrict -qhot=simd -qarch=pwr6 -qtune=pwr6 -o [v09|v10|v11]MatMult  ./matMult.c


Please note that the above result comes from a regular Power 6 machine with the following specifications.

System Model: IBM,7998-61X

Machine Serial Number: (deleted)

Processor Type: PowerPC_POWER6

Processor Implementation Mode: POWER 6

Processor Version: PV_6

Number Of Processors: 4

Processor Clock Speed: 4005 MHz

CPU Type: 64-bit

Kernel Type: 64-bit

Memory Size: 15744 MB

Good Memory Size: 15744 MB

Platform Firmware level: EA320_030

Firmware Version: IBM,EA320_030

In order to verify it with a performance machine, please find information about how to access IBM Sandbox demo at the following page

Best regards,
Anh Tuyen Tran, IBM
1 view



Thu September 29, 2011 04:48 PM

Originally posted by: ThinkOpenly

Note that the testcase used in this Entry requires a stack bigger than what is permitted by default. You may see something like this: --- $ xlc -o matMult matMult.c $ ./matMult Segmentation fault (core dumped) --- To increase the stack limit for the current shell (to "unlimited"): --- $ ulimit -s 10240 $ ulimit -s unlimited ---

Thu September 22, 2011 03:01 AM

Originally posted by: YinLinZhang

Translated article in Chinese version: https://www.ibm.com/developerworks/mydeveloperworks/blogs/12bb75c9-dfec-42f5-8b55-b669cc56ad76/entry/_e6_83_b3_e8_8e_b7_e5_be_97_e6_80_a7_e8_83_bd_e6_8f_90_e5_8d_87__e8_af_b7_e4_bd_bf_e7_94_a8_e6_96_b0_e7_89_88_e6_9c_acxlc_e7_bc_96_e8_af_91_e5_99_a8?lang=zh