C/C++ and Fortran

 View Only

How to enable auto-SIMD optimization in XL compilers

By Archive User posted Fri September 18, 2015 03:43 AM

  

Originally posted by: FangLu


Good new! The more focused community about XL compilers on POWER is now available at http://ibm.biz/xl-power-compilers.

If you are interested in the XL compilers on POWER, you may want to join the new community and subscribe to updates there. See you there!

This site remains the Cafe for C/C++ compilers for IBM Z.

SIMD (Single Instruction Multiple Data) parallelize loops. Other than thread-level parallelism that runs certain loop iterations in parallel, SIMD takes advantage of vector instructions for processors that support them to execute operations on individual parts of large data elements in parallel. These instructions can offer higher performance when used with algorithmic-intensive tasks, such as multimedia or image processing applications.

XL C/C++ for Linux V13.1.2 and XL Fortran for Linux V15.1.2 support auto-SIMD functionality. You can use the -qsimd option to enable auto-SIMD.

Defaults

Whether -qsimd is specified or not, -qsimd=auto is implied at the -O3 or higher optimization level.

-qsimd=noauto is implied at the -O2 or lower optimization level.

Usage

The -qsimd=auto option enables automatic generation of vector instructions for processors that support them. When -qsimd=auto is in effect, the compiler converts certain operations that are performed in a loop on successive elements of an array into vector instructions. These instructions calculate several results at one time, which is faster than calculating each result sequentially.

The -qsimd=noauto option disables the conversion of loop array operations into vector instructions. Finer control can be achieved by using -qstrict=ieeefp, -qstrict=operationprecision, and -qstrict=vectorprecision.

Notes:

  • Specifying -qsimd without any suboption is equivalent to -qsimd=auto.
  • Specifying -qsimd=auto does not guarantee code changes.
  • Using vector instructions to calculate several results at one time might delay or even miss detection of floating-point exceptions on some architectures. If detecting exceptions is important, do not use -qsimd=auto.

Rules

-qsimd=auto takes effect only when the optimization level is -O3 or higher; otherwise, the compiler ignores the specified -qsimd=auto option.

For example, for XL C/C++ compilers, you can enable autosimdization by using either one of the following commands:

xlc -O3 -qsimd

xlc -O2 -qhot=level=0 -qsimd=auto

However, the following command cannot enable autosimdization:

xlc -O2 -qsimd=auto

If you enable IPA and specify -qsimd=auto at the IPA compile step, but specify -qsimd=noauto at the IPA link step, the compiler automatically sets -qsimd=auto at the IPA link step. Similarly, if you enable IPA and specify -qsimd=noauto at the IPA compile step, but specify -qsimd=auto at the IPA link step, the compiler automatically sets -qsimd=auto at the compile step.

Pragma equivalent (XL C/C++ only)

For XL C/C++ compilers, -qsimd has an equivalent pragma, #pragma nosimd. The following example shows the usage of #pragma nosimd to disable -qsimd=auto for a specific for loop:

#pragma nosimd

for (i=1; i<1000; i++) {

    /* program code */

}

Loop transformation listing

You can use -qreport along with -qsimd=auto to generate a loop transformation listing. The listing file identifies how loops are optimized in a section named LOOP TRANSFORMATION SECTION. Based on this information, you might want to adjust your code so that the compiler can transform loops more effectively. The report also includes diagnostic information to show why specific loops cannot be vectorized. For example, when -qreport is used with -qsimd, messages are provided to identify non-stride-one references that prevent loop vectorization.

0 comments
0 views

Permalink