Power Programming Languages

Power Programming Languages

IBM Power, including the AIX, IBM i, and Linux operating systems, support a wide range of programming languages, catering to both traditional enterprise applications and modern development needs.


#Power

 View Only

Use OpenMP Environment Variables to get the best performance

By Archive User posted Mon March 13, 2017 10:13 PM

  

Originally posted by: GuoJiufu


This article was written by Jiufu Guo and has been copied over from the C/C++ Cafe. The original article can be found here.
Disclaimer: The content of article has not been modified from the original version and it may be out of date.

 

OpenMP is a widely used parallel programming method for shared memory multi-processing. As previous blog (OpenMP support in XL C/C++ and XL Fortran compilers for Linux on Power little endian) said, on little endian Linux on Power, you can start to use OpenMP with IBM XLC/C++ 13.1.2 to boost application performance.

For the OpenMP specification, you can get it from the OpenMP website: http://openmp.org/specifications/. For more information about the OpenMP features in the XL compilers, you can also reference XL compiler manual: http://www-01.ibm.com/support/knowledgecenter/SSXVZZ_13.1.2/com.ibm.xlcpp1312.lelinux.doc/proguide/cuppovrv.html?locale=en

You can parallelize your code with OpenMP directives and link with OpenMP libraries. In addition to the OpenMP directives, there are a few environment variables that can affect OpenMP application runtime behavior. This blog focuses on the commonly used OpenMP environment variables and how it affects the runtime behavior. The previous blog (OpenMP support in XL C/C++ and XL Fortran compilers for Linux on Power little endian) shows how the OMP_DISPLAY_ENV environment variable works. This blog describes some commonly used OpenMP environment variables. For other environment variables, please reference XL compiler manual: http://www-01.ibm.com/support/knowledgecenter/SSXVZZ_13.1.2/com.ibm.xlcpp1312.lelinux.doc/compiler_ref/ruomprun.html?locale=en and OpenMP specification. In XL compiler manual, you can also reference XLSMPOPTS which is used for loop parallelization control.

Below is a brief description of some commonly used environment variables.

  • OMP_WAIT_POLICY: could be PASSIVE or ACTIVE. You could set OMP_WAIT_POLICY to ACTIVE, if the application is running on a dedicated machine. This setting allows the threads to keep searching for work without being put into sleep or yielding to other processes.
  • OMP_STACKSIZE: controls the stack size of each OpenMP thread; it does not affect stack size of the initial thread. Setting the value too large may impact performance. Setting it too low may cause the program to crash.
  • OMP_PROC_BIND:  controls whether OpenMP threads can be moved between CPUs. Moving threads between CPUs may introduce overhead. If you run your application on dedicated CPUs, you could set OMP_PROC_BIND to TRUE and reference the number of CPUs to set OMP_NUM_THREADS .
  • OMP_SCHEDULE: specifies schedule type and chunk size for loops that have OMP_SCHEDULE(RUNTIME) clause specified. Chunk size indicates how many iterations of the associated loops are divided into chunks. Schedule type could be auto, dynamic, guided and static. For example, with OMP_SCHEDULE=”dynamic,4”, the runtime will divide iterations in chunks of four iterations and set the scheduling type to “dynamic”.
  • OMP_DYNAMIC: controls if the number of OpenMP threads is adjustable in a thread team. Adjusting threads may save system resources but it may have performance impact.
  • OMP_NUM_THREADS: controls the maximum number of threads that could be created for the parallel regions that do not have NUM_THREADS clause specified. This variable could be larger than the number of CPUs, however it may impact performance of your applications.

 

With different environment variable settings, an OpenMP application can run with different runtime performance. For different use case, different settings would be needed. The OpenMP environment variable settings have direct impact to the application runtime performance. You may want to tune the settings in order to get better performance.

To get best performance, an appropriate thread number is needed. Too many threads can cause performance downgrade because time costs on openmp runtime threads management. Too few threads cannot take best advantage of parallel. For different application, user will have a test to get best thread number. Here is an example to get appropriate thread number.

 

Source code example:

ex.cpp:

#include <iostream>

#include <math.h>

#include <sys/time.h>

double expsum()

{

double sum=0.0;

  #pragma omp parallel for

  for (int j = 1; j < 160; j++) {

     #pragma omp parallel for

    for(int i=1; i<=1000000; i++) {

      sum += exp( 0.00001 * (double)(i+j) );

    }

 }

return sum;

};

 

int main()

{

  struct timeval t1, t2;

  gettimeofday(&t1, NULL);

  double sume =expsum();

  gettimeofday(&t2, NULL);

  std::cout << (1000000.0 * (t2.tv_sec-t1.tv_sec)+ t2.tv_usec-t1.tv_usec) << " us" <<std::endl;

  return 0;

};

 

We could use xlc to compile this code:

xlC ex.cpp -qsmp=omp -o testomp

 

XL option -qsmp=omp enables OpenMP support.

 

Use below script to test best thread number for given environment variables setting:

export OMP_WAIT_POLICY=ACTIVE

export OMP_STACKSIZE=8000000B

export OMP_SCHEDULE=static

export OMP_PROC_BIND=TRUE

export OMP_THREAD_LIMIT=192

xlC ex.cpp -qsmp=omp -o testomp

for num in `seq 8 8 192`; do

export OMP_NUM_THREADS=$num;

echo -n "thread num ${num} : "

./testomp

done

 

The result could look like:

thread num 8 : 711816 us

thread num 16 : 367802 us

thread num 24 : 260911 us

thread num 32 : 397592 us

thread num 40 : 308478 us

thread num 48 : 346548 us

thread num 56 : 376139 us

thread num 64 : 421499 us

thread num 72 : 567410 us

thread num 80 : 474831 us

thread num 88 : 520806 us

thread num 96 : 507987 us

thread num 104 : 520821 us

thread num 112 : 539584 us

thread num 120 : 741001 us

thread num 128 : 914421 us

thread num 136 : 820430 us

thread num 144 : 723640 us

thread num 152 : 771098 us

thread num 160 : 747458 us

thread num 168 : 679829 us

thread num 176 : 856364 us

thread num 184 : 696850 us

thread num 192 : 949714 us

 

From this result, we can see that for this usage, the best thread number is 24. You could use this method to get the best thread number for your openmp application.


#C/C++andFortran
#IBM-XL-C,-C++,-and-Fortran-Compilers-for-POWER-servers-blog
0 comments
0 views

Permalink