Originally posted by: GuoJiufu
This article was written by Jiufu Guo and has been copied over from the C/C++ Cafe. The original article can be found here.
Disclaimer: The content of article has not been modified from the original version and it may be out of date.
OpenMP is a widely used parallel programming method for shared memory multi-processing. As previous blog (OpenMP support in XL C/C++ and XL Fortran compilers for Linux on Power little endian) said, on little endian Linux on Power, you can start to use OpenMP with IBM XLC/C++ 13.1.2 to boost application performance.
For the OpenMP specification, you can get it from the OpenMP website: http://openmp.org/specifications/. For more information about the OpenMP features in the XL compilers, you can also reference XL compiler manual: http://www-01.ibm.com/support/knowledgecenter/SSXVZZ_13.1.2/com.ibm.xlcpp1312.lelinux.doc/proguide/cuppovrv.html?locale=en
You can parallelize your code with OpenMP directives and link with OpenMP libraries. In addition to the OpenMP directives, there are a few environment variables that can affect OpenMP application runtime behavior. This blog focuses on the commonly used OpenMP environment variables and how it affects the runtime behavior. The previous blog (OpenMP support in XL C/C++ and XL Fortran compilers for Linux on Power little endian) shows how the OMP_DISPLAY_ENV environment variable works. This blog describes some commonly used OpenMP environment variables. For other environment variables, please reference XL compiler manual: http://www-01.ibm.com/support/knowledgecenter/SSXVZZ_13.1.2/com.ibm.xlcpp1312.lelinux.doc/compiler_ref/ruomprun.html?locale=en and OpenMP specification. In XL compiler manual, you can also reference XLSMPOPTS which is used for loop parallelization control.
Below is a brief description of some commonly used environment variables.
- OMP_WAIT_POLICY: could be PASSIVE or ACTIVE. You could set OMP_WAIT_POLICY to ACTIVE, if the application is running on a dedicated machine. This setting allows the threads to keep searching for work without being put into sleep or yielding to other processes.
- OMP_STACKSIZE: controls the stack size of each OpenMP thread; it does not affect stack size of the initial thread. Setting the value too large may impact performance. Setting it too low may cause the program to crash.
- OMP_PROC_BIND: controls whether OpenMP threads can be moved between CPUs. Moving threads between CPUs may introduce overhead. If you run your application on dedicated CPUs, you could set OMP_PROC_BIND to TRUE and reference the number of CPUs to set OMP_NUM_THREADS .
- OMP_SCHEDULE: specifies schedule type and chunk size for loops that have OMP_SCHEDULE(RUNTIME) clause specified. Chunk size indicates how many iterations of the associated loops are divided into chunks. Schedule type could be auto, dynamic, guided and static. For example, with OMP_SCHEDULE=”dynamic,4”, the runtime will divide iterations in chunks of four iterations and set the scheduling type to “dynamic”.
- OMP_DYNAMIC: controls if the number of OpenMP threads is adjustable in a thread team. Adjusting threads may save system resources but it may have performance impact.
- OMP_NUM_THREADS: controls the maximum number of threads that could be created for the parallel regions that do not have NUM_THREADS clause specified. This variable could be larger than the number of CPUs, however it may impact performance of your applications.
With different environment variable settings, an OpenMP application can run with different runtime performance. For different use case, different settings would be needed. The OpenMP environment variable settings have direct impact to the application runtime performance. You may want to tune the settings in order to get better performance.
To get best performance, an appropriate thread number is needed. Too many threads can cause performance downgrade because time costs on openmp runtime threads management. Too few threads cannot take best advantage of parallel. For different application, user will have a test to get best thread number. Here is an example to get appropriate thread number.
Source code example:
ex.cpp:
#include <iostream>
#include <math.h>
#include <sys/time.h>
double expsum()
{
double sum=0.0;
#pragma omp parallel for
for (int j = 1; j < 160; j++) {
#pragma omp parallel for
for(int i=1; i<=1000000; i++) {
sum += exp( 0.00001 * (double)(i+j) );
}
}
return sum;
};
int main()
{
struct timeval t1, t2;
gettimeofday(&t1, NULL);
double sume =expsum();
gettimeofday(&t2, NULL);
std::cout << (1000000.0 * (t2.tv_sec-t1.tv_sec)+ t2.tv_usec-t1.tv_usec) << " us" <<std::endl;
return 0;
};
We could use xlc to compile this code:
xlC ex.cpp -qsmp=omp -o testomp
XL option -qsmp=omp enables OpenMP support.
Use below script to test best thread number for given environment variables setting:
export OMP_WAIT_POLICY=ACTIVE
export OMP_STACKSIZE=8000000B
export OMP_SCHEDULE=static
export OMP_PROC_BIND=TRUE
export OMP_THREAD_LIMIT=192
xlC ex.cpp -qsmp=omp -o testomp
for num in `seq 8 8 192`; do
export OMP_NUM_THREADS=$num;
echo -n "thread num ${num} : "
./testomp
done
The result could look like:
thread num 8 : 711816 us
thread num 16 : 367802 us
thread num 24 : 260911 us
thread num 32 : 397592 us
thread num 40 : 308478 us
thread num 48 : 346548 us
thread num 56 : 376139 us
thread num 64 : 421499 us
thread num 72 : 567410 us
thread num 80 : 474831 us
thread num 88 : 520806 us
thread num 96 : 507987 us
thread num 104 : 520821 us
thread num 112 : 539584 us
thread num 120 : 741001 us
thread num 128 : 914421 us
thread num 136 : 820430 us
thread num 144 : 723640 us
thread num 152 : 771098 us
thread num 160 : 747458 us
thread num 168 : 679829 us
thread num 176 : 856364 us
thread num 184 : 696850 us
thread num 192 : 949714 us
From this result, we can see that for this usage, the best thread number is 24. You could use this method to get the best thread number for your openmp application.
#C/C++andFortran#IBM-XL-C,-C++,-and-Fortran-Compilers-for-POWER-servers-blog