C/C++ and Fortran

 View Only

tprof listing annotation with IBM Open XL compilers for AIX

By JINSONG JI posted Tue November 30, 2021 10:49 PM

  

Introduction

Identifying and resolving performance-related issues is a routine if you are programming or porting some open source applications. I believe you should be familiar with using perf annotate[1] on Linux to identify the bottlenecks on particular instructions. On AIX, we can also do similar annotated instruction level profiling with tprof[2], with -L objectlist option[3]. However, the design of tprof was different from perf on Linux, so it was designed and implemented to rely on compiler listing files instead of generating annotated listing on its own. Such limitation brings an unique challenge for users using  compilers other than legacy XL compiler. 

With the introduction of the new IBM Open XL C/C++ and Fortran for AIX[4], We can take advantage of Clang/LLVM technology combined with industry-leading optimizations built into IBM Power10 to deliver the best possible performance for your applications now on AIX. But since Open XL is also a different compiler than legacy XL compiler, it does not generate the same listing files as legacy XL compiler, so users won't be able to run annotated profiling with tprof on AIX by default.

This blog introduces the ibm-gen-list tool that can help resolving the limitation of tprof.
The ibm-gen-list tool is now provided in tools directory in Open XL 17.1.0.1 (2021 Dec PTF).

Example:

To make it easier for follow, we use the single source benchmark ackermann.cpp from llvm-test-suite [5] as the example. In this example, we assume you have Open XLC installed and the PATH variable updated correctly to pick up the Open XL compilers.

#include <iostream>
#include <stdlib.h>

using namespace std;

int Ack(int M, int N) { return(M ? (Ack(M-1,N ? Ack(M,(N-1)) : 1)) : N+1); }

int main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 11
#else
#define LENGTH 12
#endif
    int n = ((argc == 2) ? atoi(argv[1]) : LENGTH);

    cout << "Ack(3," << n << "): " << Ack(3, n) << endl;
    return(0);
}

Step 1:  Compile and link the binary using Open XLC.


This is the normal build step you should use in your applications. You don't need to add any additional options for generating the listings. But of course, if you want the source code line no in the listing files, you should also add -g in the compile command.

$ibm-clang++_r ackermann.cpp -o ack


Step 2:  Generate the listing files from binary using ibm-gen-list.

With legacy XL, if you want to do tprof listing annotation, you have to recompile and link the source with -qlist. It may actually cost you some build time. Also it may cause some problem if you don't have the source code and build commands handy, eg: when you run into performance problem with a prebuilt binaries.
With ibm-gen-list, you can actually generate the listing files without recompilation!  It can be used to any binaries (and objects) directly.

$ ibm-gen-list --objdump=objdump ./ack


Note:
1. ibm-gen-list depends on objdump tool to disassemble the binary or objects. You can either install the GNU objdump from AIX Toolbox for Linux Applications [6], or use llvm-objdump[7].  The command line above assume that you have set the PATH to the corresponding objdump.
2. ibm-gen-list also depends on optional llvm-dwarfdump[8] tool to support line number in the listing. The line numbers in the listing files will be all 0 without it, and you will get a warning: llvm-dwarfdump not found, line no will be all 0.
3. You can see the options to replace the objdump or dwarfdump with ibm-gen-list -h .

Step 3:  Run tprof to annotate the listing file

Once you get the listing files from ibm-gen-list, you can use it as usual with -L now.

$ tprof -r ack -u -l -s -Z -L a.lst -x ./ack
...
Starting Command ./ack 
Ack(3,12): 32765
stopping trace collection.
Generating ack.prof
Generating ack.a._Z3Ackii.alst
Note:
You should get at least one annotated listing file(.alst) and see the log similar to  "Generating ack.a._Z3Ackii.alst". If not, then please double check whether you are using the tprof command line correctly. 

Step 4:  Inspect the annotated listing file

Now you should get a annotated listing files similar to below. The Ticks will be shown in corresponding instructions, so you will be able to inspect the performance issues in instruction level now.
        Total Ticks for _Z3Ackii = 1342

      -        0| 000000 PDEF ._Z3Ackii
      -        0|        PROC
      -        0|        PROC
    121        0| 000000 mflr         7c0802a6   0     mflr r0
      -        0| 000004 std          fbe1fff8   0     std r31,-8(r1)
      -        0| 000008 std          f8010010   0     std r0,16(r1)
      -        0| 00000C stdu         f821ff61   0     stdu r1,-160(r1)
      -        0| 000010 mr           7c3f0b78   0     mr r31,r1
      -        0| 000014 std          f89f0088   0     std r4,136(r31)
    286        0| 000018 mr           7c641b78   0     mr r4,r3
      2        0| 00001C ld           e87f0088   0     ld r3,136(r31)
     29        0| 000020 stw          909f0094   0     stw r4,148(r31)
      -        0| 000024 stw          907f0090   0     stw r3,144(r31)
      -        0| 000028 lwz          807f0094   0     lwz r3,148(r31)
      -        0| 00002C cmpwi        2c030000   0     cmpwi r3,0
    136        0| 000030 beq          4182006c   0     beq 10000075c <._Z3Ackii+0x9c>
      -        0| 000034 lwz          807f0094   0     lwz r3,148(r31)
      -        0| 000038 addi         3863ffff   0     addi r3,r3,-1
      -        0| 00003C stw          907f0084   0     stw r3,132(r31)
      1        0| 000040 lwz          807f0090   0     lwz r3,144(r31)
      -        0| 000044 cmpwi        2c030000   0     cmpwi r3,0
      -        0| 000048 beq          41820028   0     beq 100000730 <._Z3Ackii+0x70>
     30        0| 00004C lwz          807f0094   0     lwz r3,148(r31)
      -        0| 000050 lwz          809f0090   0     lwz r4,144(r31)
      -        0| 000054 addi         3884ffff   0     addi r4,r4,-1
      -        0| 000058 extsw        7c6307b4   0     extsw r3,r3
      -        0| 00005C extsw        7c8407b4   0     extsw r4,r4
      8        0| 000060 bl           4bffffa1   0     bl 1000006c0 <._Z3Ackii>
     49        0| 000064 nop          60000000   0     nop
      -        0| 000068 stw          907f0080   0     stw r3,128(r31)
      -        0| 00006C b            48000010   0     b 10000073c <._Z3Ackii+0x7c>
(  662)        0|                              ._Z3Ackii+0x70:
      -        0| 000070 li           38600001   0     li r3,1
      -        0| 000074 stw          907f0080   0     stw r3,128(r31)
      -        0| 000078 b            48000004   0     b 10000073c <._Z3Ackii+0x7c>
      -        0|                              ._Z3Ackii+0x7c:
    116        0| 00007C lwz          807f0084   0     lwz r3,132(r31)
      -        0| 000080 lwz          809f0080   0     lwz r4,128(r31)
      -        0| 000084 extsw        7c6307b4   0     extsw r3,r3
      -        0| 000088 extsw        7c8407b4   0     extsw r4,r4
      -        0| 00008C bl           4bffff75   0     bl 1000006c0 <._Z3Ackii>
     59        0| 000090 nop          60000000   0     nop
      1        0| 000094 stw          907f007c   0     stw r3,124(r31)
      -        0| 000098 b            48000010   0     b 100000768 <._Z3Ackii+0xa8>
(  176)        0|                              ._Z3Ackii+0x9c:
    324        0| 00009C lwz          807f0090   0     lwz r3,144(r31)
      -        0| 0000A0 addi         38630001   0     addi r3,r3,1
      -        0| 0000A4 stw          907f007c   0     stw r3,124(r31)
(  324)        0|                              ._Z3Ackii+0xa8:
    138        0| 0000A8 lwz          807f007c   0     lwz r3,124(r31)
      -        0| 0000AC extsw        7c6307b4   0     extsw r3,r3
     10        0| 0000B0 addi         382100a0   0     addi r1,r1,160
​

References:

  1. Analyzing performance with perf annotate https://developer.ibm.com/tutorials/l-analyzing-performance-perf-annotate-trs/
  2. Annotation listing in tprof command https://developer.ibm.com/tutorials/annotation-listing-in-tprof/
  3. tprof Command: -L objectlist  https://www.ibm.com/docs/en/aix/7.2?topic=t-tprof-command#tprof__row-d3e109712
  4. Next generation of IBM C/C++ and Fortran compilers are now available on IBM AIX https://developer.ibm.com/blogs/next-gen-of-c-and-fortran-compilers-available-on-aix/
  5. ackermann.cpp https://github.com/llvm/llvm-test-suite/blob/main/SingleSource/Benchmarks/Shootout-C++/ackermann.cpp
  6. AIX Toolbox for Linux Applications https://www.ibm.com/support/pages/aix-toolbox-linux-applications-overview
  7. llvm-objdump https://github.com/llvm/llvm-project/tree/main/llvm/tools/llvm-objdump
  8. llvm-dwarfdump https://github.com/llvm/llvm-project/tree/main/llvm/tools/llvm-dwarfdump
0 comments
80 views

Permalink