Introduction
Identifying and resolving performance-related issues is a routine if you are programming or porting some open source applications. I believe you should be familiar with using perf annotate[1] on Linux to identify the bottlenecks on particular instructions. On AIX, we can also do similar annotated instruction level profiling with tprof[2], with
-L objectlist option[3]. However, the design of tprof was different from perf on Linux, so it was designed and implemented to rely on compiler listing files instead of generating annotated listing on its own. Such limitation brings an unique challenge for users using compilers other than legacy XL compiler.
With the introduction of the new IBM Open XL C/C++ and Fortran for AIX[4], We can take advantage of Clang/LLVM technology combined with industry-leading optimizations built into IBM Power10 to deliver the best possible performance for your applications now on AIX. But since Open XL is also a different compiler than legacy XL compiler, it does not generate the same listing files as legacy XL compiler, so users won't be able to run annotated profiling with tprof on AIX by default.
This blog introduces the
ibm-gen-list tool that can help resolving the limitation of tprof.
The
ibm-gen-list tool is now provided in
tools directory in Open XL
17.1.0.1 (2021 Dec PTF).
Example:
To make it easier for follow, we use the single source benchmark ackermann.cpp from llvm-test-suite [5] as the example. In this example, we assume you have Open XLC installed and the PATH
variable updated correctly to pick up the Open XL compilers.
#include <iostream>
#include <stdlib.h>
using namespace std;
int Ack(int M, int N) { return(M ? (Ack(M-1,N ? Ack(M,(N-1)) : 1)) : N+1); }
int main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 11
#else
#define LENGTH 12
#endif
int n = ((argc == 2) ? atoi(argv[1]) : LENGTH);
cout << "Ack(3," << n << "): " << Ack(3, n) << endl;
return(0);
}
Step 1: Compile and link the binary using Open XLC.
This is the normal build step you should use in your applications. You don't need to add any additional options for generating the listings. But of course, if you want the source code line no in the listing files, you should also add -g in the compile command.
$ibm-clang++_r ackermann.cpp -o ack
Step 2: Generate the listing files from binary using ibm-gen-list.
With legacy XL, if you want to do tprof listing annotation, you have to recompile and link the source with -qlist. It may actually cost you some build time. Also it may cause some problem if you don't have the source code and build commands handy, eg: when you run into performance problem with a prebuilt binaries.
With ibm-gen-list, you can actually generate the listing files without recompilation! It can be used to any binaries (and objects) directly.
$ ibm-gen-list --objdump=objdump ./ack
Note:
1. ibm-gen-list depends on objdump tool to disassemble the binary or objects. You can either install the GNU objdump from AIX Toolbox for Linux Applications [6], or use llvm-objdump[7]. The command line above assume that you have set the PATH to the corresponding objdump.
2. ibm-gen-list also depends on optional llvm-dwarfdump[8] tool to support line number in the listing. The line numbers in the listing files will be all 0 without it, and you will get a warning: llvm-dwarfdump not found, line no will be all 0.
3. You can see the options to replace the objdump or dwarfdump with ibm-gen-list -h .
Step 3: Run tprof to annotate the listing file
Once you get the listing files from ibm-gen-list, you can use it as usual with -L now.
$ tprof -r ack -u -l -s -Z -L a.lst -x ./ack
...
Starting Command ./ack
Ack(3,12): 32765
stopping trace collection.
Generating ack.prof
Generating ack.a._Z3Ackii.alst
Note:
You should get at least one annotated listing file(.alst) and see the log similar to "Generating ack.a._Z3Ackii.alst". If not, then please double check whether you are using the tprof command line correctly.
Step 4: Inspect the annotated listing file
Now you should get a annotated listing files similar to below. The Ticks will be shown in corresponding instructions, so you will be able to inspect the performance issues in instruction level now.
Total Ticks for _Z3Ackii = 1342
- 0| 000000 PDEF ._Z3Ackii
- 0| PROC
- 0| PROC
121 0| 000000 mflr 7c0802a6 0 mflr r0
- 0| 000004 std fbe1fff8 0 std r31,-8(r1)
- 0| 000008 std f8010010 0 std r0,16(r1)
- 0| 00000C stdu f821ff61 0 stdu r1,-160(r1)
- 0| 000010 mr 7c3f0b78 0 mr r31,r1
- 0| 000014 std f89f0088 0 std r4,136(r31)
286 0| 000018 mr 7c641b78 0 mr r4,r3
2 0| 00001C ld e87f0088 0 ld r3,136(r31)
29 0| 000020 stw 909f0094 0 stw r4,148(r31)
- 0| 000024 stw 907f0090 0 stw r3,144(r31)
- 0| 000028 lwz 807f0094 0 lwz r3,148(r31)
- 0| 00002C cmpwi 2c030000 0 cmpwi r3,0
136 0| 000030 beq 4182006c 0 beq 10000075c <._Z3Ackii+0x9c>
- 0| 000034 lwz 807f0094 0 lwz r3,148(r31)
- 0| 000038 addi 3863ffff 0 addi r3,r3,-1
- 0| 00003C stw 907f0084 0 stw r3,132(r31)
1 0| 000040 lwz 807f0090 0 lwz r3,144(r31)
- 0| 000044 cmpwi 2c030000 0 cmpwi r3,0
- 0| 000048 beq 41820028 0 beq 100000730 <._Z3Ackii+0x70>
30 0| 00004C lwz 807f0094 0 lwz r3,148(r31)
- 0| 000050 lwz 809f0090 0 lwz r4,144(r31)
- 0| 000054 addi 3884ffff 0 addi r4,r4,-1
- 0| 000058 extsw 7c6307b4 0 extsw r3,r3
- 0| 00005C extsw 7c8407b4 0 extsw r4,r4
8 0| 000060 bl 4bffffa1 0 bl 1000006c0 <._Z3Ackii>
49 0| 000064 nop 60000000 0 nop
- 0| 000068 stw 907f0080 0 stw r3,128(r31)
- 0| 00006C b 48000010 0 b 10000073c <._Z3Ackii+0x7c>
( 662) 0| ._Z3Ackii+0x70:
- 0| 000070 li 38600001 0 li r3,1
- 0| 000074 stw 907f0080 0 stw r3,128(r31)
- 0| 000078 b 48000004 0 b 10000073c <._Z3Ackii+0x7c>
- 0| ._Z3Ackii+0x7c:
116 0| 00007C lwz 807f0084 0 lwz r3,132(r31)
- 0| 000080 lwz 809f0080 0 lwz r4,128(r31)
- 0| 000084 extsw 7c6307b4 0 extsw r3,r3
- 0| 000088 extsw 7c8407b4 0 extsw r4,r4
- 0| 00008C bl 4bffff75 0 bl 1000006c0 <._Z3Ackii>
59 0| 000090 nop 60000000 0 nop
1 0| 000094 stw 907f007c 0 stw r3,124(r31)
- 0| 000098 b 48000010 0 b 100000768 <._Z3Ackii+0xa8>
( 176) 0| ._Z3Ackii+0x9c:
324 0| 00009C lwz 807f0090 0 lwz r3,144(r31)
- 0| 0000A0 addi 38630001 0 addi r3,r3,1
- 0| 0000A4 stw 907f007c 0 stw r3,124(r31)
( 324) 0| ._Z3Ackii+0xa8:
138 0| 0000A8 lwz 807f007c 0 lwz r3,124(r31)
- 0| 0000AC extsw 7c6307b4 0 extsw r3,r3
10 0| 0000B0 addi 382100a0 0 addi r1,r1,160
References:
- Analyzing performance with perf annotate https://developer.ibm.com/tutorials/l-analyzing-performance-perf-annotate-trs/
- Annotation listing in tprof command https://developer.ibm.com/tutorials/annotation-listing-in-tprof/
- tprof Command: -L objectlist https://www.ibm.com/docs/en/aix/7.2?topic=t-tprof-command#tprof__row-d3e109712
- Next generation of IBM C/C++ and Fortran compilers are now available on IBM AIX https://developer.ibm.com/blogs/next-gen-of-c-and-fortran-compilers-available-on-aix/
- ackermann.cpp https://github.com/llvm/llvm-test-suite/blob/main/SingleSource/Benchmarks/Shootout-C++/ackermann.cpp
- AIX Toolbox for Linux Applications https://www.ibm.com/support/pages/aix-toolbox-linux-applications-overview
- llvm-objdump https://github.com/llvm/llvm-project/tree/main/llvm/tools/llvm-objdump
- llvm-dwarfdump https://github.com/llvm/llvm-project/tree/main/llvm/tools/llvm-dwarfdump