Originally posted by: AliceYing
My previous blog introduces why TOC overflow occurs and what the general solutions are to solve the problem. Now it is time to describe the solutions in detail.
Two general solutions
As the matter of fact, the best way to handle TOC overflow is to reduce the number of global symbols so that the required number of TOC entries is reduced. But how to implement the solution?
Modify the source code to make it more “entry-saving”. You may need to remove unnecessary global variables and functions, mark them as static, group global symbols in structures, and so on. The solution is based on the assumption that you are very familiar with the code, or you don't mind such time-consuming work. A disadvantage is that the manual changes are error prone, isn't it?
If you don't want to manually modify code, let the compiler do the work automatically. The compiler can bring you the optimization result that is similar to what can be done through source changes without widespread manual source changes. This can be achieved when optimization level O4 and O5 are used at link time or when the -qipa option is applied at both the compile and link time. The -qipa option enables or customizes a class of optimizations known as interprocedural analysis (IPA). Please refer to related documentation if you want to know more about this powerful option.
Apply the -qminimaltoc option. In fact, the option enables the compiler to create a separate table that stores the global symbol addresses for each source file. In this way, there is only one TOC entry for each compilation unit. However, be noted that this option can be invalid if the the number of global symbols in a single source file exceeds the number limit of the addresses TOC can store. A heads-up here: the use of -qminimaltoc introduces indirect reference and increases the time to access a global symbol. Furthermore, the memory requirements for the application might increase. Be careful when you are using the option, because you might experience low performance. Please see the performance comparison of options that are described in other paragraphs of this blog.
As the previous blog states, enlarging TOC access range is also a solution to handle TOC overflow. The maximum 16-bit offset on IBM PowerPC supports a large TOC of 64 K TOC regions. With a maximum of 64 K entries in each TOC region, a large TOC can be 4 GB. It creates a limit of 1 G global symbols in a 32-bit environment and 500 M in a 64-bit environment.
Here you can view the option candidates that you can apply, but be careful to choose the most appropriate ones with compilation performance taken into your consideration.
Specify the -bbigtoc option. This is a linker option and it increases the total TOC capacity by creating extended TOC regions that can be one or more of 64 K regions. If a global symbol resides in the extended TOC, the linker will compute the symbol's location in two steps: to locate the extended TOC region first and then compute the location within the extended TOC.
Specify the -qpic=large option. This is a strongly recommended option, because it generates more efficient code compared with the -bbigtoc option. No matter whether TOC overflow occurs, all the symbols that include the ones in the base and extended TOC require an extra instruction to compute their addresses.
Now you have multiple choices to solve the TOC overflow problem. The next question from you would be how I decide which one to apply? The answer is that the absolutely best solution is just lying there. You must be smart enough to pick the one that fits you, on the basis of performance requirements while you are solving the TOC overflow problem.
The considerations of these options are documented from the performance aspect. Remember to minimize negative effects on runtime performance when you are applying the options.
Generally speaking, it is a powerful option, because it can reduce TOC pressure a lot and eliminates TOC overflow times. The interprocedural analysis is implied to reduce the number of global symbols. Keep it on mind that the higher optimization level runs a more aggressive TOC requirements, but at the cost of longer processing time.
The advantage of specifying this option takes effect when your source code contains performance insensitive global symbols only. Specify the option with discretion if your program contains frequently executed code that are performance sensitive. The program could be larger and slower than it was, but still faster than using the -bbigtoc option.
If the global symbols to be accessed are stored in the base TOC, no additional performance pressure exists; however, if the global symbols are in the extended TOC, more instructions are required to compute and locate the address. Furthermore, the compiler can miss opportunities to optimize the program because the calculation is generated at link time. This option used to be quite useful to handle TOC overflow, but now a better option -qpic=large is recommended.
It is an option provides the best balance for accessing symbols in both the base and extended TOC. The only concern for performance is that two instructions are generated to access a symbol, regardless of whether TOC overflow occurs, but in general, the option can defeat the -bbigtoc option, as only a short latency occurs.