System resource estimation is one of the crucial steps in planning the requirements for running workloads of any nature, ranging from databases to scientific modeling. Let it be studying the workload characteristic or trying to analyze the system response under variant load, CPU utilization is the first metric at the surface level explaining how effectively the CPUs are utilized. In other words, it helps in understanding which of the tasks are using the maximum CPU resource and how much CPU capacity is being unused. Every system administrator have some favorite monitoring tools (such as top, vmstat, perf, ebpf, and many other) to visualize the resource utilized by the active tasks in the system. Over the years, Linux® kernel developers have contributed various optimizations and improvements in the area of tracking resource utilization for accurate reporting. Performance engineers and resource planners have always been demanding more details and improvements in resource tracking, especially with CPU consumed. One reason for lack of finer granular-level accounting, is lack of hardware support that enables operating systems to perform such accounting. IBM® Power Systems supports finer per-thread utilization using two CPU registers dedicated for CPU resource accounting. This blog introduces you to Processor Utilization Resource register (PURR) and Scalable Processor Utilization Resource register (SPURR) based CPU utilization metrics on Linux on IBM Power® logical partition (LPAR).
Linux kernel assumes that all the threads (CPUs) in a core with simultaneous multithreading (SMT) support consumes the CPU cycles at the same rate, when all of them are running a task. For example, consider each CPU core with a weight 100, then each CPU in an SMT 8 setup could have a weight of 100/8 (12.5%), no matter whether the CPU is running a program or is idle. To visualize it, I am taking an example of a dual-core IBM POWER9™ LPAR with SMT enabled, each core having eights CPUs resulting in 16 CPUs/threads in the LPAR.
As an experiment, the first four CPUs of each core is loaded with a simple CPU-intensive, single-threaded program that could burn the CPU cycles in a loop (see Example 1).
int main(int argc, char **argv)
$ gcc -o cpu-loop cpu-loop.c
Example 1. A simple single-thread CPU-intensive program
Note that the other four CPUs in the core are idle. Now, when the system is monitored for total resource utilization using any of the monitoring tools, you could see that the system is 50% busy and logically that is right. But the results may vary on the IBM Power LPAR, if the monitoring tool uses the PURR values to compute CPU utilization. The PURR registers, on Power Systems, provides an estimation based on how much share a CPU has consumed in comparison to other sibling CPUs in the same core. This might sound similar to how Linux kernel estimates the utilization based on the Time-Stamp Counter (TSC) metrics, the difference between them, is the technique used to estimate CPU share, that changes dynamically based on the number of CPUs idling at a given period of time in the system being monitored. Let us now run two instances of the program illustrated in Example 1, where only two CPUs get loaded with the same program used in Figure 1.
The monitoring tools that don’t read PURR registers for estimating system utilization might display the current system load as 16/4 or 25% used. This is not true on a Power LPAR, where the system usage could be close to 55%, thus under-estimating the consumption (see Figure 2). Under-estimation of an actual consumption gives a false sense of resource availability and might lead to an unresponsive system, in case the user adds more tasks based on the reported values. Let’s try guessing the usage percentage if a third instance of the example program is added to both of the cores. It would report up to 75% of utilization. Whereas, monitoring tools based on TSC metric would display the estimate to be close to 37%, that is 38 times lesser than the real utilization.
If you are wondering what would be the utilization reported by PURR if all the CPUs are running the sample program used in Example 1, the answer is simple. The reported CPU utilization is 100%. Let’s dive into more detailed explanation on why running two tasks reports 55% of utilization? Recall that the estimation dynamically changes based on the number of idle CPUs in the core, that is, available free cycles on the core. Every busy CPU in the core, competes with each other for the unused cycles from its idle sibling. But if you had observed, the usage of the busy CPUs gets capped at certain percentage, instead of allowing them to use all the unused CPU cycles. This assumption breaks the very idea that every thread can go beyond the 1/8th share of resource (CPU core) in an SMT=8 setup if all the threads are not busy.
Effective system utilization doesn't always refer to driving the system capacity to the maximum, but managing a good balance between performance and efficiency in terms of power management. Energy saving allows the CPU to either conserve energy by operating at a lower frequency or at higher frequency to drive more performance. This dynamic change in the CPU frequencies translates to either lesser or more number of execution cycles available on the CPU. The PURR do not account for this variation in frequencies and accumulates the CPU utilization numbers monotonically, at the rate at which the timebase register ticks. This again, brings back the question, how do we track the usage accurately, with modern processors performing power management? Power processor's Scaled Processor Utilization Resource Register (SPURR) is here for rescue. SPURR accumulates the value by scaling the timebase to current operating frequency. In other words, it simply varies the accumulation in proportion to the frequency at which the CPU is operating, by accumulating more than timebase register with higher frequency (performance) and less with lower frequency (efficiency). SPURR accounting for such variation, provides close to current system utilization in comparison to PURR metrics. It also means that, if the system frequency is not altered for power saving or performance, both the PURR and SPURR estimation will remain the same. Let’s compare the SPURR utilization estimation to values reported by PURR, when the CPU is programmed for performance.
Now, let's repeat the experiment in Figure 2, with a more elaborate setup. One instance of a test program in Example 1 is launched for every CPU in the core, with an interval of 10 seconds between two instances. This means that after launching the first instance, at the 70th second, the eight instance gets launched, loading the system to its capacity. By the 80th second, we go backwards, killing one instance for every 10 seconds. So, by 150th second, the system is back to the idle state. The results of the experiment is plotted as shown in Figure 3, where the CPU utilization metrics is collected from vmstat, lparstat with support for PURR, and SPURR based metrics.
If you see the line plotted by the vmstat, it is a staircase increasing with exactly the same utilization percentage with every new instance of the program loaded to the core. Whereas, the PURR disperse in relation to utilization reported by vmstat, but converges when running the eight instance (that is. while the system gets loaded to its 100% capacity by running one instance per CPU). vmstat relies on the /proc interface for capturing the CPU utilization, because PURR metrics is a summation of the utilization values read from the per-CPU PURR register. Let’s look at the line plotted by the SPURR metrics. The utilization is about 60% with two active CPUs, 85% with three active CPUs in the core, and 110% with all the CPUs running the test program. There is about 10% more consumption in comparison to PURR, when every CPU is running an instance of a test program. Yes, you guessed it right! It is the turbo mode where the CPUs are programmed for performance, and they reach their maximum possible operating frequency. IBM Power Systems can provide best-in-class power management support to drive efficiency across a broad spectrum of workloads.
The next question that arises is what are the Linux monitoring tools that can understand the values reported by PURR/SPURR registers on the LPAR and present them in a user-friendly format? Currently, the lparstat tool packaged with powerpc-utils version 1.3.8. It has been enhanced to display the CPU utilization metric based on the PURR/SPURR registers. It samples the total system load for a given time interval and displays both PURR and SPURR estimates, along with the traditional lparstat information on the Linux kernel exporting the PURR/SPURR values.
I would like to thank Subhathra Srinivasaraghavan, Nilesh Joshi, and Suresh Kodati for their continuous support and encouragement, while working on SPURR utilization accounting and Naveen N Rao, Madhavan Srinivasan for the review of this blog.
1. IBM Power Systems Performance Guide - Implementing and Optimizing https://www.redbooks.ibm.com/redbooks/pdfs/sg248080.pdf
2. Power Systems Enterprise Servers with PowerVM Virtualization and RAS https://www.redbooks.ibm.com/redbooks/pdfs/sg247965.pdf
3. Linux Load Averages: Solving the Mystery – Brendan Gregg’s Blog http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html
Contacting the Enterprise Linux on Power Team
Have questions for the Enterprise Linux on Power team or want to learn more? Follow our discussion group on IBM Community Discussions.