Enterprise Linux

 View Only

Idle-gating for latency-sensitive workloads

By Parth Shah posted Fri March 05, 2021 05:21 AM

  

Idle-gating for latency-sensitive workloads

Author: Parth Shah

Background

Today's processors incorporate idle states to save power when the CPU has no workload to run. These idle states are responsible for power gating the processor's circuitry thereby helping save power for the particular CPU or core. It takes time to turn these idle states on and off, and these latencies can be observed during workload performance. Various methods have been invented and deployed in the Linux® kernel to maintain a better trade-off between the power consumption and performance of the workload. One such technique is the use of the CPUIdle driver which is a framework in the Linux kernel to predict the idle-durations of a particular CPU at any point in time and select an appropriate idle-state. These methods are proven worthy for many workloads, however, using this approach fails for a certain class of workload like latency-sensitive workloads.

Today, to provide full performance for latency-sensitive workloads, the system admin usually turns off all the idle states in a processor, and this results in no power saving. Though turning off these idle states gives no penalty in performance, it usually comes at the cost of no power savings and lowers the power budget of the system which in turn lowers the Turbo frequencies by the processor. To solve this ever-going issue of when to use idle-states and when not to, the idle-gating technique is proposed which disables idle states of only a few CPU where latency-sensitive workloads are being/will be executed. This work is tested against two entities and indicates a better-suited approach to save power and still maintain performance at its peak capacity. The current method for achieving the trade-offs between power consumption and performance includes two important methodologies: Linux-driver menu governor and performance mode.

Menu governor

Menu governor is the out-of-box CPUIdle governor used by Linux on IBM® POWER® processors to predict the idle duration of a CPU at any point in time. It is built upon a heuristics that uses the sliding window technique for keeping track of the idle period of a CPU, predicts the next idle duration based on this history, combines this result with entry/exit latency of every available idle state, and selects the best-suited idle state from it.

Though this is a good trade-off mechanism, the wrong prediction in idle duration may result in performance regression for latency-sensitive workloads.

Performance mode

System admins, several times, disable all the idle states of every CPU in a processor to reduce penalty incurred with entering/exiting the idle states. This results in no performance penalty but also no power savings and hence no turbo frequency benefits to the system.


Idle-gating

Unlike menu governor, idle-gating uses a Linux kernel scheduler to decide when a latency-sensitive task will be scheduled on the CPU and uses this information to disable idle-states on those specific CPUs only.
Latency-nice is a user-space to kernel-space interface to provide latency requirement of a process to the Linux scheduler, which in turn can use this information to better optimize the system, thereby, reducing latency for the given workload. The proposed work uses this latency-nice interface to mark latency-sensitive workloads and provide that information to the Linux scheduler, which then understands the scheduling pattern of the workload and figures out where this latency-sensitive workload will be scheduled next and disables the idle state on that CPU. Because the scheduler is responsible for the placement of the task in the system, it is possible to decide on the CPU that should disable its idle states such that the executing workload would face no latency penalty.

Working principle

Linux allows various ways for a user to communicate information to the kernel during runtime. Latency-nice currently supports system call to change the property of a single process or the use of control groups (also known as cgroup) to change the property of multiple processes. The latency-sensitive workloads are first marked from the userspace with the help of a latency-nice interface (using either system call or cgroup). After the task is marked latency-sensitive, the Linux scheduler finds a CPU that will be used by such tasks by looking at the following scheduling decisions:

  • During process wakeup from sleep or a newly created process, Linux scheduler decides on the best CPU where such process can be scheduled. After deciding, the scheduler can disable all idle states of this CPU.
  • To effectively balance load uniformly in the system, the scheduler can migrate a task from one CPU to another. During the migration, if the process to be migrated is latency-sensitive then the idle states are disabled on the target CPU.
  • In both the above scheduling activity, the scheduler keeps the count of the latency-sensitive processes scheduled on the CPU, and when such process gets migrated or dies then the count is appropriately decremented as well. After all the latency-sensitive processes are either scheduled out or died, the scheduler invokes the CPUIdle governor to enable all the idle states again, thereby, allowing the corresponding CPU to go to idle states and save power. The fully working set of patches is available on lkml.net


Results

The proposed methodology is compared against two parameters, that is, latency improvement and power consumption. The CPUIdle menu governor is the state-of-the-art governor to save power and still get good performance, while the performance mode is a system with all the idle states disabled so that latency observed is minimal at the cost of maximum power consumption.

The methodology is tested on the IBM POWER9™ processor-based system where two sockets are available but only the first socket is used for benchmarks, whereas, the second socket is dedicated to the PostgreSQL client traffic generator. The work demonstrates the power consumption of the system and the performance parameter with two latency-sensitive benchmarks namely schbench and PostgreSQL provided pgbench.


Schbench

Schbench is a CPU latency-measurement benchmark that measures the time taken by a CPU from creating a process to allocating a CPU to run. This benchmark can mimic various scheduling patterns, including the scheduling patterns of the in-memory databases such as SAP HANA. Schbench is flexible enough to work in a single-thread or in a multi-thread mode. Figure 1 shows the result with these two variants of schbench and shows the latency observed by the benchmark and the average power consumed during the benchmark run.

Latency observed with schbench

The multi-thread mode (in total 88 threads), with the proposed methodology of idle-gating, exhibits a 94% decrease in observed latency compared to the state-of-the-art menu governor with just a 20% increase in overall power consumption. Whereas, the performance mode shows a 95% decrease in latency compared to menu governor, and the power consumption is 31% higher.

schbench_power.pngOn the other hand, the single-thread mode shows a 98% decrease in latency and 17% increase in power with idle-gating, which is much better than the performance mode that has exactly the same latency, but the overall power consumption of the system is 30% more.


PostgreSQL

PostgreSQL is an open-source database widely used in cloud computing and comes with a benchmarking tool, pgbench to mimic a TPC-B like workload and measures the average latency observed for multiple queries initiated against the PostgreSQL server. pgbench allows parameterizing the number of clients to parallelly make a request to the server. This work configures the PostgreSQL server to run on the first socket of the system and the single-thread or multi-thread (44 clients in parallel) pgbench clients to run in the second socket.

Figure 3 shows that idle-gating for a single-thread mode can exhibit up to 81% reduction in average latency as compared to the menu governor with just a 17.9 Watts increase in power consumed. Whereas, the performance mode shows 85% reduction in latency but at the cost of 21watts more power consumption.



Multi-thread variant shows around 40% decrease in average latency with idle-gating when compared to menu governor and only 3 watts of extra power consumed, which is much better than 80% reduction in latency with 19 watts additional power consumed in performance mode.
pgbench_power.png

Further reading

Ware, Malcolm, et al. "Architecting for power management: The IBM POWER7® approach." HPCA-16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture. IEEE, 2010
https://ieeexplore.ieee.org/abstract/document/5416627/

Pallipadi, Venkatesh, Shaohua Li, and Adam Belay. "CPUIdle: Do nothing, efficiently." Proceedings of the Linux Symposium. Vol. 2. Citeseer, 2007, https://www.kernel.org/doc/ols/2007/ols2007v2-pages-119-126.pdf

Michael Larabel, “IBM Working On More Linux CPU Power Usage Optimizations For Latency-Sensitive Workloads, 2020”, https://www.phoronix.com/scan.php?page=news_item&px=IBM-Latency-Sensitive-Idle

Parth Shah, “Introduce per-task latency_nice for scheduler hints, LWN.net, 2020” https://lwn.net/Articles/813593/

Parth Shah, “IDLE gating in presence of latency-sensitive tasks” https://lwn.net/Articles/819784/


Contacting the Enterprise Linux on Power Team
Have questions for the Enterprise Linux on Power team or want to learn more? Follow our discussion group on IBM Community Discussions.

0 comments
13 views

Permalink