PowerVM

 View Only

Understanding Processor Utilization on PowerVM

By Pete Heyrman posted Tue January 25, 2022 05:23 PM

  

Understanding Processor Utilization on PowerVM

There are various tools that can be used to measure processor utilization like operating system (OS) tools, management console tools, third party tools and so on.  Frequently the PowerVM team is asked why there are differences in the reporting of utilization across the different components (PowerVM Hypervisor, HMC, OS).  This blog will give some insights into the various tools that exist and how the different tools measure utilization.  The concept of utilization can be difficult to explain so throughout this article we are going to compare measuring utilization to taking a ride in a taxi as an analogy for measuring utilization.

PowerVM Hypervisor measure of utilization

The hypervisor’s view of utilization is quite simple and is based on whether or not a physical processor is running work on behalf of the partition.  When a partition is dispatched to run on a physical core, the hypervisor records a starting timestamp.  When the partition either voluntarily or is forced to give up control of the physical core, the hypervisor records the ending timestamp.  The amount of time the partition ran on a physical core is what the hypervisor considers as utilization.  To the hypervisor, the utilization is the same if the OS is running various tasks or completely idle; as long as the partition is running on a physical processor, the hypervisor considers this as physical processor utilization.  

For our taxi ride example, to the taxi, it doesn’t matter if you’re waiting for a light to change or waiting in traffic, if the taxi is charging based on time, the meter is running throughout the ride.  Since the taxi is unavailable to other passengers, the taxi is utilized whenever you are in the taxi.

The PowerVM is a paravirtualized hypervisor where the operating systems (OS) are aware they are running on a hypervisor and cooperatively share resources.  When an OS has nothing useful to run, the OS cedes control of the physical threads in a core back to the hypervisor.  Since POWER8, each physical core is capable of running up to eight simultaneous threads of execution per core (SMT8).  When all eight threads cede control to the hypervisor, the hypervisor searches for other partitions that can utilize the hardware core.  At any point in time, one and only one partition can be running on a single core independent of SMT mode.


Power Hypervisor JSON Long Term Monitoring (LTM), HMC lslparutil command and PCM Data collection

Enabling data collection

For the HMC to collect utilization data, the HMC must be first configured to collect utilization data.  There are two ways to do this depending on how you want to use the utilization data.  If you want to use the LTM or HMC lslparutil command or run a tool that depends on this data such as lpar2rrd, you enable “Utilization Data” on the GUI or with the HMC chlparutil command.  When enabling, you can specify the interval that you want the data reported.  You can choose 0 (disabled), 30 seconds, 60 seconds (1 minute), 300 seconds (5 minutes), 1800 seconds (30 minutes) or 3600 seconds (1 hour) intervals.  If you want to use HMC Performance and Capacity Monitoring (PCM), or if you have IBM’s Power Systems Private Cloud with Shared Utility Capacity (Power Enterprise Pools 2.0), you enable PCM “Data Collection” on the GUI.  The LTM, HMC lslparutil command and PCM use the same utilization data collected from the hypervisor.

 
The following is some information about the interpretation of the utilization data collected from the hypervisor.  Output from the lslparutil command is shown here.  Note that LTM and PCM use this same data to generate its data.

Sample record

time=12/31/2021 12:54:00,event_type=sample,resource_type=lpar,sys_time=12/31/2021 12:56:01,time_cycles=10483701761333357,lpar_name=myLpar,lpar_id=5,curr_proc_mode=shared,curr_proc_units=0.1,curr_procs=1,curr_sharing_mode=uncap,curr_uncap_weight=128,curr_shared_proc_pool_name=sharedPool1,curr_shared_proc_pool_id=1,curr_5250_cpw_percent=0.0,mem_mode=ded,curr_mem=8192,entitled_cycles=903512532587178,capped_cycles=544774496832097,uncapped_cycles=508841215260085,shared_cycles_while_active=0,idle_cycles=577605656191032,total_instructions=5667939524378026,total_instructions_execution_time=1037526851164269

As you can see in these results, the cycles values can be quite large as these are maintained as incrementing units since the creation of the partition.  To analyze the data, you need to subtract the previous interval from the current interval to determine what was consumed over an interval of time.  Also, the hypervisor reports time in timebase units.  There are 512,000,000 timebase (TB) units per second.

Additional information about fields

time_cycles – This is a time stamp that provides an indication of when the sample was collected.  Again, the values are incrementing since the power on of the server.  So, any two intervals can be subtracted to calculate elapsed time by subtracting the previous sample’s time_cycles from the current sampled value.  The result is in timebase units so dividing the value by 512,000,000 would convert to seconds.

entitled_cycles – This is the cycles the partition is entitled to consume.  So, if this was a dedicated partition with 1 processor and the collection interval was 30 seconds, you would expect this to be approximately 30 seconds * 512,000,000 timebase units per second * 1 processor.  If this was a shared processor partition with 0.3 processor units of entitlement, this would be approximately 30 seconds * 512,000,000 timebase units * 0.3 processors. 

capped_cycles – This is the capped cycles the partition actually consumed.  For a dedicated partition or a shared partition where the sharing mode is capped, all the cycles will be reported as capped_cycles.

uncapped_cycles – If this is a shared uncapped partition, the partition can consume more than their entitled cycles if there are available cycles for the partition.

idle_cycles – Idle time is not a value that is calculated by the hypervisor but is a value that is reported by the operating system.  This is reported in timebase units and represents the time the OS was running on a physical processor just searching for any tasks that are ready to run.  So, processor cycles are being consumed but there is really nothing productive being done by the partition with those cycles.  For a dedicated partition without sharing when active (dedicated without donation), on an idle partition the idle_cycles reported can be very close to 100% of the capped_cycles.  For dedicated with sharing when active (dedicated donate) or for shared partitions, usually the OS will cede control of the physical CPU time to the hypervisor so other partitions can run on the physical cores.  Because these partitions cede their cycles, normally idle_cycles will be a small fraction of the consumed cycles.  Note, when a partition is rebooted, the idle_cycles start over at zero unlike the hypervisor values.

 

Calculations of utilization based on LTM or lslparutil output

Various calculations of utilization can be made from the raw data collected by the hypervisor and the partition and reported in the LTM or lslparutil output.  Remember, the values reported are continuously incrementing values so to measure utilization the data from two samples need to be subtracted.  For example, the amount of time between samples would be calculated by subtracting a previous sample’s time_cycles from the current sample’s value for time_cycles resulting in delta time_cycles.

 

If you want to know the amount of physical processor time consumed by a partition, the formula is:

 (delta capped_cycles   +   delta uncapped_cycles)   /   (delta time_cycles)

This is the calculation that is used by many service providers and IBM’s Power Enterprise Pools 2.0.  This can also be used as a measure of how utilized a server is from a physical CPU standpoint.  There is no subtraction of idle_cycles because these cycles are consumed by the running partition and are not available to other partitions.

If you want to know how much CPU capacity/throughput might be available for a given partition, the formula is:

(delta capped_cycles   +   delta uncapped_cycles   –   delta idle_cycles)     /

                           (delta time_cycles)

Idle cycles are subtracted in this situation as these represent additional productive capacity that is available to the partition that is being consumed on the physical processors but is not benefiting the completion of work within the partition.

Examples of lslparutil data

Scenario

Entitled Cycles

Capped Cycles

Uncapped Cycles

Idle Cycles

Hypervisor Utilization (Capped + Uncapped) / Time_cycles

HMC PCM utilization

(Capped + Uncapped – Idle) / Time_cycles

1

800

800

0

200

100%

75%

2

800

600

0

Small

75%

75%

3

800

400

0

Small

50%

50%

3

800

800

800

Small

200%

200%

 

Description of scenarios:

  1. This data would be typical of what would be seen for a dedicated processor partition. The Hypervisor utilization would be 100% as the processor(s) are dedicated to one and only one partition so entitled is equal to capped.  Since there are idle cycles, this additional capacity is available to the partition if additional work needs to run within the partition.
  2. This scenario is the same as the first scenario except the partition has been changed to share the processors when active (dedicated donate). In this case, when the OS has nothing to run, the processor is given back to the hypervisor to run work for other partitions.  So, the hypervisor recorded 600 cycles of time when the partition was running on a physical processor.  The additional 200 cycles could be consumed by other partitions on the server.  Note, the value for idle cycles is small since the OS gives the processor back to the hypervisor when there is nothing to run within the partition.
  3. Scenario 3 is a capped partition example that looks very similar to dedicated with donation.
  4. In scenario 4 this is an uncapped partition where the partition has consumed all of its entitlement and was given an additional 800 cycles of uncapped consumption. Because the capped + uncapped is greater than entitled, you see utilizations that exceed 100% utilization.

For our taxi ride example, some taxi companies charge by mileage (actual distance not time), so this is a different way to measure utilization of the taxi ride.

HMC Performance and Capacity Monitoring (PCM) behavior prior to V10R2M1040

In HMC versions before V10R2M1040, the HMC automatically subtracted idle cycles when displaying the utilization data on the HMC.

 
V10R2M1040 HMC Performance and Capacity Monitoring (PCM) enhancement

The HMC chhmc command has been enhanced to indicate if processor usage calculations should include or exclude idle cycles in the view that is displayed by the HMC.  The format of the command is:
chhmc -c pcmprocusage -s modify -- deductidle {on | off}
Specifying deductidle off would not subtract idle cycles so it is a measure of the time a physical processor is running work on behalf of the partition (PowerVM Hypervisor measure of utilization).  Specifying deductidle on will subtract idle cycles showing a view of available capacity of the partition.  The default is deductidle on and the current setting can be viewed with the lshmc command.

V10R3M1050 HMC Performance and Capacity Monitoring (PCM) enhancement

With the introduction of HMC 1050, the option to toggle between including or excluding idle cycles is built into the GUI instead of using the chhmc command .

Operating System Reported utilization

Operating systems (OS) also report CPU utilization via various tools.  In general, the OS tools tend to report how much throughput or capacity is available within a given partition which is different than physical CPU consumption reported by the hypervisor.  Many OSes consider things like idle cycles, simultaneous multithreading (SMT), variable processor frequencies and so on when calculating capacity. 

Consideration of SMT effects in OS utilization

The OS, with the help of the hardware, keeps track of how busy each SMT thread is and uses this information when reporting CPU utilization.  If there are 8 different threads of execution and if only one of the 8 threads are busy you might see processor utilization reported by the OS around 50% to indicate that for the given physical processor, there is additional capacity available for the applications to consume.  With 2 of the 8 threads active, the utilization might be reported as 70%.  At first look it might seem with 1 of 8 threads active the utilization should be reported as 12.5% but its more complicated than just dividing active threads by 8.  The utilization reported considers cache effects, internal processor resources and such to give a more accurate view of thread strength and remaining capacity.  As thread strength has improved with different generations of POWER processors, the hardware/software algorithms have been adjusted. 

For our taxi ride example, SMT is like the number of passengers in the taxi.  If you share a ride with a friend, the overall time or distance is the same, but you are better utilizing the service provided by the taxi by accomplishing more production over the same amount of time or distance.

 

Consideration of processor frequency in OS utilization

POWER systems can be configured to run at higher frequencies than the nominal frequency of the processors given the right conditions.  Assuming sufficient cooling and resources, the OS can report lower utilizations due to running at higher frequencies.    Similarly, processor utilization can rise if there is a decrease in frequency.  Different OSes and different tools treat these effects differently when reporting utilization.

 

Summary

Processor utilization is not just a single measurement that is consistent across all views of consumption (Hypervisor, HMC and Operating System).  It’s normal for the utilization to be different across these views as they have different perspectives in what is considered processor utilization.

Contacting the PowerVM Team

Have questions for the PowerVM team or want to learn more?  Follow our discussion group on LinkedIn IBM PowerVM or IBM Community Discussions

0 comments
617 views

Permalink