PowerVM

How does Affinity affect Server Performance

By Pete Heyrman posted Mon June 08, 2020 09:48 AM

  
PowerVM Performance


The processor and memory resources that are assigned to a partition can have a significant effect on the overall application performance on a system.  PowerVM selects the best possible resources for a partition within constraints that you define for each partition.  The following information should help you to make informed choices when defining and optimizing the performance of your Power server. 

Basic Partition Placement Information

When you define a partition, the partition is defined with a desired number of processors and desired amount of memory.  The PowerVM hypervisor tries to find resources to assign to that partition which provides the best possible performance.  When choosing the resources, the hypervisor tries to contain the resources to the same physical hardware boundaries (contain within a processor chip, contain within a module or contain without a CEC drawer).     

For dedicated processor partitions there is a direct mapping of the virtual processor to the physical core assigned to the partition.  For a shared processor partition, the entitled capacity is used by the hypervisor to determine the expected consumption of processor resources.  The number of virtual processors is ignored with respect to placement for shared processor partitions.

To decide on how to optimize memory placement, the hypervisor calculates the total amount of memory that needs to be allocated by adding together the following values:

  1. Desired memory specified by the user when defining the partition
  2. Memory required for the hardware page table (HPT)
  3. Memory required for other partition related objects.

 

The HPT is a table that is used by the operating system to translate from effective addresses to physical real addresses in the hardware.  The amount of memory for the HPT is based on the maximum memory size of the partition.  The default HPT ratio is either 1/64th of the maximum (for IBM i partitions) or 1/128th (for AIX, VIOS and Linux partitions) of the maximum memory size of the partition.  AIX, VIOS and Linux use many larger page size (16K, 64K and such) instead of using 4K pages.  By using larger page sizes this reduces the overall number of pages that need to be tracked so the overall size of the HPT can be reduced. 

AIX Example

For an AIX partition that was configured with 5.5 processor units, a desired size of 128GB and a maximum memory size of 256GB, the hypervisor, when creating the partition, would look for a single processor chip with 5.5 processor units available and a little more than 130GB (128GB+1/128 of 256GB+space for some related partition objects).  If there is a processor chip with this amount of free resources, the hypervisor would allocate the partition on that processor chip. If there is no single chip with the required amount of free resources, the hypervisor would try and spread the partition across multiple processor chips in the same module or CEC drawer.  Failing this placement, the hypervisor would end up spreading the partition over multiple modules/drawers. 

How to properly set the processor entitlement

For a dedicated processor partition, the number of processor cores assigned is defined by the value specified as the desired number of processors.  The dedicated processor partition is bound to these cores and uses these cores for dispatching of the applications.  For a shared processor partition, the specified desired entitlement defines how many cores are reserved for the partition from a partition placement point of view.  The entitlement should be set to a value that accurately represents the amount of processing resources that is normally being consumed when the workload is running.

For example, during normal business times, if the workload is consuming 5.5 cores of processing resources, you want to ensure the entitlement is set at least 5.5.  In most cases it is fine if the utilization spikes over the normal usage as there may be sufficient unused resources available within the chip or chips in the same module/drawer to handle the spike.  It is not a good practice to undersize the entitlement as this can cause affinity issues that result in degraded performance as illustrated in the following examples.  Assume a Power servers with 8 cores per processor chip and two total processor chips in the system.  Also assume you have 4 partitions (represented by different colors in the figure below) which have an average processor demand during business hours of 3.0 processor units.  If these are configured with an entitlement of 3.0 the hypervisor would place the partitions as follows:

Resource Assignment Correct Entitlement

From the layout, each of the colored partitions are placed entirely within a processor chip.  Also note that because the entitlement was 3.0, the hypervisor could only place two partitions per chip. This left some free cores available to absorb some spikes in the demand for processing resources.  Having some cores free within the chip improves affinity because spikes in demand can be handled with local resources.  For example, if the red partition was using 3 virtual processors which is in line with the 3.0 entitlement that was configured, the green partition could dispatch 5 virtual processors without having to dispatch the virtual processor on another processor chip.  Because of this behavior, some customer oversize the entitlement for critical partitions to ensure there is additional capacity to handle spikes in demand and ensure good affinity.

If the partition entitlement is undersized instead of properly sized, for example with 2.0 processors even though 3.0 is the normal demand, the placement of these same four partitions could be as follows:

Resource Assigment Minimum Entitlement

Because the capacity was under-configured the hypervisor may have packed all four partitions into a single processor chip.  In this situation if there really is demand for 3 virtual processors for each of the red, green, blue and purple partitions, the chip will be oversubscribed (i.e. there are 12 virtual processors to run on 8 processor cores simultaneously).  Since the chip is oversubscribed, this causes some virtual processors to be dispatched on other chips, reducing the affinity to the memory and results in a loss in performance.  Not only are the four original partitions affected but other partitions on other chips can also be affected.  These off chip dispatches could force other chips to be busier than expected which could affect partitions with correctly sized entitlement.  This example illustrates why it is important to correctly size the entitlement of shared processor partitions.

Unlicensed cores, unlicensed memory and affinity
Individual processor cores are either licensed or unlicensed.  The hypervisor will only dispatch virtual
processors to cores that are licensed as unlicensed cores are put into a state that conserves energy consumption.  When doing partition placement, the hypervisor, in many situations, is able to swap which cores are licensed and which cores are unlicensed to optimize the placement.


For example, if you have 2 processor chips, each chip has 4 cores licensed and 4 cores unlicensed and you create a 6 core partition, the hypervisor can switch the licensing of the cores.  This change would result in 6 licensed on one chip and only 2 licensed on the other chip so the 6 core partition can be contained in a single processor chip for better affinity.

The more cores that are licensed, the greater the probability that there is extra capacity available in the chip, module or drawer to handle spikes in CPU demand locally with better performance.  These extra cores can be permanent, elastic or utility cores.  Utility cores may be a good option as you can license extra cores but only pay for what is actually utilized by the applications.  Several customer have licensed all the dark cores using utility Capacity-On-Demand (COD) and have seen overall server-wide CPU utilization decrease due to the improvements in affinity.  If you want to control the actual amount of utility processing that is consumed, you can set caps on shared processor pools to limit the amount of CPU consumed by a group of partitions.

Unlicensed memory is handled as an upper bound on the amount of memory that can be assigned to partitions (i.e. all installed memory can be used for partition placement and there is no specific licensing/unlicensing on a DIMM basis).  The more memory that is installed in a server, the more likely that the hypervisor will be able to optimize the placement of the partitions.  When ordering a new system, to achieve the best possible affinity, it is a good practice to ensure there is memory installed behind each processor chip.  If you have processor chips without memory and the hypervisor is forced to use these processor chips for dispatching, the applications running on these processors will see all remote memory which can cause performance issues.

Displaying placement information

There are a couple different methods to determine how optimized the processors and memory assignments are for a partition.  One method is to use the HMC command line interface to issue the lsmemopt command.  The following is an example of the HMC output:

HMC lsmemopt example

 

The placement is scored on a scale of 0-100 with 100 being optimal placement.  The placement of lpar3 has a score of only 50, which would mean that around half of the processors or memory are spread across chips, modules or drawers and could be contained within a chip, module or drawer.  Note that the per partition affinity scoring is available for Power7 servers in firmware level FW780 and on all follow-on Power firmware levels.

For an AIX lpar, the lssrad command will report the virtualized resources assigned to the partition.  The following is an example of the lssrad output:

lssrad example output

 

The REF1 entries represent physical drawers or modules in the server.  Within the REF1 domains are the individual processor chips with memory.  In this example, logical drawer 0 contains chip 0 and 1.  The logical CPUs represent the SMT hardware threads that are available to the partition for dispatching processes. 

In this example, the partition is configured to run SMT4 mode so virtual processors 0, 1 and 2 are assigned to logical chip 0.  In this case, there are a total of 64 logical CPUs and since this is SMT4 this implies 16 desired virtual processors.  Note, if this is a shared processor partition, what is displayed by the operating system is where the hypervisor will prefer to dispatch the virtual processors. 

As a result of processor over-commitment, the hypervisor may dispatch a shared virtual processor on any eligible physical core in the server on any given dispatch.  In the output, you can see that each processor has memory assigned so there is local memory that can be used for processes running on each hardware thread. 

For Linux partitions, the numactl –hardware command will show similar information to lssrad.

Dynamic LPAR and Dynamic Platform Optimizer (DPO)

If you are making dynamic changes or rebooting partitions with changes to entitlement and memory, the assignment of processors and memory may not be optimal.

For example, if the partition was originally created with 8 cores, the hypervisor may have optimized the placement such that all the cores and memory were assigned from a single processor chip.  Due to an increase in CPU requirements, a change may have been made to the partition to add 2 processor cores.  The addition of these cores may have created a situation where there are 8 cores and memory on one chip and 2 cores without any memory on another chip.  This can create a situation where there is no local memory for some of the cores assigned to the partition.

Applications, in general, perform better when there is some amount of local memory.  To re-optimize the placement of partitions, PowerVM provides a feature called the Dynamic Platform Optimizer which can dynamically re-assign the memory and processors assigned to active partitions.  The partitions continue to run applications while this optimization is in progress.  The Dynamic Platform Optimizer is initiated from the HMC by issuing the optmem command.  More information about the Dynamic Platform Optimizer can be found in the IBM PowerVM Virtualization Managing and Monitoring Redbook http://www.redbooks.ibm.com/redbooks/pdfs/sg247590.pdf .

Summary

Power servers by default have excellent performance out of the box.  Understanding and utilizing the concepts in this blog will help ensure you are utilizing the Power server as optimally as possible.

Contacting the PowerVM Team

Have questions for the PowerVM team or want to learn more?  Follow our discussion group on LinkedIn IBM PowerVM or IBM Community Discussions


#powervmblog
#powervm
0 comments
23 views

Permalink