Introduction
Optimal logical partition (LPAR) placement can be important to improve the performance of workloads as this can favor efficient use of the memory and CPU resources on the system. However, for certain configuration and settings such as I/O devices allocation to the partition, amount of memory allocation, CPU entitlement to the partition, and so on we might not get a desired LPAR placement. In such situations, the technique described in this blog can enable you to place the LPAR in a desired optimal configuration.
This technique can be used before or after the installation of a new on-premises Red Hat OpenShift Container Platform cluster. The LPARs in this example scenario are used as the OpenShift Container Platform’s worker nodes and placing these worker nodes on a desired socket and NUMA boundary is the main goal.
Prerequisites
To get started, the following prerequisites are required:
- Hardware Management Console (HMC) GUI access to create LPAR profiles.
- HMC command line access to modify LPAR memory allocation.
- Baseboard management controller (BMC)/IBM Power hypervisor (PHYP) command line to determine the LPAR placements.
- Validation of all the pre-requisites before the user-provisioned infrastructure based OpenShift Container Platform cluster installation
For this OpenShift Container Platform cluster installation, multiple physical IBM Power systems are used. To illustrate the technique, consider an OpenShift Container Platform configuration where there are three LPARs as OpenShift Container Platform master nodes on three different Power systems and four LPARs as OpenShift Container Platform worker node on a single IBM Power10 processor-based system. We are going to focus only on the worker nodes. In our example, the Power system used for the worker nodes has a total of 40 physical cores across two sockets with four chips and a total of 1TB memory. So, each worker node will use 10 dedicated physical cores and 180 GB memory. To get more details about the Power10 scale out system design, refer to Announcing IBM Power10 Scale-Out and Midrange Servers.
Placing LPARs on the required socket in IBM Power system
With the above perquisites in place, the steps outlined below will place the LPARs (OpenShift Container Platform worker nodes) on the desired sockets and chips of the Power system:
- Log in to the HMC console and select the Power10 processor-based system used for the four worker nodes.
- Create four LPAR profiles as OpenShift Container Platform worker nodes, where each LPAR is assigned 10 dedicated physical cores and 180 GB memory, then save the profiles.
- Create four LPAR profiles as placeholder LPARs, where each LPAR is assigned 10 dedicated physical cores and 64 GB memory, and then save the profiles.
- Log in to the PHYP command line as shown below and confirm the current LPAR placement (replace xx.xx.xx with the BMC IP address)
phyp # hvlpconfigdata -affinity -domain
The above PHYP command invokes a built-in macro to display the current LPAR placement and system details. There should not be any LPAR placement because the LPARs have not been activated yet.
- Log in to the HMC command-line interface using the following command to see the allocated memory of the LPARs (replace x1 with the Power system name):
# ~> lshwres -r mem -m x1 --level lpar -F lpar_name,lpar_id,curr_mem
The above HMC command will display no memory allocation at this point (see below), because the LPARs are not activated yet. However, after the LPARs are activated (in the following steps), the PHYP will determine their placement and the output will show the amount of memory per LPAR. The output detail is as follows; first entry is the name of the LPAR, followed by the LPAR ID and last entry is the amount of allocated memory in MB.
placeholder1,5,0
placeholder2,6,0
placeholder3,7,0
placeholder4,8,0
worker4,4,0
worker3,3,0
worker2,2,0
worker1,1,0
- Log in to the HMC GUI interface and activate all the placeholder LPARs. Note that because there are no OS installed on these LPARs, the expected behavior is that these LPARs will be in the open Firmware state on the GUI interface.
- From the PHYP command line, run the same command as in step#4. This time, the output should display the LPAR placement of the four placeholder LPARs on the sockets and chips. Because each of the placeholder LPARs have 10 cores and 64 GB memory, and there are a total of two sockets and four chips, and each chip has 10 cores, then the PHYP is expected to place the four placeholder LPARs on each of the four chips.
- Form the HMC command line, run the same command as in step 5 to see the output for the allocated memory per LPAR.
- Using the information from step 7 on the LPAR placement and allocated memory of the four placeholder LPARs, modify the memory allocation of the four placeholder LPARs in a certain order before activating the four OpenShift Container Platform worker nodes (LPARs) so that we can replace the placeholder LPARs with the OpenShift Container Platform worker node LPARs. For our example, we want to place the OpenShift Container Platform worker nodes based on their network connections which are using Virtual Function (VF) of an SRIOV based adapter in slots C11 and C4. The first two worker nodes (worker1 and worker2) are configured using the slot C11, which is associated to the first socket, and the last two worker nodes (worker3 and worker4) have been configured using slot C4, which is associated with the second socket.
- From the HMC GUI interface, switch off the placeholder1 LPAR on the first chip of the first socket. Switch to the HMC command line and run the following command to clear its memory allocation. Replace x1 with the system name, x2 with the LPAR ID, and x3 with the allocated memory (see the earlier results captured in the step 5)
~> chhwres -r mem -m x1 -o r --id x2 -q x3
Note that the other three placeholder LPAR should still be running and should not be powered off. From the PHYP command line, run the same command as shown in step 4 to confirm that the placeholder1 LPAR is no longer placed on the first chip of the first socket. This is because its memory assignment has just been cleared.
- Given that the placeholder1 LPAR has just been removed from the first chip of the first socket, and all three placeholder LPARs are active and placed across other chips, we are now ready to activate the first OpenShift Container Platform worker node (worker1) and PHYP should place this OpenShift Container Platform worker node on the first chip of the first socket. After activating worker1, from the PHYP command line, run the same command as shown in step 4 to confirm that the OpenShift Container Platform worker node LPAR (worker1) is now placed on the first chip of the first socket.
- We now have three placeholder LPARs (placeholder2, placeholder3, placeholder4) placed across three chips and the OpenShift Container Platform worker node (worker1) on the first chip of the first socket. Note that based on network adapter slot location, we want to have worker2 placed on the second ship of the first socket, because both worker1 and worker2 are using slot C11, which is associated with the first socket. To place worker2 on the second chip of the first socket, repeat the same steps as before, however this time, switch off the placeholder2 LPAR, clear its memory allocation, then activate worker2. For worker3 and worker4 placement, the placeholder3 and placeholder4 LPARs should be used (same as the other two described earlier).
With all the four OpenShift Container Platform worker node LPARs placed as described above, the PHYP command line shown in step 4 should show their placement in the following order:
- worker1 on chip1 of socket1
- worker2 on chip2 of socket1
- worker3 on chip3 of socket2
- worker4 on chip4 of socket2
Summary
Given the optimal placement as above, we can use this technique before or after OpenShift Container Platform installation. The Power hypervisor has a unique feature, Dynamic Platform Optimizer (DPO), which can help minimize the NUMA effects when LPARs are created, modified, or replaced. To learn more about LPAR optimal placement, refer to the IBM Power Virtualization Best Practices guide.