Enterprise Linux

 View Only

KVM in a PowerVM LPAR: A Power user guide Part I

By Vaibhav Jain posted 26 days ago

  

 

 Fig 1. The KVM in an LPAR stack

Target Audience

This document is aimed at system-administrators and developers running KVM Guests in an PowerVM LPAR and want to peek under the hood to better understand the environment. I believe understanding the runtime model of KVM in a PowerVM LPAR will empower users to make better decisions when doing capacity planning or provisioning workloads running in KVM guests.

Furthermore, It is recommended to first go over the KVM in an PowerVM LPAR knowledge center document[1] prior to going over this blog, If you haven't already done so.

Abstract

The ability to run KVM Guests in an LPAR is a new feature in PowerVM Firmware 1060.10 release [1] enabling users to run KVM guests as illustrated in Fig-1. KVM in a PowerVM LPAR brings industry standard Linux KVM virtualization stack to IBM Power and easily integrates within an existing Linux virtualization ecosystem, enabling a lot of interesting usecases for which were earlier difficult to realize in a PowerVM LPAR [7].

The runtime architecture of these KVM guests is different compared to other virtualization mechanisms available for IBM POWER systems as these KVM guests are running from a LPAR in a PowerVM. This document covers various details about how how to setup Linux LPAR and KVM guests, how they work together together with help from the underlying hypervisor (PowerVM), as well as how various execution resources are a virtualized.

Table of Contents

How to set up KVM in an LPAR

Enabling KVM for a HMC Managed PowerVM Lpar

Enabling KVM for Non-HMC managed PowerVM LPAR (MDC Mode)

Linux KVM Host setup

Linux KVM Guest Bring-up

Organization of KVM guests running in a PowerVM LPAR

The New v2 API Hcalls

Guest State Buffers

KVM Guest Life Cycle

How to set up KVM in an LPAR


KVM in a PowerVM LPAR is a new type of LPAR (logical partition) that allows KVM to run inside an LPAR on PowerVM. A KVM enabled LPAR allows standard Linux KVM tools to create and manage lightweight Linux Virtual Machines (VM). A KVM Linux LPAR uses dedicated cores which enables Linux to have full control of when Linux VMs are scheduled to run, just like KVM on other platforms.

Pre-requisites

  • Hardware: IBM Power10 System (S1014, S1022, S1022, S1024, E1050 and E1080)
  • Firmware: IBM FW1060.10 and above
  • HMC Levels: V10 R3 SP1060 and above
  • LPAR:
    • Configuration: KVM Enabled LPAR (See Below for steps to enable KVM for an LPAR)
    • Operating Systems: One of the supported Linux distributions. Namely:
      • Ubuntu 24.04.1 or Later (Canonical Supported)
      • Fedora 40 or Later (Community Supported)
      • OpenSUSE Tumbleweed (Community Supported)
      • Any Linux distribution that is based on upstream kernel >=6.11 and Qemu version >=8.2 (Community/Distribution Supported)

Enabling KVM for a HMC Managed PowerVM Lpar


Using HMC GUI

Note: The following operation is allowed only on Deactivated LPAR and ensure LPAR console on the HMC is closed.

Please refer to Fig-2 below:

→ Select the Managed System

→ Select the LPAR

→ Click on “Advance Settings”

→ Select Check-box “KVM Capable” to support KVM Virtulization.

 

Fig 2. HMC Management page for enabling KVM in a PowerVM LPAR

Using HMC CLI

  1. Login to HMC via SSH
  2. To convert a regular LPAR to a KVM capable LPAR run the following command:

chsyscfg -r lpar -m <Managed System> -i  "name=<lparname>,kvm_capable=1"

  1. To convert a LPAR from KVM capable to regular LPAR:

chsyscfg -r lpar -m <Managed System> -i "name=<lparname>,kvm_capable=0"

  1. To check LPAR mode:

lssyscfg -r lpar -F kvm_capable -m <Managed System> --filter "lpar_names=<lpar-name>"

Sample Output:

# lssyscfg -r lpar -m ltcden6 --filter "lpar_names=ltcden6-lp1" -F kvm_capable

1

Enabling KVM for Non-HMC managed PowerVM LPAR (MDC Mode)


Note: KVM mode in MDC mode is only supported for eBMC based systems S1014, S1022s, S1022 and S1024.

Please refer to Fig-3 below:

  • Connect to the eBMC ASMI GUI at https://<eBMC IP>.
  • Navigate to Server power operations.
  • From the Default partition environment menu, select Linux KVM.
  • Click Save and reboot the system.

 

Fig 3. eBMC page for enabling KVM in a PowerVM LPAR

Linux KVM Host setup


Please refer to IBM PowerVM documentation [2] on how to setup various Linux distributions in a PowerVM LPARs that managed by HMC or an unmanaged eBMC system LPARs. Once your LPAR is installed with a supported distribution then please follow the relevant steps below:

Ubuntu 24.04 or later

Install KVM/Qemu libvirt

$ sudo apt install qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils -y

Install virt-install

$ sudo apt install -y virtinst

Install virt-customize, virt-sysprep and other tools

$ sudo apt install -y guestfs-tools

Start libvirtd

$ sudo systemctl start libvirtd

$ sudo systemctl enable libvirtd

Fedora 40 or later

Install KVM/Qemu libvirt

$ sudo dnf install -y qemu-kvm libvirt

Install virt-install

$ sudo dnf install -y virt-install

Install virt-customize, virt-sysprep and other tools

$ sudo dnf install -y guestfs-tools

Start libvirtd

$ sudo systemctl start libvirtd

$ sudo systemctl enable libvirtd

OpenSUSE Tumbleweed

Install KVM/Qemu libvirt

$ zypper install qemu-kvm libvirt

Install virt-install

$ zypper install virt-install

Install virt-customize, virt-sysprep and other tools

$ zypper install guestfs-tools

Start libvirtd

$ systemctl start libvirtd

$ systemctl enable libvirtd

Once the necessary packages are installed to verify libvirt daemon setup is working:

$ virsh list –all

Id Name State

--------------------

A non-error output indicates that libvirsh services are active and ready to spawn KVM guests.

Linux KVM Host – Default Network Bridge Setup

Before we create a KVM guest we will setup networking for it. Here we will create and use the default NAT bridge based network for our KVM Guests that simplest to setup.

Before proceeding with Linux KVM guest setup, make sure default network bridge is active.

$ virsh net-list

Name State Autostart Persistent

--------------------------------------------

default active no yes

In case the default network is not active then please use the command below to start the default network interface

$ virsh net-start default

Network default started

In rare circumstances if the default network is not available, then use the following steps to define the same.

Create a default.xml file with following contents

<network>

<name>default</name>

<forward mode='nat'/>

<bridge name='virbr3' stp='on' delay='0'/>

<ip address='192.168.122.1' netmask='255.255.255.0'/>

</network>

Configure the default network bridge

$ virsh net-define default.xml

Start the default network bridge

$ virsh net-start default

Linux KVM Guest Bring-up


You can use one of the following ways to setup Linux KVM guest.

Create a KVM Guest using a cloud (qcow2) image

This is one of the simplest ways to setup a KVM guest that uses pre-build distro guest images from linux distributions. To use simply download the required guest qcow images from:

Once downloaded the guest qcow2 images need to configured/provisioned which can be done before booting the guest via virt-customize or while the guest boots for the first time via ignition[5] or cloud-init[6]. Below examples uses virt-customize to configure the root credentials of a Fedora-40 server guest image downloaded from [3]. Incase you want to use ignition[5] or cloud-init[6] then skip the step below:

# Prepare the cloud image and set the root password to ‘passw0rd’

$ virt-customize --add Fedora-Server-KVM-40.ppc64le.qcow2 --root-password password:passw0rd

Once configured, a KVM-Guest can be created as a Libvirt domain with the needed resources. Example below creates a domain named ‘Fedora-40’ with 4GiB RAM and 4 VCPUs

$ virt-install --name Fedora-40 --memory 4096 --disk path=/home/test/Fedora-Server-KVM-40.ppc64le.qcow2 --vcpus 4 --os-variant generic --network bridge=virbr0 --graphics none --console pty,target_type=serial --import

Create a KVM Guest using install image (ISO)

Another method to setup a KVM guest is using the traditional ISO based installation method. To use this method first download a DVD based ISO image for IBM Power ppc64le architecture for the relevant distro. Examples below create a libvirt domain with 4GiB memory , 4 vcpus and each one is installed on a 40 GiB virtual disk :

$ sudo virt-install --name guest --memory 4096 --vcpus 4 --os-variant ubuntu24.04 --network bridge=virbr0 --disk path=/home/guest.qcow2,size=40 --graphics none --cdrom /var/lib/libvirt/images/ubuntu-24.04-live-server-ppc64el.iso

  • For Fedora download installer image from https://fedoraproject.org/server/download

$ virt-install --name guest --memory 4096 --vcpus 4 --os-variant fedora40 --network bridge=virbr0 --disk path=/home/guest.qcow2,size=40 --graphics none --cdrom /home/Fedora-Server-dvd-ppc64le-40.iso

$ virt-install --name guest --memory 4096 --vcpus 4 --os-variant opensusetumbleweed --network bridge=virbr0 --disk path=/home/guest.qcow2,size=8 --graphics none --cdrom /home/openSUSE-Tumbleweed-DVD-ppc64le-Current.iso

Follow the install screens to complete the Linux distro installation. Once completed successfully your Linux KVM guest is now ready for use. You can verify the guest creation by listing the available domains via virsh:

$ virsh list

Id Name State

------------------------------

1 guest running

To connect to the guest console use the following virsh command:

$ virsh console <guest_name>

Connected to domain 'guest'

Linux KVM Guest – other useful commands

Start a Guest

$ virsh start <domain-name> --console

Stop a Guest

$ virsh shutdown <domain-name>

Delete a Guest

$ virsh undefine <domain name>

Organization of KVM guests running in a PowerVM LPAR


Now we have setup a KVM guest which is running in a PowerVM LPAR lets peek under the hood to understand how it works.

 

Fig 4. KVM Guest in a PowerVM LPAR Organization

Above illustration describes the various components and their runtime organization to enable a KVM Guest running in a PowerVM LPAR. The KVM Guest (L2) runs at a similar privilege level as LPAR (L1). The Power-10 processor has at-least 3 privileged modes which are controlled by the Machine-State-Register (MSR) bits HV and PR. These states are namely:

* Hypervisor Privileged: Indicated by MSR[HV] is the highest privileged level at which code runs. This level is used by PowerVM Hypervisor code (L0).

* Partition Privileged: Indicated by MSR[HV] = 0 and MSR[PR] = 0. This privileged mode is used by LPAR OS Kernel.

* Problem State: Indicated by MSR[HV] = 0 and MSR[PR] = 1. The unprivileged userspace processes run at this level.

As illustrated in Fig-4 even though a KVM Guest (L2) runs as a Userspace process in the LPAR(L1) [2] hosted inside a Qemu process, it runs at partition privileged level. This virtualization model named KVM-HV [9] forms the basis of running virtualized KVM guests on an LPAR. The L2 KVM guest is a PAPR[10] compliant Pseries guest operating system which gets its runtime resources (CPU / Memory / Disks) from the L1 Host which is Hypervisor. The PowerVM hypervisor (L0) maintains the isolation between L2 and L1 even though they are running at similar privilege levels.

In KVM-HV, the L1 services the hcalls[11] made by the L2 kernel which may result in further hcalls made to PowerVM (L0). Invocation of these hcalls by L2, which is done using privileged instructions results in a TRAP to L1 containing information on what privileged operation was attempted along with other details.

Users must be aware that Invocation of an hcall by L2 is not the only possible TRAP to L1 that is possible. Other privileged operations such as handling page faults (HDSI), Hypervisor Decrem entor Interrupt (HDEC) could also cause a TRAP to L1. To manage the life-cycle of the L2 Guest, the L1 LPAR uses certain new hcalls (documented later).

The PAPR extended specification of how to handle L2 privileged operations via Traps to L1 together with the new set of HCALLs to manage L2 guests is what is called the v2 API specification of KVM Guests and is illustrated in the Fig.1

The New v2 API Hcalls

As mentioned in the previous section the v2 API introduces new hcalls to manage life-cycle and state of an L2 KVM Guest. These hcalls are made by the L1 and serviced by L0-PowerVM hypervisor. These new hcalls broadly fall under 4 different categories:

1. Querying and updating PowerVM hypervisor capabilities:

  • H_GUEST_GET_CAPABILITIES
  • H_GUEST_SET_CAPABILITIES

2. Creating/Deleting L2 Guests and its VCPUs

  • H_GUEST_CREATE
  • H_GUEST_CREATE_VCPU
  • H_GUEST_DELETE

3. Querying and Setting L2 VCPU state

  • H_GUEST_GET_STATE
  • H_GUEST_SET_STATE

4. Switch to L2-VCPU run context.

  • H_GUEST_RUN_VCPU

The names of these hcalls are self-explanatory. However finer semantics of these hcalls are described in detail in Linux kernel documentation [12].

Guest State Buffers

The exchange of L2 vcpu state between L0 and L1 is done using a data structure called ‘Guest State Buffer’ (GSB). It is the main method to communicate L2 state between the L1 and L0 via H_GUEST_{G,S}ET_STATE and H_GUEST_VCPU_RUN hcalls. State may be associated with a whole L2 (eg timebase offset) or a specific L2 vCPU (eg. GPR state). With H_GUEST_VCPU_RUN hcall only lets you set L2 threadwide vcpu state. In contrast, H_GUEST_{G,S}ET_STATE hcalls lets you get/set both guestwide as well as threadwide state of the L2.

Below is the layout of the Guest state buffer which has a header indicating the number of elements, followed by the GSB elements themselves. Each GSB element is 4 or more bytes in length indicating the state element id (for e.g GPR0) followed by its value. The GSB IDs and their corresponding lengths are described in kernel documentation at [13].

GSB header:

Offset Bytes

Size Bytes

Purpose

0x0

0x4

Number of elements

0x4

variable

Guest state buffer elements

GSB element:

Offset Bytes

Size Bytes

Purpose

0x0

0x2

ID

0x2

0x2

Size of Value

0x4

As above

Value

Guest Buffer Example

Below is an example of a guest state buffer thats sent as an output variable to H_GUEST_SET_STATE hcall to set the state of L2 vcpu GPR0 with all bits set and GPR1 to 0:

Offset

Size (in Bytes)

Value

Comments

0x0

0x4

0x2

The number of elements following this

0x4

0x2

0x1000

GSB-ID for GPR0

0x6

0x2

0x8

Size of value in bytes

0x8

0x8

0xFFFFFFFFFFFFFFFF

New value for GPR0

0x10

0x2

0x1001

GSB-ID for GPR1

0x12

0x2

0x8

Size of value in bytes

0x14

0x8

0x0

New value for GPR1

When used to retrieve the L2 VCPU state the value field of the Guest State Buffer Elements is over-written by the L0 hypervisor servicing the hcall. So for H_GUEST_GET_STATE the value fields of the GSB element act as output variables.

KVM Guest Life Cycle

 


Fig 5. KVM guest lifecycle

The lifecycle of an L2 KVM Guest running in an L1 LPAR starts with H_GUEST_CREATE hcall to L0 which returns a unique L2 guest-id. This id is then used to allocate needed number of L2 guest VCPUs using the H_GUEST_CREATE_VCPU hcall. Once the L2 Guest and its VCPUs are created their states can be set via the H_GUEST_SET_STATE hcall. This hcall lets L1 set the Guest Wide state (e.g Logical PVR) as well as thread-wide state (e.g GPRs).

Switching to the L2 VCPU running context is done via H_GUEST_RUN_VCPU hcall as illustrated in Fig.2. The underlying PowerVM hypervisor then populates the L2 vcpu state (including MMU context / MSR) and jumps to the instruction pointed to by NIP register using a context synchronizing mechanism like HRFID [7].

Once the L2 VCPU is running, it can attempt a privileged operation requiring a Trap to L1 as described earlier. When this happens, the L0 hypervisor receives an exception and switches to L1 context causing a return from H_GUEST_RUN_VCPU hcall. The return value from hcall indicates the trap/exit reason due to which the L2 VCPU stopped running. The L1 has an ability to query the new state L2 vcpus using the H_GUEST_GET_STATE hcall which returns Guest/Thread wide state of the L2 vcpu.

Base on the TRAP reason and current state of the L2 vcpu, L1 proceeds to perform necessary action to handle the TRAP (for e.g performing MMIO emulation for certain h/w register). It then proceeds to update the L2 vcpu state based on the results of handling the trap using the H_GUEST_SET_STATE hcall and then restarting the L2 vcpu using H_GUEST_RUN_VCPU hcall.

Summary

This document covered basics of KVM Guest in a PowerVM Lpar including how to configure and setup a new LPAR thats capable of running KVM guests. We also covered to quickly bootstrap a KVM guest using various mechanisms including using pre-built distro qcow2 images. Finally, we covered how these KVM Guests run under the hood using the newly introduced PAPR APIv2 and covered how the LPAR maintains and exchanges the KVM guest.

There is much more to this exciting technology and going forward we will cover more details around various usecases/functionality like setting up advance networking/storage devices for KVM guests or how to perform various resource hotplug operations.

Stay Tuned...

Acknowledgments

Thanks to Meghana Prakash who the manages the LTC - KVM team and has been the primarily sponser and impetus behind this blog series.  Special thanks to the Co-Authors of this article which includes but not limited to :

  • Gautam Meghani
  • Amit Machhiwal
  • Hariharan T S

Thanks to following to spending their time reviewing this series and coming up great insights and review comments.

  • Dave Stanton
  • Vaidyanathan Srinivasan
  • Ritesh Harjani

References/Footnotes:

[1] https://www.ibm.com/docs/en/announcements/extends-hardware-capabilities-ddr5-memory-other-enhancements-selected-power10-technology-based-servers?region=US#topic_bzt_zgq_2bc__section_mxr_yws_rbc

[2] https://www.ibm.com/docs/en/linux-on-systems?topic=systems-installing-linux-powervm-lpar-by-using-hmc

[3] https://download.fedoraproject.org/pub/fedora-secondary/releases/40/Server/ppc64le/images/

[4] https://cloud-images.ubuntu.com

[5] https://coreos.github.io/ignition/

[6] https://cloud-init.io/

[7] KVM in PowerVM LPAR IBM Knowledge Center Documentation: https://ibmdocs-test.dcs.ibm.com/docs/en/sflos?topic=linuxonibm_test/liabx/kvm_in_powervm_lpar.htm

[8] Assuming Qemu/KVM is used to host the KVM Guest in L1.

[9] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/kvm/book3s_hv.c

[10] https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference

[11] With the exception of H_RPT_INVALIDATE which is handled by PowerVM

[12] https://www.kernel.org/doc/html/latest/arch/powerpc/kvm-nested.html

[13] Power ISA 3.1C: https://files.openpower.foundation/s/9izgC5Rogi5Ywmm Section 4.3

0 comments
71 views

Permalink