Enterprise Linux

Enterprise Linux on Power

Enterprise Linux on Power delivers the foundation for your open source hybrid cloud infrastructure with industry-leading cloud-native deployment options.

 View Only

KVM in a PowerVM LPAR: A Power user guide Part III

By Amit Machhiwal posted Mon November 25, 2024 07:13 AM

  
Figure 1. KVM in a PowerVM I/O virtualization stack
 

In the previous installments of this series, we explored the architecture and foundational setup for running KVM Guests within a PowerVM LPAR environment. We examined resource allocation basics and initial performance tuning strategies. 

In part 3, we delve into the unique runtime architecture of KVM guests on IBM Power. This part highlights how resources, including I/O, are assigned to KVM guests from the LPAR resource pool, initially configured by the PowerVM Hypervisor. You’ll find detailed guidance on network configuration and storage, alongside performance benchmarks and tunable options to maximize workload efficiency.

As we progress through the intricacies of KVM in PowerVM, this blog provides practical insights into optimizing resource allocation and improving overall performance. Let’s continue to unlock the full potential of KVM in your PowerVM environment!

Target audience

This document is aimed at system-administrators and developers running KVM guests in a PowerVM LPAR and want to explore various configuration options for networking and storage resources available to the KVM guest running in an LPAR. We strongly believe that understanding this will empower users to make better decisions when performing capacity planning or provisioning workloads running on KVM guests.

Abstract

The ability to run KVM Guests in an LPAR is a new feature in PowerVM firmware (FW1060.10) release [1]. This feature brings support for industry-standard Linux KVM virtualization stack to IBM Power and integrates seamlessly within an existing Linux virtualization ecosystem.

The runtime architecture of these KVM guests differs from other virtualization mechanisms available for IBM Power. The allocation of resources [3] to a KVM guest (L2) is done from the pool of resources that are assigned to the LPAR (L1) by PowerVM Hypervisor (L0). This includes I/O resources available to the LPAR which can be assigned to KVM guests as either dedicated or shared resources. This blog provides information on how various resources are assigned to KVM guests and how they can be optimized to maximize the performance of the workloads running within them.

Configuration options: Networking

Configuration options: Storage

Performance benchmarks and tunable options

Network configuration for KVM guests

KVM guests support multiple networking modes, each offering different levels of connectivity. Certain modes enable direct communication between the guest and host, restricting access to external systems, while others allow guests to connect directly with external systems using a public IP address. The following networking modes are available for KVM guests:

  • User mode networking
  • Bridge mode networking
  • MacVtap mode networking

User mode networking

User mode networking (virtio-net) is the default networking configuration for a KVM guest, allowing network access through network address translation (NAT).

Following are the key features and use cases:

  1. Libvirt default: Uses Libvirt API’s default network, requiring no additional configuration.
  2. Simple internet access: Easiest way to access the internet or local network resources.
  3. Private IP allocation: The guest is assigned an IP in the 192.168.122.0/24 address space range.
  4. Restricted access: The guest is not directly accessible from external networks by default. However, it can be made accessible by setting up port forwarding with iptables utility through a libvirt-qemu hook [4] in the file /etc/libvirt/hooks/qemu. This configuration enables external access (example: SSH) to the guest. The same logic can be adapted to forward additional services, such as HTTP (port 80) and HTTPS (port 443), by mapping the necessary host

    The following sample setup demonstrates port forwarding for SSH:

    #!/bin/bash
    
    # IMPORTANT: Change the "VM NAME" string to match your actual VM Name.
    # In order to create rules to other VMs, just duplicate the below block and 	  
    # Configure it accordingly.
    
    if [ "${1}" = "VM NAME" ]; then
    
    # Update the following variables to fit your setup
    # This is to forward the ssh connection via the host
    # port 10022
    GUEST_IP=192.168.122.216GUEST_PORT=22HOST_PORT=10022if [ "${2}" = "stopped" ] || [ "${2}" = "reconnect" ]; then
    /sbin/iptables -D FORWARD -o virbr0 -p tcp -d $GUEST_IP --dport $GUEST_PORT -j ACCEPT
    /sbin/iptables -t nat -D PREROUTING -p tcp --dport $HOST_PORT -j DNAT --to $GUEST_IP:$GUEST_PORT
    fi
    if [ "${2}" = "start" ] || [ "${2}" = "reconnect" ]; then
    /sbin/iptables -I FORWARD -o virbr0 -p tcp -d $GUEST_IP --dport $GUEST_PORT -j ACCEPT
    /sbin/iptables -t nat -I PREROUTING -p tcp --dport $HOST_PORT -j DNAT --to $GUEST_IP:$GUEST_PORT
    fi
    fi
    
  5. Limitations: May lack certain networking features, such as Internet Control Message Protocol (ICMP) support, and has lower performance compared to other modes.

Configuration XML

The following XML snippet can be used to configure virtio-net in guest XML:

<interface type='network'>
    <source network='default'/>
    <model type='virtio'/>
</interface>

Or with an equivalent configuration:

<interface type='bridge'>
    <source bridge='virbr0'/>
    <model type='virtio'/>
</interface>

The following figure illustrates the user mode networking configuration.

Figure 2. User mode networking

Bridge networking

Bridge networking allows virtual interfaces (virtio) on the guest to connect directly to external networks through a physical interface, making the guest appear as a standalone host on the network. To configure bridge networking on a guest, a bridge must be set up on the host as a prerequisite.

Prerequisites

  • Ensure the host’s physical network interface is unmanaged by network manager. Delete the interface's connection profile if necessary.
  • Disable Spanning Tree Protocol (STP) on the bridge. STP, if enabled, sends bridge protocol data units over the connected switch port, potentially disabling the port.

Configuration steps

  1. Create and configure the bridge using the following commands:
    ip link add <bridge_name> type bridge 
    ip link set <phy_iface> master br0 
    ip link set <phy_iface> up 
    ip link set <bridge_name> up 
    
  2. Use the bridge created in the previous step in the guests XML:
    <interface type='bridge'> 
        <source bridge='<bridge_name>'/> 
        <model type='virtio'/> 
    </interface> 
    

Note: Bridging to VETH network interfaces is unsupported due to their fixed, burned-in source MAC address. Any Ethernet packet with an incorrect source MAC address will be dropped, resulting in an error on the host OS.

The following figure illustrates the bridge mode networking configuration.

Figure 3. Bridge networking

MacVtap mode networking

MacVtap provides an alternative to bridging, allowing KVM guests to communicate with external hosts using the Linux MacVtap driver. This approach replaces the combination of the tun/tap and bridge drivers with a unified macvlan driver module.

The following figure illustrates the MacVtap mode networking:

Figure 4. MacVtap networking

Key features:

  • A MacVtap endpoint is a character device that follows the tun/tap ioctl interface, enabling direct use with a KVM guest.
  • Typically, this mode allows both the guest and host to appear directly on the network switch to which the host is connected.
  • Unlike bridge networking, MacVtap connects directly to the network interface on the KVM host, reducing the path length and improving performance.

MacVtap Modes:

  • VEPA
  • Bridge (commonly used)
  • Private Mode

Configuration Steps:

  1. Create a MacVtap network configuration XML:
    <network> 
        <name>macvtapnet</name> 
        <forward dev='enP32769p1s0' mode='bridge'> 
               <interface dev='enP32769p1s0'/> 
        </forward> 
    </network> 
    
  2. Use the network XML created in the previous step to define the MacVtap network:
    virsh net-define macvtapnet.xml 
  3. Add the network configuration to the guest's XML and start the guest:
    <interface type='network'>
        <source network='macvtapnet'/>
        <model type='virtio'/>
    </interface>
    
  4. After the guest boots, configure a public IP address and other IP parameters to enable the network interface to turn the networking on.

Recommended configuration for networking

The following recommendations can help users choose the most suitable networking configuration based on their requirements:

Simplicity of configuration

If ease of use is a priority and minimal network configuration is desired, user networking using the default network is the best option. The networking interface is automatically preconfigured on guest boot using QEMU’s DHCP server, eliminating the need for manual IP configuration.

More control over IP address space

If users require greater control over the guest network and prefer explicit IP configuration, bridge networking or MacVtap networking should be considered. These modes do not rely on QEMU’s DHCP service, allowing users to manage IP settings according to their preferences.

Performance

While user networking offers simplicity, it does not provide the best performance. For optimal network performance, MacVtap with vhost is the recommended configuration. This setup ensures higher efficiency compared to other modes, excluding VFIO passthrough which is outside the scope of this discussion.

Networking benchmark

The following are the networking modes used for the performance tests:

  1. Virtio-net
  2. Bridge networking
  3. MacVTap with vhost
    <interface type='network'>
        <source network='macvtapnet'/>
        <model type='virtio'/>
        <driver name=’vhost’/>
    </interface>
    

Network performance testing tools and tunable options

The following tools were used for network performance tests:

  1. iPerf3
    1. Start the iperf3 server, using the iperf3 –s command.
    2. The number of parallel connections from the client can be adjusted to achieve the maximum throughput.

      Sample commands:

      iperf3 -B ${CLIENTIP} -c ${SERVERIP}
      // To run the traffic in reverse direction
      iperf3 -B ${CLIENTIP} -c ${SERVERIP} -R
      // Runs the traffic with 2 parallel connections
      iperf3 -B ${CLIENTIP} -c ${SERVERIP} -P 2
      iperf3 -B ${CLIENTIP} -c ${SERVERIP} -P 4
      
  2. Nginx

    For setting up nginx, an SSL certificate is required. For more information on generating and configuring a self-signed SSL certificate, refer to the HowTo: Create a Self-Signed SSL Certificate on Nginx For CentOS / RHEL documentation.

    Nginx configuration can be tweaked as follows:

    AllowedCPUs=0-31 
    CPUWeight=100 
    CPUQuota=90% 
    IOWeight=20 
    MemorySwapMax=0 
    MemoryMax=1566572544 

    siege is used as a client and the tests can be conducted with varying file sizes such as 1 KB, 10 KB, 100 KB, 1 MB, 10 MB, 100 MB and more.

    The number of connections can be tweaked with values such as 32, 64, 128, 255 and more.

  3. Netperf (TCP_RR)

    Install the netperf and start the server.

    For performing tests, the scripts in /usr/share/doc/netperf/examples/ location can be adjusted as needed.

Storage configuration for KVM guests

Storage devices such as hard disks, SSDs, USB drives, network-attached storage and so on are used to persistently store data. These storage technologies differ in their performance, reliability, and cost, but all follow the same block device model. A single block of data is the smallest logical unit used by Linux to communicate with block devices. As QEMU is the commonly used emulator for virtual guests, it is capable of emulating hard disks, SSDs, USB drives and other storage devices within a KVM guest.

Storage virtualization options in QEMU

QEMU provides emulated storage controllers, such as virtio-blk, virtio-scsi, SATA, and others to enable virtual machines to access block devices. The KVM guests communicate with these controllers to perform read or write operations. These storage controllers provide storage interfaces such as virtio-blk, NVME, or SCSI. The guests communicate with these controllers through registers to transfer data between memory buffers and block devices.

Figure 5. Virtio architecture
In QEMU guests, the storage devices are typically based on virtio (Virtual I/O), an open standard for I/O virtualization. The two most used emulated storage controllers are virtio-blk and virtio-scsi.
  1. Virtio-blk storage controller

    The virtio-blk device provides the guest with a virtual block device. This controller is the older of the two controllers and is preferred for high performance use cases.

    To use virtio-blk in a qemu guest, the following configuration can be added to the virsh guest XML configuration:

    <devices>
         <emulator>/usr/bin/qemu-system-ppc64</emulator>
            <disk type='file' device='disk'>
                 <driver name='qemu' type='qcow2'/>
                 <source file='/home/user/images/fedora.qcow2'/>
                 <blockio logical_block_size='512' physical_block_size='4096'/>
                 <target dev='vda' bus='virtio'/>
            </disk>
    </devices>
    
  2. Virtio-SCSI

    The virtio-scsi device emulates a SCSI host bus adapter for the guest. It offers more features and flexibility compared to virtio-blk and can scale to support hundreds of devices. It uses standard SCSI command sets, making it easier to add new features.

    To use virtio-scsi in a QEMU guest, the following configuration can be added to the virsh guest XML configuration:

    <disk type='file' device='disk'>
        <driver name='qemu' type='raw'/>
        <source file='/mnt1/disk.img'/>
        <target dev='sdb' bus='scsi'/>
        <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    
    <controller type='scsi' index='0' model='virtio-scsi'>
                    <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    

Both virtio-blk and virtio-scsi have a wide range of tunable options, such as number of I/O threads, physical or logical block sizes, transient disks, and so on. Refer to the libvirt documentation for a complete list of tunable options available.

Recommended configuration for improved storage performance

Vhost

  1. QEMU storage daemon

    The DEMU storage daemon allows disk-image functionality to be exposed independently of a running QEMU process. Users communicate with the daemon via QEMU monitor protocol commands over a UNIX socket, with data being transferred through the vhost-user-blk protocol. This protocol connects KVM guests to software-defined storage solutions like SPDK or the qemu-storage-daemon. The key advantage of vhost-user-blk is its use of shared memory, enabling the daemon to read from and write to the guest's RAM, which significantly boosts performance.

  2. Performance benefits

    Using vhost-user-blk reduces syscall overhead in the data path by allowing direct communication with the block layer, which accelerates storage access.

  3. Configuration example

    The following XML configuration snippet sets up a KVM domain with vhost-user-blk for improved storage performance. For more information on tunable options refer to the libvirt documentation.

    <domain type='kvm'>
    .
    .
    <memoryBacking>
    <nosharepages/>
    <source type='memfd'/>
    <access mode='shared'/>
    <allocation mode='immediate'/>
    <discard/>
    </memoryBacking>
    
    <devices>
    <disk type='vhostuser' device='disk' snapshot='no'>
    <driver name='qemu' type='raw'/>
    <source type='unix' path='/home/user/vhost-user-blk.sock'>
    <reconnect enabled='yes' timeout='10'/>
    </source>
    <target dev='vdf' bus='virtio'/>
    <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/>
    </disk>
    </devices>
    
  4. Disk setup with Vhost protocol

    To use a disk with the Vhost protocol, the disk must be made available to the guest over a UNIX socket. Use the following commands to perform disk setup :

    qemu-storage-daemon --blockdev file,filename=/mnt/disk.img,node-name=file0,read-only=off --export vhost-user-blk,node-name=file0,id=test,addr.type=unix,addr.path=vhost-user-blk.sock,writable=on,logical-block-size=4096,num-queues=16
  5. Performance testing results

    In the internal performance tests using the pgbench database benchmark, a performance improvement of over 15% was observed when using a disk with the Vhost protocol compared to virtio-blk.

Performance benchmarks and tunable options

Benchmark results

When running CI/CD workloads, such as the kernel compile benchmark (kernbench), the performance overhead observed in KVM guests is less than 5%. This makes KVM on LPAR suitable for workloads like KubeVirt, Kata containers, and others that rely on high-performance virtualization.

Tunable options

There are some tunable options available in Linux that can help improve performance. They are described as follows:

  1. Irqbalance service

    Irqbalance is a service that should be enabled to distribute interrupts across multiple processors, preventing any single CPU from being overwhelmed with a large number of interrupts. Irqbalance can be started and enabled using the following commands:

    $ systemctl start irqbalance 
    $ systemctl enable irqbalance 
    
  2. XICS interrupt mode

    For better performance in KVM on LPAR guests, use the emulated XICS interrupt mode instead of the default XIVE interrupt mode. To enable XICS, add the following parameter to the kernel command line inside the KVM guest:

    xive=off

Summary

Earlier entries in this blog series [2] and [3], discussed the fundamentals of setting up a KVM guest on an LPAR, including their internal mechanics and the allocation of CPU and memory resources. Building on that foundation, this blog explores the configuration options for networking and storage available to a KVM guest in a PowerVM environment.

Acknowledgments

Thanks to Meghana Prakash who manages the LTC - KVM team and has been the primary sponsor and impetus behind this blog series. Special thanks to the Co-authors of this blog which includes but not limited to:

  1. Gautam Menghani
  2. Vaibhav Jain

References/Footnotes:

[1] https://www.ibm.com/docs/en/announcements/extends-hardware-capabilities-ddr5-memory-other-

[2] https://community.ibm.com/community/user/power/blogs/vaibhav-jain/2024/10/18/kvm-in-a-powervm-lpar-a-power-user-guide-part-i

[3] https://community.ibm.com/community/user/power/blogs/vaibhav-jain/2024/10/29/kvm-in-a-powervm-lpar-poweruser-guide-part-ii

[4] https://wiki.libvirt.org/Networking.html#forwarding-incoming-connections

0 comments
16 views

Permalink