Authors: Anjana A R K, Savitri Hunasheekatti, Saripalli Lavanya.
Secure Execution Image Creation and Troubleshooting for Linux on IBM Z and LinuxONE
This blog is a guide on Secure Execution for Linux (SEL) image building for s390x architecture mainly focusing on configuring Secure Execution environments, managing encryption, building a secure custom RHEL image and bringing up a secure workload.
Secure Execution
LinuxONE systems have a security technique called Secure Execution for Linux (SEL) that makes it possible to create isolated and encrypted confidential virtual machines (VMs) to guarantee the confidentiality and integrity of data. SEL ensures that VMs running on LinuxONE cannot be accessed or modified by unauthorised parties, including system administrators, KVM hypervisors, or other VMs. Data in use or in process is protected and inaccessible to anybody other than the task owner when a virtual machine is installed using a SEL image.
How We Build a Secure Image?
To build a SEL image on the LinuxONE architecture, create root, boot, and data partitions, Use Linux Unified Key Setup (LUKS) to encrypt the root partition, copy the root filesystem and configure the necessary files, add the necessary drivers and modules, update the kernel file and initial ram disk, create a Secure Execution boot image, and prepare the boot partition. This ensures that the system is prepared to securely boot.
Creating a boot image for Secure Execution involves:
- Partitioning the disk.
- Encrypting the root partition
- Copying the root file system.
- Configuring boot files (zipl, fstab, crypttab).
- Generating the Secure Execution boot image.
Partitioning the disk into Root, Boot, and Data Partitions:
Data is arranged into many partitions on Linux systems. The most important ones are:
Root partition (/):
The complete file system hierarchy, including important binaries, libraries, configuration files, and user data, are contained here. Sensitive system files are protected when the root partition is secured or encrypted.
The /boot partition:
The bootloader (such as GRUB or Zipl) and kernel images are the files needed to initialise the system when starting, they are found in the boot partition. In order for the system to access boot files, it must be left unmounted and unencrypted at boot.
Partitioning data (/data):
In cloud systems, where data must expand independent of system files, the data partition is needed. By isolating data into a separate partitions, performance, management, and backup strategies become more efficient.
Encrypting the root partition:
LUKS2 encrypting the root partition ensures that the system data is confidential and unaltered. Even if someone manages to get access to the hardware the right decryption key is needed to read the data. It is essential for Secure Execution environments, where data integrity and security are crucial.
What is Cryptsetup?
Using the LUKS (Linux Unified Key Setup) standard, Cryptsetup is a tool for creating and managing encrypted partitions on Linux. It serves as a bridge between users and the kernel-level encryption framework known as dm-crypt, cryptsetup helps with:
- Initialising encrypted disk partitions
- Opening or unlocking encrypted devices using a passphrase or key
- Managing encryption keys and settings
In a Secure Execution for Linux (SEL) context, cryptsetup ensures that the system’s root filesystem is only accessible with a valid decryption key, protecting sensitive data even if the infrastructure or disk is compromised.
LUKS Encryption
The Linux standard for disk encryption is called LUKS (Linux Unified Key Setup). It offers a standardised method for managing disk partitions and encrypting data, ensuring the privacy of sensitive information.
Why should the root partition be encrypted?
When the root partition is encrypted, all the data on the disk is safeguarded, and without the correct decryption key, the machine won't start up or reveal data. This guarantees that the data in SE environment is safe and not readable by unauthorised users, even in shared environments.
How to encrypt?
100G virtual disk is created using the qemu-img tool identified as /dev/vda or /dev/vdb representing virtual block devices typically used in virtual machines.
qemu-img create -f qcow2 “se-${IMAGE_FILE}” 100G
Assuming workdir=$(pwd), disksize=100G, tmp_nbd=”/dev/$device”, dst_mnt=$workdir/dst_mnt, src_mnt=$workdir/src_mnt , the below command filters the output using `jq` to match the target size (100G) and ensures the device is not mounted or partitioned. This ensures that the correct uninitialised device is selected for partitioning and encryption.
device=$(sudo lsblk — json | jq -r — arg disksize “$disksize” ‘.blockdevices[] | select(.size == $disksize and .children == null and .mountpoint == null) | .name’)
The disk is partitioned using parted to create boot, root, and data partitions,
sudo parted -a optimal ${tmp_nbd} mklabel gpt \
mkpart boot-se ext4 1MiB 256MiB \
mkpart root 256MiB 6400MiB \
mkpart data 6400MiB ${disksize} \
set 1 boot on
After partitioning, the boot partition is formatted with the ext4 file system to hold the necessary boot files
sudo mke2fs -t ext4 -L boot-se ${tmp_nbd}1
A random encryption key is generated and stored temporarily in rootkey.bin, a tmpfs (temporary file system in memory) used to store the encryption key securely during the setup process. This key will be used for encrypting the partition.
sudo mkdir ${workdir}/rootkeys
sudo mount -t tmpfs rootkeys
${workdir}/rootkeys sudo dd if=/dev/random of=${workdir}/rootkeys/rootkey.bin bs=1 count=64 &> /dev/null
The root partition is encrypted using LUKS2, ensuring that data cannot be accessed without the key.
echo YES | sudo cryptsetup luksFormat — type luks2 ${tmp_nbd}2 — key-file ${workdir}/rootkeys/rootkey.bin
The UUID (unique identifier) of the encrypted partition is used to create a LUKS name to refer to the encrypted device. Unlock the encrypted partition and make it available at /dev/mapper/$LUKS_NAME. Any operation on this partition (like copying data) can now be performed because it is decrypted.
LUKS_NAME=”luks-$(sudo blkid -s UUID -o value ${tmp_nbd}2)”
sudo cryptsetup open ${tmp_nbd}2 $LUKS_NAME — key-file ${workdir}/rootkeys/rootkey.bin
Copying the Root Filesystem:
This process is needed for creating or restoring system images and ensuring that the encrypted root partition contains everything needed for a functional system.
mkfs.ext4: Creates an ext4 filesystem on the encrypted partition which is a journaling file system that’s commonly used in Linux distributions. It is used for setting up new drives, repurposing old drives, or preparing a partition for data storage.
tar: Copies all files from the source filesystem while preserving metadata.
mount/umount: Mounts and unmounts filesystems to allow seamless copying.
By copying the root filesystem to the encrypted partition, we ensure the system can boot securely while protecting the data with encryption. We set up an encrypted root filesystem, mount it, copy the contents of the source root filesystem to the encrypted target, and then unmount the source. This process is essential for creating or restoring a system image while ensuring data integrity and security with encryption.
How to copy?
Create a filesystem on the encrypted root partition, destination mount point where the encrypted root partition will be mounted, and source mount point for the current root filesystem (mounted in read-only mode using --bind).
Why --bind?: To make an exact copy of the running filesystem, including symbolic links and mount points.
No-recovery option ensures that the boot partition is mounted without journal recovery to avoid delays (useful when working with fresh boot partitions).
sudo mkfs.ext4 -L “root” /dev/mapper/${LUKS_NAME}
sudo mkdir -p ${dst_mnt}
sudo mkdir -p ${src_mnt}
sudo mount /dev/mapper/$LUKS_NAME ${dst_mnt}
sudo mkdir ${dst_mnt}/boot-se
sudo mount -o norecovery ${tmp_nbd}1 ${dst_mnt}/boot-se
sudo mount —-bind -o ro / ${src_mnt}
tar is a tool to archive and transfer files while preserving all attributes such as ownership, permissions, and security labels. The — one-file-system option ensures that only the current filesystem is copied (avoiding other mounted filesystems like /proc or /sys).
Copying Process:
- Create a tar archive of the source filesystem.
- Extract the tar archive directly into the mounted destination partition, preserving the original file order and metadata.
- After the filesystem copy is complete, the source mount is unmounted to free up resources.
tar_opts=( — numeric-owner — preserve-permissions — acl — selinux — xattrs — xattrs-include=’*’ — sparse — one-file-system)
sudo tar -cf — “${tar_opts[@]}” — sort=none -C ${src_mnt} . | sudo tar -xf — “${tar_opts[@]}” — preserve-order -C “$dst_mnt”
sudo umount ${src_mnt}
Create a Secure Execution Boot Image:
Mounting essential filesystems like sysfs, proc, and /dev inside the chroot environment is needed because the environment where the root filesystem is being prepared for a Secure Execution for Linux (SEL) image is isolated from the actual host system.
- sysfs is a virtual filesystem that exports information about various kernel subsystems, devices, and modules. It’s mounted at /sys.
- proc is a virtual filesystem mounted at /proc, and it provides information about processes, system resources, and kernel settings. For example, when creating the initial ramdisk (initramfs), the system might need access to /proc/cpuinfo to gather hardware information.
- The /dev directory contains special files representing devices on the system (like block devices, terminals, etc.). These are essential for interacting with the hardware. When working with disk partitions and encryption (e.g., using cryptsetup or blkid), access to device files (like /dev/sda, /dev/mapper/…) is necessary. Mounting /dev ensures that the chroot environment has access to these device nodes, so the script can interact with the disk, partitions, and encryption keys properly.
- The SEL image is prepared in a chroot environment, meaning the filesystem at dst_mnt is treated as the new root (/). However, a clean root filesystem does not contain any of the special virtual filesystems that provide information about the system’s hardware or processes. By mounting these virtual filesystems, the isolated environment is provided with the necessary system information, making it possible for tools (like cryptsetup, dracut, etc.) to function correctly.
- mount — bind /dev: Performs a bind mount, effectively making the existing /dev directory available within the destination filesystem
sudo mount -t sysfs sysfs ${dst_mnt}/sys
sudo mount -t proc proc ${dst_mnt}/proc
sudo mount — bind /dev ${dst_mnt}/dev
Configuring Boot Files (zipl, fstab, crypttab):
When dealing with encrypted partitions, special configurations and files must be added to ensure the system can unlock encrypted volumes before the root filesystem becomes available. This preparation ensures smooth and automated decryption during boot.
- The necessary drivers and modules are included in the initramfs to handle encrypted partitions.
- Keyfiles are available at boot time, ensuring that LUKS-encrypted partitions can be unlocked automatically.
- fstab and crypttab files ensure that the correct filesystems are mounted after decryption.
This is essential for enabling encrypted systems to boot securely and seamlessly, without manual intervention.
What is fstab?
The fstab (File System Table) is a critical configuration file in Unix-like operating systems (like Linux). Located at /etc/fstab, it tells how disk partitions, devices, and filesystems should be automatically mounted when the system boots. It also ensures that the system knows where to mount specific filesystems and with what mount options, even without user intervention. When creating or restoring a Secure Execution for Linux image, the system must know:
- How to mount the root filesystem (including encrypted partitions).
- Where to mount other partitions, such as /boot-se or data partitions.
- Which options to use for mounting (e.g., read-only, defaults, etc.).
Without proper entries in fstab, the system may fail to boot or miss mounting critical filesystems.
Example:
cat <<END > ${dst_mnt}/etc/fstab
/dev/mapper/$LUKS_NAME / ext4 defaults 1 1
PARTUUID=$boot_uuid /boot-se ext4 norecovery 1 2
END
Here, LUKS_NAME is the luks name for encrypted root partition.
Adding luks key file
A LUKS key file contains the key or passphrase used to unlock a LUKS-encrypted partition. Instead of having a user to manually enter a password every time the system boots or the partition is accessed, the key file allows automated decryption. This is mainly useful for automated environments, headless systems (like servers), or SE (Secure Execution) images, where user interaction is limited or undesirable, here we use the UUID to uniquely identify the LUKS-encrypted device. This ensures the system knows which key file corresponds to which encrypted partition, especially if multiple encrypted partitions exist. The key file must be available inside the encrypted filesystem so that it can be read during boot to unlock the partition.
The following commands create and store the key file securely:
dev_uuid=$(sudo blkid -s UUID -o value “/dev/mapper/$LUKS_NAME”)
sudo cp “${workdir}/rootkeys/rootkey.bin” “${dst_mnt}/etc/keys/luks-${dev_uuid}.key”
Crypttab
/etc/crypttab is a configuration file that contains information about encrypted partitions that are automatically mounted/unlocked at boot, By defining:
- What partition to unlock,
- How to unlock it (using key files), and
- Which options to apply?
This ensures that encrypted partitions like the root filesystem or data volumes are accessible immediately after the system starts, without requiring manual intervention. Once the partition is decrypted, it can be mounted based on the rules in /etc/fstab
Each line in the crypttab file follows this format:
<name> <device> <keyfile> <options>
- name: The name by which the encrypted partition will be referenced (same as $LUKS_NAME in this context).
- device: The encrypted partition to unlock, specified by UUID for reliability.
- keyfile: Path to the key file used to decrypt the partition.
- luks: Indicates the partition uses LUKS encryption.
- discard: Enables discard/TRIM support, which helps with SSD performance.
- initramfs: Ensures the crypt device is set up during the initramfs stage of boot.
cat <<END > ${dst_mnt}/etc/crypttab
$LUKS_NAME UUID=$(sudo blkid -s UUID -o value ${tmp_nbd}2) /etc/keys/luks-$(blkid-s UUID -o value /dev/mapper/$LUKS_NAME).key luks,discard,initramfs
END
Disable virtio_rng
In a Secure Execution environment, it is important to ensure that all components, mainly those related to security and cryptography, are trusted and isolated.
Blacklist virtio_rng, virtio_rng is a virtual random number generator that can be used in virtualised environments to provide entropy for cryptographic operations. However, since it may rely on the hypervisor for its randomness, it could potentially be compromised or untrusted. By blacklisting this module, the system ensures that it does not use the RNG provided by the hypervisor.
cat <<END > ${dst_mnt}/etc/modprobe.d/blacklist-virtio.conf
blacklist virtio_rng
END
The s390_trng module is a kernel module that provides access to the true random number generator (TRNG) hardware available on LinuxONE Systems. This hardware RNG can be used to provide high-quality entropy for cryptographic operations and other security-related tasks, this command is part of configuring a confidential environment, ensuring that a reliable source of randomness (the true random number generator) is available for cryptographic purposes in a secure execution (SEL) context on LinuxOne systems
echo s390_trng >> ${dst_mnt}/etc/modules
Preparing Files for mkinitrd / initramfs:
In Linux systems, initramfs (initial RAM filesystem) or mkinitrd are used to create a minimal filesystem that gets loaded into memory during the early stages of the boot.
dracut: A tool used on RHEL-based systems to generate the initramfs. The /etc/dracut.conf.d/dracut.conf files allow customisation of what modules, drivers, and files are included.
For non-RHEL systems, cryptsetup-initramfs configures the initramfs to include key files for unlocking LUKS partitions and ensures secure permissions for key files on systems using initramfs-tools
dm_crypt and crypt: These kernel modules handle encrypted block devices and LUKS-encrypted partitions.
KEYFILE_PATTERN: Specifies the pattern for key files (e.g., /etc/keys/*.key) that will be included in the initramfs to unlock encrypted partitions during boot.
UMASK: Controls the default permissions of new files. Setting UMASK=0077 ensures that key files are only accessible to the root user, enhancing security.
install_items: Ensures the key files are available during the initramfs stage.
cat <<END > ${dst_mnt}/etc/dracut.conf.d/crypt.conf
UMASK=0077
add_drivers+=” dm_crypt “
add_dracutmodules+=” crypt “
KEYFILE_PATTERN=” /etc/keys/*.key “
install_items+=” /etc/keys/*.key “
install_items+=” /etc/fstab “
install_items+=” /etc/crypttab “
END
What is zipl.conf?
In s390x architecture, zipl (z/Architecture Initial Program Loader) is the bootloader responsible for loading the kernel and initial RAM filesystem (initramfs) during system startup.
Similar to how GRUB works on other architectures, zipl needs configuration details to know which kernel image to load, where to find the boot target, and how to access the boot partition. These settings are defined in the /etc/zipl.conf file.
In a Secure Execution (SE) environment on s390x, the zipl.conf file ensures that the bootloader knows:
- Which kernel image to load (se.img).
- Where the boot files are stored (/boot-se).
- How to access the target disk (using SCSI with a 512-byte block size and a 2048-block offset).
This configuration is essential to ensure the system boots correctly and securely, especially in an automated environment where encrypted partitions and special boot options are used.
cat <<END > ${dst_mnt}/etc/zipl.conf
[defaultboot]
default=linux
target=/boot-se
targetbase=${tmp_nbd}
targettype=scsi
targetblocksize=512
targetoffset=2048
[linux]
image = /boot-se/se.img
END
Generating the SEL boot Image:
Creating the se.img (Secure Execution for Linux boot image) is a critical step in enabling a confidential VM. This image encapsulates the kernel, initial RAM disk (initramfs), and boot parameters needed to launch the virtual machine (VM) securely on IBM s390x systems.
Copy kernel and initramfs files to the boot partition, the kernel file (vmlinuz) is the Linux kernel, which is loaded into memory during boot to initialise the operating system, initramfs.img is a compressed cpio archive that contains early user space tools needed for the system to mount root partitions and continue the boot process.
sudo cp “/boot/vmlinuz-$(uname -r)” “${dst_mnt}/boot/vmlinuz-$(uname -r)”
sudo cp “/boot/initramfs-$(uname -r).img” “${dst_mnt}/boot/initramfs-$(uname -r).img”
After modifying the configuration, it is essential to update the initial RAM disk (initramfs) so that it contains the necessary drivers and modules included for encrypted partitions required to decrypt the root partition during boot.
sudo chroot ${dst_mnt} dracut -f -v
For ubuntu distro update-initramfs is used instead of Dracut
The SE_PARMLINE defines the parameters needed for the VM to boot properly in a secure execution environment:
• root=/dev/mapper/$LUKS_NAME: Tells the kernel where the encrypted root partition is located.
• rd.auto=1: Automatically scan for storage devices.
• rd.retry=30: Retry mounting the root partition for 30 seconds in case of failure.
• console=ttysclp0: Specifies the console device for system output.
• Blacklist of specific modules (e.g., virtio_rng): Disables unnecessary components to avoid security issues.
• swiotlb=262144: Allocates a large I/O buffer to handle encrypted I/O traffic efficiently.
export SE_PARMLINE=”root=/dev/mapper/$LUKS_NAME rd.auto=1 rd.retry=30 console=ttysclp0 quiet panic=0 rd.shell=0 blacklist=virtio_rng swiotlb=262144"
sudo -E bash -c ‘echo “${SE_PARMLINE}” > ${dst_mnt}/boot/parmfile’
Genprotimg:
Genprotimg is a tool used to generate a protected image for Secure Execution for Linux. It prepares the kernel, initramfs, and other necessary files into a bootable image.
sudo -E /usr/bin/genprotimg \
-i ${dst_mnt}/boot/${KERNEL_FILE} \
-r ${dst_mnt}/boot/${INITRD_FILE} \
-p ${dst_mnt}/boot/parmfile \
—-no-verify \
${host_keys} \
-o ${dst_mnt}/boot-se/se.img
zipl (z-series Initial Program Loader) is used to prepare the system for booting by configuring the boot partition with the SEL image. Wipe the boot partition and run zipl
sudo rm -rf ${dst_mnt}/boot/*
sudo chroot ${dst_mnt} zipl — targetbase ${tmp_nbd} \
—-targettype scsi \
—-targetblocksize 512 \
—-targetoffset 2048 \
—-target /boot-se \
—-image /boot-se/se.img
Unmount all partitions and clean up any temporary keys or files to ensure security and close the LUKS partition to prevent unauthorised access.
sudo cryptsetup close $LUKS_NAME
Convert and compress a disk image into QCOW2 format.
qemu-img convert -O qcow2 -c “se-${IMAGE_FILE}” “${output_img_name}”
The above image appears to show the output of commands related to checking the disk partitions, and their types, and verifying the presence of a Secure Execution for Linux (SEL) image (se.img).
- /dev/mapper/luks-<UUID> is labeled "root" and has a type crypto_LUKS, indicating that it is a LUKS-encrypted partition.
- /boot-se/ partition is being checked for essential files, confirming that the SEL image (se.img) is correctly placed in the boot partition.
Troubleshooting:
The parameters rd.shell=1 and rd.debug=1 in the paramline are commonly used for troubleshooting during the boot process. Here’s what each parameter does:
export SE_PARMLINE=”root=/dev/mapper/$LUKS_NAME rd.auto=1 rd.retry=30 console=ttysclp0 quiet panic=0 rd.shell=1 rd.debug=1 blacklist=virtio_rng swiotlb=262144"
rd.shell=1:
If an error occurs during the root mounting or early boot, instead of halting with no output, the system will drop to an interactive shell, allowing you to troubleshoot and inspect logs.
rd.debug=1:
This enables verbose debugging output from the initramfs stage, providing detailed logs about each step in the initial boot process. It’s useful for identifying exactly where in the initramfs the system might be failing.
Verify the SE Image:
To ensure that the image is SEL (Secure Execution for Linux) enabled, several verification steps can be performed.
/proc/cpuinfo contains information about the CPU's features and capabilities. The facilities field lists specific hardware features supported by the CPU. Facility 158 is critical because it indicates IBM Secure Execution Facility support, a hardware feature required to run confidential VMs on IBM Z or LinuxONE.
grep facilities /proc/cpuinfo | grep 158
/sys/firmware/uv/prot_virt_guest is a file that indicates whether the guest VM is running in a Secure Execution environment.
1: Secure Execution for Linux is enabled.
0: Secure Execution for Linux is not enabled.
cat /sys/firmware/uv/prot_virt_guest
1
pvextract-hdr: This command extracts the Secure Execution header from a SEL-enabled image (se.img).
pvextract-hdr -o output.bin /mnt/qcow2/se.img
SE header found at offset 0x014000
SE header written to ‘output.bin’ (640 bytes)
References:
https://www.ibm.com/docs/en/linux-on-systems?topic=management-secure-execution
https://unix.stackexchange.com/questions/384296/how-to-add-cryptsetup-to-dracut
https://www.freedesktop.org/software/systemd/man/latest/systemd-cryptsetup.html
https://forums.linuxmint.com/viewtopic.php?t=401325
https://access.redhat.com/solutions/447423
https://askubuntu.com/questions/862181/how-to-create-partition-for-data
https://www.ibm.com/docs/en/linux-on-systems?topic=execution-commands