Infrastructure as a Service

 View Only

Booting an IBM Cloud Bare Metal Server for VPC from an NFS mount

By Samuel Matzek posted Tue October 24, 2023 09:53 AM

  

Intro

IBM Cloud Bare Metal Servers satisfy requirements for many workloads that have very high CPU and I/O requirements and can be very cost-efficient if you are able to fully utilize the server. However, there cases when customer needs boot drive encryption for bare metals. Operationally, having such an encryption implies that the operating system, the root file system and the files it generates are all transparently encrypted by the underlying encryption layer. This article shows how IBM Cloud VPC block storage, a NFS virtual server in conjunction with a customized bootloader on bare metals can be used achieve that goal.

Solution Overview

During a “normal” system boot, the initial bootloader code is read from a special “boot” partition on a disk attached to the server. This bootloader then loads the operating system kernel and an initial RAM disk image and file system (initramfs) into memory. The executables in the initial RAM file system find the root file system partition on a disk attached to the server and mount it. The system boot the proceeds by launching the operating system initialization processes on the root file system.

In his Medium post Amartey Pearson describes how to network boot an IBM Cloud Bare Metal Server using a custom image with an iPXE EFI bootloader executable that loads an iPXE script from an HTTP(s) server. We extend this method by using iPXE to load a kernel and a NFS enabled initramfs from the HTTP server. The kernel is then booted and mounts the root file system from a NFS mount exported by a NFS server running on a VSI with an attached block volume.

The new flow during boot becomes:

  1. The server loads the iPXE bootloader from the boot partition of its physical disk
  2. The bootloader loads the kernel and the initramfs from the HTTP(s) server
  3. The modules in the initramfs mount the root file system from an NFS server that is hosting the share from an encrypted VPC block volume. The boot initialization scripts of the operating system in the NFS share are invoked and the system boots up.

Setting up the NFS server

We use an VSI for the NFS server and chose a profile that has 16 Gbps of network bandwidth. Post instance creation, we adjust the bandwidth allocation to use 8 Gbps each for volumes and networking. This gives the VSI equal bandwidth for communication to the block volume and the bare metal server. We also provision a VPC block volume to hold the bare metal server’s root file system. At provision time we can chose to use IBM-managed or customer-managed encryption for the volume. The IBM Cloud documentation contains more information on creating customer-managed encryption volumes.

We configure the NFS service by specifying a domain in /etc/idmapd.conf and configuring ID mapping:

[General]
#Verbosity = 0
# The following should be set to the local NFSv4 domain name
# The default is the host's DNS domain name.
#Domain = local.domain.edu
Domain = ourdomain.com
#...
[Mapping]
Nobody-User = nobody
Nobody-Group = nobody

Preparing the root mount point

We mount the VPC block volume at /nfsvol and make a copy of the NFS server root in the /nfsvol/sysimage directory. Note this makes the root mount point contain the same OS version as the NFS server. In our case this is Red Hat Enterprise Linux 9.0.

mkdir /nfsvol/sysimage
rsync -a --exclude='/proc/*' --exclude='/sys/*' --exclude='/nfsvol'  / /nfsvol/sysimage

We updated the root image fstab (/nfsvol/sysimage/etc/fstab) to have only the following contents:

none      /tmp      tmpfs     defaults  0 0
tmpfs     /dev/shm  tmpfs     defaults  0 0
sysfs     /sys      sysfs     defaults  0 0
proc      /proc     proc      defaults  0 0

Lastly, the network configuration scripts are removed from the root image because the bare metal server will configure its own networking on first boot.

rm /nfsvol/sysimage/etc/sysconfig/network-scripts/*

Creating the initramfs

We need to build an initramfs that has NFS support. The dracut tool is used to create the initramfs on the NFS server. We create the initramfs on the NFS server because the creation pulls in the /etc/idmapd.conf file containing the NFS configuration.

# Install dracut-network
dnf install dracut-network

# Add the NFS to the list of dracut modules
echo "add_dracutmodules+=\" nfs \"" >> /etc/dracut.conf.d/network.conf

# Generate a new initramfs
dracut -f --add nfs "initramfs-$(uname -r).img" "$(uname -r)"

# Make the initramfs image executable
chmod 0644 "initramfs-$(uname -r).img" 

# Copy the initramfs to the root image
cp initramfs-*.img /nfsvol/sysimage/boot/

Copy the resulting initramfs-*.img file and the kernel /boot/vmlinuz-<kernel-version> files to the iPXE HTTP server VSI and place them in the directory that the httpd serves files from. 

Exporting the root mount point

Now that the bare metal system image is ready, we add it to the /etc/exports file by adding this line:

/nfsvol/sysimage 10.240.128.0/24(rw,no_root_squash)

and then start the NFS server with this command:

systemctl enable --now nfs-server

Create the bare metal server

Create an IBM Cloud Bare Metal Server for VPC using the custom image created by following Amartey’s article. Use the IBM Cloud console to find the MAC/hardware address of the server's network adapter. The server will not boot into an operating system at this point, but the MAC address is needed for setting up the iPXE scripts.

Preparing the iPXE scripts

The custom image created by following Amartey Pearson’s steps results in a iPXE EFI that loads a script named script.ipxe from the HTTP server. The script.ipxe we load from the HTTP server has these contents:

#!ipxe

chain http://10.240.64.4:8080/${net0/mac:hexhyp}.script.ipxe || shell

This script simply instructs the bare metal to retrieve another file from the HTTP server using its MAC address as part of the file name. This allows us to have separate iPXE scripts for different bare metal servers which in turn allows each bare metal server to have a unique mount point for its root file system. Without having such a unique file system for each bare metal, they will share the same filesystem as their boot drive with disastrous consequences. Since the MAC address of our bare metal server is 02:00:0e:5d:91:74, we name the iPXE script 02-00-0e-5d-91-74.script.ipxe, and give it these contents:

#!ipxe

kernel vmlinuz-5.14.0-70.50.2.el9_0.x86_64
initrd initramfs-5.14.0-70.50.2.el9_0.x86_64.img

# boot and set the root file system to the share on
# the NFS server (10.240.128.12)

boot vmlinuz-5.14.0-70.50.2.el9_0.x86_64 root=10.240.128.12:/nfsvol/sysimage:vers=4.2,sec=sys,rw panic=60 selinux=0 ipv6.disable=1 console=tty0 console=ttyS0,115200n8

Reboot the bare metal server

Now that the NFS server and HTTP server iPXE scripts are ready, you can reboot the bare metal server.

When the server reboots and moves to the Starting state you can follow the boot process on the server console. The PXE boot process will look like this:

Once the server is up, the output of the mount command looks like this, with the root file system appearing as a mount from the NFS server:

$ mount
proc on /proc type proc (rw,relatime)
sysfs on /sys type sysfs (rw,relatime)
...
10.240.128.12:/nfsvol/sysimage on / type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.240.128.10,local_lock=none,addr=10.240.128.12).

Conclusion

Being able to boot your bare metal server from an NFS shared block volume gives you the flexibility of managing the encryption of the boot disk using customer-managed keys and a key management system.

Related Links

Network Booting an IBM Cloud Bare Metal Server for VPC

iPXE Open Source Boot Firmware


#Highlights
#Highlights-home

0 comments
37 views

Permalink