The new flow during boot becomes:
- The server loads the iPXE bootloader from the boot partition of its physical disk
- The bootloader loads the kernel and the initramfs from the HTTP(s) server
- The modules in the initramfs mount the root file system from an NFS server that is hosting the share from an encrypted VPC block volume. The boot initialization scripts of the operating system in the NFS share are invoked and the system boots up.
Setting up the NFS server
We use an VSI for the NFS server and chose a profile that has 16 Gbps of network bandwidth. Post instance creation, we adjust the bandwidth allocation to use 8 Gbps each for volumes and networking. This gives the VSI equal bandwidth for communication to the block volume and the bare metal server. We also provision a VPC block volume to hold the bare metal server’s root file system. At provision time we can chose to use IBM-managed or customer-managed encryption for the volume. The IBM Cloud documentation contains more information on creating customer-managed encryption volumes.
We configure the NFS service by specifying a domain in /etc/idmapd.conf and configuring ID mapping:
[General]
#Verbosity = 0
# The following should be set to the local NFSv4 domain name# The default is the host's DNS domain name.
#Domain = local.domain.edu
Domain = ourdomain.com
#...
[Mapping]
Nobody-User = nobody
Nobody-Group = nobody
Preparing the root mount point
We mount the VPC block volume at /nfsvol and make a copy of the NFS server root in the /nfsvol/sysimage directory. Note this makes the root mount point contain the same OS version as the NFS server. In our case this is Red Hat Enterprise Linux 9.0.
mkdir /nfsvol/sysimage
rsync -a --exclude='/proc/*' --exclude='/sys/*' --exclude='/nfsvol' / /nfsvol/sysimage
We updated the root image fstab (/nfsvol/sysimage/etc/fstab) to have the following contents:
none /tmp tmpfs defaults 0 0
tmpfs /dev/shm tmpfs defaults 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
Lastly, the network configuration scripts are removed from the root image because the bare metal server will configure its own networking on first boot.
rm /nfsvol/sysimage/etc/sysconfig/network-scripts/*
Creating the initramfs
We need to build an initramfs that has NFS support. The dracut tool is used to create the initramfs on the NFS server. We create the initramfs on the NFS server because the creation pulls in the /etc/idmapd.conf file containing the NFS configuration.
# Install dracut-network
dnf install dracut-network
# Add the NFS to the list of dracut modules
echo "add_dracutmodules+=\" nfs \"" >> /etc/dracut.conf.d/network.conf
# Generate a new initramfs
dracut -f --add nfs "initramfs-$(uname -r).img" "$(uname -r)"
# Make the initramfs image executable
chmod 0644 "initramfs-$(uname -r).img"
# Copy the initramfs to the root image
cp initramfs-*.img /nfsvol/sysimage/boot/
Copy the resulting initramfs-*.img file and the kernel /boot/vmlinuz-<kernel-version> files to the iPXE HTTP server VSI and place them in the directory that the httpd serves files from.
Exporting the root mount point
Now that the bare metal system image is ready, we add it to the /etc/exports file by adding this line:
/nfsvol/sysimage 10.240.128.0/24(rw,no_root_squash)
and then start the NFS server with this command:
systemctl enable --now nfs-server
Preparing the iPXE scripts
The custom image created by following Amartey Pearson’s steps results in a iPXE EFI that loads a script named script.ipxe from the HTTP server. The script.ipxe we load from the HTTP server has these contents:
#!ipxe
chain http://10.240.64.4:8080/${net0/mac:hexhyp}.script.ipxe || shell
This script simply instructs the bare metal to retrieve another file from the HTTP server using its MAC address as part of the file name. This allows us to have separate iPXE scripts for different bare metal servers which in turn allows each bare metal server to have a unique mount point for its root file system. Without having such a unique file system for each bare metal, they will share the same filesystem as their boot drive with disastrous consequences. Since the MAC address of our bare metal server is 02:00:0e:5d:91:74, we name the iPXE script 02-00-0e-5d-91-74.script.ipxe, and give it these contents:
#!ipxe
kernel vmlinuz-5.14.0-70.50.2.el9_0.x86_64
initrd initramfs-5.14.0-70.50.2.el9_0.x86_64.img
# boot and set the root file system to the share on
# the NFS server (10.240.128.12)
boot vmlinuz-5.14.0-70.50.2.el9_0.x86_64 root=10.240.128.12:/nfsvol/sysimage:vers=4.2,sec=sys,rw panic=60 selinux=0 ipv6.disable=1 console=tty0 console=ttyS0,115200n8
Create the bare metal server
Now that the NFS server and HTTP server iPXE scripts are ready, you can create an IBM Cloud Bare Metal Server for VPC using the custom image created by following Amartey’s article.
When the server provisions and moves to the Starting state you can follow the boot process on the server console. The PXE boot process will look like this: