Build and Install the Kata Containers Runtime with MicroShift on Raspberry Pi 4 with Manjaro
Introduction
MicroShift is a research project that is exploring how OpenShift OKD Kubernetes distribution can be optimized for small form factor devices and edge computing. In Part 23, we installed and used Kata Containers with MicroShift on multiple editions of Fedora 36. In this Part 24, we will compile Kata v2 from source code on Manjaro. We had already installed MicroShift on Manjaro in Part 18, we will work off the same setup. We will install and test Kata Containers with pods for alpine, busybox and nginx images. We will also run the Influx DB sample with a deployment containing multiple Kata containers and show SenseHat metrics in Grafana.
Kata Containers is an open source project and community working to build a standard implementation of lightweight Virtual Machines (VMs) that feel and perform like containers but provide the workload isolation and security advantages of VMs (Source: Kata Containers Website). Traditional Containers uses crun/runc as a container runtime that relies on kernel features such as Cgroups and namespaces to provide isolation with the shared kernel whereas Kata Containers makes containers to be more isolated in their own lightweight VM. With Kata Containers, each container or container pod is launched into a lightweight VM with its own unique kernel instance. Since each container/pod is now running in its own VM, malicious code can no longer exploit the shared kernel to access neighboring containers. Kubernetes CRI (Container Runtime Interface) implementations allow using any OCI-compatible runtime with Kubernetes, such as the Kata Containers runtime. Kata Containers support both the CRI-O and CRI-containerd CRI implementations. RuntimeClass is a feature for selecting the container runtime configuration. The container runtime configuration is used to run a Pod's containers. The picture below shows the Kubernetes integration with containerd-shim-kata-v2.
In the following sections, we start by building the kata containers runtime, followed by the initrd and the rootfs images and finally the kata containers kernel.
Building Kata v2 on Manjaro
Download and install updates
pacman --noconfirm -Syu
pacman --noconfirm -S inetutils # For hostname
Install golang
wget https://golang.org/dl/go1.18.3.linux-arm64.tar.gz
rm -rf /usr/local/go && tar -C /usr/local -xzf go1.18.3.linux-arm64.tar.gz
rm -f go1.18.3.linux-arm64.tar.gz
export PATH=$PATH:/usr/local/go/bin
export GOPATH=/root/go
cat << EOF >> /root/.bash_profile
export PATH=\$PATH:/usr/local/go/bin
export GOPATH=/root/go
EOF
#export GO111MODULE=off
go env -w GO111MODULE=auto
Build and install the Kata Containers runtime
go get -d -u github.com/kata-containers/kata-containers
cd /root/go/src/github.com/kata-containers/kata-containers/src/runtime/
make
make install
Output:
[root@rpi runtime]# make
GENERATE data/kata-collect-data.sh
GENERATE pkg/katautils/config-settings.go
kata-runtime - version 2.5.0-rc0 (commit 0b4a91ec1a508604d687ec55da9a6732144e7917)
• architecture:
Host:
golang:
Build: arm64
• golang:
go version go1.18.3 linux/arm64
• hypervisors:
Default: qemu
Known: acrn cloud-hypervisor firecracker qemu
Available for this architecture: cloud-hypervisor firecracker qemu
• Summary:
destination install path (DESTDIR) : /
binary installation path (BINDIR) : /usr/local/bin
binaries to install :
- /usr/local/bin/kata-runtime
- /usr/local/bin/containerd-shim-kata-v2
- /usr/local/bin/kata-monitor
- /usr/local/bin/data/kata-collect-data.sh
configs to install (CONFIGS) :
- config/configuration-clh.toml
- config/configuration-fc.toml
- config/configuration-qemu.toml
install paths (CONFIG_PATHS) :
- /usr/share/defaults/kata-containers/configuration-clh.toml
- /usr/share/defaults/kata-containers/configuration-fc.toml
- /usr/share/defaults/kata-containers/configuration-qemu.toml
alternate config paths (SYSCONFIG_PATHS) :
- /etc/kata-containers/configuration-clh.toml
- /etc/kata-containers/configuration-fc.toml
- /etc/kata-containers/configuration-qemu.toml
default install path for qemu (CONFIG_PATH) : /usr/share/defaults/kata-containers/configuration.toml
default alternate config path (SYSCONFIG) : /etc/kata-containers/configuration.toml
qemu hypervisor path (QEMUPATH) : /usr/bin/qemu-system-aarch64
cloud-hypervisor hypervisor path (CLHPATH) : /usr/bin/cloud-hypervisor
firecracker hypervisor path (FCPATH) : /usr/bin/firecracker
assets path (PKGDATADIR) : /usr/share/kata-containers
shim path (PKGLIBEXECDIR) : /usr/libexec/kata-containers
BUILD /root/go/src/github.com/kata-containers/kata-containers/src/runtime/kata-runtime
GENERATE config/configuration-qemu.toml
GENERATE config/configuration-clh.toml
GENERATE config/configuration-fc.toml
BUILD /root/go/src/github.com/kata-containers/kata-containers/src/runtime/containerd-shim-kata-v2
BUILD /root/go/src/github.com/kata-containers/kata-containers/src/runtime/kata-monitor
[root@rpi runtime]# make install
kata-runtime - version 2.5.0-rc0 (commit 0b4a91ec1a508604d687ec55da9a6732144e7917)
• architecture:
Host:
golang:
Build: arm64
• golang:
go version go1.18.3 linux/arm64
• hypervisors:
Default: qemu
Known: acrn cloud-hypervisor firecracker qemu
Available for this architecture: cloud-hypervisor firecracker qemu
• Summary:
destination install path (DESTDIR) : /
binary installation path (BINDIR) : /usr/local/bin
binaries to install :
- /usr/local/bin/kata-runtime
- /usr/local/bin/containerd-shim-kata-v2
- /usr/local/bin/kata-monitor
- /usr/local/bin/data/kata-collect-data.sh
configs to install (CONFIGS) :
- config/configuration-clh.toml
- config/configuration-fc.toml
- config/configuration-qemu.toml
install paths (CONFIG_PATHS) :
- /usr/share/defaults/kata-containers/configuration-clh.toml
- /usr/share/defaults/kata-containers/configuration-fc.toml
- /usr/share/defaults/kata-containers/configuration-qemu.toml
alternate config paths (SYSCONFIG_PATHS) :
- /etc/kata-containers/configuration-clh.toml
- /etc/kata-containers/configuration-fc.toml
- /etc/kata-containers/configuration-qemu.toml
default install path for qemu (CONFIG_PATH) : /usr/share/defaults/kata-containers/configuration.toml
default alternate config path (SYSCONFIG) : /etc/kata-containers/configuration.toml
qemu hypervisor path (QEMUPATH) : /usr/bin/qemu-system-aarch64
cloud-hypervisor hypervisor path (CLHPATH) : /usr/bin/cloud-hypervisor
firecracker hypervisor path (FCPATH) : /usr/bin/firecracker
assets path (PKGDATADIR) : /usr/share/kata-containers
shim path (PKGLIBEXECDIR) : /usr/libexec/kata-containers
INSTALL install-scripts
INSTALL install-completions
INSTALL install-configs
INSTALL install-configs
INSTALL install-bin
INSTALL install-containerd-shim-v2
INSTALL install-monitor
The build creates the following:
- runtime binary: /usr/local/bin/kata-runtime and /usr/local/bin/containerd-shim-kata-v2
- configuration file: /usr/share/defaults/kata-containers/configuration.toml
Output:
[root@rpi runtime]# ls /usr/local/bin/kata-runtime /usr/local/bin/containerd-shim-kata-v2 /usr/share/defaults/kata-containers/configuration.toml
/usr/local/bin/containerd-shim-kata-v2 /usr/local/bin/kata-runtime /usr/share/defaults/kata-containers/configuration.toml
Check hardware requirements
sudo kata-runtime check --verbose # This will return error because vmlinux.container does not exist
pacman --noconfirm -S which
`which kata-runtime` --version
`which containerd-shim-kata-v2` --version
Output:
[root@microshift runtime]# sudo kata-runtime check --verbose
ERRO[0000] /usr/share/defaults/kata-containers/configuration-qemu.toml: file /usr/share/kata-containers/vmlinux.container does not exist arch=arm64 name=kata-runtime pid=9296 source=runtime
/usr/share/defaults/kata-containers/configuration-qemu.toml: file /usr/share/kata-containers/vmlinux.container does not exist
[root@rpi src]# which kata-runtime
/usr/local/bin/kata-runtime
[root@rpi src]# `which kata-runtime` --version
kata-runtime : 2.5.0-rc0
commit : 0b4a91ec1a508604d687ec55da9a6732144e7917
OCI specs: 1.0.2-dev
[root@rpi src]# which containerd-shim-kata-v2
/usr/local/bin/containerd-shim-kata-v2
[root@rpi src]# `which containerd-shim-kata-v2` --version
Kata Containers containerd shim: id: "io.containerd.kata.v2", version: 2.5.0-rc0, commit: 0b4a91ec1a508604d687ec55da9a6732144e7917
Kata creates a VM in which to run one or more containers by launching a hypervisor. Kata supports multiple hypervisors. We use QEMU. The hypervisor needs two assets for this task: a Linux kernel and a small root filesystem image to boot the VM. The guest kernel is passed to the hypervisor and used to boot the VM. The default kernel provided in Kata Containers is highly optimized for kernel boot time and minimal memory footprint, providing only those services required by a container workload. The hypervisor uses an image file which provides a minimal root filesystem used by the guest kernel to boot the VM and host the Kata Container. Kata Containers supports both initrd and rootfs based minimal guest images. The default packages provide both an image and an initrd, both of which are created using the osbuilder tool.
The initrd image is a compressed cpio(1) archive, created from a rootfs which is loaded into memory and used as part of the Linux startup process. During startup, the kernel unpacks it into a special instance of a tmpfs mount that becomes the initial root filesystem.
- The runtime will launch the configured hypervisor.
- The hypervisor will boot the mini-OS image using the guest kernel.
- The kernel will start the init daemon as PID 1 (the agent) inside the VM root environment.
- The agent will create a new container environment, setting its root filesystem to that requested by the user.
- The agent will then execute the command inside the new container.
The rootfs image, sometimes referred to as the mini O/S, is a highly optimized container bootstrap system. With this,
- The runtime will launch the configured hypervisor.
- The hypervisor will boot the mini-OS image using the guest kernel.
- The kernel will start the init daemon as PID 1 (systemd) inside the VM root environment.
- systemd, running inside the mini-OS context, will launch the agent in the root context of the VM.
- The agent will create a new container environment, setting its root filesystem to that requested by the user.
- The agent will then execute the command inside the new container.
We will see the output of the qemu commands later.
Configure to use initrd image
Since, Kata containers can run with either an initrd image or a rootfs image, we will build both images but initially use the initrd. We will switch to rootfs in later section. So, make sure you add initrd = /usr/share/kata-containers/kata-containers-initrd.img in the configuration file /usr/share/defaults/kata-containers/configuration.toml and comment out the default image line with the following:
sudo mkdir -p /etc/kata-containers/
sudo install -o root -g root -m 0640 /usr/share/defaults/kata-containers/configuration.toml /etc/kata-containers
sudo sed -i 's/^\(image =.*\)/# \1/g' /etc/kata-containers/configuration.toml
The initrd line is not added by default, so add the initrd line in /etc/kata-containers/configuration.toml so that it looks as follows:
initrd = "/usr/share/kata-containers/kata-containers-initrd.img"
# image = "/usr/share/kata-containers/kata-containers.img"
Next, we create the initrd image and the rootfs image. One of the initrd and image options in Kata runtime config file must be set, but not both. The main difference between the options is that the size of initrd (10MB+) is significantly smaller than rootfs image (100MB+).
Create an initrd image
Create a local rootfs for initrd image
# yes Y | pacman -S podman buildah skopeo # If not already installed
sudo ln -s `which podman` /usr/bin/docker # or run later commands with USE_PODMAN instead of USE_DOCKER
export ROOTFS_DIR="${GOPATH}/src/github.com/kata-containers/kata-containers/tools/osbuilder/rootfs-builder/rootfs"
sudo rm -rf ${ROOTFS_DIR}
cd $GOPATH/src/github.com/kata-containers/kata-containers/tools/osbuilder/rootfs-builder
./rootfs.sh -l
Let’s use the distro=ubuntu
export distro=ubuntu
script -fec 'sudo -E GOPATH=$GOPATH AGENT_INIT=yes USE_DOCKER=true ./rootfs.sh ${distro}'
This will download and compile numerous crates and build the kata-agent (in Rust). When complete, you will see the rootfs directory.
[root@rpi rootfs-builder]# ls rootfs
bin boot dev etc home lib lib64 media mnt opt proc root run sbin srv sys tmp usr var
If you get errors such as following, just run the above command again after some time.
Could not connect to ports.ubuntu.com:80 (185.125.190.39), connection timed out
You may alternatively use the distro=debian
export distro=debian script -fec 'sudo -E GOPATH=$GOPATH AGENT_INIT=yes USE_DOCKER=true ./rootfs.sh ${distro}'
When I tried with the Debian, it resulted in
/kata-containers/tools/osbuilder/rootfs-builder/ubuntu/rootfs_lib.sh: line 22: multistrap: command not found
Failed at 22: multistrap -a "$DEB_ARCH" -d "$rootfs_dir" -f "$multistrap_conf"
This was easily fixed by editiing the debian/Dockerfile.in to install the missing multistrap
wget \
multistrap
# aarch64 requires this name -- link for all
RUN ln -s /usr/bin/musl-gcc "/usr/bin/$(uname -m)-linux-musl-gcc"
and running the script command again
script -fec 'sudo -E GOPATH=$GOPATH AGENT_INIT=yes USE_DOCKER=true ./rootfs.sh ${distro}'
Build an initrd image
cd $GOPATH/src/github.com/kata-containers/kata-containers/tools/osbuilder/initrd-builder
script -fec 'sudo -E AGENT_INIT=yes USE_DOCKER=true ./initrd_builder.sh ${ROOTFS_DIR}'
Output:
[root@rpi rootfs-builder]# cd $GOPATH/src/github.com/kata-containers/kata-containers/tools/osbuilder/initrd-builder
[root@rpi initrd-builder]# script -fec 'sudo -E AGENT_INIT=yes USE_DOCKER=true ./initrd_builder.sh ${ROOTFS_DIR}'
Script started, output log file is 'typescript'.
[OK] init is installed
[OK] Agent is installed
INFO: Creating /root/go/src/github.com/kata-containers/kata-containers/tools/osbuilder/initrd-builder/kata-containers-initrd.img based on rootfs at /root/go/src/github.com/kata-containers/kata-containers/tools/osbuilder/rootfs-builder/rootfs
135141 blocks
Script done.
Install the initrd image
commit=$(git log --format=%h -1 HEAD)
date=$(date +%Y-%m-%d-%T.%N%z)
image="kata-containers-initrd-${date}-${commit}"
sudo install -o root -g root -m 0640 -D kata-containers-initrd.img "/usr/share/kata-containers/${image}"
(cd /usr/share/kata-containers && sudo ln -sf "$image" kata-containers-initrd.img)
Output:
[root@rpi initrd-builder]# commit=$(git log --format=%h -1 HEAD)
[root@rpi initrd-builder]# date=$(date +%Y-%m-%d-%T.%N%z)
[root@rpi initrd-builder]# image="kata-containers-initrd-${date}-${commit}"
[root@rpi initrd-builder]# sudo install -o root -g root -m 0640 -D kata-containers-initrd.img "/usr/share/kata-containers/${image}"
[root@rpi initrd-builder]# (cd /usr/share/kata-containers && sudo ln -sf "$image" kata-containers-initrd.img)
Create a rootfs image
Create a local rootfs for rootfs image
export ROOTFS_DIR=${GOPATH}/src/github.com/kata-containers/kata-containers/tools/osbuilder/rootfs-builder/rootfs
sudo rm -rf ${ROOTFS_DIR}
cd $GOPATH/src/github.com/kata-containers/kata-containers/tools/osbuilder/rootfs-builder
script -fec 'sudo -E GOPATH=$GOPATH USE_DOCKER=true ./rootfs.sh ${distro}'
Build a rootfs image
cd $GOPATH/src/github.com/kata-containers/kata-containers/tools/osbuilder/image-builder
script -fec 'sudo -E USE_DOCKER=true ./image_builder.sh ${ROOTFS_DIR}'
Install the rootfs image
commit=$(git log --format=%h -1 HEAD)
date=$(date +%Y-%m-%d-%T.%N%z)
image="kata-containers-${date}-${commit}"
sudo install -o root -g root -m 0640 -D kata-containers.img "/usr/share/kata-containers/${image}"
(cd /usr/share/kata-containers && sudo ln -sf "$image" kata-containers.img)
Build Kata Containers Kernel
The process to build a kernel for Kata Containers is automated.
pacman -S flex bison bc
go env -w GO111MODULE=auto
go get github.com/kata-containers/packaging
cd $GOPATH/src/github.com/kata-containers/packaging/kernel
The script ./build-kernel.sh tries to apply the patches from ${GOPATH}/src/github.com/kata-containers/packaging/kernel/patches/ when it sets up a kernel. If you want to add a source modification, add a patch on this directory. The script also copies or generates a kernel config file from ${GOPATH}/src/github.com/kata-containers/packaging/kernel/configs/ to .config in the kernel source code. You can modify it as needed. We will use the defaults.
./build-kernel.sh setup
After the kernel source code is ready, we build the kernel
cp /root/go/src/github.com/kata-containers/packaging/kernel/configs/fragments/arm64/.config kata-linux-5.4.60-92/.config
./build-kernel.sh build
Install the kernel to the default Kata containers path (/usr/share/kata-containers/)
./build-kernel.sh install
The /etc/kata-containers/configuration.toml has the following:
# Path to vhost-user-fs daemon.
virtio_fs_daemon = "/usr/libexec/virtiofsd"
So, create a symbolic link as follows:
ln -s /usr/lib/qemu/virtiofsd /usr/libexec/virtiofsd
Cgroup v2 on host is not yet supported. So, we need to switch to cgroup v1. Concatenate the following onto the end of the existing line (do not add a new line) in /boot/cmdline.txt
systemd.unified_cgroup_hierarchy=0 systemd.legacy_systemd_cgroup_controller
Then, reboot the Raspberry Pi 4 and log back in as root.
mount | grep cgroup
Output:
[root@rpi ~]# mount | grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,size=4096k,nr_inodes=1024,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
Alternatively, instead of appending the kernel arguments with unified_cgroup_hierarchy=0, you may run the following after every reboot
mkdir /sys/fs/cgroup/systemd
mount -t cgroup -o none,name=systemd cgroup /sys/fs/cgroup/systemd
Check the output kata-runtime:
[root@rpi src]# sudo kata-runtime check --verbose
WARN[0000] Not running network checks as super user arch=arm64 name=kata-runtime pid=3952 source=runtime
INFO[0000] Unable to know if the system is running inside a VM arch=arm64 source=virtcontainers/hypervisor
INFO[0000] kernel property found arch=arm64 description="Kernel-based Virtual Machine" name=kvm pid=3952 source=runtime type=module
INFO[0000] kernel property found arch=arm64 description="Host kernel accelerator for virtio" name=vhost pid=3952 source=runtime type=module
INFO[0000] kernel property found arch=arm64 description="Host kernel accelerator for virtio network" name=vhost_net pid=3952 source=runtime type=module
INFO[0000] kernel property found arch=arm64 description="Host Support for Linux VM Sockets" name=vhost_vsock pid=3952 source=runtime type=module
System is capable of running Kata Containers
INFO[0000] device available arch=arm64 check-type=full device=/dev/kvm name=kata-runtime pid=3952 source=runtime
INFO[0000] feature available arch=arm64 check-type=full feature=create-vm name=kata-runtime pid=3952 source=runtime
INFO[0000] kvm extension is supported arch=arm64 description="Maximum IPA shift supported by the host" id=165 name=KVM_CAP_ARM_VM_IPA_SIZE pid=3952 source=runtime type="kvm extension"
INFO[0000] IPA limit size: 44 bits. arch=arm64 name=KVM_CAP_ARM_VM_IPA_SIZE pid=3952 source=runtime type="kvm extension"
System can currently create Kata Containers
Check the hypervisor.qemu section in configuration.toml:
[root@rpi kata]# cat /etc/kata-containers/configuration.toml | awk -v RS= '/\[hypervisor.qemu\]/'
[hypervisor.qemu]
path = "/usr/bin/qemu-system-aarch64"
kernel = "/usr/share/kata-containers/vmlinuz.container"
initrd = "/usr/share/kata-containers/kata-containers-initrd.img"
# image = "/usr/share/kata-containers/kata-containers.img"
machine_type = "virt"
Check the initrd image (kata-containers-initrd.img), the rootfs image (kata-containers.img), and the kernel in the /usr/share/kata-containers directory:
[root@rpi kata]# ls -las /usr/share/kata-containers
total 339588
4 drwxr-xr-x 2 root root 4096 Jul 23 19:41 .
4 drwxr-xr-x 122 root root 4096 Jul 12 16:07 ..
68 -rw-r--r-- 1 root root 68526 Jul 12 16:31 config-5.4.60
131076 -rw-r----- 1 root root 134217728 Jul 12 15:47 kata-containers-2022-07-12-15:47:42.838238693-0400-d8d6998
131072 -rw-r----- 1 root root 134217728 Jul 23 19:41 kata-containers-2022-07-23-19:41:18.888912761-0400-0b4a91ec1
4 lrwxrwxrwx 1 root root 60 Jul 23 19:41 kata-containers.img -> kata-containers-2022-07-23-19:41:18.888912761-0400-0b4a91ec1
37124 -rw-r----- 1 root root 38013194 Jul 12 16:46 kata-containers-initrd-2022-07-12-16:46:00.687387798-0400-d8d6998
25992 -rw-r----- 1 root root 26612417 Jul 23 18:50 kata-containers-initrd-2022-07-23-18:50:58.452932320-0400-0b4a91ec1
4 lrwxrwxrwx 1 root root 67 Jul 23 18:50 kata-containers-initrd.img -> kata-containers-initrd-2022-07-23-18:50:58.452932320-0400-0b4a91ec1
9708 -rw-r--r-- 1 root root 10179072 Jul 12 16:31 vmlinux-5.4.60-92
0 lrwxrwxrwx 1 root root 17 Jul 12 16:31 vmlinux.container -> vmlinux-5.4.60-92
4532 -rw-r--r-- 1 root root 4638904 Jul 12 16:31 vmlinuz-5.4.60-92
0 lrwxrwxrwx 1 root root 17 Jul 12 16:31 vmlinuz.container -> vmlinuz-5.4.60-92
The kernel file is called vmlinuz-version. vmlinuz is the name of the Linux kernel executable. vmlinuz is a compressed Linux kernel, and it can load the operating system into memory so that the computer becomes usable and application programs can be run. When virtual memory was developed for easier multitasking abilities, “vm” was put at the front of the file to show that the kernel supports virtual memory. For a while the Linux kernel was called vmlinux, but the kernel grew too large to fit in the available boot memory, so the kernel image was compressed, and the ending x was changed to a z to show it was compressed with zlib compression. This same compression isn’t always used, often replaced with LZMA or BZIP2, and some kernels are simply called zImage.
At the head of this kernel image (vmlinuz) is a routine that does some minimal amount of hardware setup and then decompresses the kernel contained within the kernel image and places it into high memory. If an initial RAM disk image (initrd) is present, this routine moves it into memory (or we can say extract the compressed ramdisk image into the real memory) and notes it for later use. The routine then calls the kernel, and the kernel boot begins. The initial RAM disk (initrd) is an initial root file system that is mounted prior to when the real rootfile system is available. The initrd is bound to the kernel and loaded as part of the kernel boot procedure. The kernel then mounts this initrd as part of the two-stage boot process to load the modules to make the real file systems available and get at the real root file system. The initrd contains a minimal set of directories and executables to achieve this, such as the insmod tool to install kernel modules into the kernel.
Many Linux distributions ship a single, generic Linux kernel image – one that the distribution's developers create specifically to boot on a wide variety of hardware. The device drivers for this generic kernel image are included as loadable kernel modules because statically compiling many drivers into one kernel causes the kernel image to be much larger, perhaps too large to boot on computers with limited memory.
Create the file /etc/crio/crio.conf.d/50-kata
cat > /etc/crio/crio.conf.d/50-kata << EOF
[crio.runtime.runtimes.kata]
runtime_path = "/usr/local/bin/containerd-shim-kata-v2"
runtime_root = "/run/vc"
runtime_type = "vm"
privileged_without_host_devices = true
EOF
Restart crio and start microshift
systemctl restart crio
systemctl start microshift
export KUBECONFIG=/var/lib/microshift/resources/kubeadmin/kubeconfig
Running some Kata samples
After MicroShift is started, you can apply the kata runtimeclass and run the samples.
cd ~
git clone https://github.com/thinkahead/microshift.git
cd ~/microshift/raspberry-pi/kata/
oc apply -f kata-runtimeclass.yaml
# Start three kata pods
oc apply -f kata-nginx.yaml -f kata-alpine.yaml -f kata-busybox.yaml
watch "oc get nodes;oc get pods -A;crictl stats -a"
Output:
NAME STATUS ROLES AGE VERSION
rpi.example.com Ready <none> 6m30s v1.21.0
NAMESPACE NAME READY STATUS RESTARTS AGE
default busybox-1 1/1 Running 0 87s
default kata-alpine 1/1 Running 0 87s
default kata-nginx 1/1 Running 0 4m13s
kube-system kube-flannel-ds-q72px 1/1 Running 0 6m29s
kubevirt-hostpath-provisioner kubevirt-hostpath-provisioner-zllj5 1/1 Running 0 6m29s
openshift-dns dns-default-rgcgl 2/2 Running 0 6m30s
openshift-dns node-resolver-wbxjv 1/1 Running 0 6m30s
openshift-ingress router-default-85bcfdd948-cpbsf 1/1 Running 0 6m34s
openshift-service-ca service-ca-7764c85869-tppsk 1/1 Running 0 6m35s
CONTAINER CPU % MEM DISK INODES
22e6c795dc826 0.00 1.372MB 89B 10
3e98d8dcf5892 0.00 0B 6.961kB 15
62b79de9bf9c8 0.00 335.9kB 12B 15
66296acecd982 0.00 0B 12B 18
90f1c9544f85d 0.00 0B 0B 7
9cefcde983a44 0.00 0B 138B 17
a1dd48420b598 0.00 0B 12.3kB 25
a41c53578f43d 0.00 0B 12B 17
b3fa22e8384b7 0.00 0B 0B 8
e31b7a624d532 0.00 2.343MB 1.225kB 23
We can see that the kernel used in the kata containers (when we created the initrd for Debian image is 5.4.60) is different from that host 5.15.52-1-MANJARO-ARM-RPI (5.15.56-1-MANJARO-ARM-RPI on latest install)
[root@rpi kata]# kata-runtime kata-env
[Kernel]
Path = "/usr/share/kata-containers/vmlinuz-5.4.60-92"
Parameters = "scsi_mod.scan=none"
[root@rpi kata]# oc exec -it kata-nginx -- uname -a
Linux kata-nginx 5.4.60 #1 SMP Tue Jul 12 16:09:33 EDT 2022 aarch64 GNU/Linux
[root@rpi kata]# oc exec -it kata-alpine -- uname -a
Linux kata-alpine 5.4.60 #1 SMP Tue Jul 12 16:09:33 EDT 2022 aarch64 Linux
[root@rpi kata]# oc exec -it busybox-1 -- uname -a
Linux busybox-1 5.4.60 #1 SMP Tue Jul 12 16:09:33 EDT 2022 aarch64 GNU/Linux
[root@rpi kata]# uname -a
Linux rpi.example.com 5.15.52-1-MANJARO-ARM-RPI #1 SMP PREEMPT Tue Jul 5 12:49:52 UTC 2022 aarch64 GNU/Linux
Check that we can ping from the kata-alpine container if you set the default_capabilities "NET_RAW" in /etc/crio/crio.conf as shown in Part 23
[root@rpi /]# oc exec -it kata-alpine -- ping -c2 google.com
PING google.com (142.250.80.110): 56 data bytes
64 bytes from 142.250.80.110: seq=0 ttl=117 time=3.702 ms
64 bytes from 142.250.80.110: seq=1 ttl=117 time=3.977 ms
--- google.com ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 3.702/3.839/3.977 ms
When done, we can delete the sample deployments
[root@rpi kata]# oc delete -f kata-alpine.yaml -f kata-busybox.yaml -f kata-nginx.yaml
pod "kata-alpine" deleted
pod "busybox-1" deleted
pod "kata-nginx" deleted
Influxdb Sample
We execute the runall-balena-dynamic.sh for Manjaro (instead of the runall-fedora-dynamic.sh because the Fedora workaround with sense_hat.py.new is not required) after updating the deployment yamls to use the runtimeclass: kata.
cd ~
git clone https://github.com/thinkahead/microshift.git
cd ~/microshift/raspberry-pi/influxdb/
Update the influxdb-deployment.yaml, telegraf-deployment.yaml and grafana/grafana-deployment.yaml to use the runtimeClassName: kata. With Kata containers, we do not directly get access to the host devices. So, we run the measure container as a runc pod. In runc, '--privileged' for a container means all the /dev/* block devices from the host are mounted into the guest. This will allow the privileged container to gain access to mount any block device from the host.
# sed -i '/^ spec:/a \ \ \ \ \ \ runtimeClassName: kata' influxdb-deployment.yaml telegraf-deployment.yaml grafana/grafana-deployment.yaml
sed -i '/^ spec:/a \ \ \ \ \ \ runtimeClassName: kata' influxdb-deployment.yaml
sed -i '/^ spec:/a \ \ \ \ \ \ runtimeClassName: kata' telegraf-deployment.yaml
sed -i '/^ spec:/a \ \ \ \ \ \ runtimeClassName: kata' grafana/grafana-deployment.yaml
Now, get the nodename
[root@rpi influxdb]# oc get nodes
NAME STATUS ROLES AGE VERSION
rpi.example.com Ready <none> 12h v1.21.0
Replace the annotation kubevirt.io/provisionOnNode with the above nodename rpi.example.com and execute the runall-balena-dynamic.sh. This will create a new project influxdb.
nodename=rpi.example.com
sed -i "s|kubevirt.io/provisionOnNode:.*| kubevirt.io/provisionOnNode: $nodename|" influxdb-data-dynamic.yaml
sed -i "s| kubevirt.io/provisionOnNode:.*| kubevirt.io/provisionOnNode: $nodename|" grafana/grafana-data-dynamic.yaml
./runall-balena-dynamic.sh
Let’s watch the stats (CPU%, Memory, Disk and Inodes) of the kata container pods:
watch "oc get nodes;oc get pods;crictl stats"
Output:
NAME STATUS ROLES AGE VERSION
rpi.example.com Ready <none> 4h33m v1.21.0
NAME READY STATUS RESTARTS AGE
grafana-855ffb48d8-bxrbb 1/1 Running 0 4m22s
influxdb-deployment-6d898b7b7b-vtnrt 1/1 Running 0 4m59s
measure-deployment-58cddb5745-dzpqd 1/1 Running 0 4m48s
telegraf-deployment-d746f5c6-f6h55 1/1 Running 0 4m38s
CONTAINER CPU % MEM DISK INODES
67dea04a753ab 1.50 24.48MB 265B 13
89bf5e9fbe2ab 4.07 12.06MB 186kB 13
89e0ed1ad3557 0.05 26.06MB 4.021MB 77
We can look at the RUNTIME_CLASS using custom columns:
[root@rpi influxdb]# oc get pods -o custom-columns=NAME:metadata.name,STATUS:.status.phase,RUNTIME_CLASS:.spec.runtimeClassName,IP:.status.podIP,IMAGE:.status.containerStatuses[].image -A
NAME STATUS RUNTIME_CLASS IP IMAGE
busybox-1 Running kata 10.85.0.117 docker.io/library/busybox:latest
kata-alpine Running kata 10.85.0.115 docker.io/karve/alpine-sshclient:arm64
kata-nginx Running kata 10.85.0.118 docker.io/library/nginx:latest
grafana-855ffb48d8-f5cp7 Running kata 10.85.0.122 docker.io/grafana/grafana:5.4.3
influxdb-deployment-6d898b7b7b-wf4rx Running kata 10.85.0.119 docker.io/library/influxdb:1.7.4
measure-deployment-58cddb5745-p57k7 Running <none> 10.85.0.120 docker.io/karve/measure:latest
telegraf-deployment-d746f5c6-2nnlv Running kata 10.85.0.121 docker.io/library/telegraf:1.10.0
kube-flannel-ds-6spzn Running <none> 192.168.1.228 quay.io/microshift/flannel:4.8.0-0.okd-2021-10-10-030117
kubevirt-hostpath-provisioner-7mp5f Running <none> 10.85.0.2 quay.io/microshift/hostpath-provisioner:4.8.0-0.okd-2021-10-10-030117
dns-default-ll4jc Running <none> 10.85.0.4 quay.io/microshift/coredns:4.8.0-0.okd-2021-10-10-030117
node-resolver-84z6x Running <none> 192.168.1.228 quay.io/microshift/cli:4.8.0-0.okd-2021-10-10-030117
router-default-85bcfdd948-wkg9r Running <none> 192.168.1.228 quay.io/microshift/haproxy-router:4.8.0-0.okd-2021-10-10-030117
service-ca-7764c85869-sgzs5 Running <none> 10.85.0.3 quay.io/microshift/service-ca-operator:4.8.0-0.okd-2021-10-10-030117
Check the qemu process. We used the initrd image and we can see that in the parameters:
ps -ef | grep qemu
Output:
root 94205 1 12 13:54 ? 00:00:02 /usr/bin/qemu-system-aarch64 -name sandbox-a9f5ae9072018e67e56f9a8582319664d34df5dab6ec45f84e499c4cc369e4a4 -uuid a074427c-40ba-4291-8a4c-787ab4c7e4ac -machine virt,usb=off,accel=kvm,gic-version=host -cpu host,pmu=off -qmp unix:/run/vc/vm/a9f5ae9072018e67e56f9a8582319664d34df5dab6ec45f84e499c4cc369e4a4/qmp.sock,server=on,wait=off -m 2048M,slots=10,maxmem=7810M -device pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=off,addr=2,io-reserve=4k,mem-reserve=1m,pref64-reserve=1m -device virtio-serial-pci,disable-modern=false,id=serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/vc/vm/a9f5ae9072018e67e56f9a8582319664d34df5dab6ec45f84e499c4cc369e4a4/console.sock,server=on,wait=off -device virtio-scsi-pci,id=scsi0,disable-modern=false -object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng-pci,rng=rng0 -device vhost-vsock-pci,disable-modern=false,vhostfd=3,id=vsock-3352093150,guest-cid=3352093150 -chardev socket,id=char-e2e46bed4a955485,path=/run/vc/vm/a9f5ae9072018e67e56f9a8582319664d34df5dab6ec45f84e499c4cc369e4a4/vhost-fs.sock -device vhost-user-fs-pci,chardev=char-e2e46bed4a955485,tag=kataShared -netdev tap,id=network-0,vhost=on,vhostfds=4,fds=5 -device driver=virtio-net-pci,netdev=network-0,mac=3e:07:15:1d:a5:f9,disable-modern=false,mq=on,vectors=4 -rtc base=utc,driftfix=slew,clock=host -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic --no-reboot -daemonize -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on -numa node,memdev=dimm1 -kernel /usr/share/kata-containers/vmlinux-5.4.60-92 -initrd /usr/share/kata-containers/kata-containers-initrd-2022-08-04-19:44:07.258170129-0400-587c0c5e5 -append iommu.passthrough=0 console=hvc0 console=hvc1 quiet panic=1 nr_cpus=4 scsi_mod.scan=none -pidfile /run/vc/vm/a9f5ae9072018e67e56f9a8582319664d34df5dab6ec45f84e499c4cc369e4a4/pid -smp 1,cores=1,threads=1,sockets=4,maxcpus=4
Add the "<RaspberryPiIPAddress> grafana-service-influxdb.cluster.local" to /etc/hosts on your laptop and login to http://grafana-service-influxdb.cluster.local/login using admin/admin. You will need to change the password on first login. Go to the Dashboards list (left menu > Dashboards > Manage). Open the Analysis Server dashboard to display monitoring information for MicroShift. Open the Balena Sense dashboard to show the temperature, pressure, and humidity from SenseHat.
Finally, after you are done working with this sample, you can run the deleteall-balena-dynamic.sh
./deleteall-balena-dynamic.sh
Deleting the persistent volume claims automatically deletes the persistent volumes.
Configure to use the rootfs image
We have been using the initrd image when running the samples above, now let’s switch to the rootfs image instead of using initrd by changing the following lines in /etc/kata-containers/configuration.toml
#initrd = "/usr/share/kata-containers/kata-containers-initrd.img"
image = "/usr/share/kata-containers/kata-containers.img"
Also disable the image nvdimm by setting the following:
disable_image_nvdimm = true # Default is false
Restart crio and test with the kata-alpine sample
systemctl restart crio
cd ~/microshift/raspberry-pi/kata/
oc apply -f kata-alpine.yaml
Output of qemu process when we use rootfs image with disable_image_nvdimm=true
root 90939 1 13 13:37 ? 00:00:02 /usr/bin/qemu-system-aarch64 -name sandbox-892c3bd11a5e478d7760765a5fe384a722c40d8fbf8e9f9b89b8ef41467e498b -uuid 6abac702-3b7b-4268-97b7-9a7ef14cec1b -machine virt,usb=off,accel=kvm,gic-version=host -cpu host,pmu=off -qmp unix:/run/vc/vm/892c3bd11a5e478d7760765a5fe384a722c40d8fbf8e9f9b89b8ef41467e498b/qmp.sock,server=on,wait=off -m 2048M,slots=10,maxmem=7810M -device pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=off,addr=2,io-reserve=4k,mem-reserve=1m,pref64-reserve=1m -device virtio-serial-pci,disable-modern=false,id=serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/vc/vm/892c3bd11a5e478d7760765a5fe384a722c40d8fbf8e9f9b89b8ef41467e498b/console.sock,server=on,wait=off -device virtio-blk-pci,disable-modern=false,drive=image-4c9c03c6b770408f,scsi=off,config-wce=off,share-rw=on,serial=image-4c9c03c6b770408f -drive id=image-4c9c03c6b770408f,file=/usr/share/kata-containers/kata-containers-2022-08-05-12:53:58.236244980-0400-587c0c5e5,aio=threads,format=raw,if=none,readonly=on -device virtio-scsi-pci,id=scsi0,disable-modern=false -object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng-pci,rng=rng0 -device vhost-vsock-pci,disable-modern=false,vhostfd=3,id=vsock-4175752710,guest-cid=4175752710 -chardev socket,id=char-f64cd1bfb913eecc,path=/run/vc/vm/892c3bd11a5e478d7760765a5fe384a722c40d8fbf8e9f9b89b8ef41467e498b/vhost-fs.sock -device vhost-user-fs-pci,chardev=char-f64cd1bfb913eecc,tag=kataShared -netdev tap,id=network-0,vhost=on,vhostfds=4,fds=5 -device driver=virtio-net-pci,netdev=network-0,mac=da:2c:9f:7a:2e:21,disable-modern=false,mq=on,vectors=4 -rtc base=utc,driftfix=slew,clock=host -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic --no-reboot -daemonize -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on -numa node,memdev=dimm1 -kernel /usr/share/kata-containers/vmlinux-5.4.60-92 -append iommu.passthrough=0 root=/dev/vda1 rootflags=data=ordered,errors=remount-ro ro rootfstype=ext4 console=hvc0 console=hvc1 quiet systemd.show_status=false panic=1 nr_cpus=4 systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket scsi_mod.scan=none -pidfile /run/vc/vm/892c3bd11a5e478d7760765a5fe384a722c40d8fbf8e9f9b89b8ef41467e498b/pid -smp 1,cores=1,threads=1,sockets=4,maxcpus=4
If we try to use the rootfs image with the disable_image_nvdimm=false, we get the following error, and the Kata container does not start.
0s Warning FailedCreatePodSandBox pod/kata-alpine Failed to create pod sandbox: rpc error: code = Unknown desc = CreateContainer failed: failed to launch qemu: exit status 1, error messages from qemu log: qemu-system-aarch64: -device nvdimm,id=nv0,memdev=mem0: memory hotplug is not enabled: missing acpi-ged device...
Output of qemu process when we use rootfs image with the disable_image_nvdimm=false
root 77890 1 0 13:13 ? 00:00:00 /usr/bin/qemu-system-aarch64 -name sandbox-e43561d381112254278ae2d1cd604a7a2b8d03c1065ab54b3bbdbc6af48cf9a9 -uuid c27a22de-4421-437e-ac2a-10decd17ecd2 -machine virt,usb=off,accel=kvm,gic-version=host,nvdimm=on -cpu host,pmu=off -qmp unix:/run/vc/vm/e43561d381112254278ae2d1cd604a7a2b8d03c1065ab54b3bbdbc6af48cf9a9/qmp.sock,server=on,wait=off -m 2048M,slots=10,maxmem=7810M -device pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=off,addr=2,io-reserve=4k,mem-reserve=1m,pref64-reserve=1m -device virtio-serial-pci,disable-modern=false,id=serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/vc/vm/e43561d381112254278ae2d1cd604a7a2b8d03c1065ab54b3bbdbc6af48cf9a9/console.sock,server=on,wait=off -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/usr/share/kata-containers/kata-containers-2022-08-05-12:53:58.236244980-0400-587c0c5e5,size=134217728 -device virtio-scsi-pci,id=scsi0,disable-modern=false -object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng-pci,rng=rng0 -device vhost-vsock-pci,disable-modern=false,vhostfd=3,id=vsock-19022696,guest-cid=19022696 -chardev socket,id=char-68ffbbd4f7c62079,path=/run/vc/vm/e43561d381112254278ae2d1cd604a7a2b8d03c1065ab54b3bbdbc6af48cf9a9/vhost-fs.sock -device vhost-user-fs-pci,chardev=char-68ffbbd4f7c62079,tag=kataShared -netdev tap,id=network-0,vhost=on,vhostfds=4,fds=5 -device driver=virtio-net-pci,netdev=network-0,mac=ce:6a:b5:ee:c4:6e,disable-modern=false,mq=on,vectors=4 -rtc base=utc,driftfix=slew,clock=host -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic --no-reboot -daemonize -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on -numa node,memdev=dimm1 -kernel /usr/share/kata-containers/vmlinux-5.4.60-92 -append iommu.passthrough=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro ro rootfstype=ext4 console=hvc0 console=hvc1 quiet systemd.show_status=false panic=1 nr_cpus=4 systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket scsi_mod.scan=none -pidfile /run/vc/vm/e43561d381112254278ae2d1cd604a7a2b8d03c1065ab54b3bbdbc6af48cf9a9/pid -smp 1,cores=1,threads=1,sockets=4,maxcpus=4
We can also run MicroShift Containerized as shown in Part 18 and execute the Jupyter Notebook samples for Digit Recognition, Object Detection and License Plate Recognition with Kata containers as shown in Part 23.
Errors
QMP command failed: The feature 'query-hotpluggable-cpus' is not enabled
This error occurs if you add resource limits to the containers. The dedicated legacy interface: cpu-add QMP command is removed in QEMU v5.2. Need to use the device_add interface.
Conclusion
In this Part 24, we looked at building Kata containers from source and running MicroShift with Kata Containers on Manjaro. We ran multiple samples using the Kata Containers Runtime from MicroShift and viewed the metrics from the containers. In Part 25 we will install and use MicroShift with KubeVirt and Kata Containers on Raspberry Pi 4 with Pop!_OS.
Hope you have enjoyed the article. Share your thoughts in the comments or engage in the conversation with me on Twitter @aakarve. I look forward to hearing about your use of MicroShift, KubeVirt and Kata Containers on ARM devices and if you would like to see something covered in more detail.
References