Setting up the Jetson Nano and using CRI-O
Introduction
MicroShift is a research project that is exploring how OpenShift OKD Kubernetes distribution can be optimized for small form factor devices and edge computing. In the previous Part 1 of this series, we looked at the different edge computing requirements and where MicroShift fits in. We built and deployed MicroShift on a Virtual Machine in VirtualBox and in a VM using Multipass on a MacBook Pro.
In this Part 2, we will set up the Jetson Nano with Ubuntu 18.04 and install the dependencies for MicroShift on a Jetson Nano Developer Kit. When we ultimately run containers in MicroShift, we use the CRI-O container engine that is an implementation of the Kubernetes Container Runtime Interface (CRI) to use Open Container Initiative (OCI) compatible runtimes. We will set up CRI-O to use the Nvidia container runtime with GPU. Further, we will directly use crun and crictl, a command-line interface for CRI-compatible container runtimes to manage pods and containers. In Part 3, we will build and deploy MicroShift on the Jetson Nano.
Setting up the Jetson Nano
As of this writing, the Jetpack 4.6 is available for the Jetson Nano. JetPack 4.6 includes L4T 32.6.1 Linux Driver Package. You should use at least a 64GB microSDXC card so that you have sufficient space for experimentation. We can download the 6.53GB SD card image for the Jetson Nano Developer Kit and write to microSDXC card using balenaEtcher. After your microSDXC card is flashed and validated, proceed to set up your developer kit with the ethernet cable, USB keyboard and mouse connected and a display attached. During the System Configuration, accept the license, enter the computer’s name (for example microshift) and username/password (for example dlinano), use the suggested “Maximum accepted size” of APP partition and use the default Nvpmodel Mode. The device will automatically reboot after the setup. Note down the IP address of eth0 (jetsonnano-ipaddress), we will now ssh to the Jetson Nano from the Laptop.
# If your hostname is without a domain, you may want to add the ".example.com"sudo su -hostnamectl set-hostname microshift.example.comdpkg-reconfigure tzdata # Select your timezone if not already done
# We will not need the ubuntu desktop, let's save on memory usage
sudo apt-get -y remove --purge chromium-browser chromium-browser-l10n
sudo apt-get -y purge ubuntu-desktop
sudo apt-get -y purge unity gnome-shell lightdm
sudo apt-get -y remove ubuntu-desktop
sudo apt purge ubuntu-desktop -y && sudo apt autoremove -y && sudo apt autoclean
sudo apt-get -y clean
sudo apt-get -y autoremove
sudo apt-get -f install
sudo reboot
Attaching the Fan
Under most conditions, the large heatsink on the Jetson Nano keeps the system running within the design thermal limits. When running very GPU intensive loads or when the Jetson is in a very warm environment, a 5V cooling fan should be attached. The Waveshare Fan-4020-PWM-5V attaches via counter sunk screw holes to the Jetson Nano’s heatsink via 4 mounting points. There are 4 holes drilled into the heatsink at the factory. The screws are provided that go in one way only to make the fan fit snugly with the sticker label at the middle of the fan facing downwards towards the heat sink. If it doesn't seem long enough, the fan is upside down. The 4 pins for the reverse-proof connector are: GND, 5V, Tachometer, PWM. The fan gets power from the Jetson Nano from the GND and 5V pins. The Tachometer signal tells the Jetson Nano what the speed of the fan currently is, and the PWM signal allows the Jetson to control the fan speed.
Let’s try out the fan.
sudo /usr/bin/jetson_clocks
sudo sh -c 'echo 255 > /sys/devices/pwm-fan/target_pwm'
Instead of 255, we can set a lower number to slow down the fan and 0 to stop it.
sudo sh -c 'echo 0 > /sys/devices/pwm-fan/target_pwm'
We can get the temperature with
cat /sys/devices/virtual/thermal/thermal_zone0/temp
The Jetson Nano fan control daemon controls the fan speed based on the temperature.
git clone https://github.com/Pyrestone/jetson-fan-ctl.git
cd jetson-fan-ctl/
./install.sh
# Customize /etc/automagic-fan/config.json
watch "cat /sys/devices/virtual/thermal/thermal_zone0/temp;echo ---;cat /sys/devices/pwm-fan/target_pwm"
Testing the Jetson Nano
Install the jetson stats
The jetson-stats is a package for monitoring and control your NVIDIA Jetson
sudo su -
apt-get update
apt-get install -y python3 python3-pip
pip3 install -U jetson-stats
jetson_release -v
Output:
root@microshift:~# jetson_release -v
- NVIDIA Jetson Nano (Developer Kit Version)
* Jetpack 4.6 [L4T 32.6.1]
* NV Power Mode: MAXN - Type: 0
* jetson_stats.service: active
- Board info:
* Type: Nano (Developer Kit Version)
* SOC Family: tegra210 - ID:33
* Module: P3448-0000 - Board: P3449-0000
* Code Name: porg
* Boardids: 3448
* CUDA GPU architecture (ARCH_BIN): 5.3
* Serial Number: 1421421054431
- Libraries:
* CUDA: 10.2.300
* cuDNN: 8.2.1.32
* TensorRT: 8.0.1.6
* Visionworks: 1.6.0.501
* OpenCV: 4.1.1 compiled CUDA: NO
* VPI: ii libnvvpi1 1.1.15 arm64 NVIDIA Vision Programming Interface library
* Vulkan: 1.2.70
- jetson-stats:
* Version 3.1.1
* Works on Python 3.6.9
Watch the CPU, GPU and disk activity using jtop
jtop cat /etc/nv_tegra_release
Output:
root@microshift:~# cat /etc/nv_tegra_release
# R32 (release), REVISION: 6.1, GCID: 27863751, BOARD: t210ref, EABI: aarch64, DATE: Mon Jul 26 19:20:30 UTC 2021
Testing the Jupyter Lab container in docker
Try out the Deep Learning Institute (DLI) course "Getting Started with AI on Jetson Nano" Course Environment Container with a USB camera attached
docker run --runtime nvidia -it --rm --network host --volume ~/nvdli-data:/nvdli-nano/data --device /dev/video0 nvcr.io/nvidia/dli/dli-nano-ai:v2.0.1-r32.6.1
If you see the “error adding seccomp filter rule for syscall clone3: permission denied: unknown”, look at the Errors section below to fix it.
Connect to your Jetson Nano ip address from your Laptop and login with password dlinano http://jetsonnano-ipaddress:8888/lab?
We can run the notebook /hello_camera/usb_camera.ipynb and test the camera. After testing, release the camera resource and shutdown the kernel.
The tutorial shows how to use a USB or a CSI camera. We can attach and try out Raspberry Pi NOIR Camera Module V2. Support for the rPI camera v2 (IMX219) is built into JetPack. The v2 Pi NoIR has a Sony IMX219 8-megapixel sensor (compared to the 5-megapixel OmniVision OV5647 sensor of the original camera). The earlier V1 camera OV5647 does not work out of box with the Jetson Nano because the driver is not included. If you have the CSI-2 camera attached, you can run
docker run --runtime nvidia -it --rm --network host --volume ~/nvdli-data:/nvdli-nano/data --volume /tmp/argus_socket:/tmp/argus_socket --device /dev/video0 nvcr.io/nvidia/dli/dli-nano-ai:v2.0.1-r32.6.1
This blog will assume that an USB camera is attached.
Testing the USB camera attached to the Jetson Nano with gstreamer on Mac
These steps will allow you to view your video stream on the Mac using gstreamer. Install the jetson-inference so that you can run video-viewer.
On Mac
On Jetson Nano
video-viewer --bitrate=1000000 /dev/video0 rtp://mac-ipaddress:1234
On Mac
gst-launch-1.0 -v udpsrc port=1234 caps = "application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)H264, payload=(int)96" ! rtph264depay ! decodebin ! videoconvert ! autovideosink
Updating your Jetson Nano to new Minor Release
Jetson Nano downloads are at https://developer.nvidia.com/embedded/downloads and OS information at https://developer.nvidia.com/embedded/jetpack
If your SD card has older Jetpak, you can update the L4T as follows. The latest is JetPack 4.6 that includes L4T 32.6.1. The platform is t210 for NVIDIA® Jetson Nano™ devices. The version is set to r32.6 (Do not set to r32.6.1)
sudo su -vi /etc/apt/sources.list.d/nvidia-l4t-apt-source.list
# Replace with following:deb https://repo.download.nvidia.com/jetson/common r32.6 maindeb https://repo.download.nvidia.com/jetson/t210 r32.6 mainapt updateapt dist-upgrade
Install the dependencies for MicroShift
Setup the repositories and install libraries and firewalld
apt-get install -y curl wget
OS_VERSION=18.04
CRIOVERSION=1.22
OS=xUbuntu_$OS_VERSION
KEYRINGS_DIR=/usr/share/keyrings
# Required for containers-common
echo "deb [signed-by=$KEYRINGS_DIR/libcontainers-archive-keyring.gpg] https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/ /" | sudo tee /etc/apt/sources.list.d/devel:kubic:libcontainers:stable.list > /dev/null
echo "deb [signed-by=$KEYRINGS_DIR/libcontainers-crio-archive-keyring.gpg] http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/$CRIOVERSION/$OS/ /" | sudo tee /etc/apt/sources.list.d/devel:kubic:libcontainers:stable:cri-o:$CRIOVERSION.list > /dev/null
mkdir -p $KEYRINGS_DIR
rm -f /usr/share/keyrings/libcontainers-archive-keyring.gpg
curl -L https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/Release.key | sudo gpg --dearmor -o $KEYRINGS_DIR/libcontainers-archive-keyring.gpg
rm -f /usr/share/keyrings/libcontainers-crio-archive-keyring.gpg
curl -L https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/$CRIOVERSION/$OS/Release.key | sudo gpg --dearmor -o $KEYRINGS_DIR/libcontainers-crio-archive-keyring.gpg
wget -qO - https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/Release.key | sudo apt-key add -
apt-get install -y ca-certificates # https://github.com/cri-o/cri-o/issues/5375#issuecomment-933608364
apt-get -y update
apt-get install -y btrfs-tools containers-common libassuan-dev libdevmapper-dev libglib2.0-dev libc6-dev libgpgme-dev libgpg-error-dev libseccomp-dev libsystemd-dev libselinux1-dev pkg-config go-md2man libudev-dev software-properties-common gcc make
ls /usr/include/gpgme.h
apt-get install -y policycoreutils-python-utils conntrack firewalld
Install cri-o, crictl and plugins
apt-get install -y curl jq tar
curl https://raw.githubusercontent.com/cri-o/cri-o/main/scripts/get | bash -s -- -a arm64
Setup firewalld
systemctl enable firewalld --now
firewall-cmd --zone=public --permanent --add-port=6443/tcp
firewall-cmd --zone=public --permanent --add-port=30000-32767/tcp
firewall-cmd --zone=public --permanent --add-port=2379-2380/tcp
firewall-cmd --zone=public --add-masquerade --permanent
firewall-cmd --zone=public --add-port=10250/tcp --permanent
firewall-cmd --zone=public --add-port=10251/tcp --permanent
firewall-cmd --zone=public --add-port=80/tcp --permanent # For Ingress
firewall-cmd --zone=public --add-port=443/tcp --permanent # For Ingress
firewall-cmd --zone=public --add-port=8888/tcp --permanent # For Jupyterlab Course
firewall-cmd --permanent --zone=trusted --add-source=10.42.0.0/16
firewall-cmd --reload
Setup CRI-O config to match MicroShift networking values
sh -c 'cat << EOF > /etc/cni/net.d/100-crio-bridge.conf
{
"cniVersion": "0.4.0",
"name": "crio",
"type": "bridge",
"bridge": "cni0",
"isGateway": true,
"ipMasq": true,
"hairpinMode": true,
"ipam": {
"type": "host-local",
"routes": [
{ "dst": "0.0.0.0/0" }
],
"ranges": [
[{ "subnet": "10.42.0.0/24" }]
]
}
}
EOF'
Verify the Nvidia device - The lspci command displays the information about devices connected through peripheral Component Interconnect (
PCI) buses.
root@jetsonnano:~# lspci -nnv |grep -i nvidia
00:02.0 PCI bridge [0604]: NVIDIA Corporation Device [10de:0faf] (rev a1) (prog-if 00 [Normal decode])
Capabilities: [40] Subsystem: NVIDIA Corporation Device [10de:0000]
NVML (and therefore nvidia-smi) is not currently supported on Jetson. The
k8s-device-plugin does not work with Jetson and the nvidia/k8s-device-plugin container image is not available for arm64. So, let’s setup cri-o to directly use nvidia container runtime hook.
mkdir -p /usr/share/containers/oci/hooks.d/
cat << EOF > /usr/share/containers/oci/hooks.d/nvidia.json
{
"version": "1.0.0",
"hook": {
"path": "/usr/bin/nvidia-container-runtime-hook",
"args": ["nvidia-container-runtime-hook", "prestart"],
"env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin "
]
},
"when": {
"always": true,
"commands": [".*"]
},
"stages": ["prestart"]
}
EOF
We remove the options for mountopt from /etc/containers/storage.conf and restart crio. The
metacopy=on is not supported in Ubuntu 18.04.
sed -i "s/^\(mountopt.*\)/#\\1/" /etc/containers/storage.conf
#sed -i 's/,metacopy=on//g' /etc/containers/storage.conf
systemctl enable crio --now
#systemctl restart crio
systemctl status crio
journalctl -u crio -f # Ctrl-C to stop the logs
This completes the installation of dependencies for MicroShift. If you want to understand and work with crun and cri-o directly, continue with the next Section. To install Microshift, go to Part 3 of this series. Do not directly use crio when MicroShift is running. If you use crictl to create pod sandboxes or containers on a running MicroShift cluster, the Kubelet will eventually delete them.
Samples using cri-o
1. Nginx sample
Create the
nginx.json and
netpod.json. Create the sandbox pod.
cat << EOF > nginx.json
{
"metadata": {
"name": "nginx-container",
"attempt": 1
},
"image": {
"image": "nginx"
},
"log_path": "nginx.log",
"linux": {
"security_context": {
"namespace_options": {}
}
}
}
EOF
cat << EOF > net-pod.json
{
"metadata": {
"name": "networking",
"uid": "networking-pod-uid",
"namespace": "default",
"attempt": 1
},
"hostname": "networking",
"port_mappings": [
{
"container_port": 80
}
],
"log_directory": "/tmp/net-pod",
"linux": {}
}
EOF
podid=$(crictl runp net-pod.json)
If you see an error as follows:
FATA[0010] run pod sandbox: rpc error: code = Unknown desc = failed to mount container k8s_POD_networking_default_networking-pod-uid_1 in pod sandbox k8s_networking_default_networking-pod-uid_1(061add45d3d9c3b2177b1e30280dec4ee4c40b53bec817d8e7cf6c0b376b5d40): error creating overlay mount to /var/lib/containers/storage/overlay/24dbf68bab872f1c4b556e2e5854d33c97783a2b69871bb2832c469c54fd80b6/merged, mount_data="nodev,metacopy=on,lowerdir=/var/lib/containers/storage/overlay/l/MXTIIQJAQN3SPYCB5PJVNCPIXP,upperdir=/var/lib/containers/storage/overlay/24dbf68bab872f1c4b556e2e5854d33c97783a2b69871bb2832c469c54fd80b6/diff,workdir=/var/lib/containers/storage/overlay/24dbf68bab872f1c4b556e2e5854d33c97783a2b69871bb2832c469c54fd80b6/work": invalid argument
Then you might have forgotten to remove the mountopt from storage.conf. Run the following to fix it:
sed -i "s/^\(mountopt.*\)/#\\1/" /etc/containers/storage.conf
systemctl restart crio
podid=`crictl runp net-pod.json` # This should be successful now
crictl pods # Check the pods
Let’s check what’s inside our sandbox net-pod
crun --root=/run/runc list
Output:
NAME PID STATUS BUNDLE PATH
64519cb57bea26531a046ac19605f8aa1f602f420792ed37fc8a03a4b81d66d4 14554 running /run/containers/storage/overlay-containers/64519cb57bea26531a046ac19605f8aa1f602f420792ed37fc8a03a4b81d66d4/userdata
crun --root=/run/runc ps $podidOutput:
PID
14554
ps -ef | grep 14554 # Process id from above
Output:
root 14554 14519 0 05:59 ? 00:00:00 /pause
There is only one process running inside the sandbox, called pause. The main task of this process is to keep the environment running and react to incoming signals. Before we create our workload within that sandbox, we have to pre-pull the image we want to run.
crictl pull nginx # Pull the image
crictl images # This will show the nginx image
This shows the nginx image along with the k8s.gcr.io/pause image
IMAGE TAG IMAGE ID SIZE
docker.io/library/nginx latest 2d25c92337fcd 139MB
k8s.gcr.io/pause 3.5 f7ff3c4042631 491kB
We use the container definition nginx.json to kick off the container:
containerid=`crictl create $podid nginx.json net-pod.json` # The container for nginx will go into Created state
crictl ps -a # List containers to see nginx in Created state
Output:
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
2dbe69290807d nginx Less than a second ago Created nginx-container 1 64519cb57bea2
crun --root=/run/runc list
Output:
NAME PID STATUS BUNDLE PATH
64519cb57bea26531a046ac19605f8aa1f602f420792ed37fc8a03a4b81d66d4 14554 running /run/containers/storage/overlay-containers/64519cb57bea26531a046ac19605f8aa1f602f420792ed37fc8a03a4b81d66d4/userdata
2dbe69290807dbbb33bbaf40e9e1df6e760222dbd16052549b50f56368740148 15076 created /run/containers/storage/overlay-containers/2dbe69290807dbbb33bbaf40e9e1df6e760222dbd16052549b50f56368740148/userdata
We now have another process in created state
ps -ef | grep 15076
Output:
root 15076 15042 0 06:05 ? 00:00:00 /usr/local/bin/crun --systemd-cgroup --root=/run/runc create --bundle /run/containers/storage/overlay-containers/2dbe69290807dbbb33bbaf40e9e1df6e760222dbd16052549b50f56368740148/userdata --pid-file /run/containers/storage/overlay-containers/2dbe69290807dbbb33bbaf40e9e1df6e760222dbd16052549b50f56368740148/userdata/pidfile 2dbe69290807dbbb33bbaf40e9e1df6e760222dbd16052549b50f56368740148
Let’s start the container
crictl start $containerid # Go to Running state
crictl logs $containerid
The crun shows that the process STATUS is now “running”
crun --root=/run/runc list # List the containers
Output:
NAME PID STATUS BUNDLE PATH
64519cb57bea26531a046ac19605f8aa1f602f420792ed37fc8a03a4b81d66d4 14554 running /run/containers/storage/overlay-containers/64519cb57bea26531a046ac19605f8aa1f602f420792ed37fc8a03a4b81d66d4/userdata
2dbe69290807dbbb33bbaf40e9e1df6e760222dbd16052549b50f56368740148 15076 running /run/containers/storage/overlay-containers/2dbe69290807dbbb33bbaf40e9e1df6e760222dbd16052549b50f56368740148/userdata
crun --root=/run/runc ps $containerid # Shows the processes in the container
crun --root=/run/runc ps $containerid | grep -v PID | xargs ps ww
Output:
PID TTY STAT TIME COMMAND
15076 ? Ss 0:00 nginx: master process nginx -g daemon off;
15487 ? S 0:00 nginx: worker process
15488 ? S 0:00 nginx: worker process
15489 ? S 0:00 nginx: worker process
15492 ? S 0:00 nginx: worker process
We can access the containers using the network address
crictl inspectp $podid | grep io.kubernetes.cri-o.IP.0 # Get the ipaddr of pod
ipaddr=`crictl inspectp $podid | jq -r .status.network.ip`
curl $ipaddr # Will return the "Welcome to nginx!" html
We can exec into the container
crictl exec $containerid cat /etc/os-release
Output:
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
We stop and remove the container
crictl stop $containerid # Go to Exited state
crictl ps -a
crictl rm $containerid
We are back to the single pause process
crun --root=/run/runc list | grep -v PID | awk '{print $2}' | xargs ps ww
Output:
PID TTY STAT TIME COMMAND
14554 ? Ss 0:00 /pause
Finally, we stop and remove the pod
crictl stopp $podid # Stop the pod
crictl rmp $podid # Remove the pod
2. Vector-add cuda sample to test the GPU
Copy the samples (we use the vectorAdd from the samples)
mkdir vectoradd
cd vectoradd
cp -r /usr/local/cuda/samples .
Create the following
Dockerfile
cat << EOF > Dockerfile
FROM nvcr.io/nvidia/l4t-base:r32.6.1
RUN apt-get update && apt-get install -y --no-install-recommends make g++
COPY ./samples /tmp/samples
WORKDIR /tmp/samples/0_Simple/vectorAdd/
RUN make clean && make
CMD ["./vectorAdd"]
EOF
Create the vectoradd.json
cat << EOF > vectoradd.json
{
"metadata": {
"name": "vectoradd-container",
"attempt": 1
},
"image": {
"image": "docker.io/karve/vector-add-sample:arm64-jetsonnano"
},
"log_path": "vectoradd.log",
"linux": {
"security_context": {
"namespace_options": {}
}
}
}
EOF
Create the net-pod.json
cat << EOF > net-pod.json
{
"metadata": {
"name": "networking",
"uid": "networking-pod-uid",
"namespace": "default",
"attempt": 1
},
"hostname": "networking",
"port_mappings": [
{
"container_port": 80
}
],
"log_directory": "/tmp/net-pod",
"linux": {}
}
EOF
Build and push the vector-add-sample image
docker build -t karve/vector-add-sample:arm64-jetsonnano .
docker push karve/vector-add-sample:arm64-jetsonnano
Run the vector-add-sample in crio
podid=`crictl runp net-pod.json`
crictl pods # Get the podid
crictl pull docker.io/karve/vector-add-sample:arm64-jetsonnano
crictl images # This will show the vector-add-sample image
containerid=`crictl create $podid vectoradd.json net-pod.json` # The container for nginx will go into Created state
crictl ps -a # List containers
crictl start $containerid # Go to Running and Exited state
crictl logs $containerid -f
The output shows: Test PASSED
root@jetsonnano:~/samples/vectoradd# crictl logs $containerid -f
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
If you are missing the nvidia runtime, it will fail with the following message because of missing hook /usr/share/containers/oci/hooks.d/nvidia.json.
Failed to allocate device vector A (error code CUDA driver version is insufficient for CUDA runtime version)!
[Vector addition of 50000 elements]
You will need to fix the nvidia.json and run again.
Finally, delete the container and pod
crictl ps -a
crictl rm $containerid
crictl stopp $podid # Stop the pod
crictl rmp $podid # Remove the pod
3. Device-query sample
Copy the samples (we use the deviceQuery from the samples)
mkdir devicequery
cd devicequery
cp -r /usr/local/cuda/samples .
Create the following
Dockerfile in this folder devicequery
cat << EOF > Dockerfile
FROM nvcr.io/nvidia/l4t-base:r32.6.1
RUN apt-get update && apt-get install -y --no-install-recommends make g++
COPY ./samples /tmp/samples
WORKDIR /tmp/samples/1_Utilities/deviceQuery
RUN make clean && make
CMD ["./deviceQuery"]
EOF
Create the
devicequery.json
cat << EOF > devicequery.json
{
"metadata": {
"name": " devicequery-container",
"attempt": 1
},
"image": {
"image": "docker.io/karve/devicequery:arm64-jetsonnano"
},
"log_path": "devicequery.log",
"linux": {
"security_context": {
"namespace_options": {}
}
}
}
EOF
Create the
net-pod.json
cat << EOF > net-pod.json
{
"metadata": {
"name": "networking",
"uid": "networking-pod-uid",
"namespace": "default",
"attempt": 1
},
"hostname": "networking",
"port_mappings": [
{
"container_port": 80
}
],
"log_directory": "/tmp/net-pod",
"linux": {}
}
EOF
Build and push the devicequery-sample image
docker build -t karve/devicequery:arm64-jetsonnano .
docker push karve/devicequery:arm64-jetsonnano
Run the devicequery in crio
podid=`crictl runp net-pod.json`
crictl pods # Get the podid
crictl pull docker.io/karve/devicequery:arm64-jetsonnano
crictl images # This will show the devicequery image
containerid=`crictl create $podid devicequery.json net-pod.json` # The container for devicequery will go into Created state
crictl ps -a # List containers
crictl start $containerid # Go to Running and Exited state
crictl logs $containerid -f
Output
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVIDIA Tegra X1"
CUDA Driver Version / Runtime Version 10.2 / 10.2
CUDA Capability Major/Minor version number: 5.3
Total amount of global memory: 3956 MBytes (4148273152 bytes)
( 1) Multiprocessors, (128) CUDA Cores/MP: 128 CUDA Cores
GPU Max Clock rate: 922 MHz (0.92 GHz)
Memory Clock rate: 13 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 262144 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS
Delete the container and pod
crictl ps -a
crictl rm $containerid
crictl stopp $podid # Stop the pod
crictl rmp $podid # Remove the pod
4. Pytorch sample
NOTE: This will stress test the GPU - Warning: attach the FAN to the Jetson Nano
Create the
pytorchsample.json
cat << EOF > pytorchsample.json
{
"metadata": {
"name": "pytorchsample-container",
"attempt": 1
},
"image": {
"image": "nvcr.io/nvidia/l4t-pytorch:r32.6.1-pth1.9-py3"
},
"log_path": "pytorchsample.log",
"linux": {
"security_context": {
"namespace_options": {}
}
}
}
EOF
Run the pytorch sample. We will exec into the container and run the samples separately.
podid=`crictl runp net-pod.json`
crictl pods # Get the podid
crictl pull nvcr.io/nvidia/l4t-pytorch:r32.6.1-pth1.9-py3
crictl images # This will show the l4t-pytorch image
containerid=`crictl create $podid pytorchsample.json net-pod.json` # The container for nginx will go into Created state
crictl ps -a
crictl exec -it $containerid bash
echo "nameserver 8.8.8.8" > /etc/resolv.conf
DATA_URL="https://nvidia.box.com/shared/static/y1ygiahv8h75yiyh0pt50jqdqt7pohgx.gz"
DATA_NAME="ILSVRC2012_img_val_subset_5k"
DATA_PATH="test/data/$DATA_NAME"
if [ ! -d "$DATA_PATH" ]; then
echo 'downloading data for testing torchvision...'
if [ ! -d "test/data" ]; then
mkdir -p test/data
fi
wget --quiet --show-progress --progress=bar:force:noscroll --no-check-certificate $DATA_URL -O test/data/$DATA_NAME.tar.gz
tar -xzf test/data/$DATA_NAME.tar.gz -C test/data/
fi
wget https://raw.githubusercontent.com/dusty-nv/jetson-containers/master/test/test_pytorch.py -O test/test_pytorch.py
python3 test/test_pytorch.py
python3 test/test_torchvision.py --data=$DATA_PATH --use-cuda
wget https://raw.githubusercontent.com/dusty-nv/jetson-containers/master/test/test_torchaudio.py -O test/test_torchaudio.py
python3 test/test_torchaudio.py
exit
Output
testing PyTorch...
PyTorch version: 1.9.0
CUDA available: True
cuDNN version: 8201
Tensor a = tensor([0., 0.], device='cuda:0')
Tensor b = tensor([-0.5832, 0.2567], device='cuda:0')
Tensor c = tensor([-0.5832, 0.2567], device='cuda:0')
testing LAPACK (OpenBLAS)...
test/test_pytorch.py:25: UserWarning: torch.solve is deprecated in favor of torch.linalg.solve
and will be removed in a future PyTorch release.
torch.linalg.solve has its arguments reversed and does not return the LU factorization.
To get the LU factorization see torch.lu, which can be used with torch.lu_solve or torch.lu_unpack.
X = torch.solve(B, A).solution
should be replaced with
X = torch.linalg.solve(A, B) (Triggered internally at /media/nvidia/NVME/pytorch/pytorch-v1.9.0/aten/src/ATen/native/BatchLinearAlgebra.cpp:760.)
x, lu = torch.solve(b, a)
done testing LAPACK (OpenBLAS)
testing torch.nn (cuDNN)...
done testing torch.nn (cuDNN)
testing CPU tensor vector operations...
est/test_pytorch.py:49: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
cpu_y = F.softmax(cpu_x)
Tensor cpu_x = tensor([12.3450])
Tensor softmax = tensor([1.])
Tensor exp (float32) = tensor([[2.7183, 2.7183, 2.7183],
[2.7183, 2.7183, 2.7183],
[2.7183, 2.7183, 2.7183]])
Tensor exp (float64) = tensor([[2.7183, 2.7183, 2.7183],
[2.7183, 2.7183, 2.7183],
[2.7183, 2.7183, 2.7183]], dtype=torch.float64)
Tensor exp (diff) = 7.429356050359104e-07
PyTorch OK
…
Delete the container and pod for pytorch
crictl rm $containerid
crictl stopp $podid # Stop the pod
crictl rmp $podid # Remove the pod
Errors
1. Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall clone3: permission denied: unknown
The docker 20.10.7-0ubuntu5~18.04.3 needs to be downgraded or the experimental nvidia-docker2 needs to be installed. See the
discussion. Select one of the below:
wget http://launchpadlibrarian.net/551655684/docker.io_20.10.7-0ubuntu1~18.04.1_arm64.deb
dpkg -i docker.io_20.10.7-0ubuntu1~18.04.1_arm64.deb
- Installing the Experimental nvidia-docker2:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/experimental/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
apt-get update
apt-get install -y nvidia-docker2
systemctl restart docker
2. Error: container create failed: error executing hook `/usr/bin/nvidia-container-runtime-hook` (exit code: 1)
You might need to install older version of nvidia-container-toolkit
apt-get remove libnvidia-container-tools
apt list --installed | grep nvidia-container
apt install nvidia-container-toolkit=1.8.0~rc.1-1
Output
apt list --installed | grep nvidia-container
libnvidia-container-tools/bionic,now 1.11.0~rc.2-1 arm64 [installed,automatic]
libnvidia-container0/stable,now 0.10.0+jetpack arm64 [installed]
libnvidia-container1/bionic,now 1.11.0~rc.2-1 arm64 [installed,automatic]
nvidia-container-csv-cuda/stable,now 10.2.460-1 arm64 [installed]
nvidia-container-csv-cudnn/stable,now 8.2.1.32-1+cuda10.2 arm64 [installed]
nvidia-container-csv-tensorrt/stable,now 8.0.1.6-1+cuda10.2 arm64 [installed]
nvidia-container-csv-visionworks/stable,now 1.6.0.501 arm64 [installed]
nvidia-container-toolkit/bionic,now 1.8.0~rc.1-1 arm64 [installed,upgradable to: 1.11.0~rc.2-1]
If you get an error with flannel, create the file /run/flannel/subnet.env
FLANNEL_NETWORK=10.42.0.0/16
FLANNEL_SUBNET=10.42.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
Conclusion
In this Part 2, we setup the Jetson Nano and installed the dependencies for MicroShift. We setup the cri-o to use the Nvidia container runtime and worked directly with CRI-O using the CLI. We are now armed with enough knowledge to play with MicroShift. In Part 3, we will see the multiple options to build and deploy MicroShift on the Jetson Nano. Further, in Part 4 and Part 5, we will look at the multiple options to build and deploy MicroShift on the Raspberry Pi 4 on the Raspberry Pi OS and CentOS 8 Stream respectively.
Hope you have enjoyed the article. Share your thoughts in the comments or engage in the conversation with me on Twitter @aakarve. I look forward to hearing about your use of MicroShift on ARM devices and if you would like to see something covered in more detail.
References