Infrastructure as a Service

 View Only

MicroShift – Part 2: Setting up the Jetson Nano with Ubuntu 18.04 and CRI-O

By Alexei Karve posted Mon November 29, 2021 04:52 AM

  

Setting up the Jetson Nano and using CRI-O

Introduction

MicroShift is a research project that is exploring how OpenShift OKD Kubernetes distribution can be optimized for small form factor devices and edge computing. In the previous Part 1 of this series, we looked at the different edge computing requirements and where MicroShift fits in. We built and deployed MicroShift on a Virtual Machine in VirtualBox and in a VM using Multipass on a MacBook Pro.

In this Part 2, we will set up the Jetson Nano with Ubuntu 18.04 and install the dependencies for MicroShift on a Jetson Nano Developer Kit. When we ultimately run containers in MicroShift, we use the CRI-O container engine that is an implementation of the Kubernetes Container Runtime Interface (CRI) to use Open Container Initiative (OCI) compatible runtimes.  We will set up CRI-O to use the Nvidia container runtime with GPU. Further, we will directly use crun and crictl, a command-line interface for CRI-compatible container runtimes to manage pods and containers. In Part 3, we will build and deploy MicroShift on the Jetson Nano.

Setting up the Jetson Nano

Jetson Nano with Camera


As of this writing, the Jetpack 4.6 is available for the Jetson Nano. JetPack 4.6 includes L4T 32.6.1 Linux Driver Package. You should use at least a 64GB microSDXC card so that you have sufficient space for experimentation. We can download the 6.53GB SD card image for the Jetson Nano Developer Kit and write to microSDXC card using balenaEtcher. After your microSDXC card is flashed and validated, proceed to set up your developer kit with the ethernet cable, USB keyboard and mouse connected and a display attached. During the System Configuration, accept the license, enter the computer’s name (for example microshift) and username/password (for example dlinano), use the suggested “Maximum accepted size” of APP partition and use the default Nvpmodel Mode. The device will automatically reboot after the setup. Note down the IP address of eth0 (jetsonnano-ipaddress), we will now ssh to the Jetson Nano from the Laptop.

# If your hostname is without a domain, you may want to add the ".example.com"
sudo su -
hostnamectl set-hostname microshift.example.com
dpkg-reconfigure tzdata # Select your timezone if not already done

# We will not need the ubuntu desktop, let's save on memory usage
sudo apt-get -y remove --purge chromium-browser chromium-browser-l10n
sudo apt-get -y purge ubuntu-desktop
sudo apt-get -y purge unity gnome-shell lightdm
sudo apt-get -y remove ubuntu-desktop
sudo apt purge ubuntu-desktop -y && sudo apt autoremove -y && sudo apt autoclean
sudo apt-get -y clean
sudo apt-get -y autoremove
sudo apt-get -f install
sudo reboot

Attaching the Fan

Under most conditions, the large heatsink on the Jetson Nano keeps the system running within the design thermal limits. When running very GPU intensive loads or when the Jetson is in a very warm environment, a 5V cooling fan should be attached. The Waveshare Fan-4020-PWM-5V attaches via counter sunk screw holes to the Jetson Nano’s heatsink via 4 mounting points. There are 4 holes drilled into the heatsink at the factory. The screws are provided that go in one way only to make the fan fit snugly with the sticker label at the middle of the fan facing downwards towards the heat sink. If it doesn't seem long enough, the fan is upside down. The 4 pins for the reverse-proof connector are: GND, 5V, Tachometer, PWM. The fan gets power from the Jetson Nano from the GND and 5V pins. The Tachometer signal tells the Jetson Nano what the speed of the fan currently is, and the PWM signal allows the Jetson to control the fan speed.

Let’s try out the fan.

sudo /usr/bin/jetson_clocks
sudo sh -c 'echo 255 > /sys/devices/pwm-fan/target_pwm'

Instead of 255, we can set a lower number to slow down the fan and 0 to stop it.

sudo sh -c 'echo 0 > /sys/devices/pwm-fan/target_pwm'

We can get the temperature with

cat /sys/devices/virtual/thermal/thermal_zone0/temp

The Jetson Nano fan control daemon controls the fan speed based on the temperature.

git clone https://github.com/Pyrestone/jetson-fan-ctl.git
cd jetson-fan-ctl/
./install.sh
# Customize /etc/automagic-fan/config.json

watch "cat /sys/devices/virtual/thermal/thermal_zone0/temp;echo ---;cat /sys/devices/pwm-fan/target_pwm"
Jetson Nano with Fan and CSI-2 Camera

Testing the Jetson Nano

Install the jetson stats

The jetson-stats is a package for monitoring and control your NVIDIA Jetson

sudo su -
apt-get update
apt-get install -y python3 python3-pip
pip3 install -U jetson-stats
jetson_release -v


Output:

root@microshift:~# jetson_release -v
- NVIDIA Jetson Nano (Developer Kit Version)
* Jetpack 4.6 [L4T 32.6.1]
* NV Power Mode: MAXN - Type: 0
* jetson_stats.service: active
- Board info:
* Type: Nano (Developer Kit Version)
* SOC Family: tegra210 - ID:33
* Module: P3448-0000 - Board: P3449-0000
* Code Name: porg
* Boardids: 3448
* CUDA GPU architecture (ARCH_BIN): 5.3
* Serial Number: 1421421054431
- Libraries:
* CUDA: 10.2.300
* cuDNN: 8.2.1.32
* TensorRT: 8.0.1.6
* Visionworks: 1.6.0.501
* OpenCV: 4.1.1 compiled CUDA: NO
* VPI: ii libnvvpi1 1.1.15 arm64 NVIDIA Vision Programming Interface library
* Vulkan: 1.2.70
- jetson-stats:
* Version 3.1.1
* Works on Python 3.6.9
Watch the CPU, GPU and disk activity using jtop
jtop 

cat /etc/nv_tegra_release


Output:

root@microshift:~# cat /etc/nv_tegra_release
# R32 (release), REVISION: 6.1, GCID: 27863751, BOARD: t210ref, EABI: aarch64, DATE: Mon Jul 26 19:20:30 UTC 2021


Testing the Jupyter Lab container in docker


Try out the Deep Learning Institute (DLI) course "Getting Started with AI on Jetson Nano" Course Environment Container with a USB camera attached

docker run --runtime nvidia -it --rm --network host --volume ~/nvdli-data:/nvdli-nano/data --device /dev/video0 nvcr.io/nvidia/dli/dli-nano-ai:v2.0.1-r32.6.1


If you see the “error adding seccomp filter rule for syscall clone3: permission denied: unknown”, look at the Errors section below to fix it.

Connect to your Jetson Nano ip address from your Laptop and login with password dlinano http://jetsonnano-ipaddress:8888/lab?

We can run the notebook /hello_camera/usb_camera.ipynb and test the camera. After testing, release the camera resource and shutdown the kernel.

The tutorial shows how to use a USB or a CSI camera. We can attach and try out Raspberry Pi NOIR Camera Module V2. Support for the rPI camera v2 (IMX219) is built into JetPack. The v2 Pi NoIR has a Sony IMX219 8-megapixel sensor (compared to the 5-megapixel OmniVision OV5647 sensor of the original camera). The earlier V1 camera OV5647 does not work out of box with the Jetson Nano because the driver is not included. If you have the CSI-2 camera attached, you can run
docker run --runtime nvidia -it --rm --network host --volume ~/nvdli-data:/nvdli-nano/data --volume /tmp/argus_socket:/tmp/argus_socket --device /dev/video0 nvcr.io/nvidia/dli/dli-nano-ai:v2.0.1-r32.6.1

This blog will assume that an USB camera is attached.


Testing the USB camera attached to the Jetson Nano with gstreamer on Mac

These steps will allow you to view your video stream on the Mac using gstreamer. Install the jetson-inference so that you can run video-viewer.

On Mac

On Jetson Nano

video-viewer --bitrate=1000000 /dev/video0 rtp://mac-ipaddress:1234


On Mac

gst-launch-1.0 -v udpsrc port=1234 caps = "application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)H264, payload=(int)96" ! rtph264depay ! decodebin ! videoconvert ! autovideosink


Updating your Jetson Nano to new Minor Release

Jetson Nano downloads are at https://developer.nvidia.com/embedded/downloads and OS information at https://developer.nvidia.com/embedded/jetpack

If your SD card has older Jetpak, you can update the L4T as follows. The latest is JetPack 4.6 that includes L4T 32.6.1. The platform is t210 for NVIDIA® Jetson Nano™ devices. The version is set to r32.6 (Do not set to r32.6.1)

sudo su -

vi /etc/apt/sources.list.d/nvidia-l4t-apt-source.list

# Replace with following:

deb https://repo.download.nvidia.com/jetson/common r32.6 main
deb https://repo.download.nvidia.com/jetson/t210 r32.6 main

apt update
apt dist-upgrade


Install the dependencies for MicroShift

Setup the repositories and install libraries and firewalld

apt-get install -y curl wget
OS_VERSION=18.04 CRIOVERSION=1.22 OS=xUbuntu_$OS_VERSION KEYRINGS_DIR=/usr/share/keyrings # Required for containers-common echo "deb [signed-by=$KEYRINGS_DIR/libcontainers-archive-keyring.gpg] https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/ /" | sudo tee /etc/apt/sources.list.d/devel:kubic:libcontainers:stable.list > /dev/null echo "deb [signed-by=$KEYRINGS_DIR/libcontainers-crio-archive-keyring.gpg] http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/$CRIOVERSION/$OS/ /" | sudo tee /etc/apt/sources.list.d/devel:kubic:libcontainers:stable:cri-o:$CRIOVERSION.list > /dev/null mkdir -p $KEYRINGS_DIR rm -f /usr/share/keyrings/libcontainers-archive-keyring.gpg curl -L https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/Release.key | sudo gpg --dearmor -o $KEYRINGS_DIR/libcontainers-archive-keyring.gpg rm -f /usr/share/keyrings/libcontainers-crio-archive-keyring.gpg curl -L https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/$CRIOVERSION/$OS/Release.key | sudo gpg --dearmor -o $KEYRINGS_DIR/libcontainers-crio-archive-keyring.gpg wget -qO - https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/Release.key | sudo apt-key add - apt-get install -y ca-certificates # https://github.com/cri-o/cri-o/issues/5375#issuecomment-933608364 apt-get -y update apt-get install -y btrfs-tools containers-common libassuan-dev libdevmapper-dev libglib2.0-dev libc6-dev libgpgme-dev libgpg-error-dev libseccomp-dev libsystemd-dev libselinux1-dev pkg-config go-md2man libudev-dev software-properties-common gcc make ls /usr/include/gpgme.h apt-get install -y policycoreutils-python-utils conntrack firewalld

Install cri-o, crictl and plugins

apt-get install -y curl jq tar
curl https://raw.githubusercontent.com/cri-o/cri-o/main/scripts/get | bash -s -- -a arm64

Setup firewalld

systemctl enable firewalld --now
firewall-cmd --zone=public --permanent --add-port=6443/tcp
firewall-cmd --zone=public --permanent --add-port=30000-32767/tcp
firewall-cmd --zone=public --permanent --add-port=2379-2380/tcp
firewall-cmd --zone=public --add-masquerade --permanent
firewall-cmd --zone=public --add-port=10250/tcp --permanent
firewall-cmd --zone=public --add-port=10251/tcp --permanent
firewall-cmd --zone=public --add-port=80/tcp --permanent # For Ingress
firewall-cmd --zone=public --add-port=443/tcp --permanent # For Ingress
firewall-cmd --zone=public --add-port=8888/tcp --permanent # For Jupyterlab Course
firewall-cmd --permanent --zone=trusted --add-source=10.42.0.0/16
firewall-cmd --reload


Setup CRI-O config to match MicroShift networking values

sh -c 'cat << EOF > /etc/cni/net.d/100-crio-bridge.conf
{
    "cniVersion": "0.4.0",
    "name": "crio",
    "type": "bridge",
    "bridge": "cni0",
    "isGateway": true,
    "ipMasq": true,
    "hairpinMode": true,
    "ipam": {
        "type": "host-local",
        "routes": [
            { "dst": "0.0.0.0/0" }
        ],
        "ranges": [
            [{ "subnet": "10.42.0.0/24" }]
        ]
    }
}
EOF'

Verify the Nvidia device - The lspci command displays the information about devices connected through peripheral Component Interconnect (PCI) buses.
root@jetsonnano:~# lspci -nnv |grep -i nvidia
00:02.0 PCI bridge [0604]: NVIDIA Corporation Device [10de:0faf] (rev a1) (prog-if 00 [Normal decode])
Capabilities: [40] Subsystem: NVIDIA Corporation Device [10de:0000]

NVML (and therefore nvidia-smi) is not currently supported on Jetson. The k8s-device-plugin does not work with Jetson and the nvidia/k8s-device-plugin container image is not available for arm64. So, let’s setup cri-o to directly use nvidia container runtime hook.
mkdir -p /usr/share/containers/oci/hooks.d/
cat << EOF > /usr/share/containers/oci/hooks.d/nvidia.json
  {
      "version": "1.0.0",
      "hook": {
          "path": "/usr/bin/nvidia-container-runtime-hook",
          "args": ["nvidia-container-runtime-hook", "prestart"],
          "env": [
              "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin              "
          ]
      },
      "when": {
          "always": true,
          "commands": [".*"]
      },
      "stages": ["prestart"]
  }
EOF

We remove the options for mountopt from /etc/containers/storage.conf and restart crio. The metacopy=on is not supported in Ubuntu 18.04.
sed -i "s/^\(mountopt.*\)/#\\1/" /etc/containers/storage.conf
#sed -i 's/,metacopy=on//g' /etc/containers/storage.conf
systemctl enable crio --now
#systemctl restart crio
systemctl status crio
journalctl -u crio -f # Ctrl-C to stop the logs


This completes the installation of dependencies for MicroShift. If you want to understand and work with crun and cri-o directly, continue with the next Section. To install Microshift, go to Part 3 of this series. Do not directly use crio when MicroShift is running. If you use crictl to create pod sandboxes or containers on a running MicroShift cluster, the Kubelet will eventually delete them.

Samples using cri-o

1. Nginx sample

Create the nginx.json and netpod.json. Create the sandbox pod.
cat << EOF > nginx.json
{
  "metadata": {
    "name": "nginx-container",
    "attempt": 1
  },
  "image": {
    "image": "nginx"
  },
  "log_path": "nginx.log",
  "linux": {
    "security_context": {
      "namespace_options": {}
    }
  }
}
EOF
cat << EOF > net-pod.json
{
  "metadata": {
    "name": "networking",
    "uid": "networking-pod-uid",
    "namespace": "default",
    "attempt": 1
  },
  "hostname": "networking",
  "port_mappings": [
    {
      "container_port": 80
    }
  ],
  "log_directory": "/tmp/net-pod",
  "linux": {}
}
EOF

podid=$(crictl runp net-pod.json)

If you see an error as follows:
FATA[0010] run pod sandbox: rpc error: code = Unknown desc = failed to mount container k8s_POD_networking_default_networking-pod-uid_1 in pod sandbox k8s_networking_default_networking-pod-uid_1(061add45d3d9c3b2177b1e30280dec4ee4c40b53bec817d8e7cf6c0b376b5d40): error creating overlay mount to /var/lib/containers/storage/overlay/24dbf68bab872f1c4b556e2e5854d33c97783a2b69871bb2832c469c54fd80b6/merged, mount_data="nodev,metacopy=on,lowerdir=/var/lib/containers/storage/overlay/l/MXTIIQJAQN3SPYCB5PJVNCPIXP,upperdir=/var/lib/containers/storage/overlay/24dbf68bab872f1c4b556e2e5854d33c97783a2b69871bb2832c469c54fd80b6/diff,workdir=/var/lib/containers/storage/overlay/24dbf68bab872f1c4b556e2e5854d33c97783a2b69871bb2832c469c54fd80b6/work": invalid argument
 
Then you might have forgotten to remove the mountopt from storage.conf. Run the following to fix it:
sed -i "s/^\(mountopt.*\)/#\\1/" /etc/containers/storage.conf
systemctl restart crio
podid=`crictl runp net-pod.json` # This should be successful now
crictl pods # Check the pods

Let’s check what’s inside our sandbox net-pod
crun --root=/run/runc list
Output:
NAME                                                             PID       STATUS   BUNDLE PATH
64519cb57bea26531a046ac19605f8aa1f602f420792ed37fc8a03a4b81d66d4 14554     running  /run/containers/storage/overlay-containers/64519cb57bea26531a046ac19605f8aa1f602f420792ed37fc8a03a4b81d66d4/userdata
crun --root=/run/runc ps $podid
Output:
PID
14554
 
ps -ef | grep 14554 # Process id from above
Output:
root     14554 14519  0 05:59 ?        00:00:00 /pause
 
There is only one process running inside the sandbox, called pause. The main task of this process is to keep the environment running and react to incoming signals. Before we create our workload within that sandbox, we have to pre-pull the image we want to run.
 
crictl pull nginx # Pull the image
crictl images # This will show the nginx image

This shows the nginx image along with the k8s.gcr.io/pause image
IMAGE                     TAG                 IMAGE ID            SIZE
docker.io/library/nginx   latest              2d25c92337fcd       139MB
k8s.gcr.io/pause          3.5                 f7ff3c4042631       491kB
 
We use the container definition nginx.json to kick off the container:
containerid=`crictl create $podid nginx.json net-pod.json` # The container for nginx will go into Created state
crictl ps -a # List containers to see nginx in Created state
Output:
CONTAINER           IMAGE               CREATED                  STATE               NAME                ATTEMPT             POD ID
2dbe69290807d       nginx               Less than a second ago   Created             nginx-container     1                   64519cb57bea2
 
crun --root=/run/runc list
Output:
NAME                                                             PID       STATUS   BUNDLE PATH
64519cb57bea26531a046ac19605f8aa1f602f420792ed37fc8a03a4b81d66d4 14554     running  /run/containers/storage/overlay-containers/64519cb57bea26531a046ac19605f8aa1f602f420792ed37fc8a03a4b81d66d4/userdata
2dbe69290807dbbb33bbaf40e9e1df6e760222dbd16052549b50f56368740148 15076     created  /run/containers/storage/overlay-containers/2dbe69290807dbbb33bbaf40e9e1df6e760222dbd16052549b50f56368740148/userdata
 
We now have another process in created state
ps -ef | grep 15076
Output:
root     15076 15042  0 06:05 ?        00:00:00 /usr/local/bin/crun --systemd-cgroup --root=/run/runc create --bundle /run/containers/storage/overlay-containers/2dbe69290807dbbb33bbaf40e9e1df6e760222dbd16052549b50f56368740148/userdata --pid-file /run/containers/storage/overlay-containers/2dbe69290807dbbb33bbaf40e9e1df6e760222dbd16052549b50f56368740148/userdata/pidfile 2dbe69290807dbbb33bbaf40e9e1df6e760222dbd16052549b50f56368740148
 
Let’s start the container
crictl start $containerid # Go to Running state
crictl logs $containerid
 
The crun shows that the process STATUS is now “running”
crun --root=/run/runc list # List the containers
Output:
NAME                                                             PID       STATUS   BUNDLE PATH
64519cb57bea26531a046ac19605f8aa1f602f420792ed37fc8a03a4b81d66d4 14554     running  /run/containers/storage/overlay-containers/64519cb57bea26531a046ac19605f8aa1f602f420792ed37fc8a03a4b81d66d4/userdata
2dbe69290807dbbb33bbaf40e9e1df6e760222dbd16052549b50f56368740148 15076     running  /run/containers/storage/overlay-containers/2dbe69290807dbbb33bbaf40e9e1df6e760222dbd16052549b50f56368740148/userdata
 
crun --root=/run/runc ps $containerid # Shows the processes in the container
crun --root=/run/runc ps $containerid | grep -v PID | xargs ps ww
Output:
  PID TTY      STAT   TIME COMMAND
15076 ?        Ss     0:00 nginx: master process nginx -g daemon off;
15487 ?        S      0:00 nginx: worker process
15488 ?        S      0:00 nginx: worker process
15489 ?        S      0:00 nginx: worker process
15492 ?        S      0:00 nginx: worker process
 
We can access the containers using the network address
crictl inspectp $podid | grep io.kubernetes.cri-o.IP.0 # Get the ipaddr of pod
ipaddr=`crictl inspectp $podid | jq -r .status.network.ip`
curl $ipaddr # Will return the "Welcome to nginx!" html
 
We can exec into the container
crictl exec $containerid cat /etc/os-release
Output:
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
 
We stop and remove the container
crictl stop $containerid # Go to Exited state
crictl ps -a
crictl rm $containerid
 
We are back to the single pause process
crun --root=/run/runc list | grep -v PID | awk '{print $2}' | xargs ps ww
Output:
  PID TTY      STAT   TIME COMMAND
14554 ?        Ss     0:00 /pause
 
Finally, we stop and remove the pod
crictl stopp $podid # Stop the pod
crictl rmp $podid # Remove the pod
 

2. Vector-add cuda sample to test the GPU

Copy the samples (we use the vectorAdd from the samples)
mkdir vectoradd
cd vectoradd
cp -r /usr/local/cuda/samples .
Create the following Dockerfile
cat << EOF > Dockerfile
FROM nvcr.io/nvidia/l4t-base:r32.6.1

RUN apt-get update && apt-get install -y --no-install-recommends make g++
COPY ./samples /tmp/samples

WORKDIR /tmp/samples/0_Simple/vectorAdd/
RUN make clean && make

CMD ["./vectorAdd"]
EOF

Create the vectoradd.json

cat << EOF > vectoradd.json
{
  "metadata": {
    "name": "vectoradd-container",
    "attempt": 1
  },
  "image": {
    "image": "docker.io/karve/vector-add-sample:arm64-jetsonnano"
  },
  "log_path": "vectoradd.log",
  "linux": {
    "security_context": {
      "namespace_options": {}
    }
  }
}
EOF

Create the net-pod.json

cat << EOF > net-pod.json
{
  "metadata": {
    "name": "networking",
    "uid": "networking-pod-uid",
    "namespace": "default",
    "attempt": 1
  },
  "hostname": "networking",
  "port_mappings": [
    {
      "container_port": 80
    }
  ],
  "log_directory": "/tmp/net-pod",
  "linux": {}
}
EOF

Build and push the vector-add-sample image

docker build -t karve/vector-add-sample:arm64-jetsonnano .
docker push karve/vector-add-sample:arm64-jetsonnano

Run the vector-add-sample in crio

podid=`crictl runp net-pod.json`
crictl pods # Get the podid
crictl pull docker.io/karve/vector-add-sample:arm64-jetsonnano
crictl images # This will show the vector-add-sample image
containerid=`crictl create $podid vectoradd.json net-pod.json` # The container for nginx will go into Created state
crictl ps -a # List containers
crictl start $containerid # Go to Running and Exited state
crictl logs $containerid -f

The output shows: Test PASSED

root@jetsonnano:~/samples/vectoradd# crictl logs $containerid -f
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

If you are missing the nvidia runtime, it will fail with the following message because of missing hook /usr/share/containers/oci/hooks.d/nvidia.json.

Failed to allocate device vector A (error code CUDA driver version is insufficient for CUDA runtime version)!
[Vector addition of 50000 elements]

You will need to fix the nvidia.json and run again.

Finally, delete the container and pod

crictl ps -a
crictl rm $containerid
crictl stopp $podid # Stop the pod
crictl rmp $podid # Remove the pod

3. Device-query sample

Copy the samples (we use the deviceQuery from the samples)
mkdir devicequery
cd devicequery
cp -r /usr/local/cuda/samples .
Create the following Dockerfile in this folder devicequery
cat << EOF > Dockerfile
FROM nvcr.io/nvidia/l4t-base:r32.6.1

RUN apt-get update && apt-get install -y --no-install-recommends make g++
COPY ./samples /tmp/samples

WORKDIR /tmp/samples/1_Utilities/deviceQuery
RUN make clean && make

CMD ["./deviceQuery"]
EOF
Create the devicequery.json
cat << EOF > devicequery.json
{
  "metadata": {
    "name": " devicequery-container",
    "attempt": 1
  },
  "image": {
    "image": "docker.io/karve/devicequery:arm64-jetsonnano"
  },
  "log_path": "devicequery.log",
  "linux": {
    "security_context": {
      "namespace_options": {}
    }
  }
}
EOF
Create the net-pod.json
cat << EOF > net-pod.json
{
  "metadata": {
    "name": "networking",
    "uid": "networking-pod-uid",
    "namespace": "default",
    "attempt": 1
  },
  "hostname": "networking",
  "port_mappings": [
    {
      "container_port": 80
    }
  ],
  "log_directory": "/tmp/net-pod",
  "linux": {}
}
EOF
Build and push the devicequery-sample image
docker build -t karve/devicequery:arm64-jetsonnano .
docker push karve/devicequery:arm64-jetsonnano
Run the devicequery in crio
podid=`crictl runp net-pod.json`
crictl pods # Get the podid
crictl pull docker.io/karve/devicequery:arm64-jetsonnano
crictl images # This will show the devicequery image
containerid=`crictl create $podid devicequery.json net-pod.json` # The container for devicequery will go into Created state
crictl ps -a # List containers
crictl start $containerid # Go to Running and Exited state
crictl logs $containerid -f
Output
./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA Tegra X1"
  CUDA Driver Version / Runtime Version          10.2 / 10.2
  CUDA Capability Major/Minor version number:    5.3
  Total amount of global memory:                 3956 MBytes (4148273152 bytes)
  ( 1) Multiprocessors, (128) CUDA Cores/MP:     128 CUDA Cores
  GPU Max Clock rate:                            922 MHz (0.92 GHz)
  Memory Clock rate:                             13 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 262144 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS
Delete the container and pod
crictl ps -a
crictl rm $containerid
crictl stopp $podid # Stop the pod
crictl rmp $podid # Remove the pod


4. Pytorch sample

NOTE: This will stress test the GPU - Warning: attach the FAN to the Jetson Nano

Create the pytorchsample.json
cat << EOF > pytorchsample.json
 {
  "metadata": {
    "name": "pytorchsample-container",
    "attempt": 1
  },
  "image": {
    "image": "nvcr.io/nvidia/l4t-pytorch:r32.6.1-pth1.9-py3"
  },
  "log_path": "pytorchsample.log",
  "linux": {
    "security_context": {
      "namespace_options": {}
    }
  }
}
EOF
Run the pytorch sample. We will exec into the container and run the samples separately.
podid=`crictl runp net-pod.json`
crictl pods # Get the podid
crictl pull nvcr.io/nvidia/l4t-pytorch:r32.6.1-pth1.9-py3
crictl images # This will show the l4t-pytorch image
containerid=`crictl create $podid pytorchsample.json net-pod.json` # The container for nginx will go into Created state
crictl ps -a
crictl exec -it $containerid bash
 
echo "nameserver 8.8.8.8" > /etc/resolv.conf
DATA_URL="https://nvidia.box.com/shared/static/y1ygiahv8h75yiyh0pt50jqdqt7pohgx.gz"
DATA_NAME="ILSVRC2012_img_val_subset_5k"
DATA_PATH="test/data/$DATA_NAME"
if [ ! -d "$DATA_PATH" ]; then
 echo 'downloading data for testing torchvision...'
 if [ ! -d "test/data" ]; then
  mkdir -p test/data
 fi
 wget --quiet --show-progress --progress=bar:force:noscroll --no-check-certificate $DATA_URL -O test/data/$DATA_NAME.tar.gz
 tar -xzf test/data/$DATA_NAME.tar.gz -C test/data/
fi
wget https://raw.githubusercontent.com/dusty-nv/jetson-containers/master/test/test_pytorch.py -O test/test_pytorch.py
python3 test/test_pytorch.py
python3 test/test_torchvision.py --data=$DATA_PATH --use-cuda
wget https://raw.githubusercontent.com/dusty-nv/jetson-containers/master/test/test_torchaudio.py -O test/test_torchaudio.py
python3 test/test_torchaudio.py
exit
 Output
testing PyTorch...
PyTorch version: 1.9.0
CUDA available:  True
cuDNN version:   8201
Tensor a = tensor([0., 0.], device='cuda:0')
Tensor b = tensor([-0.5832,  0.2567], device='cuda:0')
Tensor c = tensor([-0.5832,  0.2567], device='cuda:0')
testing LAPACK (OpenBLAS)...
test/test_pytorch.py:25: UserWarning: torch.solve is deprecated in favor of torch.linalg.solve
and will be removed in a future PyTorch release.
torch.linalg.solve has its arguments reversed and does not return the LU factorization.
To get the LU factorization see torch.lu, which can be used with torch.lu_solve or torch.lu_unpack.
X = torch.solve(B, A).solution
should be replaced with
X = torch.linalg.solve(A, B) (Triggered internally at  /media/nvidia/NVME/pytorch/pytorch-v1.9.0/aten/src/ATen/native/BatchLinearAlgebra.cpp:760.)
  x, lu = torch.solve(b, a)
done testing LAPACK (OpenBLAS)
testing torch.nn (cuDNN)...
done testing torch.nn (cuDNN)
testing CPU tensor vector operations...
est/test_pytorch.py:49: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  cpu_y = F.softmax(cpu_x)
Tensor cpu_x = tensor([12.3450])
Tensor softmax = tensor([1.])
Tensor exp (float32) = tensor([[2.7183, 2.7183, 2.7183],
        [2.7183, 2.7183, 2.7183],
        [2.7183, 2.7183, 2.7183]])
Tensor exp (float64) = tensor([[2.7183, 2.7183, 2.7183],
        [2.7183, 2.7183, 2.7183],
        [2.7183, 2.7183, 2.7183]], dtype=torch.float64)
Tensor exp (diff) = 7.429356050359104e-07
PyTorch OK
…

Delete the container and pod for pytorch
crictl rm $containerid
crictl stopp $podid # Stop the pod
crictl rmp $podid # Remove the pod

Errors

1. Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall clone3: permission denied: unknown

The docker 20.10.7-0ubuntu5~18.04.3 needs to be downgraded or the experimental nvidia-docker2 needs to be installed. See the discussion. Select one of the below:
  • Downgrading Docker:
wget http://launchpadlibrarian.net/551655684/docker.io_20.10.7-0ubuntu1~18.04.1_arm64.deb
dpkg -i docker.io_20.10.7-0ubuntu1~18.04.1_arm64.deb
 
  • Installing the Experimental nvidia-docker2:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/experimental/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey |   sudo apt-key add -
apt-get update
apt-get install -y nvidia-docker2
systemctl restart docker

2. Error: container create failed: error executing hook `/usr/bin/nvidia-container-runtime-hook` (exit code: 1)

You might need to install older version of nvidia-container-toolkit

apt-get remove libnvidia-container-tools
apt list --installed | grep nvidia-container
apt install nvidia-container-toolkit=1.8.0~rc.1-1

Output

apt list --installed | grep nvidia-container
libnvidia-container-tools/bionic,now 1.11.0~rc.2-1 arm64 [installed,automatic] libnvidia-container0/stable,now 0.10.0+jetpack arm64 [installed] libnvidia-container1/bionic,now 1.11.0~rc.2-1 arm64 [installed,automatic] nvidia-container-csv-cuda/stable,now 10.2.460-1 arm64 [installed] nvidia-container-csv-cudnn/stable,now 8.2.1.32-1+cuda10.2 arm64 [installed] nvidia-container-csv-tensorrt/stable,now 8.0.1.6-1+cuda10.2 arm64 [installed] nvidia-container-csv-visionworks/stable,now 1.6.0.501 arm64 [installed] nvidia-container-toolkit/bionic,now 1.8.0~rc.1-1 arm64 [installed,upgradable to: 1.11.0~rc.2-1]

If you get an error with flannel, create the file /run/flannel/subnet.env

FLANNEL_NETWORK=10.42.0.0/16
FLANNEL_SUBNET=10.42.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

Conclusion

In this Part 2, we setup the Jetson Nano and installed the dependencies for MicroShift. We setup the cri-o to use the Nvidia container runtime and worked directly with CRI-O using the CLI. We are now armed with enough knowledge to play with MicroShift. In Part 3, we will see the multiple options to build and deploy MicroShift on the Jetson Nano.  Further, in Part 4 and Part 5, we will look at the multiple options to build and deploy MicroShift on the Raspberry Pi 4 on the Raspberry Pi OS and CentOS 8 Stream respectively.

Hope you have enjoyed the article. Share your thoughts in the comments or engage in the conversation with me on Twitter @aakarve. I look forward to hearing about your use of MicroShift on ARM devices and if you would like to see something covered in more detail.

References


​​​​​​

0 comments
59 views

Permalink