Infrastructure as a Service

 View Only

MicroShift – Part 7: Jetson Nano with Ubuntu 20.04

By Alexei Karve posted Tue December 21, 2021 07:10 PM

  

MicroShift on a Jetson Nano with Ubuntu 20.04

Introduction

MicroShift is a research project that is exploring how OpenShift OKD Kubernetes distribution can be optimized for small form factor devices and edge computing. In Part 2 and Part 3 of this series, we setup the Jetson Nano with Ubuntu 18.04, built and deployed MicroShift using a Jetson Nano Developer Kit. In Part 4, Part 5 and Part 6 we worked with MicroShift on a Raspberry Pi 4. In this Part 7, we switch back to the Jetson Nano. Specifically, we will deploy MicroShift on Ubuntu 20.04 on the Jetson Nano. The Jetson Software Roadmap shows that JetPack 5.0 Developer Preview is planned for 1Q-2022 with Ubuntu 20.04. Meanwhile, we can follow the instructions from Q-engineering for the standard release upgrade mechanism or download the complete Jetson Nano with a pre-installed Ubuntu 20.04 10.3 GB image with OpenCV, TensorFlow and Pytorch.

Setting up the Jetson Nano with Ubuntu 20.04 (64 bit)

For this blog, we download the image from the Qengineering github site and write to Microsdxc card.

  1. Flash the image using balenaEtcher or the Raspberry Pi Imager
  2. Have a Keyboard, Monitor and the Ethernet Cable connected to the Jetson Nano
  3. Insert Microsdxc into Jetson Nano and poweron
  4. Login with jetson as the user and password
  5. Get the jetsonnano-ipaddress so that we can ssh to the Jetson Nano from your Laptop
    ip a

Let’s install the latest updates, set the hostname with fqdn and configure the timezone and set the locale to enUS
ssh jetson@jetsonnano-ipaddress
sudo su -
apt-get update
apt-get -y upgrade
hostnamectl set-hostname nano.example.com
dpkg-reconfigure tzdata
sed -i "s/nl_NL/en_US/g" /etc/default/locale
#locale-gen "en_US.UTF-8"
locale-gen en_US en_US.UTF-8
dpkg-reconfigure locales

The default image is set to use a 32GB card. Fix the partition if you installed to larger microsdxc card

parted -l
F

apt-get install -y cloud-guest-utils
growpart /dev/mmcblk0 1
resize2fs /dev/mmcblk0p1

Check the JetPak, it shows the release is R32 and revision is 6.1 which gives L4T 32.6.1. The latest is JetPack 4.6 and includes L4T 32.6.1. The BOARD parameter indicates the t210ref which is a Jetson Nano Development Kit. The platform is thus t210 for NVIDIA® Jetson Nano™ devices.

root@nano:~# cat /etc/nv_tegra_release
# R32 (release), REVISION: 6.1, GCID: 27863751, BOARD: t210ref, EABI: aarch64, DATE: Mon Jul 26 19:20:30 UTC 2021

Let’s remove the ubuntu-desktop so that we get additional free memory

cat << EOF > removedesktop.sh
sudo apt-get -y purge ubuntu-desktop
sudo apt-get -y purge unity gnome-shell lightdm
sudo apt-get -y remove ubuntu-desktop
sudo apt purge ubuntu-desktop -y && sudo apt autoremove -y && sudo apt autoclean
sudo apt-get -y clean
sudo apt-get -y autoremove
sudo apt-get -f install
sudo reboot
EOF
chmod +x removedesktop.sh
./removedesktop.sh # System will reboot

Testing the Jupyter Lab container in Docker

After the reboot, ssh to Jetson Nano with jetson/jetson and install the nvidia-docker2 to avoid the “error adding seccomp filter rule for syscall clone3: permission denied: unknown”

sudo su -
apt-get install -y curl
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/experimental/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey|sudo apt-key add -
apt-get update
apt-get install -y nvidia-docker2 # This will also install docker.io
systemctl restart docker

Try out the Deep Learning Institute (DLI) course "Getting Started with AI on Jetson Nano" Course Environment Container with a USB camera attached.

docker run --runtime nvidia -it --rm --network host --volume ~/nvdli-data:/nvdli-nano/data --device /dev/video0 nvcr.io/nvidia/dli/dli-nano-ai:v2.0.1-r32.6.1

Connect to your Jetson Nano ip address from your Laptop with the URL shown and login with password dlinano http://jetsonnano-ipaddress:8888/lab?

We can run the notebook /hello_camera/usb_camera.ipynb and test the camera. After testing, release the camera resource and shutdown the kernel and exit the container.

Output

root@nano:~# docker run --runtime nvidia -it --rm --network host --volume ~/nvdli-data:/nvdli-nano/data --device /dev/video0 nvcr.io/nvidia/dli/dli-nano-ai:v2.0.1-r32.6.1
allow 10 sec for JupyterLab to start @ http://192.168.1.208:8888 (password dlinano)
JupterLab logging location:  /var/log/jupyter.log  (inside the container)
root@nano:/nvdli-nano# exit
exit

Additional details about attaching a fan and using a CSI-2 IMX219 camera were provided in Part 2. In this blog, we will use a USB Camera.

Jetson Nano with CSI 2 Camera and Fan

Installing Microshift

Clone the microshift github repo and run the install.sh script

git clone https://github.com/thinkahead/microshift.git
cd microshift
./install.sh

You will get the error:

Error: COMMAND_FAILED: '/usr/sbin/ip6tables-restore -w -n' failed: ip6tables-restore v1.8.4 (legacy): Couldn't load match `rpfilter':No such file or directory

Set the IPv6_rpfilter=no in the /etc/firewalld/firewalld.conf to fix this

sed -i "s|^IPv6_rpfilter=yes|IPv6_rpfilter=no|" /etc/firewalld/firewalld.conf
systemctl restart firewalld

Add the ssh port 22 so we are not locked out

firewall-cmd --zone=public --permanent --add-port=22/tcp
firewall-cmd --reload

NVML (and therefore nvidia-smi) is not currently supported on Jetson. The k8s-device-plugin does not work with Jetson and the nvidia/k8s-device-plugin container image is not available for arm64. So, let’s setup cri-o to directly use nvidia container runtime hook.

mkdir -p /usr/share/containers/oci/hooks.d/
cat << EOF > /usr/share/containers/oci/hooks.d/nvidia.json
  {
      "version": "1.0.0",
      "hook": {
          "path": "/usr/bin/nvidia-container-runtime-hook",
          "args": ["nvidia-container-runtime-hook", "prestart"],
          "env": [
              "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin              "
          ]
      },
      "when": {
          "always": true,
          "commands": [".*"]
      },
      "stages": ["prestart"]
  }
EOF

The metacopy=on is not supported. Update the /etc/containers/storage.conf and restart crio.

#mountopt = "nodev,metacopy=on"
sed -i "s/,metacopy=on//" /etc/containers/storage.conf
systemctl restart crio

./install.sh

Install the oc client - We can download the required version of oc client for arm64.

wget https://mirror.openshift.com/pub/openshift-v4/arm64/clients/ocp/candidate/openshift-client-linux.tar.gz
mkdir tmp;cd tmp
tar -zxvf ../openshift-client-linux.tar.gz
mv -f oc /usr/local/bin
cd ..;rm -rf tmp
rm -f openshift-client-linux.tar.gz

It will take around 3 minutes for all pods to start. Check the status of node and pods using kubectl or oc client.

export KUBECONFIG=/var/lib/microshift/resources/kubeadmin/kubeconfig
watch "kubectl get nodes;kubectl get pods -A;crictl pods;crictl images"
#watch "oc get nodes;oc get pods -A;crictl pods;crictl images"

That completes the installation of MicroShift. You can skip the next section if you want to use this installation of MicroShift. If you want to build your own microshift binary, let's push on.

Build the MicroShift binary for arm64 on Ubuntu 20.04 (64 bit)

We can replace the microshift binary that was download from the install.sh script with our own. Let’s build the microshift binary from scratch. Clone the microshift repository from github, install golang, run make and finally move the microshift binary to /usr/local/bin.

sudo su -

apt -y install build-essential curl libgpgme-dev pkg-config libseccomp-dev

# Install golang
wget https://golang.org/dl/go1.17.2.linux-arm64.tar.gz
rm -rf /usr/local/go && tar -C /usr/local -xzf go1.17.2.linux-arm64.tar.gz
rm -f go1.17.2.linux-arm64.tar.gz
export PATH=$PATH:/usr/local/go/bin
export GOPATH=/root/go
cat << EOF >> /root/.bashrc
export PATH=$PATH:/usr/local/go/bin
export GOPATH=/root/go
EOF
mkdir $GOPATH

git clone https://github.com/thinkahead/microshift.git
cd microshift
make
./microshift version
ls -las microshift # binary in current directory /root/microshift
mv microshift /usr/local/bin/microshift
systemctl restart microshift

We may alternatively download the latest version of the prebuilt microshift binary from github that the install.sh downloaded as follows:

ARCH=arm64
export VERSION=$(curl -s https://api.github.com/repos/redhat-et/microshift/releases | grep tag_name | head -n 1 | cut -d '"' -f 4) && \
curl -LO https://github.com/redhat-et/microshift/releases/download/$VERSION/microshift-linux-${ARCH}
chmod +x microshift-linux-${ARCH}
ls -las microshift-linux*
mv microshift-linux-${ARCH} /usr/local/bin/microshift
systemctl restart microshift

Samples to run on MicroShift

We will run a few samples that will show the use of persistent volume, GPU, and the USB camera.

1. InfluxDB/Telegraf/Grafana

We reuse this influxdb sample from the previous Raspberry Pi 4 Part 6. You can also follow the line-by-line instructions as reference.

cd ~
git clone https://github.com/thinkahead/microshift.git
cd microshift/raspberry-pi/influxdb

We can install all the components with the single script runall.sh

./runall.sh

Alternatively, run the steps separately and check details at each step

Create a new project influxdb

oc new-project influxdb
oc project influxdb # if it already exists

Install InfluxDB

oc create configmap influxdb-config --from-file=influxdb.conf
oc get configmap influxdb-config -o yaml
oc apply -f influxdb-secrets.yaml
oc describe secret influxdb-secrets
mkdir /var/hpvolumes/influxdb
oc apply -f influxdb-pv.yaml
oc apply -f influxdb-data.yaml
oc apply -f influxdb-deployment.yaml
oc get -f influxdb-deployment.yaml # check that the Deployment is created and ready
oc wait -f influxdb-deployment.yaml --for condition=available
oc logs deployment/influxdb-deployment -f
oc apply -f influxdb-service.yaml

oc rsh deployment/influxdb-deployment # connect to InfluxDB and display the databases

Output

root@nano:~/microshift/raspberry-pi/influxdb# oc rsh deployment/influxdb-deployment
# influx --username admin --password admin
Connected to http://localhost:8086 version 1.7.4
InfluxDB shell version: 1.7.4
Enter an InfluxQL query
> show databases
name: databases
name
----
test
_internal
> exit
# exit

Install Telegraf and check the measurements for the telegraf database in InfluxDB

oc apply -f telegraf-config.yaml 
oc apply -f telegraf-secrets.yaml 
oc apply -f telegraf-deployment.yaml
oc wait -f telegraf-deployment.yaml --for condition=available

Output

root@ubuntu:~/microshift/raspberry-pi/influxdb# oc rsh deployment/influxdb-deployment
# influx --username admin --password admin
Connected to http://localhost:8086 version 1.7.4
InfluxDB shell version: 1.7.4
Enter an InfluxQL query
> show databases
name: databases
name
----
test
_internal
telegraf
> use telegraf
Using database telegraf
> show measurements
name: measurements
name
----
cpu
disk
diskio
kernel
mem
net
netstat
processes
swap
system
> select * from cpu;
...
> exit
# exit

Install Grafana

cd grafana
mkdir /var/hpvolumes/grafana
cp -r config/* /var/hpvolumes/grafana/.
oc apply -f grafana-pv.yaml
oc apply -f grafana-data.yaml
oc apply -f grafana-deployment.yaml
oc apply -f grafana-service.yaml
oc expose svc grafana-service # Create the route
oc wait -f grafana-deployment.yaml --for condition=available
oc get route grafana-service

Add the " jetsonnano-ipaddress grafana-service-influxdb.cluster.local" to /etc/hosts on your laptop and login to http://grafana-service-influxdb.cluster.local (or to http://grafana-service-default.cluster.local if you deployed to default namespace) using admin/admin. You will need to change the password on first login. Go to the Dashboards list (left menu > Dashboards > Manage). The Analysis Server dashboard should be visible. Open it to display monitoring information for MicroShift.

Finally, after you are done working with this sample, delete the grafana, telegraf, influxdb.

oc delete route grafana-service
oc delete -f grafana-data.yaml -f grafana-deployment.yaml -f grafana-pv.yaml -f grafana-service.yaml 
cd ..
oc delete -f telegraf-config.yaml -f telegraf-secrets.yaml -f telegraf-deployment.yaml
oc delete -f influxdb-data.yaml -f influxdb-pv.yaml -f influxdb-service.yaml -f influxdb-deployment.yaml -f influxdb-secrets.yaml
oc project default
oc delete project influxdb
rm -rf /var/hpvolumes/grafana
rm -rf /var/hpvolumes/influxdb

2. Devicequery

Create the devicequery.yaml.  The Dockerfile to create the devicequery:arm64-jetsonnano image was shown earlier in Part 2 crio samples.

cat << EOF > devicequery.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: devicequery-job
spec:
  parallelism: 1
  completions: 1
  activeDeadlineSeconds: 1800
  backoffLimit: 6
  template:
    metadata:
      labels:
        app: devicequery
    spec:
      containers:
      - name: devicequery
        image: docker.io/karve/devicequery:arm64-jetsonnano
      restartPolicy: OnFailure
EOF
oc apply -f devicequery.yaml
oc get job/devicequery-job

Wait for the job to be completed, the output shows that the CUDA device was detected within the container:

oc logs job/devicequery-job

Output

./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA Tegra X1"
  CUDA Driver Version / Runtime Version          10.2 / 10.2
  CUDA Capability Major/Minor version number:    5.3
  Total amount of global memory:                 3956 MBytes (4148273152 bytes)
  ( 1) Multiprocessors, (128) CUDA Cores/MP:     128 CUDA Cores
  GPU Max Clock rate:                            922 MHz (0.92 GHz)
  Memory Clock rate:                             13 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 262144 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS

Delete the devicequery job

oc delete -f devicequery.yaml

3. VectorAdd

Create the vectoradd.yaml. The Dockerfile for the vector-add-sample:arm64-jetsonnano image was shown earlier in Part 2 crio samples.

cat << EOF > vectoradd.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: vectoradd-job
spec:
  parallelism: 1
  completions: 1
  activeDeadlineSeconds: 1800
  backoffLimit: 6
  template:
    metadata:
      labels:
        app: vectoradd
    spec:
      containers:
      - name: vectoradd
        image: docker.io/karve/vector-add-sample:arm64-jetsonnano
      restartPolicy: OnFailure
EOF
oc apply -f vectoradd.yaml
oc get job/vectoradd-job

Wait for the job to be completed, the output shows that the vector addition of 50000 elements on the CUDA device:

oc logs job/vectoradd-job

Output

[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

Delete the vectoradd job

oc delete -f vectoradd.yaml

4. Jupyter Lab to access USB camera on /dev/video0

Create the following jupyter.yaml, create the deployment, service and route

cat << EOF > jupyter.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: jupyter-deployment
spec:
  selector:
    matchLabels:
      app: jupyter
  replicas: 1
  template:
    metadata:
      labels:
        app: jupyter
    spec:
      containers:
      - name: jupyter
        image: nvcr.io/nvidia/dli/dli-nano-ai:v2.0.1-r32.6.1
        imagePullPolicy: IfNotPresent
        command: ["/bin/bash", "-c", "jupyter lab --LabApp.token='' --LabApp.password='' --ip 0.0.0.0 --port 8888 --allow-root &> /var/log/jupyter.log && sleep infinity"]
        securityContext:
          privileged: true
          #allowPrivilegeEscalation: false
          #capabilities:
          #  drop: ["ALL"]
        ports:
        - containerPort: 8888
        # resource required for hpa
        resources:
          requests:
            memory: 128M
            cpu: 125m
          limits:
            memory: 2048M
            cpu: 1000m
        volumeMounts:
          - name: dev-video0
            mountPath: /dev/video0
      volumes:
        - name: dev-video0
          hostPath:
            path: /dev/video0

---
apiVersion: v1
kind: Service
metadata:
 name: jupyter-svc
 labels:
   app: jupyter
spec:
 type: NodePort
 ports:
 - port: 8888
   nodePort: 30080
 selector:
   app: jupyter
EOF
oc apply -f jupyter.yaml
oc expose svc jupyter-svc

Now we can add the line with ipaddress of the Jetson Nano with jupyter-svc-default.cluster.local to the /etc/hosts on your laptop/MacBook Pro and access the jupyterlab at http://jupyter-svc-default.cluster.local/lab?
Navigate to the hello_camera/usb_camera.ipynb and run the notebook.

We can delete the jupyterlab with:

oc delete route jupyter-svc
oc delete -f jupyter.yaml

5. Install Metrics Server

This will enable us to run the “kubectl top” and “oc adm top” commands.

kubectl apply -f https://raw.githubusercontent.com/thinkahead/microshift/main/jetson-nano/tests/metrics/metrics-components.yaml

If the metrics-server keeps restarting and the pod logs show the no route to host error, you may need to add the hostNetwork: true. When a pod is configured with hostNetwork: true, the applications running in such a pod can directly see the network interfaces of the host machine where the pod was started.

E1220 19:36:20.224466       1 server.go:132] unable to fully scrape metrics: unable to fully scrape metrics from node nano.example.com: unable to fetch metrics from node nano.example.com: Get "https://192.168.1.208:10250/stats/summary?only_cpu_and_memory=true": dial tcp 192.168.1.208:10250: connect: no route to host

Edit the deployment and add the line with “hostNetwork: true” within spec.template.spec

oc edit deployments -n kube-system metrics-server
apiVersion: apps/v1
kind: Deployment
metadata:
  name: metrics-server
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  template:
    metadata:
      labels:
        k8s-app: metrics-server
    spec:
      hostNetwork: true
      containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --kubelet-preferred-address-types=InternalIP
        - --kubelet-use-node-status-port
        - --v=6
        image: k8s.gcr.io/metrics-server/metrics-server:v0.4.0
# Wait for the metrics-server to start in the kube-system namespace
kubectl get deployment metrics-server -n kube-system
kubectl get events -n kube-system
kubectl logs deployment/metrics-server -n kube-system -f # Wait until the “metric-storage-ready failed: not metrics to serve” error stops

# Wait for a couple of minutes for metrics to be collected
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes
kubectl get --raw /apis/metrics.k8s.io/v1beta1/pods
apt-get install -y jq
kubectl get --raw /api/v1/nodes/$(kubectl get nodes -o json | jq -r '.items[0].metadata.name')/proxy/stats/summary

# Wait for a couple of minutes for metrics to be collected
kubectl top nodes;kubectl top pods -A
oc adm top nodes;oc adm top pods -A

watch "kubectl top nodes;kubectl top pods -A"

Output

NAME               CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
nano.example.com   902m         22%    2220Mi          57%

NAMESPACE                       NAME                                  CPU(cores)   MEMORY(bytes)
kube-system                     metrics-server-dbf765b9b-8p6wr        15m          17Mi
kubevirt-hostpath-provisioner   kubevirt-hostpath-provisioner-fsmkm   2m           9Mi
openshift-dns                   dns-default-lqktl                     10m          25Mi
openshift-dns                   node-resolver-d95pz                   0m           7Mi
openshift-ingress               router-default-85bcfdd948-khkpd       6m           38Mi
openshift-service-ca            service-ca-76674bfb58-rkcm5           16m          43Mi

6. Object Detection demo with GPU to send pictures and web socket messages to Node Red

The Object Detection sample will detect objects. When a person is detected, it will send a Web Socket message with the bounding box information and a picture to Node Red.

Let’s install Node Red on IBM Cloud. We will use Node Red to show pictures and chat messages sent from the Jetson Nano. Alternatively, we can use the Node Red that we deployed as an application in MicroShift on the MacBook Pro in VirtualBox in Part 1.

  1. Create an IBM Cloud free tier account at https://www.ibm.com/cloud/free and login to Console (top right).
  2. Create an API Key and save it, Manage->Access->IAM->API Key->Create an IBM Cloud API Key
  3. Click on Catalog and Search for "Node-Red App", select it and click on "Get Started"
  4. Give a unique App name, for example xxxxx-node-red and select the region nearest to you
  5. Select the Pricing Plan Lite, if you already have an existing instance of Cloudant, you may select it in Pricing Plan
  6. Click Create
  7. Under Deployment Automation -> Configure Continuous Delivery, click on "Deploy your app"
  8. Select the deployment target Cloud Foundry that provides a Free-Tier of 256 MB cost-free or Code Engine. The latter has monthly limits and takes more time to deploy. [ Note: Cloud Foundry is deprecated, use the IBM Cloud Code Engine. Any IBM Cloud Foundry application runtime instances running IBM Cloud Foundry applications will be permanently disabled and deprovisioned ]
  9. Enter the IBM Cloud API Key from Step 2, or click on "New" to create one
  10. The rest of the fields Region, Organization, Space will automatically get filled up. Use the default 256MB Memory and click "Next"
  11. In "Configure the DevOps toolchain", click Create
  12. Wait for 10 minutes for the Node Red instance to start
  13. Click on the "Visit App URL"
  14. On the Node Red page, create a new userid and password
  15. In Manage Palette, install the node-red-contrib-image-tools, node-red-contrib-image-output, and node-red-node-base64
  16. Import the Chat flow and the Picture (Image) display flow. On the Chat flow, you will need to edit the template node line 35 to use wss:// (on IBM Cloud) instead of ws:// (on your Laptop)
  17. On another browser tab, start the https://mynodered.mybluemix.net/chat (Replace mynodered with your IBM Cloud Node Red URL)
  18. On the Image flow, click on the square box to the right of image preview or viewer to Deactivate and Activate the Node. You will be able to see the picture when you Activate the Node
cd ~
git clone https://github.com/thinkahead/microshift.git
cd ~/microshift/jetson-nano/tests/object-detection

Build the image

docker build -t docker.io/karve/jetson-inference:r32.6.1 .
docker push docker.io/karve/jetson-inference:r32.6.1

You can update the WebSocketURL, ImageUploadURL and VideoSource in inference.yaml to point to your video source and URLs in Node Red on IBM Cloud or to the Node Red you installed in Microshift on your Laptop. For the latter, you will need to add hostAliases with the ip address of your Laptop. Then, create the deployment with the oc apply command and look at the Chat application and the Picture flow started in Node Red. It will take a couple of minutes to initially load the model.

crictl pull docker.io/karve/jetson-inference:r32.6.1  # Optional
oc apply -f inference.yaml

To stop this object-detection sample, we can delete the deployment

oc delete -f inference.yaml

7. Object Detection demo with TensorFlow Lite (no GPU) to send pictures and web socket messages to Node Red

cd ~
git clone https://github.com/thinkahead/microshift.git
cd ~/microshift/jetson-nano/tests/object-detection-no-gpu

This example uses TensorFlow Lite with Python on a Raspberry Pi to perform real-time object detection using images streamed from the USB Camera. It draws a bounding box around each detected object when the object score is above a given threshold.

a. Use a container

Build the object-detection-jetson-nano image and check that we can access the Sense Hat and the camera and run the tensorflow lite from a container in docker.

cp ~/microshift/raspberry-pi/object-detection/efficientdet_lite0.tflite .
docker build -t docker.io/karve/object-detection-jetsonnano . docker push docker.io/karve/object-detection-jetsonnano:latest docker run --rm -d --privileged -e ImageUploadURL=http://yournodered.mybluemix.net/upload -e WebSocketURL=wss://yournodered.mybluemix.net/ws/chat docker.io/karve/object-detection-jetsonnano:latest

You should see the camera feed appear on the Node Red image viewer if the image has a person. Put some objects in front of the camera, like a coffee mug or keyboard, and you'll see boxes drawn around those that the model recognizes, including the label and score for each. It also prints the number of frames per second (FPS) at the top-left corner of the screen.

b. Use microshift

sed -i "s|mynodered.mybluemix.net|yournodered.mybluemix.net|" *.yaml
oc apply -f object-detection.yaml

We will see pictures being sent to Node Red image viewer with a person is detected. When we are done testing, we can delete the deployment

oc delete -f object-detection.yaml

Smarter-Device-Manager

Applications running inside a container do not have access to device drivers unless explicitly given access. Smarter-device-manager enables containers deployed using Kubernetes to access devices (device drivers) available on the node. In the object detection sample above, we used the deployment with securityContext privileged. We want to avoid the privileged. With docker, we can use --device /dev/video0:/dev/video0. In Kubernetes, we don’t have --device. Instead of using the securityContext with privileged: true, we can use the smarter-device-manager without the privileged in the Object Detection demo from above. The inference-sdm.yaml shows the modified deployment. The daemonset and configmap for the smarter-device-manager need to be created in some namespace (we use sdm).

We first install the smarter device manager and label the node to enable it.

cd ~/microshift/jetson-nano/tests/object-detection
oc apply -f smarter-device-manager-ds.yaml -f video0-configmap.yaml
oc label node nano.example.com smarter-device-manager=enabled
oc get ds,pods -n sdm

Output:

root@nano:~/microshift/jetson-nano/tests/object-detection# oc apply -f smarter-device-manager-ds.yaml -f video0-configmap.yaml
namespace/sdm created
daemonset.apps/smarter-device-manager created
configmap/smarter-device-manager created
root@nano:~/microshift/jetson-nano/tests/object-detection# oc label node nano.example.com smarter-device-manager=enabled --overwrite
node/nano.example.com labeled
root@nano:~/microshift/jetson-nano/tests/object-detection# oc logs -n sdm ds/smarter-device-manager
root@nano:~/microshift/jetson-nano/tests/object-detection# oc get ds,pods -n sdm
NAME                                    DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                    AGE
daemonset.apps/smarter-device-manager   1         1         1       1            1           smarter-device-manager=enabled   25s

NAME                               READY   STATUS    RESTARTS   AGE
pod/smarter-device-manager-jm9dh   1/1     Running   0          25s

We can see the Capacity (20), Allocatable (20) and Allocated(0) for smarter-devices/video0 (along with other devices).

root@nano:~/microshift/jetson-nano/tests/object-detection# oc describe nodes
Name:               nano.example.com
…
Capacity:
  cpu:                                           4
  ephemeral-storage:                             59964524Ki
  hugepages-2Mi:                                 0
  memory:                                        4051048Ki
…
  smarter-devices/video0:                        20
Allocatable:
  cpu:                                           4
  ephemeral-storage:                             55263305227
  hugepages-2Mi:                                 0
  memory:                                        3948648Ki
…
  smarter-devices/video0:                        20
Allocated resources:
…
  smarter-devices/video0                        0                0

Now we can create the new deployment

root@nano:~/microshift/jetson-nano/tests/object-detection# oc apply -f inference-sdm.yaml
deployment.apps/inference-deployment created

If we describe the node again, we will see that the video0 has been allocated in “Allocated resources” and we will see the pictures and web socket messages being sent to Node Red

root@nano:~/microshift/jetson-nano/tests/object-detection# oc describe nodes
Name:               microshift.example.com
…
Allocated resources:
…
  smarter-devices/video0                        1                1

Let’s delete the deployment. After it is deleted, the “Allocated resources” Requests and Limits go back to 0.

root@nano:~/microshift/jetson-nano/tests/object-detection# oc delete -f inference-sdm.yaml
deployment.apps "inference-deployment" deleted

root@nano:~/microshift/jetson-nano/tests/object-detection# oc describe nodes
Name:               microshift.example.com
…
Allocated resources:
…
  smarter-devices/video0                        0                0

If we disable the smarter-device-manager on the node and try the deployment again, the pod will remain in STATUS=Pending

root@nano:~/microshift/jetson-nano/tests/object-detection# oc label node nano.example.com smarter-device-manager=disabled --overwrite
node/microshift.example.com labeled
root@nano:~/microshift/jetson-nano/tests/object-detection# oc apply -f inference-sdm.yaml
deployment.apps/inference-deployment created
root@nano:~/microshift/jetson-nano/tests/object-detection# oc get pods,deploy
NAME                                        READY   STATUS    RESTARTS   AGE
pod/inference-deployment-757d7c848c-nb5bt   0/1     Pending   0          69s

NAME                                   READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/inference-deployment   0/1     1            0           69s

We need to enable the label again for the deployment to get to Ready state.

root@nano:~/microshift/jetson-nano/tests/object-detection# oc label node microshift.example.com smarter-device-manager=enabled --overwrite
node/microshift.example.com labeled
root@nano:~/microshift/jetson-nano/tests/object-detection# oc get pods,deploy
NAME                                        READY   STATUS    RESTARTS   AGE
pod/inference-deployment-757d7c848c-nb5bt   1/1     Running   0          3m4s

NAME                                   READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/inference-deployment   1/1     1            1           3m4s

Finally, we can delete the sample and the daemonset smarter-device-manager.

root@nano:~/microshift/jetson-nano/tests/object-detection# oc delete -f inference-sdm.yaml
deployment.apps "inference-deployment" deleted
root@nano:~/microshift/jetson-nano/tests/object-detection# oc delete -f smarter-device-manager-ds.yaml -f video0-configmap.yaml
namespace "sdm" deleted
daemonset.apps "smarter-device-manager" deleted
configmap "smarter-device-manager" deleted

Using the NVIDIA/k8s-device-plugin

You may download the preconfigured nvidia-device-plugin.yml that points to precreated image and skip to “Apply” it below or build the plugin. To build, we can use the instructions from NVIDIA K8s Device Plugin for Wind River Linux to create a custom device plugin that allows the cluster to expose the number of GPUs on NVIDIA Jetson devices. The patch checks for the file /sys/module/tegra_fuse/parameters/tegra_chip_id and does not perform health checks for Jetson.

Build

git clone -b 1.0.0-beta6 https://github.com/NVIDIA/k8s-device-plugin.git
cd ../k8s-device-plugin/
wget https://labs.windriver.com/downloads/0001-arm64-add-support-for-arm64-architectures.patch
wget https://labs.windriver.com/downloads/0002-nvidia-Add-support-for-tegra-boards.patch
wget https://labs.windriver.com/downloads/0003-main-Add-support-for-tegra-boards.patch
git am 000*.patch
sed "s/ubuntu:16.04/ubuntu:18.04/" docker/arm64/Dockerfile.ubuntu16.04 > docker/arm64/Dockerfile.ubuntu18.04
docker build -t karve/k8s-device-plugin:1.0.0-beta6 -f docker/arm64/Dockerfile.ubuntu18.04 .
docker push karve/k8s-device-plugin:1.0.0-beta6
sed -i "s|image: .*|image: karve/k8s-device-plugin:1.0.0-beta6|" nvidia-device-plugin.yml # Change the image to karve/k8s-device-plugin:1.0.0-beta6

Apply

oc apply -f nvidia-device-plugin.yml
oc get ds -n kube-system nvidia-device-plugin-daemonset

Output

root@nano:~/k8s-device-plugin# oc get ds -n kube-system nvidia-device-plugin-daemonset
NAME                             DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
nvidia-device-plugin-daemonset   1         1         1       1            1           <none>          7h20m

With the daemonset deployed, NVIDIA GPUs can now be requested by a container using the nvidia.com/gpu resource type. The “oc describe nodes” now shows the nvidia.com/gpu Capacity, Allocatable, and Allocated resources. If we deploy the vector-add job with the resource limit, we will see in the events that only one job gets scheduled at a time even though parallelism was set to 5. When one job finishes, the next one runs.

cd ~/microshift/jetson-nano/jobs
oc apply -f vectoradd-gpu-limit.yaml

Output

root@nano:~/microshift/jetson-nano/jobs# oc apply -f vectoradd-gpu-limit.yaml
job.batch/vectoradd-job created
root@nano:~/microshift/jetson-nano/jobs# oc get events -n default
LAST SEEN   TYPE      REASON             OBJECT                    MESSAGE
33s         Warning   FailedScheduling   pod/vectoradd-job-7n2xz   0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
19s         Warning   FailedScheduling   pod/vectoradd-job-7n2xz   0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
34s         Normal    Scheduled          pod/vectoradd-job-l9cjw   Successfully assigned default/vectoradd-job-l9cjw to microshift.example.com
24s         Normal    Pulled             pod/vectoradd-job-l9cjw   Container image "docker.io/karve/vector-add-sample:arm64-jetsonnano" already present on machine
21s         Normal    Created            pod/vectoradd-job-l9cjw   Created container vectoradd
21s         Normal    Started            pod/vectoradd-job-l9cjw   Started container vectoradd
33s         Warning   FailedScheduling   pod/vectoradd-job-tnmvs   0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
19s         Warning   FailedScheduling   pod/vectoradd-job-tnmvs   0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
33s         Warning   FailedScheduling   pod/vectoradd-job-wtgnn   0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
19s         Warning   FailedScheduling   pod/vectoradd-job-wtgnn   0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
7s          Normal    Scheduled          pod/vectoradd-job-wtgnn   Successfully assigned default/vectoradd-job-wtgnn to microshift.example.com
34s         Warning   FailedScheduling   pod/vectoradd-job-zwjfs   0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
32s         Warning   FailedScheduling   pod/vectoradd-job-zwjfs   0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
19s         Normal    Scheduled          pod/vectoradd-job-zwjfs   Successfully assigned default/vectoradd-job-zwjfs to microshift.example.com
9s          Normal    Pulled             pod/vectoradd-job-zwjfs   Container image "docker.io/karve/vector-add-sample:arm64-jetsonnano" already present on machine
9s          Normal    Created            pod/vectoradd-job-zwjfs   Created container vectoradd
8s          Normal    Started            pod/vectoradd-job-zwjfs   Started container vectoradd
34s         Normal    SuccessfulCreate   job/vectoradd-job         Created pod: vectoradd-job-l9cjw
34s         Normal    SuccessfulCreate   job/vectoradd-job         Created pod: vectoradd-job-wtgnn
34s         Normal    SuccessfulCreate   job/vectoradd-job         Created pod: vectoradd-job-zwjfs
34s         Normal    SuccessfulCreate   job/vectoradd-job         Created pod: vectoradd-job-7n2xz
34s         Normal    SuccessfulCreate   job/vectoradd-job         Created pod: vectoradd-job-tnmvs

Cleanup MicroShift

We can use the script available on github to cleanup the pods and images. If you already cloned the microshift repo from github, you have the script in the ~/microshift/hack directory.

wget https://raw.githubusercontent.com/thinkahead/microshift/main/hack/cleanup.sh
bash ./cleanup.sh

If MicroShift is not stopped cleanly, we are left with mounted volumes and subPaths in the /var/kubelet/pods directory from pods as follows:

tmpfs on /var/lib/kubelet/pods/c6b82b4a-0047-493b-9cb7-58fc0e1aa57b/volumes/kubernetes.io~projected/kube-api-access-7mwm6 type tmpfs (rw,relatime)
tmpfs on /var/lib/kubelet/pods/9d0482f5-574a-4f35-9078-37b0bb093606/volumes/kubernetes.io~projected/kube-api-access-z2t58 type tmpfs (rw,relatime)
tmpfs on /var/lib/kubelet/pods/ead85661-1036-49f9-adb0-8d8a13192d33/volumes/kubernetes.io~projected/kube-api-access-49vp6 type tmpfs (rw,relatime)
tmpfs on /var/lib/kubelet/pods/cc30d05e-34e8-4a5b-9727-48a044bbe7e9/volumes/kubernetes.io~projected/kube-api-access-q4n7k type tmpfs (rw,relatime)
tmpfs on /var/lib/kubelet/pods/0886f052-6bfb-47d5-83fc-c3d2bf9178d7/volumes/kubernetes.io~projected/kube-api-access-gftv9 type tmpfs (rw,relatime)
tmpfs on /var/lib/kubelet/pods/16e47221-6360-4788-a40e-efe289ef19c1/volumes/kubernetes.io~secret/signing-key type tmpfs (rw,relatime)
tmpfs on /var/lib/kubelet/pods/16e47221-6360-4788-a40e-efe289ef19c1/volumes/kubernetes.io~projected/kube-api-access-d8s86 type tmpfs (rw,relatime)
tmpfs on /var/lib/kubelet/pods/ead85661-1036-49f9-adb0-8d8a13192d33/volumes/kubernetes.io~secret/metrics-tls type tmpfs (rw,relatime)
tmpfs on /var/lib/kubelet/pods/0886f052-6bfb-47d5-83fc-c3d2bf9178d7/volumes/kubernetes.io~secret/default-certificate type tmpfs (rw,relatime)
tmpfs on /var/lib/kubelet/pods/3ecb8db3-0a07-4c4c-a2c3-bd50ee9a403e/volumes/kubernetes.io~projected/kube-api-access-dg6kr type tmpfs (rw,relatime)
/dev/mmcblk0p1 on /var/lib/kubelet/pods/3ecb8db3-0a07-4c4c-a2c3-bd50ee9a403e/volume-subpaths/influxdb-config/influxdb/1 type ext4 (rw,relatime,data=ordered)
tmpfs on /var/lib/kubelet/pods/e5fd0e09-154a-4f2f-aee3-c78ab113f116/volumes/kubernetes.io~projected/kube-api-access-2ghr4 type tmpfs (rw,relatime)
/dev/mmcblk0p1 on /var/lib/kubelet/pods/e5fd0e09-154a-4f2f-aee3-c78ab113f116/volume-subpaths/telegraf-config/telegraf/0 type ext4 (rw,relatime,data=ordered)
tmpfs on /var/lib/kubelet/pods/94bea181-015e-4e28-86ed-9748e311a8d9/volumes/kubernetes.io~projected/kube-api-access-hzzfg type tmpfs (rw,relatime)
/dev/mmcblk0p1 on /var/lib/kubelet/pods/94bea181-015e-4e28-86ed-9748e311a8d9/volume-subpaths/grafana-volume/grafana/0 type ext4 (rw,relatime,data=ordered)
/dev/mmcblk0p1 on /var/lib/kubelet/pods/94bea181-015e-4e28-86ed-9748e311a8d9/volume-subpaths/grafana-volume/grafana/1 type ext4 (rw,relatime,data=ordered)
/dev/mmcblk0p1 on /var/lib/kubelet/pods/94bea181-015e-4e28-86ed-9748e311a8d9/volume-subpaths/grafana-volume/grafana/2 type ext4 (rw,relatime,data=ordered)

The cleanup.sh also deletes these volumes to prevent errors with orphaned pods when you restart MicroShift.

    echo "Unmounting /var/lib/kubelet/pods/..."
    mount | grep "^tmpfs.* on /var/lib/kubelet/pods/" | awk "{print \$3}" | xargs -n1 -r umount
    mount | grep "^/dev/.* on /var/lib/kubelet/pods/" | awk "{print \$3}" | xargs -n1 -r umount
    rm -rf /var/lib/kubelet/pods/*

It also deletes the directories in /var/hpvolumes used for persistent volumes by kubevirt-hostpath-provisioner.

    rm -rf /var/hpvolumes/*

Containerized MicroShift

We can run MicroShift within containers in two ways:
  1. MicroShift Containerized – The MicroShift binary runs in a Docker container, CRI-O Systemd service runs directly on the host and data is stored at /var/lib/microshift and /var/lib/kubelet on the host VM.
  2. MicroShift Containerized All-In-One – The MicroShift binary and CRI-O service run within a Docker container and data is stored in a docker volume, microshift-data. This should be used for “Testing and Development” only. The image available in the registry is not setup to use the GPU within the container with cri-o.

Since we currently cannot use the GPU in the latter, we do not use the All-In-One image. For the first approach, CRI-O runs on the host. We already setup CRI-O on the host to use the Nvidia container runtime and will therefore use the first approach that allows the GPU. We will build the image using docker with the Dockerfile.jetsonnano.containerized (from registry.access.redhat.com/ubi8/ubi-init:8.4) and Dockerfile.jetsonnano.containerized2 (from registry.access.redhat.com/ubi8/ubi-minimal:8.4). Note that we use the iptables-1.6.2 that is compatible with iptables on Jetson Nano with Ubuntu 18.04 instead of the iptables v1.8.7 that causes the error “iptables v1.8.7 (nf_tables) Could not fetch rule set generation id: Invalid argument”. Copy the microshift binary that we built earlier to the local directory and run the docker build command as shown below:

cat << EOF > Dockerfile.jetsonnano.containerized
ARG IMAGE_NAME=registry.access.redhat.com/ubi8/ubi-init:8.4
ARG ARCH
FROM ${IMAGE_NAME}

COPY microshift /usr/bin/microshift
RUN chmod +x /usr/bin/microshift

RUN dnf install -y libnetfilter_conntrack libnfnetlink && \
      rpm -v -i --force https://archives.fedoraproject.org/pub/archive/fedora/linux/releases/28/Everything/aarch64/os/Packages/i/iptables-libs-1.6.2-2.fc28.aarch64.rpm \
      https://archives.fedoraproject.org/pub/archive/fedora/linux/releases/28/Everything/aarch64/os/Packages/i/iptables-1.6.2-2.fc28.aarch64.rpm

ENTRYPOINT ["/usr/bin/microshift"]
CMD ["run"]
EOF

cp `which microshift` .
docker build -t docker.io/karve/microshift:jetson-nano-containerized -f Dockerfile.jetsonnano.containerized .
docker build -t docker.io/karve/microshift:jetson-nano-containerized2 -f Dockerfile.jetsonnano.containerized2 .

Check the sizes of the images produced for both:

root@nano:~/microshift/hack/all-in-one# docker images
REPOSITORY             TAG                          IMAGE ID       CREATED         SIZE
karve/microshift       jetson-nano-containerized2   a57e64b63a9a   4 seconds ago   562MB
karve/microshift       jetson-nano-containerized    1e152d79ff65   3 minutes ago   616MB

Run the microshift container

IMAGE=docker.io/karve/microshift:jetson-nano-containerized

docker run --rm --ipc=host --network=host --privileged -d --name microshift -v /var/run:/var/run -v /sys:/sys:ro -v /var/lib:/var/lib:rw,rshared -v /lib/modules:/lib/modules -v /etc:/etc -v /var/hpvolumes:/var/hpvolumes -v /run/containers:/run/containers -v /var/log:/var/log -e KUBECONFIG=/var/lib/microshift/resources/kubeadmin/kubeconfig $IMAGE

export KUBECONFIG=/var/lib/microshift/resources/kubeadmin/kubeconfig

We can see the microshift container running within docker:

root@nano:~# docker ps -a
CONTAINER ID   IMAGE                                        COMMAND                  CREATED          STATUS          PORTS     NAMES
8c924bf44174   karve/microshift:jetson-nano-containerized   "/usr/bin/microshift…"   3 minutes ago   Up 3 minutes             microshift

The microshift process is running within the container:

root@nano:~# docker top microshift -o pid,cmd
PID                 CMD
19997               /usr/bin/microshift run

The rest of the containers run within cri-o on the host:

root@nano:~# crictl pods
POD ID              CREATED             STATE               NAME                                  NAMESPACE                       ATTEMPT             RUNTIME
b678938b7a6a2       3 minutes ago       Ready               dns-default-j7lgj                     openshift-dns                   0                   (default)
01cc8ddd857f8       3 minutes ago       Ready               router-default-85bcfdd948-5x6vf       openshift-ingress               0                   (default)
09a5cce9af718       4 minutes ago       Ready               kube-flannel-ds-8qn5h                 kube-system                     0                   (default)
94809dd53ee44       4 minutes ago       Ready               node-resolver-57xzk                   openshift-dns                   0                   (default)
4616c0c2b7151       4 minutes ago       Ready               service-ca-76674bfb58-bqcf8           openshift-service-ca            0                   (default)
8cdd245d69c96       4 minutes ago       Ready               kubevirt-hostpath-provisioner-jg5pc   kubevirt-hostpath-provisioner   0                   (default)

Now, we can run the samples shown earlier.

After we are done, we can delete the microshift container. The --rm we used in the docker run will delete the container when we stop it.

docker stop microshift

After it is stopped, we can run the cleanup.sh as in previous section.

Conclusion

In this Part 7, we saw how to build and run MicroShift directly on the Jetson Nano with Ubuntu 20.04. We ran samples that used persistent volume for the time series database InfluxDB, GPU for inferencing, and USB camera. We worked with samples (with and without GPU) that sent the pictures and web socket messages to Node Red on IBM Cloud when a person was detected. In Part 8, we will work with MicroShift on Raspberry Pi 4 with balenaOS.

Hope you have enjoyed the article. Share your thoughts in the comments or engage in the conversation with me on Twitter @aakarve. I look forward to hearing about your use of MicroShift on ARM devices and if you would like to see something covered in more detail.

References


​​​​​​​​​​ 

​​​
0 comments
498 views

Permalink