Infrastructure as a Service

 View Only

MicroShift – Part 3: Installing on a Jetson Nano with Ubuntu 18.04

By Alexei Karve posted Mon November 29, 2021 07:47 AM

  

Building and Running MicroShift on a Jetson Nano with Ubuntu 18.04

Introduction

MicroShift is a research project that is exploring how OpenShift OKD Kubernetes distribution can be optimized for small form factor devices and edge computing. In Part 1 of this series, we built and deployed MicroShift on a Fedora 35 Virtual Machine in VirtualBox and also in a Ubuntu 20.04 VM using Multipass on a MacBook Pro.  We saw MicroShift run directly on the host VM and the two options to run it containerized. In Part 2 of this series, we setup the Jetson Nano with Ubuntu 18.04 and installed the dependencies for MicroShift. We worked directly with CRI-O using the crictl CLI. In this Part 3, we will build and deploy MicroShift on a Jetson Nano Developer Kit.

MicroShift is still in it's early days and moving fast. Features are missing. Things break. But it sure is fun to get your hands down and dirty.

Build the MicroShift binary for arm64 on Ubuntu 18.04

We can use the prebuilt microshift binary for arm64 or build it ourselves on the Jetson Nano. We can download a prebuilt MicroShift Image that can be run using docker. Let’s build the microshift binary directly on the Jetson Nano. We already installed the dependencies in Part 2. We login to the Jetson Nano, clone the microshift repository from github, install golang and run make.

ssh dlinano@$ipaddress
sudo su -

apt -y install build-essential curl libgpgme-dev pkg-config libseccomp-dev

# Install golang
wget https://golang.org/dl/go1.17.2.linux-arm64.tar.gz
rm -rf /usr/local/go && tar -C /usr/local -xzf go1.17.2.linux-arm64.tar.gz
rm -f go1.17.2.linux-arm64.tar.gz
export PATH=$PATH:/usr/local/go/bin
export GOPATH=/root/go
cat << EOF >> /root/.bashrc
export PATH=$PATH:/usr/local/go/bin
export GOPATH=/root/go
EOF
mkdir $GOPATH

git clone https://github.com/thinkahead/microshift.git
cd microshift
make
ls -las microshift # binary in current directory /root/microshift

Output:

root@jetsonnano:~/microshift# make
fatal: No names found, cannot describe anything.
go build -mod=vendor -tags 'include_gcs include_oss containers_image_openpgp gssapi providerless netgo osusergo' -ldflags "-X k8s.io/component-base/version.gitMajor=1 -X k8s.io/component-base/version.gitMajor=1 -X k8s.io/component-base/version.gitMinor=21 -X k8s.io/component-base/version.gitVersion=v1.21.0 -X k8s.io/component-base/version.gitCommit=c3b9e07a -X k8s.io/component-base/version.gitTreeState=clean -X k8s.io/component-base/version.buildDate=2021-11-25T13:23:20Z -X k8s.io/client-go/pkg/version.gitMajor=1 -X k8s.io/client-go/pkg/version.gitMinor=21 -X k8s.io/client-go/pkg/version.gitVersion=v1.21.1 -X k8s.io/client-go/pkg/version.gitCommit=b09a9ce3 -X k8s.io/client-go/pkg/version.gitTreeState=clean -X k8s.io/client-go/pkg/version.buildDate=2021-11-25T13:23:20Z -X github.com/openshift/microshift/pkg/version.versionFromGit=4.8.0-0.microshift-unknown -X github.com/openshift/microshift/pkg/version.commitFromGit=d99aa256 -X github.com/openshift/microshift/pkg/version.gitTreeState=clean -X github.com/openshift/microshift/pkg/version.buildDate=2021-11-25T13:23:21Z -s -w" github.com/openshift/microshift/cmd/microshift
root@jetsonnano:~/microshift# ./microshift version
MicroShift Version: 4.8.0-0.microshift-unknown
Base OKD Version: 4.8.0-0.okd-2021-10-10-030117
root@jetsonnano:~/microshift# ls -las microshift
146772 -rwxr-xr-x 1 root root 150291381 Nov 25 13:23 microshift

Move the microshift binary to /usr/local/bin

mv -f microshift /usr/local/bin/.
rm -rf /root/.cache/go-build # Optional Cleanup
cd ..
#rm -rf microshift
/usr/local/bin/microshift version

Output:

root@jetsonnano:~# /usr/local/bin/microshift version
MicroShift Version: 4.8.0-0.microshift-unknown
Base OKD Version: 4.8.0-0.okd-2021-10-10-030117


We may also download the microshift binary from github as follows:

ARCH=arm64
export VERSION=$(curl -sL https://api.github.com/repos/redhat-et/microshift/releases | grep tag_name | head -n 1 | cut -d '"' -f 4) && \
curl -LO https://github.com/redhat-et/microshift/releases/download/$VERSION/microshift-linux-${ARCH}
chmod +x microshift-linux-${ARCH}
mv microshift-linux-${ARCH} /usr/local/bin/microshift


Alternatively, pull the prebuilt microshift image from quay.io and extract the microshift binary to /usr/local/bin.

docker pull quay.io/microshift/microshift:4.8.0-0.microshift-2021-11-19-115908-linux-arm64
id=$(docker create quay.io/microshift/microshift:4.8.0-0.microshift-2021-11-19-115908-linux-arm64)
docker cp $id:/usr/bin/microshift /usr/local/bin/microshift
docker rm -v $id
/usr/local/bin/microshift version

Output:

root@jetsonnano:~# /usr/local/bin/microshift version
MicroShift Version: 4.8.0-0.microshift-2021-11-19-115908
Base OKD Version: 4.8.0-0.okd-2021-10-10-030117

Run MicroShift directly on the Jetson Nano

We have already set up the cri-o in Part 2. Now, we will download the kubectl and oc, create the microshift systemd and start microshift.
# Get kubectl
ARCH=arm64
curl -LO "https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/$ARCH/kubectl"
chmod +x kubectl
mv kubectl /usr/local/bin

# Get oc
wget https://mirror.openshift.com/pub/openshift-v4/arm64/clients/ocp/candidate/openshift-client-linux.tar.gz
mkdir tmp;cd tmp
tar -zxvf ../openshift-client-linux.tar.gz
mv -f oc /usr/local/bin
cd ..
rm -rf tmp
rm -f openshift-client-linux.tar.gz

mkdir /var/hpvolumes # used by hostpath-provisioner
cp /root/microshift/microshift /usr/local/bin/. # If not already done

mkdir /usr/lib/systemd/system
cat << EOF | sudo tee /usr/lib/systemd/system/microshift.service
[Unit]
Description=Microshift
After=crio.service

[Service]
WorkingDirectory=/usr/local/bin/
ExecStart=/usr/local/bin/microshift run
Restart=always
User=root

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable microshift --now
#systemctl start microshift
systemctl status microshift
journalctl -u microshift -f # Ctrl-C to break

mkdir -p $HOME/.kube
if [ -f $HOME/.kube/config ]; then
    mv $HOME/.kube/config $HOME/.kube/config.orig
fi
KUBECONFIG=/var/lib/microshift/resources/kubeadmin/kubeconfig:$HOME/.kube/config.orig /usr/local/bin/kubectl config view --flatten > $HOME/.kube/config
watch "kubectl get nodes;kubectl get pods -A;crictl pods;crictl images"
#watch "oc get nodes;oc get pods -A;crictl pods;crictl images"

Output when MicroShift is started properly:

watch "kubectl get nodes;kubectl get pods -A;kubectl get pv,pvc -n default;crictl images;crictl pods"

NAME STATUS ROLES AGE VERSION jetson-nano.example.com Ready <none> 6m3s v1.20.1 NAMESPACE NAME READY STATUS RESTARTS AGE kube-system kube-flannel-ds-fzfn6 1/1 Running 0 5m24s kubevirt-hostpath-provisioner kubevirt-hostpath-provisioner-98frt 1/1 Running 0 5m26s openshift-dns dns-default-bd92w 3/3 Running 0 5m25s openshift-ingress router-default-79f7dc4c6b-2p4nb 1/1 Running 0 5m25s openshift-service-ca service-ca-58798776fb-b7dkb 1/1 Running 0 5m26s IMAGE TAG IMAGE ID SIZE k8s.gcr.io/pause 3.6 7d46a07936af9 492kB quay.io/microshift/coredns 1.6.9 2e234fad5a864 264MB quay.io/microshift/flannel v0.14.0 996759f548df5 149MB quay.io/microshift/hostpath-provisioner v0.9.0 e96859fbded4f 39.2MB quay.io/microshift/kube-rbac-proxy v0.11.0 03509ac20d4d7 41.5MB quay.io/microshift/openshift-router 4.5 2ade343656684 123MB quay.io/microshift/service-ca-operator latest 0fedc7575c705 152MB POD ID CREATED STATE NAME NAMESPACE ATTEMPT RUNTIME d1dcfec0e2bc2 4 minutes ago Ready dns-default-bd92w openshift-dns 0 (default) e02d8d7847572 4 minutes ago Ready router-default-79f7dc4c6b-2p4nb openshift-ingress 0 (default) a9ae0fe8b5a14 5 minutes ago Ready kube-flannel-ds-fzfn6 kube-system 0 (default) bd375903e4d38 5 minutes ago Ready kubevirt-hostpath-provisioner-98frt kubevirt-hostpath-provisioner 0 (default) b1331d994b844 5 minutes ago Ready service-ca-58798776fb-b7dkb openshift-service-ca 0 (default)

Samples to run on MicroShift

We will run a few samples that will show the use of helm, persistent volume, GPU, and the USB camera.

1. Mysql database server

Download helm and run the mysql server in a container with hostpath persistent volume and a mysql client container

# Install helm
curl -o helm-v3.5.2-linux-arm64.tar.gz  https://get.helm.sh/helm-v3.5.2-linux-arm64.tar.gz
tar -zxvf helm-v3.5.2-linux-arm64.tar.gz
cp linux-arm64/helm /usr/local/bin
rm -rf linux-arm64
rm -f helm-v3.5.2-linux-arm64.tar.gz
chmod 600 /var/lib/microshift/resources/kubeadmin/kubeconfig
chmod 600 /root/.kube/config

# Add the repo for mysql helm chart
helm repo add stable https://charts.helm.sh/stable

oc project default
# Install mysql with provided image tag (hacky way to use the sha256 tag for the arm64 image) and my-user as the userid with custom passwords for root and my-user
helm install mysql stable/mysql --set mysqlRootPassword=secretpassword,mysqlUser=my-user,mysqlPassword=my-password,mysqlDatabase=my-database --set persistence.enabled=true --set storageClass=kubevirt-hostpath-provisioner --set image=mysql/mysql-server@sha256 --set imageTag=5e373bcea878b3657937c68cdefa8a1504f53e356ac19a3e51bf515e41e0c48c
helm list

# Remember to delete the /var/hpvolumes/mysql if it already exists (otherwise it will use old password from previous run)
rm -rf /var/hpvolumes/mysql

cat << EOF > hostpathpv.yaml
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: hostpath-provisioner
spec:
  #storageClassName: "kubevirt-hostpath-provisioner"
  capacity:
    storage: 8Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/var/hpvolumes/mysql"
...
EOF

# Create the persistent volume
kubectl apply -f hostpathpv.yaml

# Wait for the pod to be Running
kubectl get pods -n default -w

kubectl get svc mysql # Note down the CLUSTER-IP
export ipofmysqlserver=<from above>

# Start a client container and install the mysql client within it and login using the my-user userid
kubectl run -i --tty ubuntu --image=ubuntu:18.04 --restart=Never -- bash -il
apt-get update && apt-get install mysql-client -y
# replace correct $ipofmysqlserver as seen from few lines below
mysql -h$ipofmysqlserver -umy-user -pmy-password
quit
exit
oc delete pod ubuntu # Delete the client pod
helm delete mysql # Delete the deployment
oc delete -f hostpathpv.yaml # Delete the persistent volume

2. Nginx web server

Create the file nginx.yaml with the deployment and service

cat << EOF > nginx.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2 # tells deployment to run 2 pods matching the template
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginxinc/nginx-unprivileged:alpine # arm64 image
        ports:
        - containerPort: 8080
        # resource required for hpa
        resources:
          requests:
            memory: 128M
            cpu: 125m
          limits:
            memory: 1024M
            cpu: 1000m
---
apiVersion: v1
kind: Service
metadata:
 name: nginx-svc
 labels:
   app: nginx
spec:
 type: NodePort
 ports:
 - port: 8080
   nodePort: 30080
 selector:
   app: nginx
...
EOF

# Create the deployment and service. Test it.
kubectl apply -f nginx.yaml
kubectl get svc nginx-svc # see the port 8080:30080

Now we can access nginx on the NodePort 30080

curl localhost:30080

If we add following the line to /etc/hosts on the Jetson Nano

127.0.0.1       localhost nginx-svc-default.cluster.local

and run

oc expose svc nginx-svc

We can access nginx on port 80 using the route on the Jetson Nano

curl localhost

Finally on your Laptop/MacBook Pro, you can add the line with ipaddress of the Jetson Nano with nginx-svc-default.cluster.local to /etc/hosts. Then access nginx at http://nginx-svc-default.cluster.local/

We can delete nginx deployment and route with:

oc delete -f nginx.yaml
oc delete route nginx-svc

3. Devicequery

Create a Job using the devicequery.yaml.  The Dockerfile to create the devicequery:arm64-jetsonnano image was shown earlier in crio samples in Part 2.

cat << EOF > devicequery.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: devicequery-job
spec:
  parallelism: 1
  completions: 1
  activeDeadlineSeconds: 1800
  backoffLimit: 6
  template:
    metadata:
      labels:
        app: devicequery
    spec:
      containers:
      - name: devicequery
        image: docker.io/karve/devicequery:arm64-jetsonnano
      restartPolicy: OnFailure
EOF
oc apply -f devicequery.yaml
oc get job/devicequery-job

Wait for the job to be completed, the output shows that the CUDA device was detected within the container:

oc logs job/devicequery-job

Output:

oc logs job/devicequery-job
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA Tegra X1"
  CUDA Driver Version / Runtime Version          10.2 / 10.2
  CUDA Capability Major/Minor version number:    5.3
  Total amount of global memory:                 3956 MBytes (4148273152 bytes)
  ( 1) Multiprocessors, (128) CUDA Cores/MP:     128 CUDA Cores
  GPU Max Clock rate:                            922 MHz (0.92 GHz)
  Memory Clock rate:                             13 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 262144 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS

Delete the devicequery job

oc delete -f devicequery.yaml

4. VectorAdd

Create the vectoradd.yaml and run the job. The Dockerfile for the vector-add-sample:arm64-jetsonnano image was shown earlier in Part 2 under crio samples.

cat << EOF > vectoradd.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: vectoradd-job
spec:
  parallelism: 1
  completions: 1
  activeDeadlineSeconds: 1800
  backoffLimit: 6
  template:
    metadata:
      labels:
        app: vectoradd
    spec:
      containers:
      - name: vectoradd
        image: docker.io/karve/vector-add-sample:arm64-jetsonnano
      restartPolicy: OnFailure
EOF
oc apply -f vectoradd.yaml
oc get job/vectoradd-job

Wait for the job to be completed

oc logs job/vectoradd-job -f

The output shows that the vector addition of 50000 elements on the CUDA device:

[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

Delete the vectoradd job

oc delete -f vectoradd.yaml

5. Jupyter Lab to access USB camera on /dev/video0

Create the file jupyter.yaml with the deployment and service

cat << EOF > jupyter.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: jupyter-deployment
spec:
  selector:
    matchLabels:
      app: jupyter
  replicas: 1
  template:
    metadata:
      labels:
        app: jupyter
    spec:
      containers:
      - name: jupyter
        image: nvcr.io/nvidia/dli/dli-nano-ai:v2.0.1-r32.6.1
        imagePullPolicy: IfNotPresent
        command: ["/bin/bash", "-c", "jupyter lab --LabApp.token='' --LabApp.password='' --ip 0.0.0.0 --port 8888 --allow-root &> /var/log/jupyter.log && sleep infinity"]
        securityContext:
          privileged: true
          #allowPrivilegeEscalation: false
          #capabilities:
          #  drop: ["ALL"]
        ports:
        - containerPort: 8888
        # resource required for hpa
        resources:
          requests:
            memory: 128M
            cpu: 125m
          limits:
            memory: 2048M
            cpu: 1000m
        volumeMounts:
          - name: dev-video0
            mountPath: /dev/video0
      volumes:
        - name: dev-video0
          hostPath:
            path: /dev/video0

---
apiVersion: v1
kind: Service
metadata:
 name: jupyter-svc
 labels:
   app: jupyter
spec:
 type: NodePort
 ports:
 - port: 8888
   nodePort: 30080
 selector:
   app: jupyter
EOF
oc apply -f jupyter.yaml
oc expose svc jupyter-svc

Now we can add the line with ipaddress of the Jetson Nano with jupyter-svc-default.cluster.local to the /etc/hosts on your laptop/MacBook Pro and access the jupyterlab at http://jupyter-svc-default.cluster.local/lab?

Navigate to the hello_camera/usb_camera.ipynb and run the notebook.

You can delete the jupyterlab with:

oc delete -f jupyter.yaml

6. Install Metrics Server

This will enable us to run the “kubectl top” and “oc adm top” commands using the metrics-server-components.yaml

wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml -O metrics-server-components.yaml
export KUBECONFIG=/var/lib/microshift/resources/kubeadmin/kubeconfig
kubectl apply -f metrics-server-components.yaml

# Wait for the metrics-server to start in the kube-system namespace
kubectl get deployment metrics-server -n kube-system
kubectl get events -n kube-system

# Wait for a couple of minutes for metrics to be collected
kubectl top nodes;kubectl top pods -A
oc adm top nodes;oc adm top pods -A

Output:

watch "kubectl top nodes;kubectl top pods -A"

NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% jetsonnano 902m 22% 2220Mi 57% NAMESPACE NAME CPU(cores) MEMORY(bytes) kube-system metrics-server-dbf765b9b-8p6wr 15m 17Mi kubevirt-hostpath-provisioner kubevirt-hostpath-provisioner-fsmkm 2m 9Mi openshift-dns dns-default-lqktl 10m 25Mi openshift-dns node-resolver-d95pz 0m 7Mi openshift-ingress router-default-85bcfdd948-khkpd 6m 38Mi openshift-service-ca service-ca-76674bfb58-rkcm5 16m 43Mi

7. Object Detection demo to send pictures and web socket messages to Node Red


The Object Detection sample will detect objects. When a person is detected, it will send a Web Socket message with the bounding box information and a picture to Node Red.

Let’s install Node Red on IBM Cloud. We will use Node Red to show pictures and chat messages sent from the Jetson Nano. Alternatively, we can use the Node Red that we deployed as an application in MicroShift on the MacBook Pro in VirtualBox in Part 1.

  1. Create an IBM Cloud free tier account at https://www.ibm.com/cloud/free and login to Console (top right).
  2. Create an API Key and save it, Manage->Access->IAM->API Key->Create an IBM Cloud API Key
  3. Click on Catalog and Search for "Node-Red App", select it and click on "Get Started"
  4. Give a unique App name, for example xxxxx-node-red and select the region nearest to you
  5. Select the Pricing Plan Lite, if you already have an existing instance of Cloudant, you may select it in Pricing Plan
  6. Click Create
  7. Under Deployment Automation -> Configure Continuous Delivery, click on "Deploy your app"
  8. Select the deployment target Cloud Foundry that provides a Free-Tier of 256 MB cost-free or Code Engine. The latter has monthly limits and takes more time to deploy. [ Note: Cloud Foundry is deprecated, use the IBM Cloud Code Engine. Any IBM Cloud Foundry application runtime instances running IBM Cloud Foundry applications will be permanently disabled and deprovisioned ]
  9. Enter the IBM Cloud API Key from Step 2, or click on "New" to create one
  10. The rest of the fields Region, Organization, Space will automatically get filled up. Use the default 256MB Memory and click "Next"
  11. In "Configure the DevOps toolchain", click Create
  12. Wait for 10 minutes for the Node Red instance to start
  13. Click on the "Visit App URL"
  14. On the Node Red page, create a new userid and password
  15. In Manage Palette, install the node-red-contrib-image-tools, node-red-contrib-image-output, and node-red-node-base64
  16. Import the Chat flow and the Picture (Image) display flow. On the Chat flow, you will need to edit the template node line 35 to use wss:// (on IBM Cloud) instead of ws:// (on your Laptop)
  17. On another browser tab, start the https://mynodered.mybluemix.net/chat (Replace mynodered with your IBM Cloud Node Red URL)
  18. On the Image flow, click on the square box to the right of image preview or viewer to Deactivate and Activate the Node. You will be able to see the picture when you Activate the Node


Running the sample directly on Jetson Nano

Set up the Jetson Inference on the Jetson Nano

sudo su -
apt-get update
apt-get install git cmake libpython3-dev python3-numpy
cd /
git clone --recursive https://github.com/dusty-nv/jetson-inference
cd jetson-inference
mkdir build
cd build
cmake ../
make -j$(nproc)
make install
ldconfig

Clone the object-detection sample

cd ~
git clone https://github.com/thinkahead/microshift.git
cd microshift/jetson-nano/tests/object-detection
pip3 install websocket-client

Edit the URLs in my-detection.py and/or my-detection2.py to point to your Node Red instance. This loads the models from /usr/local/bin/networks/SSD-Mobilenet-v2

python3 my-detection.py # Ctrl-C multiple times to stop
python3 my-detection2.py # Ctrl-C multiple times to stop

Look at the Chat application and the Picture flow started in Node Red.


Running the demo in Docker

Now, let’s try this demo from Docker. The docker/run.sh can download the models into the container. However, to run the container without the docker/run.sh or to run it in MicroShift without attaching the data directory as a volume, let’s copy the SSD-Mobilenet-v2 to the local folder and have the Dockerfile copy it into the image.

cp -r /jetson-inference/data/networks/SSD-Mobilenet-v2 . # if not present in github
docker build -t docker.io/karve/jetson-inference:r32.6.1 .
docker push docker.io/karve/jetson-inference:r32.6.1

cd ~/jetson-inference
docker/run.sh --container karve/jetson-inference:r32.6.1

or

docker run --runtime nvidia --rm --privileged -it karve/jetson-inference:r32.6.1 bash
./runme.sh


Running the demo in MicroShift

You can update the WebSocketURL, ImageUploadURL and VideoSource in inference.yaml to point to your video source and URLs in Node Red on IBM Cloud or to the Node Red you installed in Microshift on your Laptop. For the latter, you will need to add hostAliases with the ip address of your Laptop. Then, create the deployment with the oc apply command and look at the Chat application and the Picture flow started in Node Red. It will take a couple of minutes to initially load the model.

crictl pull docker.io/karve/jetson-inference:r32.6.1  # Optional
oc apply -f inference.yaml

To stop this object-detection sample, we can delete the deployment

oc delete -f inference.yaml

We can also run the object-detection demo without the GPU for comparison.

Smarter-Device-Manager

Applications running inside a container do not have access to device drivers unless explicitly given access. Smarter-device-manager enables controlled access for containers deployed using Kubernetes to devices (device drivers) available on the node. In the object detection sample above, we used the deployment with securityContext privileged. We want to avoid the privileged. With docker, we can use --device /dev/video0:/dev/video0. In Kubernetes, we don’t have --device. Instead of using the securityContext with privileged: true, we can use the smarter-device-manager without the privileged in the Object Detection demo from above. The inference-sdm.yaml shows the modified deployment. The daemonset and custom configmap for the smarter-device-manager need to be created in some namespace (we use sdm).

We first install the smarter device manager and label the node to enable it.

cd ~/microshift/jetson-nano/tests/object-detection
oc apply -f smarter-device-manager-ds.yaml -f video0-configmap.yaml
oc label node microshift.example.com smarter-device-manager=enabled
oc get ds,pods -n sdm

Output

root@microshift:~/microshift/jetson-nano/tests/object-detection# oc apply -f smarter-device-manager-ds.yaml -f video0-configmap.yaml
namespace/sdm created
daemonset.apps/smarter-device-manager created
configmap/smarter-device-manager created
root@microshift:~/microshift/jetson-nano/tests/object-detection# oc label node microshift.example.com smarter-device-manager=enabled --overwrite
node/microshift.example.com labeled
root@microshift:~/microshift/jetson-nano/tests/object-detection# oc logs -n sdm ds/smarter-device-manager
root@microshift:~/microshift/jetson-nano/tests/object-detection# oc get ds,pods -n sdm
NAME                                    DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                    AGE
daemonset.apps/smarter-device-manager   1         1         1       1            1           smarter-device-manager=enabled   25s

NAME                               READY   STATUS    RESTARTS   AGE
pod/smarter-device-manager-jm9dh   1/1     Running   0          25s

We can see the Capacity (20), Allocatable (20) and Allocated(0) for smarter-devices/video0 (along with other devices).

root@microshift:~/microshift/jetson-nano/tests/object-detection# oc describe nodes
Name:               microshift.example.com
…
Capacity:
  cpu:                                           4
  ephemeral-storage:                             59964524Ki
  hugepages-2Mi:                                 0
  memory:                                        4051048Ki
…
  smarter-devices/video0:                        20
Allocatable:
  cpu:                                           4
  ephemeral-storage:                             55263305227
  hugepages-2Mi:                                 0
  memory:                                        3948648Ki
…
  smarter-devices/video0:                        20
Allocated resources:
…
  smarter-devices/video0                        0                0

Now we can create the new deployment

root@microshift:~/microshift/jetson-nano/tests/object-detection# oc apply -f inference-sdm.yaml
deployment.apps/inference-deployment created

If we describe the node again, we will see that the video0 has been allocated in “Allocated resources” and we will see the pictures and web socket messages being sent to Node Red.

root@microshift:~/microshift/jetson-nano/tests/object-detection# oc describe nodes
Name:               microshift.example.com
…
Allocated resources:
…
  smarter-devices/video0                        1                1

Let’s delete the deployment

root@microshift:~/microshift/jetson-nano/tests/object-detection# oc delete -f inference-sdm.yaml
deployment.apps "inference-deployment" deleted

If we disable the smarter-device-manager on the node and try the deployment again, the pod will remain in Pending state

root@microshift:~/microshift/jetson-nano/tests/object-detection# oc label node microshift.example.com smarter-device-manager=disabled --overwrite
node/microshift.example.com labeled
root@microshift:~/microshift/jetson-nano/tests/object-detection# oc apply -f inference-sdm.yaml
deployment.apps/inference-deployment created
root@microshift:~/microshift/jetson-nano/tests/object-detection# oc get pods,deploy
NAME                                        READY   STATUS    RESTARTS   AGE
pod/inference-deployment-757d7c848c-nb5bt   0/1     Pending   0          69s

NAME                                   READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/inference-deployment   0/1     1            0           69s

We need to enable the label again for the pod to show Running status and the deployment to get to Ready state.

root@microshift:~/microshift/jetson-nano/tests/object-detection# oc label node microshift.example.com smarter-device-manager=enabled --overwrite
node/microshift.example.com labeled
root@microshift:~/microshift/jetson-nano/tests/object-detection# oc get pods,deploy
NAME                                        READY   STATUS    RESTARTS   AGE
pod/inference-deployment-757d7c848c-nb5bt   1/1     Running   0          3m4s

NAME                                   READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/inference-deployment   1/1     1            1           3m4s

Finally, we can delete the sample and the daemonset smarter-device-manager.

root@microshift:~/microshift/jetson-nano/tests/object-detection# oc delete -f inference-sdm.yaml
deployment.apps "inference-deployment" deleted
root@microshift:~/microshift/jetson-nano/tests/object-detection# oc delete -f smarter-device-manager-ds.yaml -f video0-configmap.yaml
namespace "sdm" deleted
daemonset.apps "smarter-device-manager" deleted
configmap "smarter-device-manager" deleted

Using the NVIDIA/k8s-device-plugin

You may download the preconfigured nvidia-device-plugin.yml that points to precreated image and skip to “Apply” it below or build the plugin. To build, we can use the instructions from NVIDIA K8s Device Plugin for Wind River Linux to create a custom device plugin that allows the cluster to expose the number of GPUs on NVIDIA Jetson devices. The patch checks for the file /sys/module/tegra_fuse/parameters/tegra_chip_id and does not perform health checks for Jetson.

git clone -b 1.0.0-beta6 https://github.com/NVIDIA/k8s-device-plugin.git
cd ../k8s-device-plugin/
wget https://labs.windriver.com/downloads/0001-arm64-add-support-for-arm64-architectures.patch
wget https://labs.windriver.com/downloads/0002-nvidia-Add-support-for-tegra-boards.patch
wget https://labs.windriver.com/downloads/0003-main-Add-support-for-tegra-boards.patch
git am 000*.patch
sed "s/ubuntu:16.04/ubuntu:18.04/" docker/arm64/Dockerfile.ubuntu16.04 > docker/arm64/Dockerfile.ubuntu18.04
docker build -t karve/k8s-device-plugin:1.0.0-beta6 -f docker/arm64/Dockerfile.ubuntu18.04 .
docker push karve/k8s-device-plugin:1.0.0-beta6
sed -i "s|image: .*|image: karve/k8s-device-plugin:1.0.0-beta6|" nvidia-device-plugin.yml # Change the image to karve/k8s-device-plugin:1.0.0-beta6

Apply

oc apply -f nvidia-device-plugin.yml
oc get ds -n kube-system nvidia-device-plugin-daemonset

Output

NAME                             DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
nvidia-device-plugin-daemonset   1         1         1       1            1           

With the daemonset deployed, NVIDIA GPUs can now be requested by a container using the nvidia.com/gpu resource type. The “oc describe nodes” now shows the nvidia.com/gpu Capacity, Allocatable, and Allocated resources. If we deploy the vector-add job with the resource limit, we will see in the events that only one job gets scheduled at a time even though parallelism was set to 5. When one job finishes, the next one runs.

root@microshift:~/microshift/jetson-nano/jobs# oc apply -f vectoradd-gpu-limit.yaml
job.batch/vectoradd-job created
root@microshift:~/microshift/jetson-nano/jobs# oc get events -n default
LAST SEEN   TYPE      REASON             OBJECT                    MESSAGE
33s         Warning   FailedScheduling   pod/vectoradd-job-7n2xz   0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
19s         Warning   FailedScheduling   pod/vectoradd-job-7n2xz   0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
34s         Normal    Scheduled          pod/vectoradd-job-l9cjw   Successfully assigned default/vectoradd-job-l9cjw to microshift.example.com
24s         Normal    Pulled             pod/vectoradd-job-l9cjw   Container image "docker.io/karve/vector-add-sample:arm64-jetsonnano" already present on machine
21s         Normal    Created            pod/vectoradd-job-l9cjw   Created container vectoradd
21s         Normal    Started            pod/vectoradd-job-l9cjw   Started container vectoradd
33s         Warning   FailedScheduling   pod/vectoradd-job-tnmvs   0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
19s         Warning   FailedScheduling   pod/vectoradd-job-tnmvs   0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
33s         Warning   FailedScheduling   pod/vectoradd-job-wtgnn   0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
19s         Warning   FailedScheduling   pod/vectoradd-job-wtgnn   0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
7s          Normal    Scheduled          pod/vectoradd-job-wtgnn   Successfully assigned default/vectoradd-job-wtgnn to microshift.example.com
34s         Warning   FailedScheduling   pod/vectoradd-job-zwjfs   0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
32s         Warning   FailedScheduling   pod/vectoradd-job-zwjfs   0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
19s         Normal    Scheduled          pod/vectoradd-job-zwjfs   Successfully assigned default/vectoradd-job-zwjfs to microshift.example.com
9s          Normal    Pulled             pod/vectoradd-job-zwjfs   Container image "docker.io/karve/vector-add-sample:arm64-jetsonnano" already present on machine
9s          Normal    Created            pod/vectoradd-job-zwjfs   Created container vectoradd
8s          Normal    Started            pod/vectoradd-job-zwjfs   Started container vectoradd
34s         Normal    SuccessfulCreate   job/vectoradd-job         Created pod: vectoradd-job-l9cjw
34s         Normal    SuccessfulCreate   job/vectoradd-job         Created pod: vectoradd-job-wtgnn
34s         Normal    SuccessfulCreate   job/vectoradd-job         Created pod: vectoradd-job-zwjfs
34s         Normal    SuccessfulCreate   job/vectoradd-job         Created pod: vectoradd-job-7n2xz
34s         Normal    SuccessfulCreate   job/vectoradd-job         Created pod: vectoradd-job-tnmvs

Cleanup MicroShift

We can delete the microshift and the images in cri-o with the cleanup.sh script

wget https://raw.githubusercontent.com/thinkahead/microshift/main/hack/cleanup.sh
bash ./cleanup.sh

Containerized MicroShift

We can run MicroShift within containers in two ways:

  1. MicroShift Containerized – The MicroShift binary runs in a Docker container, CRI-O Systemd service runs directly on the host and data is stored at /var/lib/microshift and /var/lib/kubelet on the host VM.
  1. MicroShift Containerized All-In-One – The MicroShift binary and CRI-O service run within a Docker container and data is stored in a docker volume, microshift-data. This should be used for “Testing and Development” only. The image available in the registry is not setup to use the GPU within the container with cri-o.

Since we cannot use the GPU in the latter, we do not use the All-In-One image. For the first approach, CRI-O runs on the host. We already setup CRI-O on the host to use the Nvidia container runtime in Part 2 of this series and will therefore use the first approach that allows the GPU. We will build the image with the Dockerfile.jetsonnano.containerized (from registry.access.redhat.com/ubi8/ubi-init:8.4). Note that we use the iptables-1.6.2 that is compatible with iptables on Jetson Nano with Ubuntu 18.04 instead of the iptables v1.8.7 that causes the error “iptables v1.8.7 (nf_tables) Could not fetch rule set generation id: Invalid argument”. We do not use the default Dockerfile for building the image because of the iptables problem. Copy the microshift binary that we built earlier to the local directory and run the docker build command as shown below:

cat << EOF > Dockerfile.jetsonnano.containerized
ARG IMAGE_NAME=registry.access.redhat.com/ubi8/ubi-init:8.4
ARG ARCH
FROM ${IMAGE_NAME}

COPY microshift /usr/bin/microshift
RUN chmod +x /usr/bin/microshift

RUN dnf install -y libnetfilter_conntrack libnfnetlink && \
      rpm -v -i --force https://archives.fedoraproject.org/pub/archive/fedora/linux/releases/28/Everything/aarch64/os/Packages/i/iptables-libs-1.6.2-2.fc28.aarch64.rpm \
      https://archives.fedoraproject.org/pub/archive/fedora/linux/releases/28/Everything/aarch64/os/Packages/i/iptables-1.6.2-2.fc28.aarch64.rpm

ENTRYPOINT ["/usr/bin/microshift"]
CMD ["run"]
EOF

cp `which microshift` .
docker build -t docker.io/karve/microshift:jetson-nano-containerized -f Dockerfile.jetsonnano.containerized .


Similarly build using the Dockerfile.jetsonnano.containerized2 (from registry.access.redhat.com/ubi8/ubi-minimal:8.4). Check the sizes of the images produced from the ubi-init:8.4 and the ubi-minimal:8.4. We save 54 MB with the ubi-minimal image.

root@microshift:~/microshift/hack/all-in-one# docker images
REPOSITORY                TAG                          IMAGE ID       CREATED          SIZE
karve/microshift          jetson-nano-containerized2   024b990fd269   9 minutes ago    560MB
karve/microshift          jetson-nano-containerized    ed85c5153b65   4 hours ago      614MB

Run the microshift container

IMAGE=docker.io/karve/microshift:jetson-nano-containerized

docker run --rm --ipc=host --network=host --privileged -d --name microshift -v /var/run:/var/run -v /sys:/sys:ro -v /var/lib:/var/lib:rw,rshared -v /lib/modules:/lib/modules -v /etc:/etc -v /run/containers:/run/containers -v /var/log:/var/log -e KUBECONFIG=/var/lib/microshift/resources/kubeadmin/kubeconfig $IMAGE

export KUBECONFIG=/var/lib/microshift/resources/kubeadmin/kubeconfig

We can see the microshift container running within docker:

root@jetsonnano:~# docker ps -a
CONTAINER ID   IMAGE                                        COMMAND                  CREATED          STATUS          PORTS     NAMES
8c924bf44174   karve/microshift:jetson-nano-containerized   "/usr/bin/microshift…"   3 minutes ago   Up 3 minutes             microshift
The microshift process is running within the container:
root@jetsonnano:~# docker top microshift -o pid,cmd
PID                 CMD
19997               /usr/bin/microshift run

The rest of the containers run within cri-o on the host:

root@jetsonnano:~# crictl pods
POD ID            CREATED           STATE   NAME                                  NAMESPACE                       ATTEMPT  RUNTIME
b678938b7a6a2     3 minutes ago     Ready   dns-default-j7lgj                     openshift-dns                   0        (default)
01cc8ddd857f8     3 minutes ago     Ready   router-default-85bcfdd948-5x6vf       openshift-ingress               0        (default)
09a5cce9af718     4 minutes ago     Ready   kube-flannel-ds-8qn5h                 kube-system                     0        (default)
94809dd53ee44     4 minutes ago     Ready   node-resolver-57xzk                   openshift-dns                   0        (default)
4616c0c2b7151     4 minutes ago     Ready   service-ca-76674bfb58-bqcf8           openshift-service-ca            0        (default)
8cdd245d69c96     4 minutes ago     Ready   kubevirt-hostpath-provisioner-jg5pc   kubevirt-hostpath-provisioner   0        (default)

Now, we can run the samples shown earlier.

After we are done, we can stop the microshift container. The --rm we used in the docker run will delete the container when we stop it.

docker stop microshift

After it is stopped, we can run the cleanup.sh as in previous section.

Errors

1. The node was low on resource: [DiskPressure]

If you have less than 10% free disk space on the microSDXC card, the kubevirt-hostpath-provisioner pod may get evicted. This will happen on the 32GB microSDXC card if the disk space cannot be reclaimed after deleting unused images. You will need to create space by deleting some github sources we had downloaded for installation.

rm -rf /root/.cache/go-build # Cleanup to get space on microSDXC card
# You can check the eviction events as follows
kubectl describe nodes
kubectl get events --field-selector involvedObject.kind=Node
kubectl delete events --field-selector involvedObject.kind=Node

2. ImageInspectError

If the pod shows this ImageInspectError state, you may be missing the /etc/containers/registries.conf. You can add that or qualify the image with "docker.io/" or the correct registry.

3. Name resolution problems from the container 

You can map whatever is in dnsConfig under /etc/resolv.conf for the containers. For example, in the inference.yaml, you may add the following

    spec:
      dnsPolicy: "None"
      dnsConfig:
        nameservers:
          - 8.8.8.8
      containers:
      - name: inference

4. Error: failed to initialize NVML: could not load NVML library

The nvidia-device-plugin does not work on Jetson Nano. So, directly add the nvidia-container-runtime-hook to cri-o

root@jetson-nano:~/k8s-device-plugin# docker run -it --privileged --network=none -v /var/lib/kubelet/device-plugins:/var/lib/kubelet/device-plugins docker.io/karve/k8s-device-plugin:arm64-jetsonnano --pass-device-specs2021/10/13 16:34:03 Loading NVML2021/10/13 16:34:03 Failed to initialize NVML: could not load NVML library.2021/10/13 16:34:03 If this is a GPU node, did you set the docker default runtime to `nvidia`?2021/10/13 16:34:03 You can check the prerequisites at: https://github.com/NVIDIA/k8s-device-plugin#prerequisites2021/10/13 16:34:03 You can learn how to set the runtime at: https://github.com/NVIDIA/k8s-device-plugin#quick-start2021/10/13 16:34:03 If this is not a GPU node, you should set up a toleration or nodeSelector to only deploy this plugin on GPU nodes2021/10/13 16:34:03 Error: failed to initialize NVML: could not load NVML library

5. Docker microshift container crashes

The container may crash with “leader election lost” and “etcdserver apply request took too long” and “http: Handler timeout” in log messages
E1130 13:33:24.199060       1 available_controller.go:508] v1.image.openshift.io failed with: failing or missing response from https://192.168.1.208:8444/apis/image.openshift.io/v1: Get "https://192.168.1.208:8444/apis/image.openshift.io/v1": dial tcp 192.168.1.208:8444: connect: connection refused
{"level":"warn","ts":"2021-11-30T13:33:25.921-0500","caller":"etcdserver/util.go:163","msg":"apply request took too long","took":"379.944154ms","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/secrets/kube-system/resourcequota-controller-token-hfhbg\" ","response":"range_response_count:1 size:691"}
E1130 13:59:15.189827       1 writers.go:117] apiserver was unable to write a JSON response: http: Handler timeout
F1130 13:59:16.434857       1 controller_manager.go:105] leaderelection lost

This could be because some application pod is consuming more memory. Your Jetson Nano is low on memory and swapping with high “cpu wait”. You will  need to restart the container if it gets killed. It would help if you could reduce the memory usage by removing the ubuntu desktop as shown in Part 2.

Conclusion

In this Part 3, we saw how to build and run MicroShift directly and containerized on the Jetson Nano. We ran samples that used helm, persistent volume for mysql, GPU for inferencing, and USB camera. We saw a sample that sent the pictures and web socket messages when a person was detected. In Part 4, Part 5 and Part 6, we will look at the multiple options to build and deploy MicroShift on the Raspberry Pi 4.

The Jetson Software Roadmap shows that JetPack 5.0 Developer Preview is planned for 1Q-2022 with Ubuntu 20.04. We will work with MicroShift on the Jetson Nano with Ubuntu 20.04 in Part 7 of this series.

Hope you have enjoyed the article. Share your thoughts in the comments or engage in the conversation with me on Twitter @aakarve. I look forward to hearing about your use of MicroShift on ARM devices and if you would like to see something covered in more detail.

References



​​​​

​​
0 comments
53 views

Permalink