Building and Running MicroShift on a Jetson Nano with Ubuntu 18.04
Introduction
MicroShift is a research project that is exploring how OpenShift OKD Kubernetes distribution can be optimized for small form factor devices and edge computing. In Part 1 of this series, we built and deployed MicroShift on a Fedora 35 Virtual Machine in VirtualBox and also in a Ubuntu 20.04 VM using Multipass on a MacBook Pro. We saw MicroShift run directly on the host VM and the two options to run it containerized. In Part 2 of this series, we setup the Jetson Nano with Ubuntu 18.04 and installed the dependencies for MicroShift. We worked directly with CRI-O using the crictl CLI. In this Part 3, we will build and deploy MicroShift on a Jetson Nano Developer Kit.
MicroShift is still in it's early days and moving fast. Features are missing. Things break. But it sure is fun to get your hands down and dirty.
Build the MicroShift binary for arm64 on Ubuntu 18.04
We can use the prebuilt microshift binary for arm64 or build it ourselves on the Jetson Nano. We can download a prebuilt MicroShift Image that can be run using docker. Let’s build the microshift binary directly on the Jetson Nano. We already installed the dependencies in Part 2. We login to the Jetson Nano, clone the microshift repository from github, install golang and run make.
ssh dlinano@$ipaddress
sudo su -
apt -y install build-essential curl libgpgme-dev pkg-config libseccomp-dev
# Install golang
wget https://golang.org/dl/go1.17.2.linux-arm64.tar.gz
rm -rf /usr/local/go && tar -C /usr/local -xzf go1.17.2.linux-arm64.tar.gz
rm -f go1.17.2.linux-arm64.tar.gz
export PATH=$PATH:/usr/local/go/bin
export GOPATH=/root/go
cat << EOF >> /root/.bashrc
export PATH=$PATH:/usr/local/go/bin
export GOPATH=/root/go
EOF
mkdir $GOPATH
git clone https://github.com/thinkahead/microshift.git
cd microshift
make
ls -las microshift # binary in current directory /root/microshift
Output:
root@jetsonnano:~/microshift# make
fatal: No names found, cannot describe anything.
go build -mod=vendor -tags 'include_gcs include_oss containers_image_openpgp gssapi providerless netgo osusergo' -ldflags "-X k8s.io/component-base/version.gitMajor=1 -X k8s.io/component-base/version.gitMajor=1 -X k8s.io/component-base/version.gitMinor=21 -X k8s.io/component-base/version.gitVersion=v1.21.0 -X k8s.io/component-base/version.gitCommit=c3b9e07a -X k8s.io/component-base/version.gitTreeState=clean -X k8s.io/component-base/version.buildDate=2021-11-25T13:23:20Z -X k8s.io/client-go/pkg/version.gitMajor=1 -X k8s.io/client-go/pkg/version.gitMinor=21 -X k8s.io/client-go/pkg/version.gitVersion=v1.21.1 -X k8s.io/client-go/pkg/version.gitCommit=b09a9ce3 -X k8s.io/client-go/pkg/version.gitTreeState=clean -X k8s.io/client-go/pkg/version.buildDate=2021-11-25T13:23:20Z -X github.com/openshift/microshift/pkg/version.versionFromGit=4.8.0-0.microshift-unknown -X github.com/openshift/microshift/pkg/version.commitFromGit=d99aa256 -X github.com/openshift/microshift/pkg/version.gitTreeState=clean -X github.com/openshift/microshift/pkg/version.buildDate=2021-11-25T13:23:21Z -s -w" github.com/openshift/microshift/cmd/microshift
root@jetsonnano:~/microshift# ./microshift version
MicroShift Version: 4.8.0-0.microshift-unknown
Base OKD Version: 4.8.0-0.okd-2021-10-10-030117
root@jetsonnano:~/microshift# ls -las microshift
146772 -rwxr-xr-x 1 root root 150291381 Nov 25 13:23 microshift
Move the microshift binary to /usr/local/bin
mv -f microshift /usr/local/bin/.
rm -rf /root/.cache/go-build # Optional Cleanup
cd ..
#rm -rf microshift
/usr/local/bin/microshift version
Output:
root@jetsonnano:~# /usr/local/bin/microshift version
MicroShift Version: 4.8.0-0.microshift-unknown
Base OKD Version: 4.8.0-0.okd-2021-10-10-030117
We may also download the microshift binary from github as follows:
ARCH=arm64
export VERSION=$(curl -sL https://api.github.com/repos/redhat-et/microshift/releases | grep tag_name | head -n 1 | cut -d '"' -f 4) && \
curl -LO https://github.com/redhat-et/microshift/releases/download/$VERSION/microshift-linux-${ARCH}
chmod +x microshift-linux-${ARCH}
mv microshift-linux-${ARCH} /usr/local/bin/microshift
Alternatively, pull the prebuilt microshift image from quay.io and extract the microshift binary to /usr/local/bin.
docker pull quay.io/microshift/microshift:4.8.0-0.microshift-2021-11-19-115908-linux-arm64
id=$(docker create quay.io/microshift/microshift:4.8.0-0.microshift-2021-11-19-115908-linux-arm64)
docker cp $id:/usr/bin/microshift /usr/local/bin/microshift
docker rm -v $id
/usr/local/bin/microshift version
Output:
root@jetsonnano:~# /usr/local/bin/microshift version
MicroShift Version: 4.8.0-0.microshift-2021-11-19-115908
Base OKD Version: 4.8.0-0.okd-2021-10-10-030117
Run MicroShift directly on the Jetson Nano
We have already set up the cri-o in
Part 2. Now, we will download the kubectl and oc, create the microshift systemd and start microshift.
# Get kubectl
ARCH=arm64
curl -LO "https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/$ARCH/kubectl"
chmod +x kubectl
mv kubectl /usr/local/bin
# Get oc
wget https://mirror.openshift.com/pub/openshift-v4/arm64/clients/ocp/candidate/openshift-client-linux.tar.gz
mkdir tmp;cd tmp
tar -zxvf ../openshift-client-linux.tar.gz
mv -f oc /usr/local/bin
cd ..
rm -rf tmp
rm -f openshift-client-linux.tar.gz
mkdir /var/hpvolumes # used by hostpath-provisioner
cp /root/microshift/microshift /usr/local/bin/. # If not already done
mkdir /usr/lib/systemd/system
cat << EOF | sudo tee /usr/lib/systemd/system/microshift.service
[Unit]
Description=Microshift
After=crio.service
[Service]
WorkingDirectory=/usr/local/bin/
ExecStart=/usr/local/bin/microshift run
Restart=always
User=root
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable microshift --now
#systemctl start microshift
systemctl status microshift
journalctl -u microshift -f # Ctrl-C to break
mkdir -p $HOME/.kube
if [ -f $HOME/.kube/config ]; then
mv $HOME/.kube/config $HOME/.kube/config.orig
fi
KUBECONFIG=/var/lib/microshift/resources/kubeadmin/kubeconfig:$HOME/.kube/config.orig /usr/local/bin/kubectl config view --flatten > $HOME/.kube/config
watch "kubectl get nodes;kubectl get pods -A;crictl pods;crictl images"
#watch "oc get nodes;oc get pods -A;crictl pods;crictl images"
Output when MicroShift is started properly:
watch "kubectl get nodes;kubectl get pods -A;kubectl get pv,pvc -n default;crictl images;crictl pods"
NAME STATUS ROLES AGE VERSION
jetson-nano.example.com Ready <none> 6m3s v1.20.1
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system kube-flannel-ds-fzfn6 1/1 Running 0 5m24s
kubevirt-hostpath-provisioner kubevirt-hostpath-provisioner-98frt 1/1 Running 0 5m26s
openshift-dns dns-default-bd92w 3/3 Running 0 5m25s
openshift-ingress router-default-79f7dc4c6b-2p4nb 1/1 Running 0 5m25s
openshift-service-ca service-ca-58798776fb-b7dkb 1/1 Running 0 5m26s
IMAGE TAG IMAGE ID SIZE
k8s.gcr.io/pause 3.6 7d46a07936af9 492kB
quay.io/microshift/coredns 1.6.9 2e234fad5a864 264MB
quay.io/microshift/flannel v0.14.0 996759f548df5 149MB
quay.io/microshift/hostpath-provisioner v0.9.0 e96859fbded4f 39.2MB
quay.io/microshift/kube-rbac-proxy v0.11.0 03509ac20d4d7 41.5MB
quay.io/microshift/openshift-router 4.5 2ade343656684 123MB
quay.io/microshift/service-ca-operator latest 0fedc7575c705 152MB
POD ID CREATED STATE NAME NAMESPACE ATTEMPT RUNTIME
d1dcfec0e2bc2 4 minutes ago Ready dns-default-bd92w openshift-dns 0 (default)
e02d8d7847572 4 minutes ago Ready router-default-79f7dc4c6b-2p4nb openshift-ingress 0 (default)
a9ae0fe8b5a14 5 minutes ago Ready kube-flannel-ds-fzfn6 kube-system 0 (default)
bd375903e4d38 5 minutes ago Ready kubevirt-hostpath-provisioner-98frt kubevirt-hostpath-provisioner 0 (default)
b1331d994b844 5 minutes ago Ready service-ca-58798776fb-b7dkb openshift-service-ca 0 (default)
Samples to run on MicroShift
We will run a few samples that will show the use of helm, persistent volume, GPU, and the USB camera.
1. Mysql database server
Download helm and run the mysql server in a container with hostpath persistent volume and a mysql client container
# Install helm
curl -o helm-v3.5.2-linux-arm64.tar.gz https://get.helm.sh/helm-v3.5.2-linux-arm64.tar.gz
tar -zxvf helm-v3.5.2-linux-arm64.tar.gz
cp linux-arm64/helm /usr/local/bin
rm -rf linux-arm64
rm -f helm-v3.5.2-linux-arm64.tar.gz
chmod 600 /var/lib/microshift/resources/kubeadmin/kubeconfig
chmod 600 /root/.kube/config
# Add the repo for mysql helm chart
helm repo add stable https://charts.helm.sh/stable
oc project default
# Install mysql with provided image tag (hacky way to use the sha256 tag for the arm64 image) and my-user as the userid with custom passwords for root and my-user
helm install mysql stable/mysql --set mysqlRootPassword=secretpassword,mysqlUser=my-user,mysqlPassword=my-password,mysqlDatabase=my-database --set persistence.enabled=true --set storageClass=kubevirt-hostpath-provisioner --set image=mysql/mysql-server@sha256 --set imageTag=5e373bcea878b3657937c68cdefa8a1504f53e356ac19a3e51bf515e41e0c48c
helm list
# Remember to delete the /var/hpvolumes/mysql if it already exists (otherwise it will use old password from previous run)
rm -rf /var/hpvolumes/mysql
cat << EOF > hostpathpv.yaml
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: hostpath-provisioner
spec:
#storageClassName: "kubevirt-hostpath-provisioner"
capacity:
storage: 8Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/var/hpvolumes/mysql"
...
EOF
# Create the persistent volume
kubectl apply -f hostpathpv.yaml
# Wait for the pod to be Running
kubectl get pods -n default -w
kubectl get svc mysql # Note down the CLUSTER-IP
export ipofmysqlserver=<from above>
# Start a client container and install the mysql client within it and login using the my-user userid
kubectl run -i --tty ubuntu --image=ubuntu:18.04 --restart=Never -- bash -il
apt-get update && apt-get install mysql-client -y
# replace correct $ipofmysqlserver as seen from few lines below
mysql -h$ipofmysqlserver -umy-user -pmy-password
quit
exit
oc delete pod ubuntu # Delete the client pod
helm delete mysql # Delete the deployment
oc delete -f hostpathpv.yaml # Delete the persistent volume
2. Nginx web server
Create the file nginx.yaml with the deployment and service
cat << EOF > nginx.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
selector:
matchLabels:
app: nginx
replicas: 2 # tells deployment to run 2 pods matching the template
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginxinc/nginx-unprivileged:alpine # arm64 image
ports:
- containerPort: 8080
# resource required for hpa
resources:
requests:
memory: 128M
cpu: 125m
limits:
memory: 1024M
cpu: 1000m
---
apiVersion: v1
kind: Service
metadata:
name: nginx-svc
labels:
app: nginx
spec:
type: NodePort
ports:
- port: 8080
nodePort: 30080
selector:
app: nginx
...
EOF
# Create the deployment and service. Test it.
kubectl apply -f nginx.yaml
kubectl get svc nginx-svc # see the port 8080:30080
Now we can access nginx on the NodePort 30080
curl localhost:30080
If we add following the line to /etc/hosts on the Jetson Nano
127.0.0.1 localhost nginx-svc-default.cluster.local
and run
oc expose svc nginx-svc
We can access nginx on port 80 using the route on the Jetson Nano
curl localhost
Finally on your Laptop/MacBook Pro, you can add the line with ipaddress of the Jetson Nano with nginx-svc-default.cluster.local to /etc/hosts. Then access nginx at http://nginx-svc-default.cluster.local/
We can delete nginx deployment and route with:
oc delete -f nginx.yaml
oc delete route nginx-svc
3. Devicequery
Create a Job using the devicequery.yaml. The Dockerfile to create the devicequery:arm64-jetsonnano image was shown earlier in crio samples in Part 2.
cat << EOF > devicequery.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: devicequery-job
spec:
parallelism: 1
completions: 1
activeDeadlineSeconds: 1800
backoffLimit: 6
template:
metadata:
labels:
app: devicequery
spec:
containers:
- name: devicequery
image: docker.io/karve/devicequery:arm64-jetsonnano
restartPolicy: OnFailure
EOF
oc apply -f devicequery.yaml
oc get job/devicequery-job
Wait for the job to be completed, the output shows that the CUDA device was detected within the container:
oc logs job/devicequery-job
Output:
oc logs job/devicequery-job
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVIDIA Tegra X1"
CUDA Driver Version / Runtime Version 10.2 / 10.2
CUDA Capability Major/Minor version number: 5.3
Total amount of global memory: 3956 MBytes (4148273152 bytes)
( 1) Multiprocessors, (128) CUDA Cores/MP: 128 CUDA Cores
GPU Max Clock rate: 922 MHz (0.92 GHz)
Memory Clock rate: 13 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 262144 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS
Delete the devicequery job
oc delete -f devicequery.yaml
4. VectorAdd
Create the vectoradd.yaml and run the job. The Dockerfile for the vector-add-sample:arm64-jetsonnano image was shown earlier in Part 2 under crio samples.
cat << EOF > vectoradd.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: vectoradd-job
spec:
parallelism: 1
completions: 1
activeDeadlineSeconds: 1800
backoffLimit: 6
template:
metadata:
labels:
app: vectoradd
spec:
containers:
- name: vectoradd
image: docker.io/karve/vector-add-sample:arm64-jetsonnano
restartPolicy: OnFailure
EOF
oc apply -f vectoradd.yaml
oc get job/vectoradd-job
Wait for the job to be completed
oc logs job/vectoradd-job -f
The output shows that the vector addition of 50000 elements on the CUDA device:
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
Delete the vectoradd job
oc delete -f vectoradd.yaml
5. Jupyter Lab to access USB camera on /dev/video0
Create the file jupyter.yaml with the deployment and service
cat << EOF > jupyter.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: jupyter-deployment
spec:
selector:
matchLabels:
app: jupyter
replicas: 1
template:
metadata:
labels:
app: jupyter
spec:
containers:
- name: jupyter
image: nvcr.io/nvidia/dli/dli-nano-ai:v2.0.1-r32.6.1
imagePullPolicy: IfNotPresent
command: ["/bin/bash", "-c", "jupyter lab --LabApp.token='' --LabApp.password='' --ip 0.0.0.0 --port 8888 --allow-root &> /var/log/jupyter.log && sleep infinity"]
securityContext:
privileged: true
#allowPrivilegeEscalation: false
#capabilities:
# drop: ["ALL"]
ports:
- containerPort: 8888
# resource required for hpa
resources:
requests:
memory: 128M
cpu: 125m
limits:
memory: 2048M
cpu: 1000m
volumeMounts:
- name: dev-video0
mountPath: /dev/video0
volumes:
- name: dev-video0
hostPath:
path: /dev/video0
---
apiVersion: v1
kind: Service
metadata:
name: jupyter-svc
labels:
app: jupyter
spec:
type: NodePort
ports:
- port: 8888
nodePort: 30080
selector:
app: jupyter
EOF
oc apply -f jupyter.yaml
oc expose svc jupyter-svc
Now we can add the line with ipaddress of the Jetson Nano with jupyter-svc-default.cluster.local to the /etc/hosts on your laptop/MacBook Pro and access the jupyterlab at http://jupyter-svc-default.cluster.local/lab?
Navigate to the hello_camera/usb_camera.ipynb and run the notebook.
You can delete the jupyterlab with:
oc delete -f jupyter.yaml
6. Install Metrics Server
This will enable us to run the “kubectl top” and “oc adm top” commands using the metrics-server-components.yaml
wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml -O metrics-server-components.yaml
export KUBECONFIG=/var/lib/microshift/resources/kubeadmin/kubeconfig
kubectl apply -f metrics-server-components.yaml
# Wait for the metrics-server to start in the kube-system namespace
kubectl get deployment metrics-server -n kube-system
kubectl get events -n kube-system
# Wait for a couple of minutes for metrics to be collected
kubectl top nodes;kubectl top pods -A
oc adm top nodes;oc adm top pods -A
Output:
watch "kubectl top nodes;kubectl top pods -A"
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
jetsonnano 902m 22% 2220Mi 57%
NAMESPACE NAME CPU(cores) MEMORY(bytes)
kube-system metrics-server-dbf765b9b-8p6wr 15m 17Mi
kubevirt-hostpath-provisioner kubevirt-hostpath-provisioner-fsmkm 2m 9Mi
openshift-dns dns-default-lqktl 10m 25Mi
openshift-dns node-resolver-d95pz 0m 7Mi
openshift-ingress router-default-85bcfdd948-khkpd 6m 38Mi
openshift-service-ca service-ca-76674bfb58-rkcm5 16m 43Mi
7. Object Detection demo to send pictures and web socket messages to Node Red
The Object Detection sample will detect objects. When a person is detected, it will send a Web Socket message with the bounding box information and a picture to Node Red.
Let’s install Node Red on IBM Cloud. We will use Node Red to show pictures and chat messages sent from the Jetson Nano. Alternatively, we can use the Node Red that we deployed as an application in MicroShift on the MacBook Pro in VirtualBox in Part 1.
- Create an IBM Cloud free tier account at https://www.ibm.com/cloud/free and login to Console (top right).
- Create an API Key and save it, Manage->Access->IAM->API Key->Create an IBM Cloud API Key
- Click on Catalog and Search for "Node-Red App", select it and click on "Get Started"
- Give a unique App name, for example xxxxx-node-red and select the region nearest to you
- Select the Pricing Plan Lite, if you already have an existing instance of Cloudant, you may select it in Pricing Plan
- Click Create
- Under Deployment Automation -> Configure Continuous Delivery, click on "Deploy your app"
- Select the deployment target Cloud Foundry that provides a Free-Tier of 256 MB cost-free or Code Engine. The latter has monthly limits and takes more time to deploy. [ Note: Cloud Foundry is deprecated, use the IBM Cloud Code Engine. Any IBM Cloud Foundry application runtime instances running IBM Cloud Foundry applications will be permanently disabled and deprovisioned ]
- Enter the IBM Cloud API Key from Step 2, or click on "New" to create one
- The rest of the fields Region, Organization, Space will automatically get filled up. Use the default 256MB Memory and click "Next"
- In "Configure the DevOps toolchain", click Create
- Wait for 10 minutes for the Node Red instance to start
- Click on the "Visit App URL"
- On the Node Red page, create a new userid and password
- In Manage Palette, install the node-red-contrib-image-tools, node-red-contrib-image-output, and node-red-node-base64
- Import the Chat flow and the Picture (Image) display flow. On the Chat flow, you will need to edit the template node line 35 to use wss:// (on IBM Cloud) instead of ws:// (on your Laptop)
- On another browser tab, start the https://mynodered.mybluemix.net/chat (Replace mynodered with your IBM Cloud Node Red URL)
- On the Image flow, click on the square box to the right of image preview or viewer to Deactivate and Activate the Node. You will be able to see the picture when you Activate the Node
Running the sample directly on Jetson Nano
Set up the Jetson Inference on the Jetson Nano
sudo su -
apt-get update
apt-get install git cmake libpython3-dev python3-numpy
cd /
git clone --recursive https://github.com/dusty-nv/jetson-inference
cd jetson-inference
mkdir build
cd build
cmake ../
make -j$(nproc)
make install
ldconfig
Clone the object-detection sample
cd ~
git clone https://github.com/thinkahead/microshift.git
cd microshift/jetson-nano/tests/object-detection
pip3 install websocket-client
Edit the URLs in my-detection.py and/or my-detection2.py to point to your Node Red instance. This loads the models from /usr/local/bin/networks/SSD-Mobilenet-v2
python3 my-detection.py # Ctrl-C multiple times to stop
python3 my-detection2.py # Ctrl-C multiple times to stop
Look at the Chat application and the Picture flow started in Node Red.
Running the demo in Docker
Now, let’s try this demo from Docker. The docker/run.sh can download the models into the container. However, to run the container without the docker/run.sh or to run it in MicroShift without attaching the data directory as a volume, let’s copy the SSD-Mobilenet-v2 to the local folder and have the Dockerfile copy it into the image.
cp -r /jetson-inference/data/networks/SSD-Mobilenet-v2 . # if not present in github
docker build -t docker.io/karve/jetson-inference:r32.6.1 .
docker push docker.io/karve/jetson-inference:r32.6.1
cd ~/jetson-inference
docker/run.sh --container karve/jetson-inference:r32.6.1
or
docker run --runtime nvidia --rm --privileged -it karve/jetson-inference:r32.6.1 bash
./runme.sh
Running the demo in MicroShift
You can update the WebSocketURL, ImageUploadURL and VideoSource in inference.yaml to point to your video source and URLs in Node Red on IBM Cloud or to the Node Red you installed in Microshift on your Laptop. For the latter, you will need to add hostAliases with the ip address of your Laptop. Then, create the deployment with the oc apply command and look at the Chat application and the Picture flow started in Node Red. It will take a couple of minutes to initially load the model.
crictl pull docker.io/karve/jetson-inference:r32.6.1 # Optional
oc apply -f inference.yaml
To stop this object-detection sample, we can delete the deployment
oc delete -f inference.yaml
We can also run the object-detection demo without the GPU for comparison.
Smarter-Device-Manager
Applications running inside a container do not have access to device drivers unless explicitly given access. Smarter-device-manager enables controlled access for containers deployed using Kubernetes to devices (device drivers) available on the node. In the object detection sample above, we used the deployment with securityContext privileged. We want to avoid the privileged. With docker, we can use --device /dev/video0:/dev/video0. In Kubernetes, we don’t have --device. Instead of using the securityContext with privileged: true, we can use the smarter-device-manager without the privileged in the Object Detection demo from above. The inference-sdm.yaml shows the modified deployment. The daemonset and custom configmap for the smarter-device-manager need to be created in some namespace (we use sdm).
We first install the smarter device manager and label the node to enable it.
cd ~/microshift/jetson-nano/tests/object-detection
oc apply -f smarter-device-manager-ds.yaml -f video0-configmap.yaml
oc label node microshift.example.com smarter-device-manager=enabled
oc get ds,pods -n sdm
Output
root@microshift:~/microshift/jetson-nano/tests/object-detection# oc apply -f smarter-device-manager-ds.yaml -f video0-configmap.yaml
namespace/sdm created
daemonset.apps/smarter-device-manager created
configmap/smarter-device-manager created
root@microshift:~/microshift/jetson-nano/tests/object-detection# oc label node microshift.example.com smarter-device-manager=enabled --overwrite
node/microshift.example.com labeled
root@microshift:~/microshift/jetson-nano/tests/object-detection# oc logs -n sdm ds/smarter-device-manager
root@microshift:~/microshift/jetson-nano/tests/object-detection# oc get ds,pods -n sdm
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/smarter-device-manager 1 1 1 1 1 smarter-device-manager=enabled 25s
NAME READY STATUS RESTARTS AGE
pod/smarter-device-manager-jm9dh 1/1 Running 0 25s
We can see the Capacity (20), Allocatable (20) and Allocated(0) for smarter-devices/video0 (along with other devices).
root@microshift:~/microshift/jetson-nano/tests/object-detection# oc describe nodes
Name: microshift.example.com
…
Capacity:
cpu: 4
ephemeral-storage: 59964524Ki
hugepages-2Mi: 0
memory: 4051048Ki
…
smarter-devices/video0: 20
Allocatable:
cpu: 4
ephemeral-storage: 55263305227
hugepages-2Mi: 0
memory: 3948648Ki
…
smarter-devices/video0: 20
Allocated resources:
…
smarter-devices/video0 0 0
Now we can create the new deployment
root@microshift:~/microshift/jetson-nano/tests/object-detection# oc apply -f inference-sdm.yaml
deployment.apps/inference-deployment created
If we describe the node again, we will see that the video0 has been allocated in “Allocated resources” and we will see the pictures and web socket messages being sent to Node Red.
root@microshift:~/microshift/jetson-nano/tests/object-detection# oc describe nodes
Name: microshift.example.com
…
Allocated resources:
…
smarter-devices/video0 1 1
Let’s delete the deployment
root@microshift:~/microshift/jetson-nano/tests/object-detection# oc delete -f inference-sdm.yaml
deployment.apps "inference-deployment" deleted
If we disable the smarter-device-manager on the node and try the deployment again, the pod will remain in Pending state
root@microshift:~/microshift/jetson-nano/tests/object-detection# oc label node microshift.example.com smarter-device-manager=disabled --overwrite
node/microshift.example.com labeled
root@microshift:~/microshift/jetson-nano/tests/object-detection# oc apply -f inference-sdm.yaml
deployment.apps/inference-deployment created
root@microshift:~/microshift/jetson-nano/tests/object-detection# oc get pods,deploy
NAME READY STATUS RESTARTS AGE
pod/inference-deployment-757d7c848c-nb5bt 0/1 Pending 0 69s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/inference-deployment 0/1 1 0 69s
We need to enable the label again for the pod to show Running status and the deployment to get to Ready state.
root@microshift:~/microshift/jetson-nano/tests/object-detection# oc label node microshift.example.com smarter-device-manager=enabled --overwrite
node/microshift.example.com labeled
root@microshift:~/microshift/jetson-nano/tests/object-detection# oc get pods,deploy
NAME READY STATUS RESTARTS AGE
pod/inference-deployment-757d7c848c-nb5bt 1/1 Running 0 3m4s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/inference-deployment 1/1 1 1 3m4s
Finally, we can delete the sample and the daemonset smarter-device-manager.
root@microshift:~/microshift/jetson-nano/tests/object-detection# oc delete -f inference-sdm.yaml
deployment.apps "inference-deployment" deleted
root@microshift:~/microshift/jetson-nano/tests/object-detection# oc delete -f smarter-device-manager-ds.yaml -f video0-configmap.yaml
namespace "sdm" deleted
daemonset.apps "smarter-device-manager" deleted
configmap "smarter-device-manager" deleted
Using the NVIDIA/k8s-device-plugin
You may download the preconfigured nvidia-device-plugin.yml that points to precreated image and skip to “Apply” it below or build the plugin. To build, we can use the instructions from NVIDIA K8s Device Plugin for Wind River Linux to create a custom device plugin that allows the cluster to expose the number of GPUs on NVIDIA Jetson devices. The patch checks for the file /sys/module/tegra_fuse/parameters/tegra_chip_id and does not perform health checks for Jetson.
git clone -b 1.0.0-beta6 https://github.com/NVIDIA/k8s-device-plugin.git
cd ../k8s-device-plugin/
wget https://labs.windriver.com/downloads/0001-arm64-add-support-for-arm64-architectures.patch
wget https://labs.windriver.com/downloads/0002-nvidia-Add-support-for-tegra-boards.patch
wget https://labs.windriver.com/downloads/0003-main-Add-support-for-tegra-boards.patch
git am 000*.patch
sed "s/ubuntu:16.04/ubuntu:18.04/" docker/arm64/Dockerfile.ubuntu16.04 > docker/arm64/Dockerfile.ubuntu18.04
docker build -t karve/k8s-device-plugin:1.0.0-beta6 -f docker/arm64/Dockerfile.ubuntu18.04 .
docker push karve/k8s-device-plugin:1.0.0-beta6
sed -i "s|image: .*|image: karve/k8s-device-plugin:1.0.0-beta6|" nvidia-device-plugin.yml # Change the image to karve/k8s-device-plugin:1.0.0-beta6
Apply
oc apply -f nvidia-device-plugin.yml
oc get ds -n kube-system nvidia-device-plugin-daemonset
Output
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
nvidia-device-plugin-daemonset 1 1 1 1 1
With the daemonset deployed, NVIDIA GPUs can now be requested by a container using the nvidia.com/gpu resource type. The “oc describe nodes” now shows the nvidia.com/gpu Capacity, Allocatable, and Allocated resources. If we deploy the vector-add job with the resource limit, we will see in the events that only one job gets scheduled at a time even though parallelism was set to 5. When one job finishes, the next one runs.
root@microshift:~/microshift/jetson-nano/jobs# oc apply -f vectoradd-gpu-limit.yaml
job.batch/vectoradd-job created
root@microshift:~/microshift/jetson-nano/jobs# oc get events -n default
LAST SEEN TYPE REASON OBJECT MESSAGE
33s Warning FailedScheduling pod/vectoradd-job-7n2xz 0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
19s Warning FailedScheduling pod/vectoradd-job-7n2xz 0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
34s Normal Scheduled pod/vectoradd-job-l9cjw Successfully assigned default/vectoradd-job-l9cjw to microshift.example.com
24s Normal Pulled pod/vectoradd-job-l9cjw Container image "docker.io/karve/vector-add-sample:arm64-jetsonnano" already present on machine
21s Normal Created pod/vectoradd-job-l9cjw Created container vectoradd
21s Normal Started pod/vectoradd-job-l9cjw Started container vectoradd
33s Warning FailedScheduling pod/vectoradd-job-tnmvs 0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
19s Warning FailedScheduling pod/vectoradd-job-tnmvs 0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
33s Warning FailedScheduling pod/vectoradd-job-wtgnn 0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
19s Warning FailedScheduling pod/vectoradd-job-wtgnn 0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
7s Normal Scheduled pod/vectoradd-job-wtgnn Successfully assigned default/vectoradd-job-wtgnn to microshift.example.com
34s Warning FailedScheduling pod/vectoradd-job-zwjfs 0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
32s Warning FailedScheduling pod/vectoradd-job-zwjfs 0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
19s Normal Scheduled pod/vectoradd-job-zwjfs Successfully assigned default/vectoradd-job-zwjfs to microshift.example.com
9s Normal Pulled pod/vectoradd-job-zwjfs Container image "docker.io/karve/vector-add-sample:arm64-jetsonnano" already present on machine
9s Normal Created pod/vectoradd-job-zwjfs Created container vectoradd
8s Normal Started pod/vectoradd-job-zwjfs Started container vectoradd
34s Normal SuccessfulCreate job/vectoradd-job Created pod: vectoradd-job-l9cjw
34s Normal SuccessfulCreate job/vectoradd-job Created pod: vectoradd-job-wtgnn
34s Normal SuccessfulCreate job/vectoradd-job Created pod: vectoradd-job-zwjfs
34s Normal SuccessfulCreate job/vectoradd-job Created pod: vectoradd-job-7n2xz
34s Normal SuccessfulCreate job/vectoradd-job Created pod: vectoradd-job-tnmvs
Cleanup MicroShift
We can delete the microshift and the images in cri-o with the cleanup.sh script
wget https://raw.githubusercontent.com/thinkahead/microshift/main/hack/cleanup.sh
bash ./cleanup.sh
Containerized MicroShift
We can run MicroShift within containers in two ways:
- MicroShift Containerized – The MicroShift binary runs in a Docker container, CRI-O Systemd service runs directly on the host and data is stored at /var/lib/microshift and /var/lib/kubelet on the host VM.
- MicroShift Containerized All-In-One – The MicroShift binary and CRI-O service run within a Docker container and data is stored in a docker volume, microshift-data. This should be used for “Testing and Development” only. The image available in the registry is not setup to use the GPU within the container with cri-o.
Since we cannot use the GPU in the latter, we do not use the All-In-One image. For the first approach, CRI-O runs on the host. We already setup CRI-O on the host to use the Nvidia container runtime in Part 2 of this series and will therefore use the first approach that allows the GPU. We will build the image with the Dockerfile.jetsonnano.containerized (from registry.access.redhat.com/ubi8/ubi-init:8.4). Note that we use the iptables-1.6.2 that is compatible with iptables on Jetson Nano with Ubuntu 18.04 instead of the iptables v1.8.7 that causes the error “iptables v1.8.7 (nf_tables) Could not fetch rule set generation id: Invalid argument”. We do not use the default Dockerfile for building the image because of the iptables problem. Copy the microshift binary that we built earlier to the local directory and run the docker build command as shown below:
cat << EOF > Dockerfile.jetsonnano.containerized
ARG IMAGE_NAME=registry.access.redhat.com/ubi8/ubi-init:8.4
ARG ARCH
FROM ${IMAGE_NAME}
COPY microshift /usr/bin/microshift
RUN chmod +x /usr/bin/microshift
RUN dnf install -y libnetfilter_conntrack libnfnetlink && \
rpm -v -i --force https://archives.fedoraproject.org/pub/archive/fedora/linux/releases/28/Everything/aarch64/os/Packages/i/iptables-libs-1.6.2-2.fc28.aarch64.rpm \
https://archives.fedoraproject.org/pub/archive/fedora/linux/releases/28/Everything/aarch64/os/Packages/i/iptables-1.6.2-2.fc28.aarch64.rpm
ENTRYPOINT ["/usr/bin/microshift"]
CMD ["run"]
EOF
cp `which microshift` .
docker build -t docker.io/karve/microshift:jetson-nano-containerized -f Dockerfile.jetsonnano.containerized .
Similarly build using the Dockerfile.jetsonnano.containerized2 (from registry.access.redhat.com/ubi8/ubi-minimal:8.4). Check the sizes of the images produced from the ubi-init:8.4 and the ubi-minimal:8.4. We save 54 MB with the ubi-minimal image.
root@microshift:~/microshift/hack/all-in-one# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
karve/microshift jetson-nano-containerized2 024b990fd269 9 minutes ago 560MB
karve/microshift jetson-nano-containerized ed85c5153b65 4 hours ago 614MB
Run the microshift container
IMAGE=docker.io/karve/microshift:jetson-nano-containerized
docker run --rm --ipc=host --network=host --privileged -d --name microshift -v /var/run:/var/run -v /sys:/sys:ro -v /var/lib:/var/lib:rw,rshared -v /lib/modules:/lib/modules -v /etc:/etc -v /run/containers:/run/containers -v /var/log:/var/log -e KUBECONFIG=/var/lib/microshift/resources/kubeadmin/kubeconfig $IMAGE
export KUBECONFIG=/var/lib/microshift/resources/kubeadmin/kubeconfig
We can see the microshift container running within docker:
root@jetsonnano:~# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8c924bf44174 karve/microshift:jetson-nano-containerized "/usr/bin/microshift…" 3 minutes ago Up 3 minutes microshift
The microshift process is running within the container:
root@jetsonnano:~# docker top microshift -o pid,cmd
PID CMD
19997 /usr/bin/microshift run
The rest of the containers run within cri-o on the host:
root@jetsonnano:~# crictl pods
POD ID CREATED STATE NAME NAMESPACE ATTEMPT RUNTIME
b678938b7a6a2 3 minutes ago Ready dns-default-j7lgj openshift-dns 0 (default)
01cc8ddd857f8 3 minutes ago Ready router-default-85bcfdd948-5x6vf openshift-ingress 0 (default)
09a5cce9af718 4 minutes ago Ready kube-flannel-ds-8qn5h kube-system 0 (default)
94809dd53ee44 4 minutes ago Ready node-resolver-57xzk openshift-dns 0 (default)
4616c0c2b7151 4 minutes ago Ready service-ca-76674bfb58-bqcf8 openshift-service-ca 0 (default)
8cdd245d69c96 4 minutes ago Ready kubevirt-hostpath-provisioner-jg5pc kubevirt-hostpath-provisioner 0 (default)
Now, we can run the samples shown earlier.
After we are done, we can stop the microshift container. The --rm we used in the docker run will delete the container when we stop it.
docker stop microshift
After it is stopped, we can run the cleanup.sh as in previous section.
Errors
1. The node was low on resource: [DiskPressure]
If you have less than 10% free disk space on the microSDXC card, the kubevirt-hostpath-provisioner pod may get evicted. This will happen on the 32GB microSDXC card if the disk space cannot be reclaimed after deleting unused images. You will need to create space by deleting some github sources we had downloaded for installation.
rm -rf /root/.cache/go-build # Cleanup to get space on microSDXC card
# You can check the eviction events as follows
kubectl describe nodes
kubectl get events --field-selector involvedObject.kind=Node
kubectl delete events --field-selector involvedObject.kind=Node
2. ImageInspectError
If the pod shows this ImageInspectError state, you may be missing the /etc/containers/registries.conf. You can add that or qualify the image with "docker.io/" or the correct registry.
3. Name resolution problems from the container
You can map whatever is in dnsConfig under /etc/resolv.conf for the containers. For example, in the inference.yaml, you may add the following
spec:
dnsPolicy: "None"
dnsConfig:
nameservers:
- 8.8.8.8
containers:
- name: inference
4. Error: failed to initialize NVML: could not load NVML library
The nvidia-device-plugin does not work on Jetson Nano. So, directly add the nvidia-container-runtime-hook to cri-o
root@jetson-nano:~/k8s-device-plugin# docker run -it --privileged --network=none -v /var/lib/kubelet/device-plugins:/var/lib/kubelet/device-plugins docker.io/karve/k8s-device-plugin:arm64-jetsonnano --pass-device-specs2021/10/13 16:34:03 Loading NVML2021/10/13 16:34:03 Failed to initialize NVML: could not load NVML library.2021/10/13 16:34:03 If this is a GPU node, did you set the docker default runtime to `nvidia`?2021/10/13 16:34:03 You can check the prerequisites at: https://github.com/NVIDIA/k8s-device-plugin#prerequisites2021/10/13 16:34:03 You can learn how to set the runtime at: https://github.com/NVIDIA/k8s-device-plugin#quick-start2021/10/13 16:34:03 If this is not a GPU node, you should set up a toleration or nodeSelector to only deploy this plugin on GPU nodes2021/10/13 16:34:03 Error: failed to initialize NVML: could not load NVML library
5. Docker microshift container crashes
The container may crash with “leader election lost” and “etcdserver apply request took too long” and “http: Handler timeout” in log messages
E1130 13:33:24.199060 1 available_controller.go:508] v1.image.openshift.io failed with: failing or missing response from https://192.168.1.208:8444/apis/image.openshift.io/v1: Get "https://192.168.1.208:8444/apis/image.openshift.io/v1": dial tcp 192.168.1.208:8444: connect: connection refused
{"level":"warn","ts":"2021-11-30T13:33:25.921-0500","caller":"etcdserver/util.go:163","msg":"apply request took too long","took":"379.944154ms","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/secrets/kube-system/resourcequota-controller-token-hfhbg\" ","response":"range_response_count:1 size:691"}
E1130 13:59:15.189827 1 writers.go:117] apiserver was unable to write a JSON response: http: Handler timeout
F1130 13:59:16.434857 1 controller_manager.go:105] leaderelection lost
This could be because some application pod is consuming more memory. Your Jetson Nano is low on memory and swapping with high “cpu wait”. You will need to restart the container if it gets killed. It would help if you could reduce the memory usage by removing the ubuntu desktop as shown in Part 2.
Conclusion
In this Part 3, we saw how to build and run MicroShift directly and containerized on the Jetson Nano. We ran samples that used helm, persistent volume for mysql, GPU for inferencing, and USB camera. We saw a sample that sent the pictures and web socket messages when a person was detected. In Part 4, Part 5 and Part 6, we will look at the multiple options to build and deploy MicroShift on the Raspberry Pi 4.
The Jetson Software Roadmap shows that JetPack 5.0 Developer Preview is planned for 1Q-2022 with Ubuntu 20.04. We will work with MicroShift on the Jetson Nano with Ubuntu 20.04 in Part 7 of this series.
Hope you have enjoyed the article. Share your thoughts in the comments or engage in the conversation with me on Twitter @aakarve. I look forward to hearing about your use of MicroShift on ARM devices and if you would like to see something covered in more detail.
References