There are many reasons when using OpenShift that you want certain pods to run on specific worker node or a specific zone. This may be for disaster recovery (DR) testing, application requirements, latency simulations, etc. Through the use of labels and annotations this blog will help provide a real-world use case and provide the commands issued to achieve deterministic pod placements.
In this scenario here is my configuration and objective:
I have three worker nodes in a single zone. I want to simulate I have two zones and force the placement of the pods into a specific zone.
So, let's get started. First, here is my deployment file:
fabriccli.yaml.
apiVersion: apps/v1
kind: Deployment
metadata:
name: fabriccli-deployment
labels:
app: fabriccli
spec:
replicas: 3
selector:
matchLabels:
app: fabriccli
template:
metadata:
labels:
app: fabriccli
spec:
containers:
- name: fabricclidep
image: hyperledger/fabric-tools:1.4.4
command:
- sleep
- "3600"
This will deploy three pods and the pods will terminate after 3600 seconds. This variable can be changed in the deployment file. Now, let's take a look at the worker nodes in our cluster.
Now, let's deploy fabriccli.yaml:
oc deploy -f fabriccli.yaml
You should see three running pods. You also need to look at which worker nodes the pods are running on. In general, Kubernetes uses its own pod scheduler resource, and the algorithm is similar to a round-robin. It isn't quite that simple, but you get the point. Thus, in this case let's verify that each pod is on a different worker node. I am issuing both commands sequentially below.
oc get pod
NAME READY STATUS RESTARTS AGE
fabriccli-deployment-cdcdfd59d-6hkkj 1/1 Running 0 2m44s
fabriccli-deployment-cdcdfd59d-bxt2d 1/1 Running 0 2m43s
fabriccli-deployment-cdcdfd59d-fhfzp 1/1 Running 0 2m45s
oc describe pod fabriccli-deployment-cdcdfd59d-6hkkj | grep worker
Node: worker1.mzrsample.cp.fyre.ibm.com/10.17.23.83
Normal Scheduled <unknown> Successfully assigned mzr/fabriccli-deployment-cdcdfd59d-6hkkj to worker1.mzrsample.cp.fyre.ibm.com
Normal Pulled 3m4s kubelet, worker1.mzrsample.cp.fyre.ibm.com Container image "hyperledger/fabric-tools:1.
You can describe the other pods and verify they are on worker0 and worker2.
oc get nodes
NAME STATUS ROLES AGE VERSION
master0.mzrsample.cp.fyre.ibm.com Ready master 24h v1.19.0+7070803
master1.mzrsample.cp.fyre.ibm.com Ready master 24h v1.19.0+7070803
master2.mzrsample.cp.fyre.ibm.com Ready master 24h v1.19.0+7070803
worker0.mzrsample.cp.fyre.ibm.com Ready worker 24h v1.19.0+7070803
worker1.mzrsample.cp.fyre.ibm.com Ready worker 24h v1.19.0+7070803
worker2.mzrsample.cp.fyre.ibm.com Ready worker 24h v1.19.0+7070803
As you can see, there are three worker nodes: worker0, worker1, and worker2, respectively. In this task, we are running in a single zone. However, we can easily simulate a multi-zone cluster through a label on the worker nodes, Kubernetes Labels, Annotations, and Taints ,
provides details on the api's as well as describes the syntax needed with examples.
I will create two zones, zone1 and zone2. Worker0 and worker1 are in zone1 and worker2 is in zone2.
I need to add the following label to worker nodes worker0 and worker1 in the label section of the node resource:
topology.kubernetes.io/zone: zone-1
I need to add the following label to worker node worker2 in label section of the node resource:
topology.kubernetes.io/zone: zone-2
Your worker node should look similar to this,
kind: Node
apiVersion: v1
metadata:
name: worker2.mzrsample.cp.fyre.ibm.com
selfLink: /api/v1/nodes/worker2.mzrsample.cp.fyre.ibm.com
uid: 9610666e-4bc7-41e0-b8a2-b0dc73ea4390
resourceVersion: '431421'
creationTimestamp: '2021-03-18T18:10:37Z'
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/arch: amd64
kubernetes.io/hostname: worker2.mzrsample.cp.fyre.ibm.com
kubernetes.io/os: linux
node-role.kubernetes.io/worker: ''
node.openshift.io/os_id: rhcos
topology.kubernetes.io/zone: zone-2
annotations:
machineconfiguration.openshift.io/currentConfig: rendered-worker-
.
.
.
After making changes to worker nodes, you should repetitively issue the oc get nodes
command and wait to all the nodes are in a STATUS of Ready.
Ok, now let's get to the fun part and force the pod into zone-2.
Let's replace the project resource and add in the annotation with the node-selector to create a project where all pods are assigned to a specific zone. I am currently working in the project mzr
. Issue the following to save the project resource into a yaml file:
oc get ns mzr -o yaml > mzr.yaml
Now, edit the file mzr.yaml and add the following annotation:
openshift.io/node-selector: topology.kubernetes.io/zone=zone-2
I cannot edit the project resource itself, as it is immutable, so I'll replace it with the addition via an oc replace
command:
oc replace ns mzr -f mzr.yaml
Now, to prove our theory! Let's delete all three pods and when the pods are redeployed they should all be running on zone-2 which is worker node: worker2. You can use the same commands as above to prove how you have scoped the pods to zone-2!
Did it work? Now, let's do an outage simulation.
Let's stop and restart the worker node worker2 and see its impact on the pods.
So, assuming all three pods are in a running state, let's halt worker2. Here is the oc adm
command.
oc adm drain worker2.mzrsample.cp.fyre.ibm.com --force --ignore-daemonsets=true
Notice the pods are evicted. Because replicas is 3 in the deployment file, the pods will now try and redloy and should be in a pending state as show:
oc get pods
NAME READY STATUS RESTARTS AGE
fabriccli-deployment-cdcdfd59d-2hlgn 0/1 Pending 0 2m7s
fabriccli-deployment-cdcdfd59d-hj952 0/1 Pending 0 2m7s
fabriccli-deployment-cdcdfd59d-jcbqz 0/1 Pending 0 2m7s
The last step is to re-enable the worker node:
oc adm uncordon worker2.mzrsample.cp.fyre.ibm.com
Magically, the worker node is now active and all pods are running:
oc get pods
NAME READY STATUS RESTARTS AGE
fabriccli-deployment-cdcdfd59d-2hlgn 1/1 Running 0 3m48s
fabriccli-deployment-cdcdfd59d-hj952 1/1 Running 0 3m48s
fabriccli-deployment-cdcdfd59d-jcbqz 1/1 Running 0 3m48s
So, to recap, we used OpenShift Labels and Annotations to simulate the isolation of pods in a project to a specific zone. We then added the scenario of taking the worker node down in the preferred zone and watching the pods go into a pending state. Finally, when the worker node was reenabled, all the pods returned to the running state. I hope you find this useful!
This blog is based on education provided by @Andre Tost.
#openshift #multizone #markbarry #bringuplab #bringup
#Openshift#Featured-area-3#Featured-area-3-home