IBM Cloud Global

 View Only

Using OpenShift Projects to Influence Pod Placements

By Barry Mosakowski posted Mon March 22, 2021 09:24 AM

  
There are many reasons when using OpenShift that you want  certain pods to run on specific worker node or a specific zone.  This may be for disaster recovery (DR) testing, application requirements, latency simulations, etc.  Through the use of labels and annotations this blog will help provide a real-world use case and provide the commands issued to achieve deterministic pod placements. 

In this scenario here is my configuration and objective:

I have three worker nodes in a single zone.   I want to simulate I have two zones and force the placement of the pods into a specific zone.  

So, let's get started.   First, here is my deployment file: fabriccli.yaml.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: fabriccli-deployment
  labels:
    app: fabriccli
spec:
  replicas: 3
  selector:
    matchLabels:
      app: fabriccli
  template:
    metadata:
      labels:
        app: fabriccli
    spec:
      containers:
      - name: fabricclidep
        image: hyperledger/fabric-tools:1.4.4
        command:
          - sleep
          - "3600"​

This will deploy three pods and the pods will terminate after 3600 seconds.  This variable can be changed in the deployment file.    Now, let's take a look at the worker nodes in our cluster.

Now, let's deploy fabriccli.yaml:

oc deploy -f fabriccli.yaml


You should see three running pods.  You also need to look at which worker nodes the pods are running on.  In general, Kubernetes uses its own pod scheduler resource, and the algorithm is similar to a round-robin.  It isn't quite that simple, but you get the point.  Thus, in this case let's verify that each pod is on a different worker node.  I am issuing both commands sequentially below.

​oc get pod
NAME READY STATUS RESTARTS AGE
fabriccli-deployment-cdcdfd59d-6hkkj 1/1 Running 0 2m44s
fabriccli-deployment-cdcdfd59d-bxt2d 1/1 Running 0 2m43s
fabriccli-deployment-cdcdfd59d-fhfzp 1/1 Running 0 2m45s

oc describe pod fabriccli-deployment-cdcdfd59d-6hkkj | grep worker
Node: worker1.mzrsample.cp.fyre.ibm.com/10.17.23.83
Normal Scheduled <unknown> Successfully assigned mzr/fabriccli-deployment-cdcdfd59d-6hkkj to worker1.mzrsample.cp.fyre.ibm.com
Normal Pulled 3m4s kubelet, worker1.mzrsample.cp.fyre.ibm.com Container image "hyperledger/fabric-tools:1.

You can describe the other pods and verify they are on worker0 and worker2.  

oc get nodes
NAME                                STATUS   ROLES    AGE   VERSION
master0.mzrsample.cp.fyre.ibm.com   Ready    master   24h   v1.19.0+7070803
master1.mzrsample.cp.fyre.ibm.com   Ready    master   24h   v1.19.0+7070803
master2.mzrsample.cp.fyre.ibm.com   Ready    master   24h   v1.19.0+7070803
worker0.mzrsample.cp.fyre.ibm.com   Ready    worker   24h   v1.19.0+7070803
worker1.mzrsample.cp.fyre.ibm.com   Ready    worker   24h   v1.19.0+7070803
worker2.mzrsample.cp.fyre.ibm.com   Ready    worker   24h   v1.19.0+7070803
​


As you can see, there are three worker nodes: worker0, worker1, and worker2, respectively.   In this task, we are running in a single zone.  However, we can easily simulate a multi-zone cluster through a label on the worker nodes,  Kubernetes Labels, Annotations, and Taints ,
provides details on the api's as well as describes the syntax needed with examples.  

I will create two zones, zone1 and zone2.  Worker0 and worker1 are in zone1 and worker2 is in zone2.  

I need to add the following label to worker nodes worker0 and worker1 in the label section of the node resource:

topology.kubernetes.io/zone: zone-1


I need to add the following label to worker node worker2 in label section of the node resource:

topology.kubernetes.io/zone: zone-2


Your worker node should look similar to this,

kind: Node
apiVersion: v1
metadata:
  name: worker2.mzrsample.cp.fyre.ibm.com
  selfLink: /api/v1/nodes/worker2.mzrsample.cp.fyre.ibm.com
  uid: 9610666e-4bc7-41e0-b8a2-b0dc73ea4390
  resourceVersion: '431421'
  creationTimestamp: '2021-03-18T18:10:37Z'
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/os: linux
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: worker2.mzrsample.cp.fyre.ibm.com
    kubernetes.io/os: linux
    node-role.kubernetes.io/worker: ''
    node.openshift.io/os_id: rhcos
    topology.kubernetes.io/zone: zone-2
  annotations:
    machineconfiguration.openshift.io/currentConfig: rendered-worker-
.
.
.


After making changes to worker nodes, you should repetitively issue the oc get nodes command and wait to all the nodes are in a STATUS of Ready.

Ok, now let's get to the fun part and force the pod into zone-2.   

Let's replace the project resource and add in the annotation with the node-selector to create a project where all pods are assigned to a specific zone.  I am currently working in the project mzr.   Issue the following to save the project resource into a yaml file:

oc get ns mzr -o yaml > mzr.yaml


Now, edit the file mzr.yaml and add the following annotation:

openshift.io/node-selector: topology.kubernetes.io/zone=zone-2


I cannot edit the project resource itself, as it is immutable, so I'll replace it with the addition via an oc replace command:

oc replace ns mzr -f mzr.yaml

 
Now, to prove our theory!  Let's delete all three pods and when the pods are redeployed they should all be running on zone-2 which is worker node: worker2.   You can use the same commands as above to prove how you have scoped the pods to zone-2!

Did it work?   Now, let's do an outage simulation.  

Let's stop and restart the worker node worker2 and see its impact on the pods.

So, assuming all three pods are in a running state, let's halt worker2.  Here is the oc adm command.

oc adm drain worker2.mzrsample.cp.fyre.ibm.com --force --ignore-daemonsets=true 

Notice the pods are evicted.  Because replicas is 3 in the deployment file, the pods will now try and redloy and should be in a pending state as show:

oc get pods
NAME                                   READY   STATUS    RESTARTS   AGE
fabriccli-deployment-cdcdfd59d-2hlgn   0/1     Pending   0          2m7s
fabriccli-deployment-cdcdfd59d-hj952   0/1     Pending   0          2m7s
fabriccli-deployment-cdcdfd59d-jcbqz   0/1     Pending   0          2m7s


The last step is to re-enable the worker node:

oc adm uncordon worker2.mzrsample.cp.fyre.ibm.com


Magically, the worker node is now active and all pods are running:

oc get pods
NAME                                   READY   STATUS    RESTARTS   AGE
fabriccli-deployment-cdcdfd59d-2hlgn   1/1     Running   0          3m48s
fabriccli-deployment-cdcdfd59d-hj952   1/1     Running   0          3m48s
fabriccli-deployment-cdcdfd59d-jcbqz   1/1     Running   0          3m48s


So, to recap, we used OpenShift Labels and Annotations to simulate the isolation of pods in a project to a specific zone.  We then added the scenario of taking the worker node down in the preferred zone and watching the pods go into a pending state.  Finally, when the worker node was reenabled, all the pods returned to the running state.   I hope you find this useful!

This blog is based on education provided by @Andre Tost.

#openshift #multizone #markbarry #bringuplab #bringup

​​​​​​​​​​​
#Openshift
#Featured-area-3
#Featured-area-3-home
0 comments
1721 views

Permalink