Containers, Kubernetes, OpenShift on Power

 View Only

Controlling Pod placement based on weighted node-affininty with your Multi-Arch Compute cluster

By Punith Kenchappa posted Tue February 27, 2024 11:51 PM

  

Controlling pod placement on nodes using node affinity rules

This gives developers and administrators of Pods fine grained control of where and how workload runs. 

This post teaches techniques for controlling where your Multi-Architecture Compute workload runs.

There are two types of nodeAffinity rules to deploy the pods on the nodes, those are required rules and preferred rule.

Required rules

Required rule is an invariant scheduling directive set using the key spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution. The invariant cannot be violated during scheduling.

To use the required rule, you may use nodeSelectorTerms to match the Node's metadata.labels to the matchExpressions. When these value agree, and only when they agree, the Pod is scheduled. Consider the example, where kubernetes.io/os: linux is used in matchExpression, the Pod is only scheduled on a matching Node.

apiVersion: v1
kind: Pod
metadata:
  name: affinity-w1-w100
  labels:
    app: httpd
  namespace: punith-project
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/os
            operator: In
            values:
            - linux

To direct a Pod to a specific architecture, you can use kubernetes.io/arch: ppc64le, so the Pod is only directed to a ppc64le Node.

apiVersion: v1
kind: Pod
metadata:
  name: affinity-amd64-w1-amd-w100
  labels:
    app: httpd
  namespace: punith-project
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/arch
            operator: In
            values:
            - ppc64le

Preferred rules

Preferred rule is an variable scheduling directive set using spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution. The variable directive is attempted to be satisfied, and if there is no Node capacity, the nodeAffinity directives are ignored.

Consider the Pod definition where it must be deployed to a Linux Node and may be scheduled to a ppc64le Node. If the ppc64le Nodes are fully committed, then the Pod is scheduled on another architecture.

apiVersion: v1
kind: Pod
metadata:
  name: affinity-amd64-w1-amd-w100
  labels:
    app: httpd
  namespace: punith-project
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/os
            operator: In
            values:
            - linux
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        preference:
          matchExpressions:
          - key: kubernetes.io/arch
            operator: In
            values:
            - ppc64le

With these matchExpressions, there are logical operators - for affinity use In and for anti-affinity use NotIn. Consider the following Inexample, the affinity is set with a scheduling invariant that deploys all the Pods on the ppc64le arch node.

apiVersion: v1
kind: Pod
metadata:
  name: affinity-amd64-w1-amd-w100
  labels:
    app: httpd
  namespace: punith-project
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/arch
            operator: In
            values:
            - ppc64le
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  containers:
    - name: carts
      image: quay.io/powercloud/sock-shop-carts:latest
      imagePullPolicy: Always
      command: [ "/usr/bin/java" ]
      args: [ "-cp", "/opt/app.jar", "-Xms64m", "-Xmx128m", "-XX:+UseG1GC", "-Djava.security.egd=file:/dev/urandom", "-Dspring.zipkin.enabled=false", "-Dloader.path=/opt/lib", "org.springframework.boot.loader.PropertiesLauncher", "--port=8080" ]
      resources:
        limits:
          cpu: 300m
          memory: 500Mi
        requests:
          cpu: 100m
          memory: 200Mi
      ports:
        - containerPort: 8080
      securityContext:
        runAsNonRoot: true
        capabilities:
          drop:
            - all
        readOnlyRootFilesystem: true
      volumeMounts:
        - mountPath: /tmp
          name: carts-vol
  volumes:
    - name: carts-vol
      emptyDir:
        medium: Memory

Consider the following NotIn example, the anti-affinity is set with a scheduling invariant that deploys all the Pods on Nodes that are not the ppc64le arch node. Thus, the pods are scheduled on the amd64 architecture.

apiVersion: v1
kind: Pod
metadata:
  name: antiaffinity-amd64-w1-amd-w100
  labels:
    app: httpd
  namespace: punith-project
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/arch
            operator: NotIn
            values:
            - ppc64le
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  containers:
    - name: carts
      image: quay.io/powercloud/sock-shop-carts:latest
      imagePullPolicy: Always
      command: [ "/usr/bin/java" ]
      args: [ "-cp", "/opt/app.jar", "-Xms64m", "-Xmx128m", "-XX:+UseG1GC", "-Djava.security.egd=file:/dev/urandom", "-Dspring.zipkin.enabled=false", "-Dloader.path=/opt/lib", "org.springframework.boot.loader.PropertiesLauncher", "--port=8080" ]
      resources:
        limits:
          cpu: 300m
          memory: 500Mi
        requests:
          cpu: 100m
          memory: 200Mi
      ports:
        - containerPort: 8080
      securityContext:
        runAsNonRoot: true
        capabilities:
          drop:
            - all
        readOnlyRootFilesystem: true
      volumeMounts:
        - mountPath: /tmp
          name: carts-vol
  volumes:
    - name: carts-vol
      emptyDir:
        medium: Memory

To prioritize the variable scheduling directive preferredDuringSchedulingIgnoredDuringExecution a weight between 1 and 100 for each instance of the matchExpressions. When the scheduler finds nodes that meet all the other scheduling requirements of the Pod, the kube-scheduler iterates through every preferred rule and prioritize the Node based on the sum of weights for each matching expression.

Consider the following Affinity with nodeAffinity weights, which causes fewer pods to be scheduled on amd64 because the ppc64le rule is weighted higher.

apiVersion: v1
kind: Pod
metadata:
  name: affinity-amd64-w1-amd-w100
  labels:
    app: httpd
  namespace: punith-project
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/os
            operator: In
            values:
            - linux
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        preference:
          matchExpressions:
          - key: kubernetes.io/arch
            operator: In
            values:
            - ppc64le
      - weight: 1
        preference:
          matchExpressions:
          - key: kubernetes.io/arch
            operator: In
            values:
            - amd64
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  containers:
    - name: carts
      image: quay.io/powercloud/sock-shop-carts:latest
      imagePullPolicy: Always
      command: [ "/usr/bin/java" ]
      args: [ "-cp", "/opt/app.jar", "-Xms64m", "-Xmx128m", "-XX:+UseG1GC", "-Djava.security.egd=file:/dev/urandom", "-Dspring.zipkin.enabled=false", "-Dloader.path=/opt/lib", "org.springframework.boot.loader.PropertiesLauncher", "--port=8080" ]
      resources:
        limits:
          cpu: 300m
          memory: 500Mi
        requests:
          cpu: 100m
          memory: 200Mi
      ports:
        - containerPort: 8080
      securityContext:
        runAsNonRoot: true
        capabilities:
          drop:
            - all
        readOnlyRootFilesystem: true
      volumeMounts:
        - mountPath: /tmp
          name: carts-vol
  volumes:
    - name: carts-vol
      emptyDir:
        medium: Memory

Consider the following Anti-Affinity with nodeAffinity weights, which causes fewer pods to be scheduled on ppc64le because the ppc64le rule is weighted higher with a NotIn rule.

apiVersion: v1
kind: Pod
metadata:
  name: affinity-amd64-w1-amd-w100
  labels:
    app: httpd
  namespace: punith-project
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/os
            operator: In
            values:
            - linux
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        preference:
          matchExpressions:
          - key: kubernetes.io/arch
            operator: In
            values:
            - amd64
      - weight: 1
        preference:
          matchExpressions:
          - key: kubernetes.io/arch
            operator: NotIn
            values:
            - ppc64le
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  containers:
    - name: carts
      image: quay.io/powercloud/sock-shop-carts:latest
      imagePullPolicy: Always
      command: [ "/usr/bin/java" ]
      args: [ "-cp", "/opt/app.jar", "-Xms64m", "-Xmx128m", "-XX:+UseG1GC", "-Djava.security.egd=file:/dev/urandom", "-Dspring.zipkin.enabled=false", "-Dloader.path=/opt/lib", "org.springframework.boot.loader.PropertiesLauncher", "--port=8080" ]
      resources:
        limits:
          cpu: 300m
          memory: 500Mi
        requests:
          cpu: 100m
          memory: 200Mi
      ports:
        - containerPort: 8080
      securityContext:
        runAsNonRoot: true
        capabilities:
          drop:
            - all
        readOnlyRootFilesystem: true
      volumeMounts:
        - mountPath: /tmp
          name: carts-vol
  volumes:
    - name: carts-vol
      emptyDir:
        medium: Memory

Summary

You have seen how to schedule the pods on the nodes using the affinity and anti-affinity using Required and Preferred rules to direct workloads onto the the architecture your application needs.

Good luck with your Pod placement.

Authors

Punith Kenchappa Software Engineer, ISDL IBM

Permalink