This gives developers and administrators of Pods fine grained control of where and how workload runs.
This post teaches techniques for controlling where your Multi-Architecture Compute workload runs.
There are two types of nodeAffinity rules to deploy the pods on the nodes, those are required
rules and preferred
rule.
Required rules
Required rule is an invariant scheduling directive set using the key spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution
. The invariant cannot be violated during scheduling.
To use the required rule, you may use nodeSelectorTerms
to match the Node
's metadata.labels
to the matchExpressions
. When these value agree, and only when they agree, the Pod
is scheduled. Consider the example, where kubernetes.io/os: linux
is used in matchExpression
, the Pod is only scheduled on a matching Node
.
apiVersion: v1
kind: Pod
metadata:
name: affinity-w1-w100
labels:
app: httpd
namespace: punith-project
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
To direct a Pod to a specific architecture, you can use kubernetes.io/arch: ppc64le
, so the Pod is only directed to a ppc64le
Node.
apiVersion: v1
kind: Pod
metadata:
name: affinity-amd64-w1-amd-w100
labels:
app: httpd
namespace: punith-project
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/arch
operator: In
values:
- ppc64le
Preferred rules
Preferred rule is an variable scheduling directive set using spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution
. The variable directive is attempted to be satisfied, and if there is no Node capacity, the nodeAffinity
directives are ignored.
Consider the Pod definition where it must be deployed to a Linux Node
and may be scheduled to a ppc64le
Node. If the ppc64le
Nodes are fully committed, then the Pod
is scheduled on another architecture.
apiVersion: v1
kind: Pod
metadata:
name: affinity-amd64-w1-amd-w100
labels:
app: httpd
namespace: punith-project
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: kubernetes.io/arch
operator: In
values:
- ppc64le
With these matchExpressions
, there are logical operators - for affinity use In
and for anti-affinity use NotIn
. Consider the following In
example, the affinity is set with a scheduling invariant that deploys all the Pods on the ppc64le arch node.
apiVersion: v1
kind: Pod
metadata:
name: affinity-amd64-w1-amd-w100
labels:
app: httpd
namespace: punith-project
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/arch
operator: In
values:
- ppc64le
securityContext:
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
containers:
- name: carts
image: quay.io/powercloud/sock-shop-carts:latest
imagePullPolicy: Always
command: [ "/usr/bin/java" ]
args: [ "-cp", "/opt/app.jar", "-Xms64m", "-Xmx128m", "-XX:+UseG1GC", "-Djava.security.egd=file:/dev/urandom", "-Dspring.zipkin.enabled=false", "-Dloader.path=/opt/lib", "org.springframework.boot.loader.PropertiesLauncher", "--port=8080" ]
resources:
limits:
cpu: 300m
memory: 500Mi
requests:
cpu: 100m
memory: 200Mi
ports:
- containerPort: 8080
securityContext:
runAsNonRoot: true
capabilities:
drop:
- all
readOnlyRootFilesystem: true
volumeMounts:
- mountPath: /tmp
name: carts-vol
volumes:
- name: carts-vol
emptyDir:
medium: Memory
Consider the following NotIn
example, the anti-affinity is set with a scheduling invariant that deploys all the Pods on Nodes that are not the ppc64le arch node. Thus, the pods are scheduled on the amd64 architecture.
apiVersion: v1
kind: Pod
metadata:
name: antiaffinity-amd64-w1-amd-w100
labels:
app: httpd
namespace: punith-project
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/arch
operator: NotIn
values:
- ppc64le
securityContext:
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
containers:
- name: carts
image: quay.io/powercloud/sock-shop-carts:latest
imagePullPolicy: Always
command: [ "/usr/bin/java" ]
args: [ "-cp", "/opt/app.jar", "-Xms64m", "-Xmx128m", "-XX:+UseG1GC", "-Djava.security.egd=file:/dev/urandom", "-Dspring.zipkin.enabled=false", "-Dloader.path=/opt/lib", "org.springframework.boot.loader.PropertiesLauncher", "--port=8080" ]
resources:
limits:
cpu: 300m
memory: 500Mi
requests:
cpu: 100m
memory: 200Mi
ports:
- containerPort: 8080
securityContext:
runAsNonRoot: true
capabilities:
drop:
- all
readOnlyRootFilesystem: true
volumeMounts:
- mountPath: /tmp
name: carts-vol
volumes:
- name: carts-vol
emptyDir:
medium: Memory
To prioritize the variable scheduling directive preferredDuringSchedulingIgnoredDuringExecution
a weight between 1 and 100 for each instance of the matchExpressions
. When the scheduler finds nodes that meet all the other scheduling requirements of the Pod, the kube-scheduler iterates through every preferred rule and prioritize the Node based on the sum of weights for each matching expression.
Consider the following Affinity with nodeAffinity weights
, which causes fewer pods to be scheduled on amd64 because the ppc64le rule is weighted higher.
apiVersion: v1
kind: Pod
metadata:
name: affinity-amd64-w1-amd-w100
labels:
app: httpd
namespace: punith-project
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: kubernetes.io/arch
operator: In
values:
- ppc64le
- weight: 1
preference:
matchExpressions:
- key: kubernetes.io/arch
operator: In
values:
- amd64
securityContext:
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
containers:
- name: carts
image: quay.io/powercloud/sock-shop-carts:latest
imagePullPolicy: Always
command: [ "/usr/bin/java" ]
args: [ "-cp", "/opt/app.jar", "-Xms64m", "-Xmx128m", "-XX:+UseG1GC", "-Djava.security.egd=file:/dev/urandom", "-Dspring.zipkin.enabled=false", "-Dloader.path=/opt/lib", "org.springframework.boot.loader.PropertiesLauncher", "--port=8080" ]
resources:
limits:
cpu: 300m
memory: 500Mi
requests:
cpu: 100m
memory: 200Mi
ports:
- containerPort: 8080
securityContext:
runAsNonRoot: true
capabilities:
drop:
- all
readOnlyRootFilesystem: true
volumeMounts:
- mountPath: /tmp
name: carts-vol
volumes:
- name: carts-vol
emptyDir:
medium: Memory
Consider the following Anti-Affinity with nodeAffinity weights
, which causes fewer pods to be scheduled on ppc64le because the ppc64le rule is weighted higher with a NotIn
rule.
apiVersion: v1
kind: Pod
metadata:
name: affinity-amd64-w1-amd-w100
labels:
app: httpd
namespace: punith-project
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: kubernetes.io/arch
operator: In
values:
- amd64
- weight: 1
preference:
matchExpressions:
- key: kubernetes.io/arch
operator: NotIn
values:
- ppc64le
securityContext:
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
containers:
- name: carts
image: quay.io/powercloud/sock-shop-carts:latest
imagePullPolicy: Always
command: [ "/usr/bin/java" ]
args: [ "-cp", "/opt/app.jar", "-Xms64m", "-Xmx128m", "-XX:+UseG1GC", "-Djava.security.egd=file:/dev/urandom", "-Dspring.zipkin.enabled=false", "-Dloader.path=/opt/lib", "org.springframework.boot.loader.PropertiesLauncher", "--port=8080" ]
resources:
limits:
cpu: 300m
memory: 500Mi
requests:
cpu: 100m
memory: 200Mi
ports:
- containerPort: 8080
securityContext:
runAsNonRoot: true
capabilities:
drop:
- all
readOnlyRootFilesystem: true
volumeMounts:
- mountPath: /tmp
name: carts-vol
volumes:
- name: carts-vol
emptyDir:
medium: Memory
Summary
You have seen how to schedule the pods on the nodes using the affinity and anti-affinity using Required and Preferred rules to direct workloads onto the the architecture your application needs.
Good luck with your Pod placement.
Authors
Punith Kenchappa Software Engineer, ISDL IBM