Examples of both taints/tolerations and affinity in Kubernetes:
1. Applying a taint to a node (using kubectl):
# Add a taint to a node
kubectl taint nodes node1 dedicated=gpu:NoSchedule
# Remove a taint from a node
kubectl taint nodes node1 dedicated=gpu:NoSchedule-
2. Pod definition with tolerations and affinity:
apiVersion: v1
kind: Pod
metadata:
name: ml-training-pod
labels:
app: ml-training
spec:
# TOLERATIONS: Allow this pod to schedule on GPU-tainted nodes
tolerations:
- key: "dedicated"
operator: "Equal"
value: "gpu"
effect: "NoSchedule"
# NODE AFFINITY: Prefer nodes with GPU, but can run elsewhere if needed
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: gpu
operator: Exists
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: gpu-type
operator: In
values:
- nvidia-tesla-v100
# POD AFFINITY: Try to schedule near other ML pods for data sharing
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 50
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- ml-data-service
topologyKey: kubernetes.io/hostname
# POD ANTI-AFFINITY: Avoid scheduling with other ML training pods (resource competition)
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- ml-training
topologyKey: kubernetes.io/hostname
containers:
- name: ml-container
image: ml-training:latest
resources:
requests:
cpu: "4"
memory: "16Gi"
nvidia.com/gpu: 1
limits:
cpu: "8"
memory: "32Gi"
nvidia.com/gpu: 1
Explanation:
Taints on node: The taint
dedicated=gpu:NoSchedule
means only pods with matching tolerations can schedule there.Tolerations in pod: The pod has a toleration that matches the taint, allowing it to schedule on the GPU node.
Node Affinity:
requiredDuringSchedulingIgnoredDuringExecution
: Pod MUST run on nodes with the label "gpu"preferredDuringSchedulingIgnoredDuringExecution
: Pod PREFERS nodes with GPU type "nvidia-tesla-v100"
Pod Affinity: The pod prefers to run on the same node as pods with label "app=ml-data-service" (for faster data access)
Pod Anti-Affinity: The pod avoids running on the same node as other "app=ml-training" pods (to avoid resource competition)
The naming of requiredDuringSchedulingIgnoredDuringExecution
is verbose but descriptive:
- "required" - this rule must be satisfied for scheduling
- "DuringScheduling" - applies when pod is first scheduled
- "IgnoredDuringExecution" - if node labels change after pod is running, pod won't be evicted