|
| 1 | +--- |
| 2 | +id: scheduling |
| 3 | +title: Scheduling Pods |
| 4 | +--- |
| 5 | + |
| 6 | +## Scheduling |
| 7 | + |
| 8 | +### Kubernetes Scheduler |
| 9 | + |
| 10 | +The kube-scheduler looks for newly created pods that do not have any assigned node. A node that meets the requirements to run a pod is called a feasible node. |
| 11 | + |
| 12 | + Kube-scheduler is responsible for assigning a suitable node to the pod based on 2 criteria: |
| 13 | + |
| 14 | +1. Filtering - Checks available nodes for available resources, to meet requirements specificed in the pod. |
| 15 | +2. Scoring - Based on the available resources when Filtering, ranks the available nodes and chooses the highest ranking one. If 2 or more are equal, selection is random. |
| 16 | + |
| 17 | +### Node Selector |
| 18 | + |
| 19 | +You can schedule pods based on labels and nodeSelector, for example, you can force your pod to run on a machine with GPU by setting: `gpuEnabled: true` |
| 20 | + |
| 21 | +Nodeconfig: |
| 22 | + |
| 23 | +```yaml |
| 24 | +labels: |
| 25 | + gpuEnabled: true |
| 26 | +``` |
| 27 | +
|
| 28 | +Podconfig: |
| 29 | +
|
| 30 | +```yaml |
| 31 | +nodeSelector: |
| 32 | + gpuEnabled: true |
| 33 | +``` |
| 34 | +
|
| 35 | +## Affinity & antiAffinity |
| 36 | +
|
| 37 | +This is similar to `nodeSelector` bur greatly enhances the types of constraints you can express. |
| 38 | + |
| 39 | +1. The affinity/anti-affinity language is more expressive. The language offers more matching rules. |
| 40 | +2. Rules can be "preferred" rather than hard requirements, so if the scheduler can't satisfy them, the pod will still be scheduled |
| 41 | +3. You can constrain against labels on other pods running on the node (or other topological domain), rather than against labels on the node itself. |
| 42 | + |
| 43 | +### Node Affinity |
| 44 | + |
| 45 | +There are currently two types of node affinity, called `requiredDuringSchedulingIgnoredDuringExecution` and `preferredDuringSchedulingIgnoredDuringExecution`. You can think of them as "hard" and "soft" requirements to schedule a pod. The `IgnoredDuringExecution` part of the names means that if labels on a node change at runtime such that the affinity rules on a pod are no longer met, the pod continues to run on the node. |
| 46 | + |
| 47 | +```yaml |
| 48 | +spec: |
| 49 | + affinity: |
| 50 | + nodeAffinity: |
| 51 | + requiredDuringSchedulingIgnoredDuringExecution: |
| 52 | + nodeSelectorTerms: |
| 53 | + - matchExpressions: |
| 54 | + - key: kubernetes.io/name |
| 55 | + operator: In |
| 56 | + values: |
| 57 | + - ABC |
| 58 | + - XYZ |
| 59 | + preferredDuringSchedulingIgnoredDuringExecution: |
| 60 | + - weight: 100 |
| 61 | + preference: |
| 62 | + matchExpressions: |
| 63 | + - key: label-key |
| 64 | + operator: In |
| 65 | + values: |
| 66 | + - label-value |
| 67 | +``` |
| 68 | + |
| 69 | +This example only allows pods to be scheduled on nodes with a key `kubernetes.io/name` with value `ABC` or `XYZ` Among the nodes matching this criteria, nodes with the key `label-key` and the value `label-value` will be preferred. |
| 70 | + |
| 71 | +The `weight` field is ranged 1-100 and for each node matching all scheduling requirements, the kube-scheduler computes a score, as mentioned earlier. It then adds this number to that sum to calculate the best matching node. |
| 72 | + |
| 73 | +### podAffinity and podAntiAffinity |
| 74 | + |
| 75 | +podAffinity and podAntiAffinity lets you constrain which nodes pods are eligible to be scheduled on based of label of the pods running on the node rather than the labels on the node. |
| 76 | + |
| 77 | +podAnitAffinity |
| 78 | + |
| 79 | +```yaml |
| 80 | +spec: |
| 81 | + affinity: |
| 82 | + podAffinity: |
| 83 | + requiredDuringSchedulingIgnoredDuringExecution: |
| 84 | + - labelSelector: |
| 85 | + matchExpressions: |
| 86 | + - key: label1 |
| 87 | + operator: In |
| 88 | + values: |
| 89 | + - label-value |
| 90 | + topologyKey: topology.kubernetes.io/zone |
| 91 | + podAntiAffinity: |
| 92 | + preferredDuringSchedulingIgnoredDuringExecution: |
| 93 | + - weight: 100 |
| 94 | + podAffinityTerm: |
| 95 | + labelSelector: |
| 96 | + matchExpressions: |
| 97 | + - key: label2 |
| 98 | + operator: In |
| 99 | + values: |
| 100 | + - label-value-anti |
| 101 | + topologyKey: topology.kubernetes.io/zone |
| 102 | +``` |
| 103 | + |
| 104 | +This shows an example where we use both affinity rules. |
| 105 | + |
| 106 | +Affinity rule: the pod can only be scheduled onto a node if that node is in the same zone as at least one already-running pod that has a label with key `label1` and value `label-value`. |
| 107 | + |
| 108 | +antiAffinity rule: the pod should not be scheduled onto a node if that node is in the same zone as a pod with a label having key `label2` and value `label-value-anti` |
| 109 | + |
| 110 | +```yaml |
| 111 | + affinity: |
| 112 | + podAntiAffinity: |
| 113 | + preferredDuringSchedulingIgnoredDuringExecution: |
| 114 | + - podAffinityTerm: |
| 115 | + labelSelector: |
| 116 | + matchExpressions: |
| 117 | + - key: prometheus |
| 118 | + operator: In |
| 119 | + values: |
| 120 | + - xks |
| 121 | + topologyKey: kubernetes.io/hostname |
| 122 | + weight: 100 |
| 123 | + - podAffinityTerm: |
| 124 | + labelSelector: |
| 125 | + matchExpressions: |
| 126 | + - key: prometheus |
| 127 | + operator: In |
| 128 | + values: |
| 129 | + - xks |
| 130 | + topologyKey: topology.kubernetes.io/zone |
| 131 | + weight: 100 |
| 132 | +``` |
| 133 | + |
| 134 | +This is an example configuration of podAntiAffinity for Prometheus. Spreading the pod deployment based on `topology.kubernetes.io/zone` and `topology.kubernetes.io/hostname` to only allow 1 pod on each node and to mitigate downtime in case an entire zone goes down, e.g: if a pod runs in zone A with key `prometheus` and value `xks` do not schedule it in zone A - choose zone B or C. Note that these settings are "preferred" and not required. |
| 135 | + |
| 136 | +We recommend using this configuration, as critical services should be distributed to multiple zones to minimize downtime. |
| 137 | + |
| 138 | +You can read more about this [in the official documentation](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/). |
0 commit comments