All the pods are running in the node you need now.
2 min readApr 17, 2024
- You have three Kubernetes nodes.
- Respectively in eu-west-1a, eu-west-1b, eu-west-1c.
- The last two are using few CPU and memory. The one in 1a is highly committed in resources instead.
- You’re deploying a stateful set with three replicas, one per AZ.
- The last two replicas are easy to schedule but for the first one its requests are larger than the current node capacity.
- You think unfortunately I have to increase the max number of nodes in the ASG but that won’t assure you the new node will pop up in the first AZ.
- What you really want is to select pods that have no benefit of running in the 1a AZ and schedule them on other nodes — that may have plenty of resources by the way.
- First of all, you want to stop new pods from tormenting the node in 1a
$ kubectl cordon no ...
- Then, if you can afford downtimes and/or are in a rush, remove all the pods from the node in 1a. This will cause all those pods to be rescheduled and the pods that can’t (e.g. because they’ll mount a volume in 1a) will remain pending.
kubectl get po -A --field-selector spec.nodeName=wrk-1-pool-1 -o name | xargs kubectl delete
- Now, uncordon the node in 1a: the pods in a pending state will be scheduled there.
- Delete all the pods to force a reschedule of all of them.
- Of course, instead of deleting pods you can request a
rollout restart
for deployments and stateful sets: this will virtually eliminate the risk of downtime for users.