K8s-shredder – a new way of parking in Kubernetes
As more and more teams running their workloads on Kubernetes started deploying stateful applications(kafka, zookeeper, rabbitmq, redis, etc) on top of a Kubernetes platform, there might be challenges on finding solutions for keeping alive the minion nodes(k8s worker nodes) where those pods part of a statefulset/deployment are running. There might be cases where worker nodes need to be running for an extended period of time during a full cluster upgrade in order to ensure no downtime at application level while rotating the worker nodes.
K8s-shredder introduces the concept of parked nodes which aims to address some critical aspects on a Kubernetes cluster while rotating the worker nodes during a cluster upgrade:
- allow teams running stateful apps to move their workloads off of parked nodes at their will, independent of clusters upgrade lifecycle.
- optimises cloud costs by dynamically purging unschedulable worker nodes(parked nodes).
- notifies clients that they are running workloads on parked nodes so that they can take proper actions.
In order to enable k8s-shredder on a Kubernetes cluster you can use the manifests as described in k8s-shredder spec.
Then, during a cluster upgrade, while rotating the worker nodes, you have to label the nodes that you want them parked with:
Additionally, if you want a pod to be exempted from the eviction loop until parked node TTL expires, you can label the pod with “shredder.ethos.adobe.net/allow-eviction=false” so that k8s-shredder will know to skip it.
The following options can be used to customise the k8s-shredder controller:
|How often to run the eviction loop process
|Time a node can be parked before starting force eviction process
|How much time(percentage) should pass from ParkedNodeTTL before starting the rollout restart process
|Label used for the identifying parked nodes
|Label used for identifying the TTL for parked nodes
|For pods in namespaces having this prefix proceed directly with a rollout restart without waiting for the RollingRestartThreshold
|Annotation name used to mark a controller object for rollout restart
|Label used for skipping evicting pods that have explicitly set this label on false
|Node taint used for skipping a subset of parked nodes that are already handled by cluster-autoscaler
How it works
K8s-shredder will periodically run eviction loops, based on configured
EvictionLoopInterval, trying to clean up all the pods from
the parked nodes. Once all the pods are cleaned up, cluster-autoscaler should chime in and recycle the parked node.
The diagram below describes a simple flow about how k8s-shredder handles stateful set applications: