Hey folks! I'm curious about your practices for scaling down to zero during non-business hours. Do you have a set process for doing this? Are you using tools like Cron or KEDA? Also, what areas do you manage – the entire test cluster or just some specific namespaces? If you're using Kubernetes, I'd love to know which flavor, particularly if it's ARO (Azure Red Hat OpenShift). Is it common to remove nodes at this time? What challenges have you faced, and have you seen any significant cost savings? Thanks!
5 Answers
Just unplug everything! 😂
We don't scale down to zero, but we get pretty close. Using the cluster autoscaler or Karpenter makes it easy! KEDA can scale workloads down to zero based on metrics. But remember, you're paying for the nodes, not just the containers. After scaling workloads, ensure your nodes are also reducing their capacity. Spot instances can offer additional savings if your workloads can tolerate interruptions!
We use EKS on AWS and scale down our testing environments over the weekend. Initially, we employed scheduled Lambda functions, but we transitioned to Step Functions for better control. One job shuts everything down at the specified time while another starts the services back up. We handle scaling down on our worker nodes and databases through this method.
Autoscaling is great! If you combine Horizontal, Vertical, and Cluster Autoscaling, you can scale down to a single small instance running core services when nothing's happening.
It's interesting to consider scaling down to zero. If production resources can be scaled down after hours, then maybe pricing needs a rethink!
Could you clarify what you mean? I'm exploring scaling down non-prod cluster workloads when they're not in use. For production, I imagine event-driven scaling is the way to go, but I'm curious about your thoughts!
Haha! I should propose 'Unplug-as-a-Service' to Azure!