I've set up Horizontal Pod Autoscaler (HPA) on my cluster with a minimum of 1 pod and a maximum of 2 pods for regular usage. It works well to scale down to 1 pod at night when the load decreases, but it takes way too long to respond when there's a spike in traffic during the day. This delay is frustrating for users, so I've had to disable the autoscaling feature. I'm using a large cluster on VMware Tanzu primarily for internal users, so it's mostly idle during the night. I need suggestions on how to make the autoscaling respond more promptly to these traffic spikes!
4 Answers
If your load patterns are consistent, you may want to set up a cron job for time-based scaling. Upscaling can take a while on some platforms, so getting ahead of predictable spikes ensures you have capacity ready when needed. I usually let some capacity sit idle rather than risk being caught off guard.
Have you considered using a predictive scaling approach? For example, if you know the load spikes during the day and drops at night, you could set a rule to automatically increase the number of pods before the spike starts. That way, you avoid the slow response times that lead to user complaints. It sounds like typical autoscaling might struggle to manage sudden loads created by workflows, especially if the load isn't predictable enough to plan ahead.
You could look into Keda or use a VPA combined with some real-time stats gathering plugins for your Envoy or API Gateway. Also, consider predictive autoscaling with models from tools like TFT or Pytorch for forecasting. These methods can help manage the workload better. Just keep in mind that proprietary solutions may come with extra costs.
Since your load patterns are fairly predictable with connection cut-off timings, you might want to schedule scaling just before the typical spikes, like at 7:55 am. This way, you lessen the chances of seeing it as a DDoS attack from the HPA's perspective. Also, is your autoscaler adjustable to respond more aggressively to these patterns?
It's crucial not to over-provision all the time. A carefully timed scaling strategy will allow for smooth transitions in and out of high-demand periods. Just make sure your current tools can implement these ideas!

Definitely setting your autoscaler to react more quickly can help. But you should also avoid always having to over-provision to handle these spikes. A smoother scaling solution would be best, similar to what cloud providers offer.