I'm a developer with a couple of years' experience deploying applications on Kubernetes. I work with Horizontal Pod Autoscaler (HPA) and I'm wondering about the scale-up and scale-down behaviors I should expect. My site's traffic follows a sine wave pattern throughout the day, with peaks about three times the troughs. While the scale-up and scale-down generally correspond to this pattern, I'm noticing quite a bit of intermediate scaling which seems excessive. I can adjust the CPU and memory requests and limits in the helm chart I use. Should I consider raising the CPU limits to prevent the frequent scaling events? I'd like to smooth out these fluctuations, especially since deploying a new pod takes about 20-30 seconds. While I understand the ability to scale quickly is a benefit of Kubernetes, I'm unsure if the overhead of frequent scaling is worth it or if I should just let Kubernetes handle things as usual.
5 Answers
What specific issue are you trying to resolve with the scaling?
Consider removing CPU limits entirely and try using Kubernetes Resource Requests with something like KRR to help manage your resources effectively.
I'd suggest making your scale-up settings really responsive but extending the duration for scale-down actions. This way, you can minimize the back-and-forth scaling and reduce churn, especially if you're relying on the metric server for HPA. If possible, consider using Prometheus to base your scaling on more detailed metrics.
Seriously, what’s with all the scaling? Do your job and optimize your settings.
Remember, you can scale using a variety of metrics, not just CPU. However, in your case, CPU seems to be the primary driver for your scaling events.
True, but in this instance, CPU is triggering the scaling.

I'm finding that the scaling actions happen quite a lot, and I'm not sure what amount of scaling is considered "normal."