I'm curious about the metrics that organizations typically rely on for autoscaling in production environments. I know that the metrics server can help with scaling based on CPU and memory usage, but I'm wondering if companies use other metrics or tools alongside these. As a beginner, I'm eager to understand how autoscaling actually works in the real world!
3 Answers
It really varies based on the workload you have. For instance, in my experience, we have some workers that scale according to the number of items in a message queue, while our API services scale based on HTTP request counts. Those two metrics are pretty significant for us.
There's no one-size-fits-all answer here. It's similar to asking what people typically eat. The best approach is to use a controlled load generator that mimics production conditions accurately. I use k6 for load testing, setting a consistent load to reach the overload point, then gather metrics to find which ones correlate best with performance. Just a heads-up, the results can really depend on the scripts you're running!
If you’re on a pay-per-use model, consider scaling to zero when the application isn't in use. For example, let’s say you’re running three replicas during peak times, but at night, you really don’t need that many. You could scale down to one or even zero during idle times to save on costs, but only if your infrastructure supports that. For high availability, we usually stick with three replicas around the clock regardless of the metrics.

Related Questions
Can't Load PhpMyadmin On After Server Update
Redirect www to non-www in Apache Conf
How To Check If Your SSL Cert Is SHA 1
Windows TrackPad Gestures