I've been diving into Kubernetes' Horizontal Pod Autoscaler (HPA) for a bit and I keep running into the same frustrating issue when working with custom metrics from the Prometheus Adapter. My goal is to use a custom HTTP latency metric (specifically the 95th percentile) exposed via Prometheus for scaling decisions, but it feels like the HPA is either lagging behind or being overly aggressive in its scaling responses.
Here's my setup: I'm using histogramquantile(0.95, ) in the Prometheus Adapter for my metrics, and I've set the HPA to scale between 3 and 15 replicas based on this latency threshold. However, when traffic spikes, it seems like the HPA just can't keep up – it scales up too late when latency exceeds my SLO, then quickly scales back down when things stabilize. I've tried adjusting the --horizontal-pod-autoscaler-sync-period and cooldown windows, but it seems like those tweaks are better suited for more standard CPU/memory metrics.
Am I pushing the HPA too far with custom latency metrics? Should I be looking into service mesh solutions like Envoy or Linkerd that might handle adaptive concurrency better than Kubernetes' built-in scaling logic? I'd love to hear how others have tackled this situation without ditching HPA for alternatives like KEDA or external event-driven scalers.
4 Answers
Check if there are better autoscaler options that work based on step function scaling, too – that might offer more control during your traffic spikes. Let me link you that info once I find it!
Have you thought about changing configurations in the HPA itself instead of just the command line flags? You can adjust the scaling behavior directly in HPA to go with 1 Pod per 5 minutes or whatever your needs are. Although KEDA can assist, it seems like your current setup could just use some tweaks.
KEDA isn't abandoning the HPA; it actually builds on top of it to provide better inputs to the metrics server API, which could really help your situation. I'd definitely recommend giving KEDA a shot if you haven't yet!
KEDA might not be the problem here. Instead, try adjusting the stabilizationWindowSeconds alongside periodSeconds in your HPA setup. This should help reduce the flapping you're experiencing. A periodSeconds of about 15-60 seconds with a stabilizationWindowSeconds set to 3-5 times longer will generally make HPA more conservative about scaling up or down. It's designed for situations like yours! Link to the documentation here: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#stabilization-window
Totally agree! KEDA can simplify a lot of the scaling logic.