Hey everyone! I'm noticing that during traffic spikes, Kubernetes' Horizontal Pod Autoscaler (HPA) isn't scaling up my AI agents quickly enough. This leads to frustrating latency issues. Does anyone have tips or tricks on how to overcome this challenge? I'd really appreciate your help! Thanks! 🙏
5 Answers
Have you tried adjusting the scaling thresholds? Different thresholds could make a significant difference in response times during those peak loads.
If you know when traffic is going to spike, consider using KEDA. It allows you to schedule scaling proactively using a cron scaler, along with other options. Just check out their docs for more details!
Why not let the AI agents themselves analyze traffic data and predict when a scale-up is necessary? That could improve your response to spikes!
To make your HPA more effective, ensure your metrics server is enabled. Identify which resource (CPU or memory) spikes first and set thresholds around 65% for that resource, and slightly higher for the other. Track the startup time for your AI agents in new pods and adjust health checks accordingly to minimize downtime. Also, sharing your HPA YAML could provide more insight for others to help you!
First off, what do you mean by 'fast enough'? It would help to clarify your scaling goals and the metrics you're looking at.
We've implemented a cron-based KEDA for web workloads with predictable traffic, and it's improved our performance significantly!