Hey there! I need some advice on how to effectively scale our service to manage significant traffic spikes. We're currently using an EKS cluster with around 300-350 nodes and have a setup involving ISTIO as a service mesh, along with Prometheus for metrics and KEDA for scaling. Our service handles streaming data requests, usually around 50-60K per minute, but we recently brought on an enterprise client whose batch jobs are causing spikes to over 200K requests per minute for short periods. Our reactive scaling takes 45-80 seconds to adjust, leading to dropped requests. We've implemented some temporary fixes and are seeking solutions for quicker scalability, like using warm-up pools or reducing metric polling times. I've read about proactive scaling but am unclear on implementation. Any thoughts or suggestions on accommodating these unpredictable loads? Thanks!
1 Answer
Have you thought about implementing rate limits for your API? Instead of trying to scale instantly for every spike, you could control the incoming request rate, which will give you time to scale your backend gradually. As you get used to the load, gradually increase the rate limits to match the demand.
We do have a rate limit set at 2000 requests per connection, but the issue is that they’re creating more than 50 connections at once. Plus, since this is our first enterprise client, management is hesitant to ask them for changes.