Hey everyone! Scaling APIs usually seems like a simple equation: just multiply the number of calls by the response time to get the total time. However, I'm dealing with an asynchronous API where the response time doesn't fully represent the time the API is engaged. For example, in a case like this:
```
async my_route():
do_something_sync_for_100_ms
await do_something_for_500_ms
return
```
Here, the API reports a response time of 600ms but is actively busy only for about 100ms. I'm looking for smart ways to scale this—would using custom metrics that disregard await times be a good approach, or do you have other suggestions that don't require changing the app? Thanks!
2 Answers
To effectively scale your asynchronous API, consider tracking metrics related to transaction times and the lag of any message queues or event buses that the API relies on. You can utilize Kubernetes with Keda to manage your scaling. By incorporating Prometheus metrics, Keda can help you scale based on these custom metrics, allowing for dynamic scaling that suits your workload.
Why focus so much on response time? For users, the perceived speed is key. Even if your API responds quickly, if it’s bogged down with backend processes afterwards, it could lead to a poor user experience. For instance, if an async API shows a fast '202 Accepted' but fails to process requests due to backend issues, that's problematic. So, response time alone may not be the best metric once you step outside of straightforward synchronous tasks.
Exactly! It's critical to ensure that while the API responds quickly, the subsequent processes are also adequately resourced. Just tracking response time doesn't cut it when you're dealing with more complex logic happening in the background.