Programming

Clarifying AWS Lambda Scaling: What’s the Deal with Provisioned and On-Demand Concurrency?

May 2, 2025

Asked By CuriousCoder12 On May 2, 2025

Hey everyone! I'm diving deep into the scaling behavior of AWS Lambda, focusing on the interplay between provisioned concurrency and on-demand concurrency. I find some aspects in the AWS documentation confusing, especially regarding the requests per second (RPS) limit and how reserved concurrency fits into all this.

According to the AWS docs, if a function has a request duration under 100ms, Lambda limits requests to 10 times the account concurrency, which can really affect performance. So, if your concurrency limit is 1,000, you're looking at a max of 10,000 RPS. This applies to all synchronous functions and those with provisioned concurrency.

But here's where I get lost: when it comes to functions with reserved concurrency, do they still follow the account-wide limit?

Also, I found conflicting statements in the docs about spillover behavior. For instance, they say a function with provisioned concurrency spills over into on-demand after reaching 10 concurrency or 100 RPS, but that doesn't clarify the impact of function duration—what if it's a super-fast 10ms function?

I'm really looking for insights into how these limits work practically and any experiences you have had regarding these scaling behaviors. If anyone from AWS happens to catch this, some clarification on the documentation would be really appreciated! Thanks so much!

1 Answer

Answered By TechWhiz77 On May 3, 2025

Great question! Basically, the duration under 100ms allows you to exceed the TPS limit before hitting your concurrency cap. If your function has a duration under 100ms, you can maximize your throughput. For example, with a 10ms duration, you could breach the TPS limit before reaching your concurrency threshold, which is critical to understand when managing loads.

CodeMasterX - May 5, 2025

That's an interesting take! I think the docs imply that the spillover at 100 TPS is based on the assumption of a 100ms duration, which can cause confusion. So, if the duration is shorter, like 10ms, it sounds like you’re arguing that spillover could actually happen at 1,000 TPS, right?

Clarifying AWS Lambda Scaling: What’s the Deal with Provisioned and On-Demand Concurrency?

1 Answer

Related Questions

How To: Running Codex CLI on Windows with Azure OpenAI

Set Wordpress Featured Image Using Javascript

How To Fix PHP Random Being The Same

Why no WebP Support with Wordpress

Replace Wordpress Cron With Linux Cron

Customize Yoast Canonical URL Programmatically

LEAVE A REPLY Cancel reply