I'm dealing with a frustrating issue where my AWS Lambda function gets invoked twice whenever I upload files to an S3 bucket. Here's my setup: I've configured an S3 bucket to send event notifications to an SQS queue, which is then used as a trigger for the Lambda function. I have set the SQS batch size to 10k messages, with a batch window of 300 seconds. For example, if I upload 15 files to S3, I end up with two Lambda invocations: one processes 11 messages and the other processes 4. My expectation was to have a single Lambda invocation that handles all 15 messages at once.
I have a few questions: 1. Why is the Lambda function invoked twice despite the batch size and window allowing for processing all messages? 2. Is this a normal behavior due to Lambda or SQS's scaling and polling mechanisms? 3. How can I tweak the Lambda or SQS settings to ensure there's only one invocation per batch, aiming for concurrency to be limited to 1?
4 Answers
SQS is designed as a distributed system, which means that messages may be processed in unpredictable ways, and you might see messages delivered multiple times due to at-least-once delivery. It’s crucial to ensure your Lambda function is idempotent. Also, think about whether the 10k batch size is too high—if you process that many messages at once, can your function complete before timing out? If you reduce the batch size to 1, you might get more individual invocations for each message, but that also means more concurrent runs.
Lambda maintains multiple pollers, which can cause messages to be distributed among those pollers. This is why you're seeing messages split across different invocations. While you can limit the maximum concurrency in the event source mapping, it's unlikely you'll ever get all messages into a single batch due to how it dynamically handles the polling.
Having such a large batch size (10k) is pretty unusual. What's the reasoning behind it? If you're building a logging pipeline for a large number of incoming JSON files, just ensure your setup can handle the loads without overwhelming your function.
Keep in mind, the batch size is more of a maximum rather than a guaranteed count. You might want to consider adding a delay in your SQS settings if you want to manage the processing timing better. But make sure you understand why chunking these messages into one batch is important for your use case.
Related Questions
Set Wordpress Featured Image Using Javascript
How To Fix PHP Random Being The Same
Why no WebP Support with Wordpress
Replace Wordpress Cron With Linux Cron
Customize Yoast Canonical URL Programmatically
[Centos] Delete All Files And Folders That Contain a String