How can I efficiently set up a Lambda function to scrape a rate-limited API?

0
0
Asked By TechieMaster88 On

I'm working on a CDK stack that involves invoking a Lambda function to scrape an API with a strict rate limit of 1000 calls per hour. Essentially, I need to make around 41,000 calls for every zip code in the US and store the results in a DynamoDB table. I also have a tracking table to manage status and handle any errors that arise due to rate limiting or failures.

Previously, I ran a script that took nearly 100 hours to complete, mostly due to API issues, while saving progress into a CSV file along the way. Currently, I have an EventBridge rule set to trigger the initial scraping process, but I'm unsure how to continue this recursively without breaking the rate limit. I don't want to implement setTimeout because of the costs involved, and a slow ingestion rate could also be costly. I'm considering the use of Step Functions but I'm not well-versed in them yet. Any advice on how to set this up effectively? I've also considered changing the initial trigger to scrape around 100+ zip codes first before performing a full scan if I detect new entries. What methods or technologies might help here?

1 Answer

Answered By TechieTribe99 On

Step Functions are perfect for this! They are built to handle long-running processes like yours without resorting to setTimeouts. You can set up a state machine that processes a batch of zip codes, logs your progress, and then moves on to the next state. This allows you to incorporate wait states to stay under the API's rate limit while avoiding unnecessary Lambda costs. Also, the built-in retry logic and visibility will streamline your workflow. Alternatively, you could consider using SQS with Lambda, where each batch of calls is represented as a message.

CodeWizard88 -

Absolutely, using an SQS with Lambda will help manage your requests effectively as well. It keeps everything clean.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.