I'm developing a document extraction pipeline on AWS for a client where we upload PDFs to S3. This triggers a series of AWS Lambda functions that perform tasks like concatenating PDFs, extracting text using Textract and Bedrock VLM, redacting PII with Comprehend, and finally extracting structured data with Gemini via Fargate. While it's working well with about 10 documents, we need to scale it to handle over 500 documents uploaded in bulk. I'm seeking advice on considerations for this scaling, particularly regarding API rate limits, Lambda concurrency, and whether using Fargate for each file is efficient at scale.
5 Answers
This setup sounds quite similar to a project I came across! You might want to check this sample solution on GitHub for insights: https://github.com/aws-samples/aws-ai-intelligent-document-processing/tree/main/guidance/prompt-flow-orchestration. It could spark some ideas for scaling.
You shouldn't encounter significant concurrency issues with Lambda or API Gateway, so those should be manageable. For the other services, it might be a good idea to look up their limits just to be sure. If you're concerned about downstream bottlenecks, consider adding an SQS queue to your pipeline for better handling.
Have you thought about writing your Lambda functions in Rust for improved speed? It could potentially give you better performance during processing, though make sure your team is comfortable with Rust first.
I recommend just adding a queue to help manage processing in batches. It seems to be a straightforward solution for handling your scaling needs more efficiently.
How urgent are these jobs? If they can be queued and processed in batches, that would help a lot. Also, consider how much memory a single job might need. Lastly, think about whether your service really needs to scale down to zero, or if having a baseline compute capacity makes sense for you.

Not sure that's necessary unless your team is already experienced with Rust. It might add complexity for those unfamiliar with it.