I'm working on a document processing system that triggers an Azure Function when scanned PDFs are uploaded to Azure Blob Storage. This Function then calls an LLM (Azure AI Foundry) to extract structured data and saves the results in Cosmos DB.
Here's the catch: on Day 1, the client plans to upload over 40,000 PDFs simultaneously, which means 40,000 blob triggers will fire at the same time, leading to potential rate limit exhaustion and system failures. Post-launch, the load will reduce significantly, with only 10-50 PDFs being processed daily, making this a one-time challenge.
I have:
- Azure Blob Storage
- Azure Functions
- Azure AI Foundry
- Cosmos DB
I can't introduce a Service Bus at this stage due to finalized architecture and the approval process. I'm trying to implement Azure Storage Queues to decouple ingestion from processing, where the blob trigger enqueues the blob path, and a separate function processes with controlled concurrency through a `batchSize` configuration.
I have a few questions:
1. Will using Storage Queues with controlled `batchSize` be enough to protect the LLM endpoint from being overwhelmed?
2. Anyone experienced something similar on launch day? What concurrency levels worked for you?
3. Are there any pitfalls with using a poison queue for handling extraction failures?
4. If Storage Queues can't handle the load, what's the least complex way to justify switching to Service Bus without making it seem like a huge mistake?
I'd love to hear tips from anyone who has navigated a similar scaling issue!
1 Answer
You definitely need to throttle this. The LLM can hit quota limits and throw errors, so you might want to set up multiple Foundry resources and balance the calls. Relying on native triggers won’t give you the flexibility you need, even if it feels like the simplest path right now.

What would you suggest as a way to manage that? Should I reach out to Azure to increase the LLM's capacity or consider going with a bus service?