I'm working on a document processing system where scanned PDFs are uploaded to Azure Blob Storage. Each upload triggers an Azure Function that calls an LLM (Azure AI Foundry) to extract structured data and stores the results in Cosmos DB. The issue is that on Day 1, the client plans to send over 40,000 PDFs simultaneously, which means a huge spike in triggers and LLM calls. After the initial load, we expect only about 10-50 PDFs daily, so this is a one-time challenge.
I have access to Azure Blob Storage, Azure Functions, Azure AI Foundry, and Cosmos DB. While Service Bus would be an ideal solution for managing the load, integrating it now would require a lot of rework and approvals, which isn't feasible. So, I'm thinking of using Azure Storage Queues instead for decoupling the ingestion from processing. The blob trigger would enqueue the blob path, and a separate queue-triggered function would process these with controlled concurrency via a defined `batchSize`. Cosmos DB will help track statuses for retries on failures.
My questions are:
1. Will using Storage Queues with controlled `batchSize` be sufficient to protect the LLM endpoint, or am I overlooking something?
2. Has anyone dealt with a similar Day 1 backlog? How did you manage concurrency?
3. Are there any pitfalls with the poison queue approach for failed extractions?
4. If I find Storage Queues aren't enough and need to resort to Service Bus, how can I justify it without it seeming like a major oversight? I'm hoping to hear from anyone who has managed a similar pipeline at scale!
5 Answers
It sounds like you're dealing with a very asynchronous process. The model calls might take 10-15 seconds each. Instead of relying on blob triggers, I’d lean toward a queue trigger. Also, consider using Durable Functions for better orchestration of your processes. This way, you can define how many concurrent operations you want, reducing the risk of being rate-limited by your LLM calls.
Buffering function invocations is critical here. Storage queues provide a solid retry mechanism if something fails. You can absolutely make this work without using Service Bus—go with storage queues and carefully manage your retries and failures.
You can request an increase in your tokens per minute (TPM). I’d suggest placing your messages on a storage queue and setting up multiple subscriptions for your AI Foundry. This will allow your functions to scale with the load while also offering reprocessing options for any failed messages.
To handle this kind of load, you definitely need some throttling. The biggest bottleneck will likely be your LLM since it can hit quota limits pretty easily. You should consider using multiple Azure Foundry resources and load-balance the calls to make sure you can handle all that traffic. While using the native blob function trigger might seem simple, it won't give you the flexibility required to scale appropriately.
Watch out that storage queues only handle up to 20,000 messages per second. If this batch is just historical data, what's your normal call rate like? And are you using PTUs on the Foundry side or sticking to standard TPM?

Related Questions
How to Build a Custom GPT Journalist That Posts Directly to WordPress
Fix Not Being Able To Add New Categories With Intuitive Category Checklist For Wordpress
Get Real User IP Without Installing Cloudflare Apache Module
How to Get Total Line Count In Visual Studio 2013 Without Addons
Install and Configure PhpMyAdmin on Centos 7
How To Setup PostfixAdmin With Dovecot and Postfix Virtual Mailbox