How to Handle Huge Concurrent Azure Function Triggers Without Overloading My LLM?

0
0
Asked By TechieNinja42 On

I'm working on a document processing system where scanned PDFs are uploaded to Azure Blob Storage. Each upload triggers an Azure Function that calls an LLM (Azure AI Foundry) to extract structured data and stores the results in Cosmos DB. The issue is that on Day 1, the client plans to send over 40,000 PDFs simultaneously, which means a huge spike in triggers and LLM calls. After the initial load, we expect only about 10-50 PDFs daily, so this is a one-time challenge.

I have access to Azure Blob Storage, Azure Functions, Azure AI Foundry, and Cosmos DB. While Service Bus would be an ideal solution for managing the load, integrating it now would require a lot of rework and approvals, which isn't feasible. So, I'm thinking of using Azure Storage Queues instead for decoupling the ingestion from processing. The blob trigger would enqueue the blob path, and a separate queue-triggered function would process these with controlled concurrency via a defined `batchSize`. Cosmos DB will help track statuses for retries on failures.

My questions are:
1. Will using Storage Queues with controlled `batchSize` be sufficient to protect the LLM endpoint, or am I overlooking something?
2. Has anyone dealt with a similar Day 1 backlog? How did you manage concurrency?
3. Are there any pitfalls with the poison queue approach for failed extractions?
4. If I find Storage Queues aren't enough and need to resort to Service Bus, how can I justify it without it seeming like a major oversight? I'm hoping to hear from anyone who has managed a similar pipeline at scale!

5 Answers

Answered By AsyncExplorer On

It sounds like you're dealing with a very asynchronous process. The model calls might take 10-15 seconds each. Instead of relying on blob triggers, I’d lean toward a queue trigger. Also, consider using Durable Functions for better orchestration of your processes. This way, you can define how many concurrent operations you want, reducing the risk of being rate-limited by your LLM calls.

Answered By QueueMaster5 On

Buffering function invocations is critical here. Storage queues provide a solid retry mechanism if something fails. You can absolutely make this work without using Service Bus—go with storage queues and carefully manage your retries and failures.

Answered By ThrottleExpert On

You can request an increase in your tokens per minute (TPM). I’d suggest placing your messages on a storage queue and setting up multiple subscriptions for your AI Foundry. This will allow your functions to scale with the load while also offering reprocessing options for any failed messages.

Answered By CloudGuru99 On

To handle this kind of load, you definitely need some throttling. The biggest bottleneck will likely be your LLM since it can hit quota limits pretty easily. You should consider using multiple Azure Foundry resources and load-balance the calls to make sure you can handle all that traffic. While using the native blob function trigger might seem simple, it won't give you the flexibility required to scale appropriately.

Answered By TokenTitan On

Watch out that storage queues only handle up to 20,000 messages per second. If this batch is just historical data, what's your normal call rate like? And are you using PTUs on the Foundry side or sticking to standard TPM?

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.