I've been exploring Azure OpenAI's rate limiting, and it seems there's a big gap between what's documented and what's actually happening. After running some tests and tracking my usage closely, I noticed that the token-per-minute (TPM) limits don't match up with what Azure claims. Here's what I've got:
### My Setup
- **Token Management**: I have a system set to replenish 15,000 tokens every 250ms, which should mean 3.6M TPM.
- **API Calls**: I reserve 11,000 tokens for each call, but I usually only use about 9,000.
- **Buffer**: I keep a buffer of 1,500 tokens to avoid going over the limit.
- **Processing Method**: I'm processing documents one after the other while controlling token usage.
### What I Expected
According to Azure's docs, I should be able to manage:
- 4M tokens per minute
- About 4 requests per second
- A stable processing rate that fits within their service limits
I'm on the S0 tier; I'm wondering if the actual quota depends on the specific deployment.
### Issues I've Found
I've discovered a few troubling points:
1. The actual TPM seems much lower—maybe less than 20% of what they say it should be.
2. There seem to be extra limitations that aren't linked to how many tokens I'm using.
### Seeking Help
I'm sharing this to:
1. Help others who might be facing the same issues.
2. Ask for clarity from Azure about their rate limiting.
3. Suggest they update their docs to reflect what's really happening.
My setup seems fine, so I think the problem lies with Azure's rate limiting rather than my code. Has anyone else noticed this mismatch? I'd love to hear from other devs or get official insight from Microsoft!
3 Answers
What model are you using for this deployment? It can affect how the rate limits are applied. Just curious if the location of the model impacts your experience too.
I’ve seen discrepancies like this before! My guess is that additional factors, like the model location and tier, could be playing a role in your case.
Yeah, I think the rate limits can vary based on regions and specific resources. Have you checked if any settings in your subscription might be affecting this? Sometimes there are underlying constraints that aren’t documented well.
Related Questions
How To: Running Codex CLI on Windows with Azure OpenAI
Set Wordpress Featured Image Using Javascript
How To Fix PHP Random Being The Same
Why no WebP Support with Wordpress
Replace Wordpress Cron With Linux Cron
Customize Yoast Canonical URL Programmatically