I'm setting up an Agent chatbot for Microsoft Teams at my company, and I'm encountering a rate limit exceeded error when I ask 4 or 5 questions in quick succession. I believe we have a paid plan, but I'm unsure what that entails regarding limits. I see I have a 50k token limit, but I don't think I'm hitting that. Can someone explain what's going on? Have you experienced similar issues?
1 Answer
No matter what plan you're on—free or paid—there's still a requests per minute limit tied to your token limit. Generally, it's something like 1 request per 100 tokens. So, if you have a 1MM token limit, you can send 1,000 requests per minute. Depending on your model, you might be limited to 500 requests per minute. In practice, throttling could feel worse than advertised limits, especially if your setup is causing multiple API requests for one query. If you're just trying things out, consider going with a smaller model with a higher token quota and monitor your costs.
That's really helpful! I didn't realize the request limit linked to tokens. Any tips for optimizing API calls to avoid hitting these limits?