Hey everyone! At my company, we're running several projects that rely on GPT models in Azure, and we're looking to switch to GPT-5 because our tests suggest it's more accurate. However, I've noticed that the inference times are 3-10 times longer than what we experienced with GPT-4.1, which is causing some issues with integrations due to timeouts. The token limits are also quite low, at just 20k tokens per minute. I was really impressed with GPT-5 when it launched, but right now it just isn't feasible for us. Does anyone have insight on whether this will change soon?
4 Answers
Yeah, asking Microsoft is your best bet since they can give you definitive answers on future updates. The performance is a concern, but they might be able to help you with optimizations in the meantime.
You might want to consider reaching out to Microsoft directly about this. Performance-related questions can be tricky, and they would have the most accurate information on the status of GPT-5.
Also, a quick tip: try using 0 reasoning tokens in your requests. It might help speed things up a bit!
Have you looked into using Global PTU? If you’re on standard deployments, it could be that your regional pools are maxed out, slowing things down.
Related Questions
xAI Grok Token Calculator
DeepSeek Token Calculator
Google Gemini Token Calculator
Meta LLaMA Token Calculator
OpenAI Token Calculator