AI Tools

Why is GPT-5 Inference So Slow on Azure, and Will It Improve?

August 12, 2025

Asked By CuriousCat42 On August 12, 2025

Hey everyone! At my company, we're running several projects that rely on GPT models in Azure, and we're looking to switch to GPT-5 because our tests suggest it's more accurate. However, I've noticed that the inference times are 3-10 times longer than what we experienced with GPT-4.1, which is causing some issues with integrations due to timeouts. The token limits are also quite low, at just 20k tokens per minute. I was really impressed with GPT-5 when it launched, but right now it just isn't feasible for us. Does anyone have insight on whether this will change soon?

4 Answers

Answered By CloudNinja21 On August 13, 2025

Yeah, asking Microsoft is your best bet since they can give you definitive answers on future updates. The performance is a concern, but they might be able to help you with optimizations in the meantime.

Answered By TechGuru99 On August 13, 2025

You might want to consider reaching out to Microsoft directly about this. Performance-related questions can be tricky, and they would have the most accurate information on the status of GPT-5.

Answered By AIWhisperer77 On August 12, 2025

Also, a quick tip: try using 0 reasoning tokens in your requests. It might help speed things up a bit!

Answered By DevTalker88 On August 12, 2025

Have you looked into using Global PTU? If you’re on standard deployments, it could be that your regional pools are maxed out, slowing things down.

Why is GPT-5 Inference So Slow on Azure, and Will It Improve?

4 Answers

Related Questions

xAI Grok Token Calculator

DeepSeek Token Calculator

Google Gemini Token Calculator

Meta LLaMA Token Calculator

OpenAI Token Calculator

LEAVE A REPLY Cancel reply