Why is GPT-5 Inference So Slow on Azure, and Will It Improve?

0
13
Asked By CuriousCat42 On

Hey everyone! At my company, we're running several projects that rely on GPT models in Azure, and we're looking to switch to GPT-5 because our tests suggest it's more accurate. However, I've noticed that the inference times are 3-10 times longer than what we experienced with GPT-4.1, which is causing some issues with integrations due to timeouts. The token limits are also quite low, at just 20k tokens per minute. I was really impressed with GPT-5 when it launched, but right now it just isn't feasible for us. Does anyone have insight on whether this will change soon?

4 Answers

Answered By CloudNinja21 On

Yeah, asking Microsoft is your best bet since they can give you definitive answers on future updates. The performance is a concern, but they might be able to help you with optimizations in the meantime.

Answered By TechGuru99 On

You might want to consider reaching out to Microsoft directly about this. Performance-related questions can be tricky, and they would have the most accurate information on the status of GPT-5.

Answered By AIWhisperer77 On

Also, a quick tip: try using 0 reasoning tokens in your requests. It might help speed things up a bit!

Answered By DevTalker88 On

Have you looked into using Global PTU? If you’re on standard deployments, it could be that your regional pools are maxed out, slowing things down.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.