I've noticed that Anthropic models like Sonnet and Opus perform significantly slower on Bedrock compared to Azure, Google Cloud, or even Anthropic's own API. In fact, they can be between 2 to 10 times slower, which makes them less suitable for many applications. Is there any documentation available regarding the expected performance of these models?
2 Answers
Actually, I've found that the Anthropic models can be faster than other platforms, including Anthropic's own API. Bedrock offers two types of APIs: streaming and non-streaming, and Anthropic defaults to streaming. If you switch your code to the streaming API, you might experience better speeds. Also, make sure you’re using global inference if you aren’t already!
What kind of latency and token throughput data are you seeing on Bedrock compared to other services? I've experienced moments where it feels congested too, but AWS doesn’t acknowledge these issues unless you're spending a lot. Just keep in mind, it's an on-demand service without any performance guarantees.

Related Questions
Neural Network Simulation Tool
xAI Grok Token Calculator
DeepSeek Token Calculator
Google Gemini Token Calculator
Meta LLaMA Token Calculator
OpenAI Token Calculator