I'm trying to make a simple request using Bedrock with LangChain, but it's taking a frustratingly long time—27.62 seconds! Here's my setup: I'm using Boto3 to create the Bedrock client and the ChatBedrockConverse model. The relevant code is straightforward: I invoke the model with a simple prompt and measure the response time. Interestingly, the metadata shows latencyMs as 988, which doesn't seem like the culprit. I've heard that factors like retries can affect the timing, but I haven't found any configurations that help. Moreover, when I run the same request using raw Boto3, it still takes 20+ seconds. Does anyone have insights or tips on what might be causing this?
5 Answers
Definitely check out your CloudWatch metrics. You can get detailed insights into invocation latency, throttles, and any failed requests. This info can really help pinpoint the issue.
CloudWatch is key here. It’s also beneficial to create an inference profile for better observability with diagnostics. Plus, if you can share the full output response, that could shed more light rather than just latencyMs. Also, how's your network setup? That might impact response times.
Just a heads up—your code won't finish until the final token from the LLM is received. If you aren't streaming the tokens, it might seem like it hangs. Consider asking for fewer tokens to speed things up!
Honestly, Claude 3.5 can be slow, and it might be using some older hardware on AWS. I'd recommend trying a newer model to see if that improves the response time. Also, remember that response latency can be affected by how many tokens you're expecting back. Lastly, since you're using the EU inference profile, it might be serving your request from a far-off region, which could add to the delay.
I've set up something similar with Bedrock using lambdas on top and got much better performance—like 5-6 seconds for responses. I switched to Sonnet v2 for Claude, and it worked great for me. I suggest you try changing the model to see if you can speed things up!

Great point! If trying another model doesn’t help, it might be worth reaching out to support to see if they can help diagnose the issue further.