I'm experiencing high load on my API Management setup, which is currently on the Basic Tier. We're developing an event API that handles requests from 2,000 to 3,000 clients, with each request being fairly small (around 15 KB). While the load is typically light and evenly distributed, there are peak times when many clients send requests simultaneously. Unfortunately, I can't modify the client configurations. If they receive a 200 OK response, that's fine; if not, they just try again later.
Behind the API Management, I have a Service Bus running on the Standard Tier, where requests and messages are sent to a topic. This setup works fine under normal load, but during load testing (about 10,000 requests in 5 minutes, peaking at 2,000 clients simultaneously), I'm seeing a 60% error rate, including 500 errors and java.net.SocketException. I initially thought the API tier was the issue since I tested on a Dev SKU, but I'm getting the same problems on APIM Standard. Can anyone suggest recommendations or documentation that could help? Is there something wrong with my architecture?
1 Answer
It sounds like you're running into some boundary issues rather than a complete architectural flaw. The combination of API Management and Service Bus can handle bursts, but you need to be deliberate about throttling and managing backpressure. From what you've described, the API Management is acting like a synchronous gateway to an asynchronous system, which isn't ideal during spikes—leading to those socket errors. I’ve seen teams successfully stabilize these types of setups by implementing buffering early, applying policies for burst smoothing, or even decoupling the write operations so that APIM can respond quickly without waiting on downstream confirmations. It's really about how you manage failures and saturation during peak loads. If those contracts aren't clear, your load tests will definitely highlight them.

Thanks for your insights! I'll definitely look into those topics more. I did some testing with a separate API Management on the Standard v2 Tier, and the results were much better! It worked like a charm, and I realized I was stress-testing at double the expected load. The error rate dropped to just 2%. So it seems that simply upgrading the tier had a major positive impact. I’ll also explore decoupling the write path and burst policy smoothing.