I'm encountering some bottleneck issues with my architecture involving API Management and a Service Bus. We're developing an event API that handles requests from 2,000 to 3,000 clients, each sending small requests of about 15kb. Typically, the load is manageable, but during peak times, we experience simultaneous requests. Unfortunately, I'm unable to adjust the client configurations, so if they receive a 200 OK, they're good to go; if not, they'll retry later. Behind the API Management, we have a Service Bus sending messages to a topic. While it performs decently under normal load, during a recent load test with around 10,000 requests in five minutes (peaking at 2,000 clients), I faced about a 60% error rate, including 500 errors and Java socket exceptions. Initially, I attributed this to the API tier, but testing with a Standard tier didn't resolve the issue, leading me to believe it's linked to the Service Bus. I'm looking for recommendations or documentation to help troubleshoot this situation. Is there something inherently wrong with my approach or architecture?
3 Answers
Testing your capacity with 10,000 requests in five minutes isn't particularly exaggerated. You should check if you're getting errors on the Service Bus side as well—there might be throttling happening that you're not aware of. When dealing with massive request loads, it’s essential to manage how you send messages. Make sure you're considering your configurations for queues, topics, and partitions. Also, ensure your service is set to autoscale appropriately, as socket exceptions can often occur when you're pushing too much traffic without proper limits on connections.
It sounds like you're running into some classic bottleneck issues that often arise from how the API Management and Service Bus interact. They can handle bursts, but you really need to implement throttling and manage backpressure effectively between your layers. Right now, it seems like your API Management is being used as a synchronous entry point to an asynchronous system, which means sudden spikes lead to socket errors instead of being absorbed smoothly. To tackle this, try implementing buffering or using policies designed to smooth out bursts. The goal is to ensure that API Management returns quickly without waiting on a slow downstream process. Load tests can expose these issues, so you might need to rethink how you handle failures and saturation during peak times.
Who's handling the message writing for the Service Bus? You might want to bypass the API Management and call your endpoint directly to see if the issues still persist. I’m assuming you're routing like this: Client → API Management → Function App → Service Bus. Testing that flow directly could help isolate where the slowdown occurs.
Glad to hear you found some relief with the tier adjustment! Consider experimenting with different policies to find what best suits your needs.

Yeah, I found that switching to the Standard Tier resolved most of my bottlenecks. The flow is indeed Client ➔ APIM ➔ Service Bus ➔ Functions. I'm currently using a preview policy from Microsoft to send Service Bus messages, which I'm hoping will help.