Our team has implemented Azure Front Door Standard with an origin group that includes 10 app services located in different regions: 7 in East US2, 1 in UK South, 1 in Southeast Asia, and 1 in Germany. We utilize health probes over HTTPS with a HEAD method and a 100-second interval. Our load balancing rule has a sample size of 4, requiring 3 successful probes with a latency sensitivity of 50 milliseconds.
We've noticed a recurring issue where the last app service in the East US2 list, labeled USAPI1, gets heavily strained. When we check the DNS URL, we see that Azure Front Door rounds through the origins, but if we don't reboot USAPI1 daily at noon EST, its response times can spike to over 25 seconds due to excessive load. Meanwhile, the other US app services (US1 to US6) don't experience this issue. We reboot USAPI1 to manage this problem, but we're curious if anyone knows why this last origin is consistently getting hit so hard.
1 Answer
I hear you on that! If you're not in need of a CDN, Traffic Manager could be a better fit for pure load balancing across regions. But if your main goal is to ensure the best response times for users close to your services, it sounds like Front Door was a smart choice. Just keep an eye on that last origin—could it be that it just happens to have heavier traffic and needs more resources?
That's definitely a possibility! The odd part is we've set it up for latency-based routing, which should ideally distribute the load evenly, so it's baffling why it remains the most loaded. We’ve also verified that session affinity isn't a factor since we’re keeping things stateless.