Hey folks! I'm facing an issue with my App Service web app that's running on Linux. I've set up the health check, but today I noticed one of the instances showed as unhealthy. My load balancing threshold is configured for 5 minutes, and I've set the WEBSITE_HEALTHCHECK_MAXPINGFAILURES to 5. I followed the guidelines in the Azure documentation and waited for half an hour, but the App Service didn't restart the unhealthy instance, even though I have two instances running. As per my understanding, it should restart an unhealthy service after one hour, but I'm not sure it will actually do that. Has anyone dealt with this situation before? Any tips or insights on what I could be missing?
3 Answers
Yeah, it seems the system doesn’t automatically restart the app services themselves. Instead, it typically replaces the app service plan only when all apps in that plan are marked unhealthy, indicating the worker needs a replacement. I recommend using the diagnostic tools in the app service blade to get a clearer picture of past issues. It helped me a lot!
I was also shocked to learn that! What’s worse is if one app in the plan has a deadlock issue and shows unhealthy, it won’t get restarted if other apps in the same service plan are still operational. And waiting an hour just adds to the frustration. I’m seriously considering migrating to something like Azure Container Apps for more control.
Totally get your frustration! One thing we found is that it's crucial to keep an eye on the app service health check. Azure does list this as a metric, but honestly, their monitoring tools can be a bit of a hassle. We wanted to take metrics like this into our own observability platform because the built-in stuff just doesn’t cut it. Sending metrics to an Event Hub and using an Azure Function can certainly complicate things but it helps to keep track of those health checks as they seem to get ignored by default.
It's been 8 hours now, and while my app is running healthy on two instances, it's stuck showing unhealthy on the third one despite all my attempts at restarting! I found it annoying that we have no direct control over zones when scaling; it seems like there could be some underlying Azure issue affecting the deployment. Really wish they'd improve this process!

That's such a letdown! I really expected better functionality from Azure.