I'm currently setting up a lab cluster for a client involving Citrix and VDI, but today, two of my VMs—a Domain Controller and a Citrix Delivery instance—just died on me while I was using them normally. When I attempted to reconnect, I noticed they were switching between "updating" and "starting" states in the Azure console, before finally settling on "failed" after about 15 minutes. Azure's diagnostics suggested I re-apply the VM configuration or de-allocate and re-allocate them, but none of those steps worked. I checked the serial console and didn't find anything useful, though I'm parsing the event logs for clues. I've done enough troubleshooting to know that VMs can fail, but this one really has me puzzled. I'm wondering if there was a hard failure on the infrastructure hosting these VMs that corrupted the boot sequence. Has anyone experienced something similar in Azure? I'm in the process of rebuilding the VMs but would love to understand the potential root causes to prevent this in the future.
3 Answers
What region are your VMs in? This sounds like it might be a capacity issue in Azure, based on prior experiences I've had with similar problems.
I ran into a similar issue this week where my VMs kept switching from starting to updating and then failing. Are you using a v6 family CPU with TPM enabled? Microsoft recently investigated some start failures related to a hypervisor update that caused TPM issues, leading to VM failures.
Have you tried the HyperV rescue method? It sounds like your VMs might have encountered a bad update and got stuck in a boot loop.

I'm using two different v6 machine types: Standard D2als v6 and Standard D4alds v6, both with trusted launch and secure boot enabled. Did you manage to recover your VMs or did you end up having to rebuild them?