I'm renting a dedicated server from Hetzner with an Intel Core i9-9900K, 128GB of RAM, and 2x1TB NVMe disks running ESXi 8.0U3. The issue I'm facing is that the server reboots on its own every 4 to 5 hours without showing a Purple Screen of Death (PSOD). The logs indicate an unexpected power loss, which is as if the power was abruptly cut off. It restarts normally afterwards. I'm trying to troubleshoot this and suspect it might be a failing power supply or potential thermal issues since the i9-9900K can run hot. I've also heard that disabling C-States in the BIOS might help with the reboots, but I'm unsure if that would cause the power loss logs. Has anyone else dealt with this on Hetzner servers or with the i9-9900K? Should I request a PSU replacement from Hetzner or try disabling C-States first? Any thoughts or debugging steps would be greatly appreciated!
5 Answers
The symptoms do suggest a PSU issue, but if it was failing at boot, you’d likely see boot loops rather than stable operation for hours. It could also be a voltage problem with the CPU. Make sure you have good voltage readings and solid temperatures.
I've noticed that using consumer-grade components like the i9-9900K can lead to issues, especially since it doesn't support ECC memory. It's worth considering that this could lead to stability problems. Have you checked the IPMI logs or if there's a BMC unit available? That could provide insight into the reboots. Also, make sure to monitor your CPU and motherboard temperatures to rule out overheating.
Don't hesitate to contact Hetzner directly. They'll have a better idea of what could be causing those power loss logs and might swap out the PSU for you if it’s defective.
Definitely run some stress tests like OCCT or Prime95 with all your VMs powered down. This will help you determine if the hardware is unstable. If it crashes during the stress test, definitely reach out to Hetzner support. Also, consider turning off Turbo Boost for the CPU; I've seen it cause instability over time on consumer chips.
It might also be a good idea to check your power setup. If you have a managed or switched PDU, it could be throttling under load. I had a similar issue once where Windows Defender was scheduled to actively scan at 4 AM, which overloaded the PDU and caused crashes. Just a thought!

Related Questions
Can't Load PhpMyadmin On After Server Update
Redirect www to non-www in Apache Conf
How To Check If Your SSL Cert Is SHA 1
Windows TrackPad Gestures