Hey everyone,
I've been a sysadmin for quite a while and recently started a new job about 4 months ago. Things have been going smoothly until we began migrating our virtual machines from three standalone Hyper-V servers to a new Hyper-V cluster. We shut down the VMs, copied the VHDX and configuration files over to new storage, imported them, and started them up again. We have around 80 VMs total, and given our solid 10g/25g backbone with flash storage and plenty of cores, I thought the process would be pretty straightforward.
Unfortunately, I keep running into issues with corrupt VHDX files. It seems like every batch I work on has at least one corrupted file, manifesting as SQL errors, NTFS errors, or the VMs just won't boot at all. Initially, I simply used copy/paste for transfers, but after advice from a database administrator, I switched to robo copy for my second attempt.
When copying files, I usually transfer about 5 VHDX files simultaneously, hitting around 7Gbps, which feels like the storage and NIC max. My boss has been doing it differently by copying one VHDX file at a time, capping his transfers at about 3Gbps. I need your advice on what I can check or test to prove these corruptions aren't a result of something I'm doing wrong, and whether it's more likely a hardware issue. The fact that these corruptions seem to occur only for me is concerning, and I suspect it might be related to network issues when I'm maxing out transfer speeds.
4 Answers
First off, check the switches and interfaces involved in the transfer. Normally, TCP is pretty reliable, so if the network was the culprit, it should be more obvious. You can also generate hashes of your VHDX files before and after copying; if they differ, it’s clear something is getting corrupted during the transfer.
Have you considered checking your antivirus settings? VHD files are often targeted by attackers, so it’s possible your antivirus could be interfering by scanning files during the copy process. You might want to temporarily disable your AV on the receiving end to see if that resolves the issue. It’s a bit of a long shot, but worth trying if you haven’t yet!
Are both the source and destination in the same Active Directory domain? If they are, you might want to try a MOVE command instead, which would remove files from the destination Hyper-V manager and re-add them through FCM. It could simplify things.
We opted not to move the files for fear of losing originals if further issues arise. Keeping backups easily accessible is crucial!
It sounds like the issue might not be with the transit itself but with your storage or the methodology you’re using. Make sure you’re actually copying all associated files like checkpoints and snapshots. Sometimes they can be overlooked depending on your VM settings. Also, have you considered using Hyper-V replication? It automates the transfer and ensures everything is replicated properly without the need for manual copying.
We did use robo copy and had the same issue. We're definitely copying the right files, and when I recopy the same ones, they work. There's no snapshots or checkpoints involved, and management wanted to avoid connecting the new cluster directly to the old one. True, I’m doubting the new storage’s integrity but I’ll double-check that. Before moving on to final VMs, I’ll make sure to compare file hashes and sizes.
I've already asked my network guy to investigate any NIC errors. We're nearly done with migrations (99%) but I could run a couple of tests with some smaller files.