I'm looking for insights from sysadmins who've worked with ZFS and Proxmox (or similar stacks). Here's my situation: I'm using ZFS to replicate Proxmox VM datasets regularly, and I do this while the VMs are powered on. I'm not employing `fsfreeze` or any guest-level consistency mechanisms, since I only need a clean, restorable backup after the VM shuts down and I trigger a final replication. Essentially, I'm approaching this with an 'eventual consistency' mindset. So my main question is: is this method acceptable for production environments in terms of backup and disaster recovery? What potential issues or risks should I be aware of with corrupted snapshots or problems due to running VMs with ZFS or Proxmox? Any real-world experiences would be greatly appreciated!
3 Answers
I think you're aiming for quicker incremental replication for situations like manual DR failover. I haven't directly tested this, but if you shut down your systems properly and replicate afterward, it should generally work. Just be cautious about why you're setting this up; usually, I need replica VMs for unplanned failures, and this method might not be reliable for that kind of situation.
If your VMs and their applications can safely resume like they've gone through a hard shutdown, this replication method could be fine. Just remember to use a backup solution alongside this for application data. For ZFS, once you take a snapshot, any writes after that don't affect replication, since it's separated for that purpose.
What about the "hard shutdown" part? I get that if I used snapshots, it would be consistent. I only consider the last replication as the "current" state.
In real deployment, you'd typically use stretched clusters or solutions like Veeam for VM replication to a DR site. This setup seems more suitable for a home lab than a professional environment.
That’s exactly my goal! I'm focused on planned maintenance, not for unplanned incidents. Basically, I want to ensure quick VM migrations, allowing safe maintenance without stressing over the VMs since they've already been moved.