Hey everyone,
I'm working on replicating some KVM virtual machines from my main site to a disaster recovery site over WAN links. The VMs are currently stored as qcow2 images on a RAID setup with XFS. These VMs are quite important as they serve my personal email and production systems, as well as a few VM setups for friends.
My main goal is to ensure that I have replicas of these VMs ready to go on a secondary KVM host, with a maximum synchronization gap of one hour between the original and the replica.
I've encountered some commercial solutions like DRBD and DRBD Proxy, but they're a bit pricey for my budget—I'm not looking to spend thousands yearly for licensing or support.
I'm trying to explore cheaper alternatives or strong open-source tools for this geo-replication task. Here are some options I've considered so far:
- **LizardFS**: Offers WAN replication, but the project seems inactive.
- **SaunaFS**: A fork of LizardFS, no WAN replication planned yet, but they seem promising.
- **GlusterFS**: It's being deprecated, so I'm hesitant.
In terms of filesystem replication, I looked at:
- **ZFS with send/receive**: While a robust solution, I've faced performance issues with COW for VMs and kernel updates tend to break it.
- **XFS dump/receive**: A solid possibility but limited in snapshot capabilities.
- **LVM with XFS snapshots and rsync**: It's more filesystem-agnostic, but concerns arise about rsync's performance due to the data read overhead.
- **qcow2 disk snapshots with restic backup**: It's filesystem-agnostic, but restoring takes time.
If anyone has successfully implemented a geo-replication solution for VMs without needing advanced expertise or a massive budget, I'd love to hear your thoughts and suggestions! Thanks in advance!
5 Answers
Have you thought about using hourly Borg backups directly on the secondary host? If bandwidth isn’t a major issue, it might streamline your process. You could leverage the qcow2 snapshot feature for less traffic between backups, though routing can get tricky if your IPs differ at each site.
ZFS snapshots could be a game-changer for you. They allow easy management of your images and can really streamline your replication processes. Have you tried incorporating them into your workflow?
As I mentioned earlier, ZFS performance for VMs isn't the best for my setup, though I’ve been using it a long time. I’ve faced kernel issues after updates too, which is frustrating.
I’ve heard some people say GlusterFS isn’t entirely deprecated, but its commercial side is. Still, if you're worried about its long-term support, it might not be the best pick. Do you have any experiences with it?
You’re right that the project isn’t dead per se, but the commit activity has definitely slowed down. I would tread carefully if you're counting on it for something mission-critical.
You can achieve this without any shared storage. I frequently live migrate VMs across hosts without shared storage using 'virsh migrate' and it works quite well, as long as your bandwidth allows for it. If the data changes aren’t too frequent, the costs could be manageable too.
That sounds promising. How do you handle IP routing without shared storage to ensure everything maps correctly?
Consider a Proxmox backup server if you have any inclination to switch. It’s efficient and cost-effective although I know you're not running Proxmox. Just throwing it out there because it works wonders for others.
I appreciate the suggestion, but I’m committed to my setup with vanilla KVM for now. Proxmox has some quirks I’m not a fan of, like their unique formats and APIs.
I prefer using restic over Borg based on my tests, but I get your point. The recovery time is my main concern—I want those VMs up and running quickly!