I'm setting up a home server using Proxmox to host services for my family, like file and media storage. Currently, my backup process involves using a Proxmox Backup Server with local ZFS datasets and offsite backups to a Raspberry Pi running a Restic server. Now, I'm planning to transition workloads to Kubernetes with Rook Ceph, which includes using external Ceph OSDs. My main concern is ensuring proper disaster recovery and offsite backups, especially since I'd prefer to stick with Restic as my backend and maintain encryption like I do now. I've been exploring Velero, which can backup Kubernetes manifests and create snapshots of persistent volumes (PVs), but I'm worried that if my Ceph cluster fails, I'll lose my PV data because Velero's snapshots are stored within that same cluster. I'm looking for insights on how others manage offsite PV backups with Rook Ceph, best practices for maintaining point-in-time consistency for offsite backups, and if a workflow involving snapshots, temporary PVCs, and Restic would keep recovery straightforward while allowing me to restore workloads without excessive manual setup. Any suggestions would be greatly appreciated!
4 Answers
Using Velero alone won't be enough to protect against a complete Ceph failure since the CSI snapshots are just metadata linked to the same cluster. For disaster recovery, it's best to separate PV backups. A common approach is to use Velero for the cluster state (like manifests and PVC definitions) and manage PV backups at the storage layer. You could perform Restic backups of your PV contents while maintaining Velero for cluster manifests. A simple step-by-step could include snapshotting the RBD volume, mapping that to a temporary mount, and then running Restic to your Raspberry Pi back end. Recovery is straightforward; recreate your cluster and restore from Velero, then restore the data from Restic. It’s not fully automated, but it works reliably for home setups, keeping your backups encrypted and using a system you trust.
One way to handle offsite backups is by leveraging Velero’s unique ability to mount snapshots and perform file copies. Check the Velero documentation on CSI snapshot data movement for details on how this works. It could simplify your process if you're looking for efficient offsite backup methods.
Consider using VolSync combined with tools like Flux or Argo for easier restoration on cluster recreation. If your app or database has built-in backup and restore tools, prioritize those as they could streamline the process.
I appreciate all the responses! Based on the suggestions, I’m leaning towards using VolSync to aid my backup and restoration process!

Related Questions
Can't Load PhpMyadmin On After Server Update
Redirect www to non-www in Apache Conf
How To Check If Your SSL Cert Is SHA 1
Windows TrackPad Gestures