I'm new to Kubernetes and I'm trying to figure out the best way to set up backups for my k3s cluster. I have a local k3s cluster running on VMs, comprising one master/control plane node and three worker nodes. I use Traefik as my Ingress Controller and MetalLB for VIP. As I don't have centralized storage, all data is stored locally. I opted for Longhorn as my storage solution because it's simple to set up and isn't heavy on resources. Although I've seen options like Rook and Ceph, they seem too complicated for my current hardware.
For backups, I need a solid disaster recovery plan that can restore either the whole cluster, just the Control Plane, or particular PersistentVolumes (PVs). I want to keep using snapshots like Longhorn offers. Initially, I thought about just relying on Longhorn's native backups, but I've heard it might not be the best strategy. I'm also uncertain about the immutability and consistency guarantees for backups stored on remote S3, along with how to manage encryption – it seems that the only feasible option is encrypting the volumes themselves. Additionally, I'm concerned about whether my database backups will be consistent; does Longhorn have any features for application-aware backups? Regarding my Control Plane, I'm planning to take etcd snapshots or just copy the k3s SQLite database.
As an alternative, I'm considering Velero, which appears to simplify the process, but I have questions:
- Should I go with File System Backups using Restic or Kopia, or use CSI support for Longhorn? The latter feels like it could lead to a messy configuration with too many dependencies, and I prefer to keep things straightforward.
- Does Velero support application-aware backups?
- I'm also worried about cluster-side encryption and ensuring S3 immutability for the backups.
I've considered using Veeam Kasten (K10), but I've seen mixed reviews. I'm aiming for a backup solution that is simple and reliable without involving any SaaS options. Any suggestions would be greatly appreciated!
2 Answers
To start, consider adopting GitOps practices with a tool like Flux or ArgoCD. This will help safeguard your configurations such as configmaps, secrets, and deployments, making restoration straightforward. As for PV backups, Longhorn generally handles backups for application PVs well, including those for Prometheus metrics.
Regarding your database concerns, are you talking about ETCD or do you have Postgres databases also? If it's the latter, some operators, like CNPG, have plugins that offer point-in-time recovery, reducing potential data loss compared to using Longhorn for those backups.
My setup is a bit more complex, but it works for me. I use a couple of similar-sized HDDs and SSDs organized into two ZFS pools, which I then provide as NFS to my k3s cluster via an NFS CSI driver. I don't back up the entire pool, only critical data. For individual backups, especially for Postgres, I have the operator dump backups to rustfs, and then I do a syncoid/sanoid backup of rustfs after a few hours. I strongly recommend adopting GitOps; it might be tough now, but you'll be glad you did later! For etcd, I simply dump backups daily to the ZFS pool.

Related Questions
Can't Load PhpMyadmin On After Server Update
Redirect www to non-www in Apache Conf
How To Check If Your SSL Cert Is SHA 1
Windows TrackPad Gestures