I'm managing a home Longhorn cluster and have implemented a routine to power it off and on daily. Despite my efforts to ensure a smooth startup and shutdown process for workloads relying on Longhorn, I'm still facing issues with random PVC corruption. Has anyone else dealt with this kind of problem? I'd love to hear about your experiences and any solutions you might suggest!
3 Answers
What error messages are you getting? Details about your disks, filesystems, and versions you're using could be important too. Also, don't forget to share your Longhorn and PV configurations; they could be the key to resolving the issue.
Honestly, what's the point of running Kubernetes if you're powering it off and on every day? Seems a bit counterproductive, doesn’t it?
I'm just trying to learn how these technologies work, that's all!
It might help to know how many replicas you have for your PVC. Also, are you making sure to detach the volumes safely before shutting the hosts down? A clear shutdown process can be crucial to avoid data corruption.
I have one replica on each node. My shutdown steps are: 1. Scale down to 0 in argocd 2. Scale down to 0 for all Longhorn-dependent workloads 3. Wait for all PVCs to detach 4. Cordoned and drained 5. Stop the k3s service 6. Shut down the system.
I'll be offline for a few days but will get back to you as soon as I'm back!