I've been running a Longhorn cluster at home and have been powering it off and on every day. I've worked hard to set up a proper startup and shutdown process tailored to my workloads, but I'm still facing issues with random PVC corruption. Has anyone experienced this? Any tips or advice?
3 Answers
Honestly, why even use Kubernetes if you’re turning the cluster on and off every day? Seems counterproductive!
It sounds like you're encountering some common issues. How many replicas do you have for your PVCs? Also, are you making sure to detach the volumes properly before shutting down your hosts?
I have one replica on each node. My shutdown process is:
1. Scale down to 0 on Argocd
2. Scale down to 0 for all Longhorn depending workloads
3. Wait until all PVCs are detached
4. Cordoned and drained the nodes
5. Stop the k3s service
6. Finally, shutdown the system.
Have you checked what error message you're getting? It could help to know the specifics about your disks and filesystems, the versions you're running, and how exactly your Longhorn and PVs are configured.
I'll be away for a few days but I promise to update you as soon as I'm back with the details.
I’m just learning the technologies, trying to figure things out!