We're dealing with a pretty old K8s cluster made up of five physical servers (one master and four worker nodes). This cluster, while labeled as a development setup, actually runs crucial applications like a password manager, Nextcloud, and helpdesk tools without any backup solutions in place. The persistent volumes (PVs) for these apps were configured using OpenEBS Hostpath, which means they are bound to the nodes where they were created.
Looking to improve our situation, we're thinking about migrating these volumes to an NFS setup to prevent data loss if a node goes down. We also need to implement proper RAID (at least RAID-1) on these servers. However, we're constrained by resources—we can't afford any spare servers at the moment.
Our main goals are to:
- Migrate PVs to NFS
- Back up critical data using a tool like Velero
- Reinstall servers to ensure proper RAID configuration sequentially, starting from the master node.
How should we kick things off with a system that currently doesn't have RAID-1? We're hoping to transition everything gradually while minimizing downtime for users of these internal applications. Any insights would be greatly appreciated!
3 Answers
The main issue you seem to face is potentially losing data due to the lack of redundancy. Since your pods are tied to the PVs and those PVs to specific nodes, you can't drain a node without handling those PVs first. For now, focus on moving the PVs to NFS to avoid data loss during migration. Your immediate priority should be keeping that data safe.
Start by decommissioning one worker node to set up RAID properly. Once that’s configured, you can promote it to master by removing the old master and turning it into a worker. After that, work through the other nodes in a similar manner.
Ideally, a fresh cluster using a solution like Talos could solve a lot of issues in the long run, but you first need to move the PVs to avoid any pod disruptions before you can drain nodes.
Your cluster situation looks tough! Just remember, with K8s, the storage must be set up in a way that allows for high availability (HA). You might not need RAID-1 on all drives, but it helps for boot drives. I recommend looking into solutions like Rook Ceph or Longhorn for distributing data across multiple nodes rather than sticking to just NFS—it offers better resilience and performance.
Related Questions
Can't Load PhpMyadmin On After Server Update
Redirect www to non-www in Apache Conf
How To Check If Your SSL Cert Is SHA 1
Windows TrackPad Gestures