How to Fix My Kubernetes HA Cluster After a Reboot?

0
12
Asked By CuriousCoder92 On

Hey folks, I'm in the process of setting up a High Availability (HA) Kubernetes cluster and I ran into some trouble after rebooting my PC. Initially, I ran `kubeadm init` on my first master node with the command `kubeadm init --control-plane-endpoint "LOAD_BALANCER_IP:6443" --upload-certs --pod-network-cidr=192.168.0.0/16`. Everything went smoothly during the setup, but post-reboot, I can't connect via `kubectl`. I'm getting an error indicating that it couldn't get the API group list, specifically it times out trying to reach the HAProxy VM at `192.168.122.118:6443`.

Upon investigation, I've found that the `kube-apiserver` pods are in a `CrashLoopBackOff` state. The logs show that it can't connect to `etcd` at `127.0.0.1:2379`. Checking the `etcd` health with `ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 endpoint health` confirms that it's unhealthy or timing out. Now I'm looking for advice on how to properly configure `etcd` for reboot resilience in a kubeadm HA setup, how to recover from this situation, and whether there's a safe way to restart `etcd` and `kube-apiserver` without losing data. Any help would be greatly appreciated! My setup includes 3 control plane nodes (master1-3) and 2 worker nodes, running Kubernetes v1.30.11 on Ubuntu 24.04.

1 Answer

Answered By TechGuru88 On

First off, have you checked the status of your `etcd` pods? From what I gather, if they're running, take a look at their logs to see if there's any clue about why they can't start properly.

CuriousCoder92 -

Yeah, I took a look at the `etcd` state on `master1`. It shows the container is running, but health checks fail. It reports that it can't reach quorum or establish leadership.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.