Best Practices for Setting Up a Highly Available Kubernetes Cluster Across Two Data Centers?

0
0
Asked By TechWhiz42 On

Hey everyone! I'm diving into setting up a production-grade, highly available Kubernetes cluster on-premises that spans across two physical data centers. I've got hands-on experience with Kubernetes in the cloud, but now my upper management is pushing for a specific plan that I'm not totally on board with. They want me to run both the Master and Worker roles on a single physical server in each data center, essentially creating a setup with just two nodes for now, and I'm concerned about quorum and overall reliability.

Here's what I'm working with:
- Two big bare metal servers (one in each DC)
- A dedicated 100 Gbps link connecting the two data centers
- In about 7 months, we're expecting to add a third data center and server
- The goal is to deploy an internal AI platform using Helm charts

I'm looking for some guidance on how to design for high availability right from the start with these resources:
1. What's the best approach to establishing HA with only two nodes?
2. How do I handle etcd quorum until the third node is in play? Could an Active-Passive setup be worth considering?
3. What are your thoughts on networking, load balancing, and the choice between overlay vs underlay for pod traffic?
4. Any tips for managing secrets safely for pulling Helm charts?
5. What tools or stacks do you recommend for bare-metal automation?

I'd really appreciate any insights you all might have before I present this to my team tomorrow!

4 Answers

Answered By ServerSage88 On

I agree with CloudHunter. Running active-passive with only two servers is risky—if the network goes down, you're stuck without a quorum. I’d focus on each data center as a separate cluster for now and use a primary/secondary model. If you need more reliability, you’ll really want additional servers at each site.

TechWhiz42 -

Thanks for the insight! I was trying to convince my manager that we need more machines for a proper HA setup, but it seems upper management isn’t budging on the current plan.

Answered By K8sGuru88 On

You definitely need three servers for a high-availability control plane setup. With only two, you're not really achieving HA. If management insists on using these servers, maybe consider something simple like Kind (Kubernetes in Docker) on VMs instead.

Answered By NetWiz82 On

Your latency between control nodes is also a huge factor! Ideally, you want it under 30 ms for etcd operations. If you're really committed to a multi-regional setup, using something like Cilium for cluster mesh could be beneficial, but keep in mind it has its complexities. Let me know if you want to discuss it further!

Answered By CloudHunter99 On

Honestly, it might be more efficient to just consolidate all servers into one data center. Trying to maintain high availability with only two nodes spread out like this isn't ideal, especially since true HA requires a minimum of three nodes for control plane redundancy, and network latency between nodes is crucial.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.