Setting Up Highly Available Kubernetes on Bare Metal: Need Your Insights!

0
7
Asked By CloudyHorizon68 On

Hey everyone! I'm gearing up for an interview where I'll be discussing the setup of highly available Kubernetes clusters on bare metal (no cloud services involved). The organization has a strong infrastructure background but is relatively new to Kubernetes, and they're planning to implement an AI orchestration tool on top of the cluster.

I'd love to hear from anyone who has experience with this kind of setup. Here are some specific points I'm curious about:

* What's your approach for high availability in terms of etcd, multi-master configurations, and load balancing?
* What networking and persistent storage solutions do you rely on in an on-prem Kubernetes setup?
* Are there any issues you've encountered while automating deployments with tools like Terraform or Ansible?
* How do you handle monitoring and logging on bare metal (e.g., using Prometheus, ELK stack)?
* What persistent storage solutions work well for Kubernetes on bare metal (like Rook, Ceph, NFS, or OpenEBS)?
* Any recommendations (or things to avoid) regarding tools for automating deployments?
* If you have experience connecting two different sites or clusters, how do you approach that?

Thanks in advance for any advice or insights you can share!

6 Answers

Answered By K8sMastery2023 On

For high availability, I recommend using at least three control plane nodes and setting up kube-vip for your virtual IPs. It's important to configure your etcd with three nodes to avoid quorum issues. Using Calico as the CNI is a solid choice, and MetalLB makes load balancing a breeze. Installing it via Helm is straightforward! Keep in mind to set up a DNS entry for kube-vip as well.

TechieTom -

Yeah, that's what worked best for us too! We're considering TalOS in the future though, just haven't had the time to test it out.

CuriousDev -

That’s exactly what I did, worked like a charm!

CargoClyde -

Calico has been great for us as well. We faced some challenges with Flannel compatibility, so I'd definitely stick with Calico.

Answered By JohnDoeDev On

My setup is on GitHub, where you can see my entire process. I managed my bare metal with Proxmox, using Ansible to bootstrap and RKE2 for the Kubernetes setup. Just be careful with automation tools; while Terraform can work, it often requires double application to fully bootstrap a cluster. For storage, Ceph and NFS seem to work well, though I've had issues with NFS during restarts in production environments.

Answered By DeploymentDude On

Using Talos with at least three control plane nodes is great. If you go the Kubespray route, be aware it has its share of pitfalls – knowing those in advance helps. I'm a fan of using Calico for networking, and for monitoring, I recommend using the Grafana Loki stack. Logging can be handled with FluentD or FluentBit. For persistent storage, try to use NFS if you can, as it's reliable and straightforward.

Answered By ArchitectAnna On

Check out reference architectures for the tools you're using. They usually provide guidance that simplifies decision-making. Knowing the business needs like scale and SLAs is crucial since it impacts your setup. It's often easier to leverage existing infrastructure like load balancers and networking instead of trying to do everything within Kubernetes. Ensure your control planes are three-node and isolated to avoid single points of failure, but remember they can potentially be virtualized, depending on your setup.

Answered By SysAdminSavvy On

When building your cluster, I suggest using three master nodes (consider removing the master taint to allow them to also act as workers). For networking, try using Cilium and for storage, Rook or Longhorn. I find that local-path storage works fine for most use cases, especially for non-database applications. Avoid using Terraform for the bare metal aspect unless you're comfortable with challenges it may bring.

DavidOps -

Local-path can get a bit tricky with databases though, so be cautious.

Answered By CheerfulCoder On

We used to go with the ClusterAPI using a bare metal operator, but eventually wrote our own solution with kubeadm commands. If you are considering networking, make sure to explore Calico, as it has high compatibility. A potential pitfall with Terraform is managing the bare metal setups – so weigh your options carefully when planning deployment strategies to avoid future headaches.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.