Hey everyone! I recently got a new role where I'm in charge of our Kubernetes clusters, and I'm facing some major challenges due to the scale of our operations and the mix of cloud providers we use. Just to give you a quick overview: we have about four on-prem bare metal clusters that are not well-maintained and still running an outdated Kubernetes version, plus ten AKS clusters on Azure and a few small EKS clusters on AWS. My team is pretty small—only four of us—and we only spend half our time on Kubernetes tasks. The main challenges I'm dealing with include maintaining Terraform modules, keeping the clusters updated, certificate rotation, and everyday support for different use cases. I'm exploring tools like Kubespray and RKE (or RKE2) to simplify and centralize some responsibilities. I've got a few questions: has anyone dealt with a similar setup? What have your experiences been with RKE or RKE2 at scale? Is Rancher effective for managing multiple clusters across different cloud providers? And are there any lessons or pitfalls I should be aware of? Thanks for any insights!
5 Answers
While I don't have direct experience with RKE2 at that scale, I've found it crucial to standardize workflows across clouds and on-prem environments. Consider a unified lifecycle management approach—many products use the Cluster API but watch out for assumptions it makes about your environments. We use a tool called Omni which can manage various environments while keeping everything API-driven, making it easier to handle.
It sounds like your team might be a bit understaffed for such a large-scale operation. Managing thousands of nodes usually requires a few more dedicated Kubernetes engineers to handle the workload smoothly.
I've felt your pain too! The challenges you listed sound so familiar from my days as an SRE. One approach that worked for me was running the Kubernetes control plane as pods in a management cluster, which really flattens the differences between cloud providers and simplifies upgrades and management. Have you thought about using the Cluster API for provisioning nodes? It's super helpful to manage across different infrastructures.
That sounds like an interesting approach! I'll definitely look into the Cluster API for node provisioning.
Have you checked out solutions like Anthos or Azure Arc? They seem to handle multi-cloud scenarios better while also offering mature management tools for certificates. Just be aware of potential pricing shocks depending on your setup.
That does sound promising, but the costs could be a concern for our budget!
Seriously consider k0rdent! It's designed to address many of the concerns you've mentioned and uses plain YAML files for configuration. Plus, it's open source, so you can try it without worrying about licensing costs.
True, but let's be realistic—there aren't many with a decade of Kubernetes experience since it's still such a young technology.