I'm managing a Kubernetes cluster that will have roughly 1,000 pods per node and an expected total of around 10,000 pods. I'm looking for advice on how to properly size the control plane, including the number of nodes, etcd resources, and API server replicas, to maintain good responsiveness and availability. Any tips or best practices?
6 Answers
With that many pods per node, you might want to think about having more nodes in your cluster. Doubling the nodes to around 20 can really help with failover and overall stability. Also, keep in mind that adding more nodes does impact etcd memory usage, so plan accordingly.
High-density clusters seem great on paper, but they can turn problematic with workload types. If apps are spiky or memory-hungry, you'll need to allow some buffer for when nodes go down. Plus, network and storage I/O can become bottlenecks. Managing high tenant density can also complicate microsegmentation. In the long run, I find smaller and less dense clusters are easier to handle.
Honestly, a 10-node setup with 1,000 pods each might exhaust your local resources pretty quickly. I’d recommend scaling up to at least a 50-node cluster with 200 pods each before diving deeper into control plane requirements.
I thought Kubernetes generally recommends a max of 110 pods per node, even though that's not a strict limit. Are you planning to use a specific version of Kubernetes that supports higher pod densities? Or will this be on a cloud provider?
Everything depends on how busy your cluster gets! Are you using any monitoring tools, like Alloy, to track API server performance? If you’re running a lot of operators and have numerous events, you need to consider that. For reference, our biggest cluster has 13,000 pods across 70 nodes, and we're doing fine with 3 control planes (8 CPUs and 30GB RAM each). Just make sure to isolate etcd with separate disks.
It really varies based on your cluster's activity level. You should monitor your control planes closely to see if they'll need to be scaled further. Maybe try testing things out in a smaller dev environment first to see how your control planes hold up. Running a half-size setup could give you useful insights.
Related Questions
Can't Load PhpMyadmin On After Server Update
Redirect www to non-www in Apache Conf
How To Check If Your SSL Cert Is SHA 1
Windows TrackPad Gestures