I'm curious if anyone out there has successfully run close to 1,000 pods on a single Kubernetes node. If you have, I'd love to hear about the specific tuning you've done with your CNI, such as settings for iptables, disk IOPS, kernel configurations, and CIDR ranges. My colleague Paco and I recently talked about this at Kubecon, and I'm eager to dive deeper into this topic. Insights on the performance bottlenecks and tuning strategies for large clusters would be greatly appreciated!
5 Answers
We tried going for 1,000 pods initially, but it didn't work out. We ended up virtualizing our bare metal nodes with Proxmox and now run 4 virtual nodes per host. It cuts down on issues during updates since we can update them individually.
I’m interested in your architecture—how did you manage the transition?
Using EKS, I doubt we could hit those numbers with the AWS CNI limitations. The max I've seen mentioned is around 110 pods in the default configuration, but it feels restrictive.
What’s the highest you’ve seen supported there, though? Are there known workarounds to push that limit?
I've read some docs about tweaking it, but they don't clearly define a max limit; it's frustrating!
I've personally run around 500 pods on a node. When there's heavy outgoing traffic, we start seeing packet loss and CPU spikes. I found that this issue is primarily related to the netfilter with the Linux kernel rather than a CNI problem.
That’s interesting! Would love to hear more about how you monitored it!
Have you tried tuning the netfilter settings? I’ve heard that could help mitigate those spikes.
Running that many pods on a single node seems risky to me. The Kubernetes documentation advises against it due to performance issues with kubelet when you overload a node with pods. If a node goes down, all those pods need to be rescheduled, which could really strain the control plane. Managing network, CPU, and memory resources also becomes a nightmare at that scale.
You're probably right, but I've seen people manage 500+ successfully in cloud environments. What configurations do you recommend for smaller scales?
I get your point, but have any of you successfully run 600 pods? It seems manageable with the right setup.
We've managed to get around 600 pods per node, but hitting 1,000 has been tough. We're currently waiting for some updates in cAdvisor that may help reduce CPU usage for the kubelet when scaling up. We run Calico with properly allocated subnets and use Kube-proxy in IPVS mode. Our disk I/O setup is optimized using NVMe drives, which might be overkill, but it's been beneficial, especially with high pod churn. We switched from Fluentd to Vector for logging, which was a game changer!
That's impressive! I'm curious, did you implement any specific sysctl tweaks to get those numbers?
Vector sounds interesting! I wasn’t aware it outperforms Fluentd. What made you decide to switch?
That sounds like a practical solution! How many pods are you running per node now?