Has Anyone Managed to Run Nearly 1,000 Pods Per Node in Kubernetes?

0
4
Asked By TechnoWizard42 On

I'm curious if anyone out there has successfully run close to 1,000 pods on a single Kubernetes node. If you have, I'd love to hear about the specific tuning you've done with your CNI, such as settings for iptables, disk IOPS, kernel configurations, and CIDR ranges. My colleague Paco and I recently talked about this at Kubecon, and I'm eager to dive deeper into this topic. Insights on the performance bottlenecks and tuning strategies for large clusters would be greatly appreciated!

5 Answers

Answered By BareMetalBuff On

We tried going for 1,000 pods initially, but it didn't work out. We ended up virtualizing our bare metal nodes with Proxmox and now run 4 virtual nodes per host. It cuts down on issues during updates since we can update them individually.

KubeConParticipant -

That sounds like a practical solution! How many pods are you running per node now?

CloudSkeptic -

I’m interested in your architecture—how did you manage the transition?

Answered By AWSFanatic On

Using EKS, I doubt we could hit those numbers with the AWS CNI limitations. The max I've seen mentioned is around 110 pods in the default configuration, but it feels restrictive.

CloudSkeptic -

What’s the highest you’ve seen supported there, though? Are there known workarounds to push that limit?

ECS_Explorer -

I've read some docs about tweaking it, but they don't clearly define a max limit; it's frustrating!

Answered By OptimisticSysAdmin On

I've personally run around 500 pods on a node. When there's heavy outgoing traffic, we start seeing packet loss and CPU spikes. I found that this issue is primarily related to the netfilter with the Linux kernel rather than a CNI problem.

BareMetalBuff -

That’s interesting! Would love to hear more about how you monitored it!

DataDynamo99 -

Have you tried tuning the netfilter settings? I’ve heard that could help mitigate those spikes.

Answered By CloudSkeptic On

Running that many pods on a single node seems risky to me. The Kubernetes documentation advises against it due to performance issues with kubelet when you overload a node with pods. If a node goes down, all those pods need to be rescheduled, which could really strain the control plane. Managing network, CPU, and memory resources also becomes a nightmare at that scale.

DataDynamo99 -

You're probably right, but I've seen people manage 500+ successfully in cloud environments. What configurations do you recommend for smaller scales?

KubeConParticipant -

I get your point, but have any of you successfully run 600 pods? It seems manageable with the right setup.

Answered By DataDynamo99 On

We've managed to get around 600 pods per node, but hitting 1,000 has been tough. We're currently waiting for some updates in cAdvisor that may help reduce CPU usage for the kubelet when scaling up. We run Calico with properly allocated subnets and use Kube-proxy in IPVS mode. Our disk I/O setup is optimized using NVMe drives, which might be overkill, but it's been beneficial, especially with high pod churn. We switched from Fluentd to Vector for logging, which was a game changer!

CNI_Expert22 -

That's impressive! I'm curious, did you implement any specific sysctl tweaks to get those numbers?

CalicoGeek77 -

Vector sounds interesting! I wasn’t aware it outperforms Fluentd. What made you decide to switch?

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.