I've set up Karpenter on my EKS cluster, which mainly runs BestEffort pods (those without resource requests or limits). Initially, everything functioned well with Karpenter automatically provisioning and terminating nodes. However, I've started encountering scheduling issues.
Here's the situation: while Karpenter schedules the pods successfully, some end up stuck in the CreatingContainer state after some time. When I check the nodes, their CPU usage is alarmingly high (around 99%). I suspect this is due to CPU and memory pressure caused by over-scheduling since my BestEffort pods don't define any resource requests or limits. Therefore, Karpenter may be underestimating the resources required.
I've attempted a couple of solutions:
1. I set some minimal CPU/memory requests by converting BestEffort pods to Burstable, thinking this would help Karpenter make better provisioning choices, but it didn't solve the problem. Now, Karpenter is provisioning more nodes than the Cluster Autoscaler needed, resulting in higher costs without addressing the core issue.
2. I also deployed a DaemonSet that requests resources to create buffer capacity during CPU spikes, but even that didn't work. The pods still got stuck, and nodes remain under high CPU pressure.
I'm looking for suggestions on how to make Karpenter work more effectively with mainly BestEffort workloads. What can I do to avoid over-scheduling and manage CPU/memory pressure more efficiently?
1 Answer
Consider setting default resource request values using LimitRanges in your namespace. This way, even pods that don’t define resources will have basic requests for scheduling. I've dealt with Karpenter scheduling too many pods on a single node which leads to issues, and this approach helped improve the situation for us.
I can't really set limits either because the pods created by tasks are unique, and limiting them causes OOM errors. It's frustrating!