I'm struggling with setting up eviction thresholds and managing memory pressure in our Kubernetes cluster. We have nodes with 32GiB of RAM, but 4GiB is reserved for system use. The hard eviction threshold is currently set to the default of 100MiB, which I understand applies to the whole node. The issue is that the kubepods.slice cgroup, which uses 28GiB, often hits capacity, leading to failed liveness probes and other disruptions.
I want to know if raising the eviction thresholds would affect the reserved system memory, which I prefer to keep intact. Ideally, I'd like the hard eviction threshold to trigger when the kubepods.slice reaches around 27.5GiB, without considering the system memory usage. I also know that setting proper resource requests and limits could help, but due to company policies, this isn't enforced for our users. What are your thoughts on whether eviction thresholds should account for the total memory of the node?
3 Answers
I noticed we have EnforceNodeAllocatable set to "pods," leading to hard limits on kubepods.slice. While this keeps our nodes stable, it also leads to critical system pods failing when memory is low. It's a tricky balance!
Our company policy prevents us from enforcing resource requests and limits on user workloads, forcing us to encourage them to set them voluntarily. This is a challenge because we have hundreds of workloads running without any resource requests or limits, and I'm exploring options like using recommended settings from tools like VPA or Goldilocks.
Consider putting a memory limit on the kubepods.slice cgroup if it isn’t already set up. This could help manage memory pressures better, though bear in mind that Kubernetes QoS doesn't guarantee which processes will be terminated first when limits are hit.

Related Questions
Can't Load PhpMyadmin On After Server Update
Redirect www to non-www in Apache Conf
How To Check If Your SSL Cert Is SHA 1
Windows TrackPad Gestures