I'm running about 100 pods of five different Python web applications across multiple nodes, and I'm experiencing around 15 OOM (Out of Memory) kills on a typical day. I've checked resource limits and haven't found any obvious flaws yet, so I'm not entirely sure why these OOM kills are happening. To help manage resource usage more effectively, I was thinking about disabling memory overcommit. This would make it more likely for memory allocation to fail, but I'm concerned about any potential unforeseen negative consequences. Has anyone tried this approach, and what were your experiences?
3 Answers
Have you checked the resource quotas set up in your Kubernetes cluster? It could be that they're set too aggressively, which might be contributing to the OOM kills.
You might want to consider that limiting CPU is more important than limiting memory. Generally, it's better not to set strict CPU limits, as it can lead to performance issues. Focus on optimizing your actual resource values instead.
I wouldn't recommend disabling memory overcommit without digging deeper into the issue first. You might have a memory leak or perhaps you're not allocating enough memory for your containers to function properly. What have you tried in terms of debugging this?

Related Questions
Can't Load PhpMyadmin On After Server Update
Redirect www to non-www in Apache Conf
How To Check If Your SSL Cert Is SHA 1
Windows TrackPad Gestures