I'm dealing with a constant headache over pod resource requests and limits in our EKS environment. Most of our services are either Java or Node, and I find that developers consistently ask for much more than they actually need—like 2 CPU and 4 GiB for apps that only use around 200m CPU and 500 MiB of memory. I get it; they want to be safe, but it's making our cloud bill skyrocket.
Our nodes often look underutilized, and our finance team is putting pressure on us to cut costs. I've experimented with Vertical Pod Autoscaler (VPA), but it doesn't really fit most of our workloads. Horizontal Pod Autoscaler (HPA) works for scaling, but it doesn't tackle the mismatch between requests and actual usage. Right now, we're stuck looking at Prometheus graphs, tweaking YAML files, rolling out pods over and over…and it feels like a total waste of time.
Has anyone found a solid solution to this? Are there scripts or tools out there that help manage this situation? I feel like I'm missing something obvious, but everything I try either disrupts workloads or requires constant monitoring. I'd love to hear what's been successful for you!
6 Answers
Have you thought about using something like Karpenter? It could help manage resources better.
One approach is to tie the costs directly back to the developers. If finance communicates with them about the expenses, it creates an incentive to lower their resource requests. It might help ease some of your pressure too!
Consider creating a tier list for resource requests based on the app's value to the company. If running an app isn’t justified by its cost, push for a cheaper solution or a compelling case for why it should run. You could even not set limits and use multiple clusters to balance between shared uses and controlled resources for scaling.
I suggest basing requests and limits on historical monitoring data, plus load testing. Don’t let the developers decide the capacity on their own, but definitely involve them in the discussion!
We mainly go with limits for most of our apps and keep an eye on them over time. It's essential to monitor for any "rogue" applications to avoid any surprises.
Check out tools like Kubecost or OpenCost. We use them alongside a chargeback model or efficiency reports to leadership to keep track of spending more effectively.

Related Questions
Can't Load PhpMyadmin On After Server Update
Redirect www to non-www in Apache Conf
How To Check If Your SSL Cert Is SHA 1
Windows TrackPad Gestures