I'm dealing with escalating Kubernetes costs, and it's becoming clear we're likely over-provisioned. Yet, every time I consider adjusting resource requests, I fear it might disrupt our production stability. Our engineering team is already stretched thin, and no one wants to take responsibility for any potential performance issues. I need to present tangible savings to leadership but feel trapped between budget constraints and reliability risks. How do you approach K8s optimization without risking the system? Are there any frameworks for rightsizing that won't place blame on me if something goes wrong?
5 Answers
Honestly, hitting 30-40% CPU utilization isn't a problem—it's a decent target. The real concern is if your cloud costs are through the roof. Have you broken down costs beyond just the Kubernetes bill? Network misconfigurations can lead to massive cost increases.
Consider setting up Vertical Pod Autoscaler (VPA) in update mode 'Off' first to understand potential adjustments without making changes. This way, you can see how much you'd need to scale down without making any immediate adjustments, which helps highlight the severity of over-provisioning.
To start, how over-provisioned are we talking about? Are you utilizing any scaling options like Horizontal Pod Autoscaling (HPA)? If your average CPU utilization is around 30-40%, that’s not too bad since it provides some headroom for spikes. But you'll want to analyze usage further.
You should absolutely leverage monitoring tools like Grafana and Prometheus to get visibility into your resource usage. At my last job, we received pushback on CPU allocations, but metrics showed we were only using about 10% of the requested resources. Metrics provide confidence that you can safely cut back.
Yeah, we have Prometheus and Grafana configured already; that’s how I realized how over-provisioned we are.
AI can actually help here. It's a complex mix of factors—CPU, memory, request times, etc. Observability tools like DataDog can guide you through this. I’ve saved my company hundreds of thousands annually with just a few tweaks to resource allocations. Showcasing these wins can help you advocate for more time on FinOps work.
That's encouraging to hear, thanks for the insights!

We're indeed at about 30-40% CPU utilization on most workloads, with some even lower. We’ve set up HPA, but it’s pretty basic and mainly just CPU-based thresholds.