I'm managing 12 EKS clusters across different environments like development, staging, and production, and we're spending around $200k a month. My team constantly refers to shared infrastructure, which makes it tough to allocate costs appropriately, but I suspect there's a lot of waste happening. Recently, I found one cluster had 47% unused CPU because teams tend to over-provision resources. Another cluster still had outdated workloads from the second quarter running. Honestly, the way we look at resource requests versus actual usage doesn't make sense. Currently, we track monthly rollups by namespace without any real accountability, and teams just blame each other when there are issues. I really need to understand the unit economics for each service, but the shared clusters complicate this. How do others manage cost attribution in shared Kubernetes environments? Are there any tools out there that can help track waste to specific teams or services? I'm really tired of hearing that it's too complicated.
5 Answers
Datadog Cloud Cost is a fantastic tool for this. It assigns costs directly to containers and breaks down expenses into usage, workload idle, and cluster idle categories. This allows you to see where the overspending is happening, plus you can customize reporting based on your tags. It’s a bit of setup, but totally worth it, especially if you’re already using Datadog.
I recommend using AWS Billing with Split Cost Allocation. You can do chargebacks by either Namespace or Workload Name. Given your budget, you probably should have a financial operations tool that integrates with EKS for better cost management.
You’re in luck! AWS recently introduced a feature for split cost allocation with Kubernetes labels, which can really help in tracking costs according to usage. Check out their official page for more details.
You might want to look into Karpenter as well; it's quite helpful for managing Kubernetes clusters more efficiently with regards to cost.
But seriously, why do you have 12 EKS clusters? What kind of project is this? It sounds like a lot!
Multi-region and multiple environments for testing could be one reason!
Usually, it’s multi-region plus multiple environments for testing and production support.

Exactly! Also, the Cloud Intelligence Dashboards on AWS provide excellent insights for managing costs.