I recently found out that our DevOps team set up 15 AWS clusters for what was supposed to be a two-week performance testing sprint, but that was eight months ago. Now, we're stuck paying around $87K per month for these unused environments that have no application traffic and hardly any metrics or recent activity. There's a lack of owner tags, automated teardown, and cost gates when these resources were provisioned. Governance tools exist but aren't effectively enforced. I'm looking for advice or shared experiences on how to prevent situations like this in the future. What guardrails or practices do you have in place to avoid accumulating unwanted costs?
6 Answers
Here’s how we deal with cost issues:
- Default to read-only access for devs.
- All infrastructure must go through Infrastructure as Code (IaC).
- CI checks for tags, and deployments fail if they aren't correct.
- Implement budget alerts on accounts.
- Our finance team keeps a close eye on spending, which escalates any abnormal costs immediately.
This way, if costs spiral, the right questions are asked quickly!
Implementing FinOps practices is crucial. Consider enforcing a setup where your deployment pipeline fails unless resources have specific tags, particularly for cost monitoring. Having clear SOPs for resource management can also help; people are forgetful, and addressing human behavior is key. Plus, ensure no one is allowed to create resources without going through a proper pipeline—this accountability can help prevent future overspends.
We've started auto-terminating untagged resources after a certain period, and it's surprising how motivating that has been!
Implement GitOps to enforce that anything deployed is from a version-controlled source. Tools like Atlantis can help ensure tagging is enforced at creation. You can also set up budget alerts to notify the right people when spending thresholds are crossed, which ensures oversight and accountability.
Changing the culture around cost awareness is just as important as technical measures. Everyone needs to feel responsible for the financial impact.
We stopped allowing developers to create static infrastructure directly. Now they can only create specific resources like S3 and serverless items through controlled pipelines. This prevents runaway costs and ensures that tech decisions align with budget discussions. Just like in any business, signing off on expenses before they happen prevents excessive spending!
Paying $1 million a year for unused resources is crazy! I’d look into how costs are generated without any traffic. Your organization might need to consider hiring more people to keep an eye on AWS resources, especially if costs can go unnoticed for so long!
A lot of companies waste serious cash on cloud resources. The computing power alone for running idle services can rack up costs quickly.
We faced a similar dilemma, so we enforced mandatory tagging and automated cleanup scripts for all non-production environments. Anything without a TTL or owner tag gets deleted after 30 days. We also use a cloud cost optimization tool that connects resource costs with project codes, which helps identify abandoned resources quickly.
We’re trying to shift the culture to make everyone aware that cost is everyone's responsibility, but it's a slow process.