I've recently taken on a role in financial operations at a sizable B2B company, focusing on managing our EC2 commitments and savings plans. While my team has achieved a good coverage rate of over 90%, I've noticed that a significant portion—roughly 60-70%—is essentially just covering idle capacity. The challenge lies in getting my DevOps/platform team to engage in meaningful discussions about rightsizing resources or modifying safety buffers. They seem to prefer the comfort of higher safety margins, which is understandable, but it's hindering our ability to optimize costs effectively. I genuinely appreciate their skills and contributions but am struggling to gain their cooperation on these matters. How can I approach this collaboration to ensure we're working as a team towards better cost management?
1 Answer
One thing to keep in mind is the concept of burstiness. Depending on the workload, the usage peaks could indicate that smaller instance sizes might be sufficient. However, if those resources are a concern during high-demand times, it makes sense for them to want a larger buffer. It’s key that they explain this to you when discussing resource allocation, but it does create a conflict when optimizing for cost versus safety.

Definitely look into analyzing post-mortem reports to verify what services are actually scaling with your current traffic and ensure there’s room for right-sizing.