I'm dealing with a constant issue regarding Azure costs in our setup. Even though we've invested in Reservations, many teams aren't using them because their deployed SKUs don't align with the reserved ones. This leads to two main problems: workloads are being run on non-reserved SKUs while the reservations go unused, and we have no clear view of how much of our reserved capacity is actually being utilized versus wasted. I'm curious how others are tackling this. Are you finding success in standardizing SKUs, putting policies in place, or using tools to better map workloads to reservations and catch mismatches early?
4 Answers
Embracing a FinOps culture and making engineers accountable for cost is key. Designing architectural patterns that prioritize cost optimization and embedding those as operational KPIs will help. Adding policy guardrails can help too, but it’s all about fostering the right culture around optimization. With mature FinOps processes, you’ll find that making things IaC keeps these issues more manageable in the long run.
One approach we've tested is to set up a minimal Savings Plan ($0.01/hour). If it shows 100% utilization, you can check which resources are using it and invest in reservations as necessary. Also, consider that Savings Plans can offer more flexibility than Reserved Instances, especially if you're looking at multiple instances across different regions.
Using Azure Policy to restrict which SKUs can be deployed is super helpful. Also, make it a habit to review your reservation utilization regularly. If the SKUs being used don't match up with your reservations, either switch the machine SKUs to align with the reservations or consider returning them and purchasing new ones.
Definitely implement a policy that defines allowable SKUs based on what you want. To monitor your savings and how well your RI is being used, check out this [FinOps toolkit](https://microsoft.github.io/finops-toolkit/). It provides insight into overall Azure costs and savings rates. We also use shared or managed groups for our reservations, ensuring they cover all our subscriptions and that we don’t restrict them to a specific one — this way, we hit all available SKUs.

That sounds good, but we can't always enforce strict SKU limits if the application's demands require better specs. We really need to identify any existing reservations that can already cover our required SKUs beforehand to avoid wasting resources, especially since we're managing 20-30 subscriptions. It’s a pain when engineers provision new SKUs without knowing reserved capacity is already available.