I'm curious about the best practices for right sizing Azure Kubernetes Service (AKS) node pools when using Terraform for cluster provisioning. Since Terraform relies on desired state configuration, how do people manage dynamic workloads while ensuring the node pools are appropriately sized?
3 Answers
Yes, I use cluster-autoscaler too! I pair it with py-kube-downscaler to ensure my dev cluster runs minimally outside of working hours, which helps save on costs while still being efficient.
I usually go with the cluster-autoscaler to handle this. I start off with a node pool of three instances and then scale up based on the demand.
Karpenter is a solid option for this, though I'm not sure if it's available for AKS yet. It's worth checking out if you're looking for something that simplifies node provisioning.

Actually, node auto provisioning for AKS just went GA! You can check out more details on it here: https://learn.microsoft.com/en-us/azure/aks/node-autoprovision?tabs=azure-cli. I'm just waiting for Terraform support before I try it out.