How to Ensure AKS Reliability During Scaling and Upgrades?

0
1
Asked By GamerGirl44 On

I've been hearing a mix of reviews about AKS when it comes to running in production. Some people say it's great, while others recount nightmare stories with issues cropping up during cluster upgrades, scaling events, or changes to node pools. For those of you using AKS for critical workloads, how reliable have you found it to be? Are there any best practices you've adopted to keep your clusters running smoothly?

1 Answer

Answered By TechWiz123 On

To keep AKS reliable during scaling and upgrades, it's crucial to set your infrastructure policies to align with your operational preferences. Use K8s APIs to gracefully manage node evictions. Make sure to establish affinities, spreads, and disruption budgets so that your pods are well-distributed across nodes. Microsoft has specific recommendations:
1. Ensure your production clusters are available across multiple availability zones (AZs).
2. Create separate node pools for each AZ while keeping similar workloads balanced.
3. Set affinities for pods across all node pools.
4. Use topology skew with the AZ labels on the node pools.
5. Establish a disruption budget of 2+ for your pods.
If you follow these principles, you should be able to avoid downtime unless there's a major platform failure.

NodeGeek99 -

Are there any sources or links for these recommendations?

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.