Hey everyone! I'm gearing up to upgrade my EKS cluster from version 1.31 to 1.32 and also transitioning node groups from Amazon Linux 2 (AL2) to Amazon Linux 2023 (AL2023). This is for a large production setup with 12 m5.xlarge nodes, so I want to approach this carefully. If anyone has experience with this upgrade, I'd love to hear about it. Were there any issues or surprises during your upgrade? Specifically, what should I watch out for regarding AL2023 node quirks, networking problems, or daemonset compatibility? Are there any notable differences with kernel, systemd, or containerd? Lastly, is there anything you wish you had known before starting? I'm really hoping to minimize any unexpected challenges during the rollout. Thanks in advance!
5 Answers
I made a similar transition recently on a potentially larger cluster size and faced no major issues, so it’s definitely doable! I recommend creating a new node group for AL2023, migrating workloads to it, then scaling down the old group. This strategy worked well for us. Just be careful with any EBS volumes tied to specific nodes, and check that your new instances have the correct security group permissions.
Upgrading EKS can be tricky, especially with the Kubernetes API changes as AL2 to AL2023 isn’t the primary concern. Definitely investigate your existing charts and deployments for any unsupported APIs; using relevant tools to identify them is crucial. I also don't get why more companies don't set up a blue/green deployment strategy for these upgrades—it could save a lot of headache.
I’ve heard caution is key with resource usage—it seems some folks have experienced increased CPU consumption after moving to AL2023. Just keep an eye on performance.
Oh man, I faced some issues during my upgrade from 1.33 to 1.34 with ingress breaking. It's essential to test thoroughly! Make sure your services are robust enough to handle the change without being too reliant on single replicas.
We had no problems moving from AL2 to AL2023, so fingers crossed you'll have an easy time too! Just be diligent with compatibility checks across your stack.

Exactly! Plus, keep in mind AL2023 has IMDSv1 turned off by default. Make sure any pods relying on the instance role can still function—IRSA or custom launch templates may be necessary.