I'm interested in hearing from others about their experiences with Kubernetes upgrades in a production environment. Upgrading always feels like a game of chance, so I'm eager to learn:
- What specific issues did you encounter during your last upgrade? Was it related to Kubernetes itself or any add-ons, custom resources, or policies?
- Did your staging environment catch these issues, or did they pop up in production first?
- What checks do you perform before an upgrade, and what do you wish you would have checked?
Bonus: If you could know one critical fact before an upgrade, what would it be?
5 Answers
Generally, I haven't faced major issues during upgrades. I usually read the release notes and check for any deprecated APIs. Most updates have gone smoothly, especially in managed environments. Just be sure to test in your staging set-up ahead of time to catch anything unusual.
Yeah, it sounds like planning and early testing are key to avoiding mishaps.
I had an upgrade go wrong with Harvester when Longhorn crashed mid-update, which resulted in needing a complete reinstallation. It was partly due to someone else manipulating the process without communication, so always make sure everyone’s on the same page before starting an upgrade!
Yikes, that sounds stressful! Were there any signs or warnings beforehand that something might go wrong?
In my experience with EKS, the main issues arose from not having all add-ons updated before the cluster upgrade, which caused the control plane to become unresponsive. Always ensure your add-ons are ready to roll to prevent headaches later!
Totally agree. I often run into issues with EKS add-ons and a patchy upgrade path if I'm not cautious.
Right! It’s all about version compatibility with add-ons and plugins.
I’ve had few major issues while upgrading my AKS cluster. We utilize automatic upgrades, and it’s been pretty reliable. It’s important to read the release notes and consider everything in your infrastructure as code to avoid missing any components during upgrades.
Sounds smart! Keeping everything in IaC should help avoid surprises.
Exactly! Always backup everything before doing major upgrades just in case.
For my last upgrade, I set up a new cluster running the latest version in parallel with the old one. This way, I could deploy all my resources and test everything without risking downtime. If anything went wrong with the new cluster, I could quickly switch back. This has worked well for us since we use external persistence, allowing me to migrate with minimal hassle.

That makes sense! I've found that managed Kubernetes often makes upgrades feel less risky.