I'm trying to upgrade RabbitMQ from version 3.12.11 to 3.13.7 using Helm, but I've noticed a staggering number of changes—about 1,700 additions and 750 deletions across more than 50 files, even though this is just a minor version upgrade! I've mainly replaced the old chart and image references with the new version while keeping the values.yaml file similar to the original. I deployed this in a Sandbox environment, but now my app is failing, possibly due to issues with Rabbit's STOMP WebSocket plugin, among other things. I'm seeking advice on how others approach such upgrades with significant changes. Do you sift through all these changes meticulously, or do you have a more systematic process? Also, how do you troubleshoot issues like these? It feels overwhelming, and I'm wondering if I might be overthinking it or if there's a better approach I should consider.
5 Answers
For me, it usually depends on the service. I check relevant changes in the diff, and don't hesitate to read the release notes for better insights. After that, I deploy everything to a dev cluster to verify that everything works as expected—popular charts can have regressions that slip through.
One solid tip I can share is to rebuild a test environment that mirrors the setup you're upgrading. Install the old version of the chart first, then practice the upgrade until you get it right. If something goes wrong, you can tear down the test and try again. It's also crucial to document your terminal commands.
This happens to me all the time too. Mitigating risk is key, and practicing in lower environments helps a ton!
I often use helm-diff to see the actual changes in charts. It helps me maintain my own charts, even if official versions exist.
My upgrade process usually starts by checking the software changelog and release notes. I also look at the Helm chart's changelog or GitHub release notes. If I find discrepancies or breaking changes, I deploy it from dev to test, and finally to production, ensuring thorough testing at each stage before the final rollout.
I do the same! I’d even add that asking AI to summarize the diffs can help highlight areas you might want to focus on.
Totally agree! That deployment roadmap is crucial to ensure smooth transitions.
I stick to checking the changelog and performing tests on a staging environment before I consider going live. If the environment is highly sensitive, I prefer blue-green deployments. But if there aren't critical components involved, I just ensure there are no 'breaking change' alerts and get to it. It’s always smart to communicate with your team before you start upgrading.

Exactly! And don’t forget to ensure your test environment has the same networking configurations as your production setup. I've seen so many silent failures due to network issues.