I'm managing a production environment with about 100 pods and need advice on how to update services regularly with nearly zero downtime. Ideally, I'd like to have a separate environment to test new releases and features before they go live. I'm considering creating a second namespace to deploy the updates there and then switch traffic over to this new namespace. Any suggestions on the best approach? Thanks!
7 Answers
Definitely embrace automation! I advocate for versioned URLs for each app, plus a 'latest' URL. Install the new version while slowly rerouting traffic from the old to the new. Some versions might need to remain locked for compatibility, so you'll need to ensure your infrastructure supports both old and new versions during the rollout process.
It sounds like you're hinting at blue/green deployment. Basically, create a new deployment with a few pods running the new version. You can gradually scale down the old version while scaling up the new one, but ensure that the new version uses matching labels for the service.
To achieve real zero downtime, your app should be able to handle shutdown signals gracefully. For near zero downtime, rolling updates can be very effective, along with pod disruption budgets (PDBs) and setting affinities or anti-affinities. There are plenty of resources online discussing these strategies.
Rolling upgrades really cover most use cases. They were a game changer when Kubernetes was first introduced!
This seems a bit complicated! A simpler approach is to use multiple replicas and stick with rolling updates. Make sure to isolate your testing environment completely beforehand.
Argo Rollouts is a great tool for this! It can handle most of the deployment process for you. Just set up the metrics correctly and it will automatically roll back if it detects issues with the new version.
Duplicating the environment can work too. Once you’re satisfied, you can switch the DNS record to direct traffic to the new version. Just be aware, this approach can get pricey. I utilize Istio for ingress and canary deployments, which can simplify things.
You're right about testing in a parallel environment! What you're envisioning with the second namespace is a kind of Blue/Green deployment approach. Check out some resources on it; they give a nice breakdown.
Also, having readiness probes is crucial for large rollouts to ensure that new versions are healthy before they start receiving traffic.