What’s the Best Way to Set Up Kubernetes for Disaster Recovery?

0
33
Asked By RobustPenguin88 On

Hi everyone! I'm diving into disaster recovery for my Kubernetes setup and facing some network issues with my local provider. I'm considering two approaches: should I go for two separate clusters in different availability zones (AZs) or have a single cluster with master nodes distributed across AZs? I'm leaning towards the two-cluster option to avoid etcd quorum issues, but I'm concerned about the challenge of keeping resources synchronized and managing databases, Vault, and Harbor effectively. Any advice?

5 Answers

Answered By ConnectingChameleon55 On

I have a setup with 7 master nodes: three in one AZ, three in another, and one in an EC2 instance that doesn't run workloads. This way, as long as you don’t have two AZs dropping at once, you shouldn't hit any etcd quorum issues.

Answered By EasyGoingCactus42 On

A two-cluster setup sounds safer to me. Stretching etcd over unreliable connections is way too complicated. You could use GitOps tools like Argo or Flux to keep both clusters in sync, and for managing containers, replicate Harbor. For databases, rather than going active-active, you might want to set up asynchronous replicas or have a backup and restore plan depending on your RPO and RTO. Velero is excellent for backing up clusters too. Just handle failover at the DNS or load balancer level, keep it straightforward, and make sure to test your cutover regularly!

Answered By InquisitiveLion12 On
Answered By PonderingPanda99 On
Answered By CuriousTurtle77 On

Distributing your control plane across multiple AZs works if those zones have good bandwidth and low latency, plus an odd number of masters for quorum. This setup's pretty common with various cloud providers. Just avoid spreading your control plane across different regions. Keep in mind that the workloads you're managing and their storage needs will impact your decision. Synchronizing between two clusters can be tricky, but some applications, like Harbor, can do some heavy lifting here. You might also consider a global load balancer for easier failover.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.