I'm considering an upgrade for my EKS cluster from version 1.28 to 1.29 and need some advice. My setup includes a ClickHouse cluster managed by the Altinity Operator, which consists of 3 shards with 2 replicas each. Each ClickHouse pod is running on i4i.2xlarge instances that have locally attached NVMe storage, not EBS volumes. My main worry is that during node upgrades, which involve replacing the EC2 instances, the local storage will be wiped and I could lose data. I've added an extra replica per shard as a precaution, but I'm unclear if this replication is sufficient since it operates per table, not per node. I'm looking for insights on whether my setup could lead to data loss when upgrading the nodes and the best practices for performing these upgrades safely, especially concerning the Altinity Operator's capabilities. Any experiences or best practices would be greatly appreciated!
3 Answers
You better double-check whether you're using local instance storage instead of EBS. If you are indeed on NVMe storage, then the storage will disappear once the nodes are recreated during the upgrade. Adding extra replicas helps, but make sure they're on different nodes with persistent storage. It sounds like you need to prepare for some manual work here unless you've got a solid backup process. Do you have the specifics of your storage configuration handy?
Testing your setup ahead of time seems crucial. If you haven't done that yet, it might be worth simulating the upgrade in a test environment first. It could save you a lot of headaches later!
Without persistent storage, your only option will be to rely on replicas on different nodes to prevent data loss. For your upgrade, I recommend draining and replacing nodes individually while keeping an eye on the replicas. It sounds tedious, but that's the safest route given your current setup.

Related Questions
Can't Load PhpMyadmin On After Server Update
Redirect www to non-www in Apache Conf
How To Check If Your SSL Cert Is SHA 1
Windows TrackPad Gestures