I'm gearing up to migrate a large Elasticsearch cluster that's currently handling over 100 million documents in production. The old setup is a single-node cluster with around 200 shards, and I'm moving to a new setup with three nodes to improve performance and reliability. I'm particularly keen on any real-world DevOps lessons that can help me avoid mistakes, especially concerning monitoring and observability since the migration might take hours and I want to ensure no data loss. My high-level plan involves using snapshot and restore to minimize impact, reindexing in the new cluster, and utilizing a dual-write approach for sanity. I'm looking for insights on operational risks I might be underestimating, how to monitor progress effectively, and any essential signals or tools you've found invaluable during such migrations. If you've gone through something similar, what would you do differently?
4 Answers
When migrating, ensuring you have cluster health as your primary signal is key. You can set up a reverse proxy, add your old node to the new cluster, and once the migration is underway, make sure you monitor shard distribution closely to minimize any pressure on the old cluster. If you're operating under heavy loads, this can get tricky.
Keep in mind that version compatibility can throw a wrench into the works. Migrations can fail due to network issues or app incompatibility with the new versions. Ensure you’re ready for potential snags by checking those aspects!
Before any migration, always do a complete backup. This gives you peace of mind should things go sideways. I’d also recommend stopping activities on your cluster to prevent changes during the migration. The process can be smoother if you start with a full VM backup, just to ensure you don't lose anything.
If you’re just scaling up to a 3-node cluster, it's pretty straightforward. Deploy your new cluster first, and set up a new ingester to push data to both clusters. Once that's running smooth, you can gradually migrate old data over to the new setup. Just keep an eye on the load to avoid stressing your old node too much during the process.

Related Questions
Can't Load PhpMyadmin On After Server Update
Redirect www to non-www in Apache Conf
How To Check If Your SSL Cert Is SHA 1
Windows TrackPad Gestures