Looking for Advice on Migrating a Large Elasticsearch Cluster with 400 Million Documents

0
3
Asked By TechyNinja42 On

Hey everyone! I'm in the process of migrating a huge Elasticsearch cluster and could really use your insights. After taking a closer look at the data, it turns out I have about **400 million documents** spread across multiple indices, which is way more than I initially estimated.

Currently, I have everything set up on a single node running Elasticsearch **8.16.6** with about **200 shards**, and managing this has been quite a hassle. To improve things, I've set up a new cluster consisting of **3 nodes** (1 master and 2 data nodes). I'm thinking about switching to a setup with **3 master/data nodes** instead of having one dedicated master. Given the volume of data and anticipated growth, what do you think is best?

Here's my migration plan:
1. Take a snapshot from the old cluster.
2. Restore it on a temporary machine.
3. From that temporary setup, reindex into the new cluster with a new index design and adjusted shard count and replicas.

Does this strategy sound reasonable? Also, I'd love advice on:
- How to determine the ideal number of shards and replicas for such a large dataset.
- How to monitor a long migration like this effectively.
- Recommendations for making the cluster easier to scale in the future.

Thanks a lot for any tips you can share!

2 Answers

Answered By DataGuru88 On

It sounds like you're tackling a pretty sizable challenge! For your migration, I'd suggest skipping the notion of 'escaping old sharding decisions' since Elasticsearch will handle shard allocation on its own once your cluster is up and running correctly. It's crucial to focus on ensuring your nodes have adequate CPU and RAM for what you're dealing with. If you're aiming for performance and stability, I’d recommend adding two more nodes to your current setup and letting Elasticsearch redistribute shards instead of locking into an overly complicated system.

On the sharding front, you might want to consider using around 3 to 5 shards per index—avoid going too high as it can lead to overhead. For replicas, having 1 replica of your data is generally sufficient unless there are specific needs to deviate from that.

Of course, for monitoring a lengthy reindex, having a custom tool can really help you keep an eye on things! Overall, I think simplifying roles and just using your nodes as both masters and data carriers will be more efficient.

CuriousDeveloper -

Thanks for your input! I appreciate the clarity on shards and replicas. So just to confirm, it makes sense to set the shards based on the index size and not stress too much about the exact number upfront? And your point about using custom tools for monitoring is super helpful. Any specifics you’d recommend for those tools?

Answered By CloudTraveler99 On

I actually went through a similar migration a while back! Here's the rundown:
1. I took a snapshot of the old cluster using S3.
2. I then restored the snapshot in the new cluster while ensuring everything had read-only permissions.
3. I did another snapshot with full access after everything was verified.

Just keep in mind that if you directly restore into the new cluster, you might still have the same sharding issues, which you probably want to avoid! Consider your setup carefully to prevent that.

TechyNinja42 -

That’s great to hear you had success! But I worry that if I restore to the new cluster, I could end up with the same old bad design, right? What’s your take on that?

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.