Cost-effective Data Handling with Elasticsearch and S3

0
16
Asked By CuriousCat2023 On

Hey folks! I'm currently facing a major challenge with our UI that pulls data directly from Elasticsearch, which is running up costs to about $110,000 per month! We have around 200TB of AWS storage allocated, but 130TB is already in use.

We've realized that we've been indexing way too many fields, most of which we don't actually need. So, to cut costs, we're planning to index only the essential fields for UI filtering, which we estimate will reduce our data size by about 90%.

The new approach is to keep complete JSON documents in S3. The plan is as follows:
- When a user applies filters, we fetch the necessary data from Elasticsearch.
- When they want to see the full dataset, we retrieve it from S3.

Currently, we handle around 700,000 calls to Elasticsearch each month. I'm curious, does this approach sound reasonable? Any insights would be really helpful!

3 Answers

Answered By TechGuru91 On

It sounds like a solid plan to limit your indexed fields to the essentials, especially considering your high costs! You might want to think about your cluster size and how the adjustments will change that makeup after your field reduction. Using S3 is a budget-friendly move, but just remember it won't have the same performance level for queries. Is there any specific part of your data set that's more important for searches, or do you need to search through everything every time?

Answered By CloudWhiz77 On

If you're on AWS’s managed OpenSearch, check out the remote store feature. It allows primary data to remain on disk while keeping a copy on S3. It could be a great hybrid approach for you! Here's a detailed link: [AWS OpenSearch Remote Store Feature](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/or1.html).

Answered By DataDude88 On

With your setup, if you're mainly doing basic exact-match filtering in the UI, you might want to consider if you really need Elasticsearch's full-text search. If your read throughput is low now, maybe a database would be better for optimizing costs and only paying for storage. Is your write throughput also on the lower side?

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.