Best Solutions for Storing and Searching Large Data Sets

0
12
Asked By DataDynamo99 On

Hey everyone! I'm looking for recommendations on how to store and efficiently search across a vast amount of data. We've got a flattened table structure with around 300 million records across nearly 50 columns. Our main requirement is to perform fuzzy text search on specific fields while maintaining a high number of queries per second (QPS). We're aiming for response times that align with synchronous API calls, ideally between 200ms and 1s. Initially, we considered loading the data into RDS Aurora (MySQL, r6g.xlarge), but I'm unsure how well it would handle such a massive volume, especially with index maintenance. I also thought about DynamoDB, but it seems that the fuzzy search requirement makes that impractical. I'm now leaning towards OpenSearch serverless as a potential solution. Has anyone dealt with a similar scenario? We anticipate updates to this table to be infrequent, maybe once a month at most.

4 Answers

Answered By CloudNavigator88 On

OpenSearch seems perfect for your situation! You have high read/query needs, infrequent updates, and it handles fuzzy text search beautifully. While Aurora could theoretically work, you'd likely face issues with massive full-text indexes and performance slowdowns, especially when handling lots of simultaneous fuzzy searches. OpenSearch serverless really feels like the way to go, but keep an eye on costs and performance estimations to avoid surprises later on.

Answered By TechGuru33 On

Your instincts are spot on! While RDS Aurora might look feasible, the overhead with indexing at that scale typically outweighs its benefits, especially for fuzzy search capabilities. OpenSearch tends to be the go-to for setups with major read traffic and infrequent bulk updates. Just remember to carefully model your index and set up your analyzers to keep query latency in check!

Answered By InfoSeeker72 On

You're definitely on the right track thinking about OpenSearch. With that many records, managing indices in RDS could become a nightmare. Aurora may not provide the performance or ease of maintenance you're looking for. Plus, DynamoDB's limitations with fuzzy search make it a no-go. OpenSearch could give you the performance you need while keeping your operational burden manageable.

Answered By DataWhisperer21 On

Considering the scale you're dealing with, managed OpenSearch could really enhance your performance and save costs compared to RDS, which tends to struggle under heavy loads. The only downside is you might have a slightly higher operational workload, but it should pay off in the long run.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.