Best Database Solution for Large Data Sets with Fuzzy Search?

0
10
Asked By DataDynamo92 On

Hey everyone,

I'm looking for input on the best way to store and search through a massive amount of data. We've got about 300 million records in a flattened table structure, around 50 columns. Our main requirements are for fuzzy text search on some fields and we need to handle a high number of queries per second with response times between 200ms to 1s.

Initially, I thought about using RDS Aurora (MySQL, r6g.xlarge), but I'm concerned about how well it will handle such large data volumes, especially with massive index sizes and maintenance challenges.

DynamoDB seemed like a good choice, but it can't support the fuzzy search we need. Now I'm leaning towards OpenSearch in a serverless setup.

Has anyone tackled a similar situation? We don't expect the data to be updated very often, maybe just once a month.

4 Answers

Answered By LatencyMaster On

Considering your needs, OpenSearch probably is the best bet. With 300 million records and high query demands, it might provide better performance and cost-effectiveness compared to RDS, which could become costly and complicated to manage under heavy workloads. Definitely weigh the pros and cons, but OpenSearch sounds like the smart choice!

Answered By FuzzyFinder88 On

You're on the right track with OpenSearch! Given your criteria like mostly reads, infrequent updates, and the need for fuzzy text search, RDS may overcomplicate things. Managing large indexes in Aurora could lead to slow and unpredictable performance. OpenSearch can definitely meet your latency goals, especially if you design the index correctly and keep your search fields focused.

Answered By QueryKing007 On

OpenSearch is definitely well-suited for your needs. Given that you have a high read-to-write ratio, and with OpenSearch supporting fuzzy searches, it seems like a great fit. Using Aurora might be a headache due to the need for extensive full-text indexes and potential performance issues under load. I’d recommend going with OpenSearch serverless. You'll have some less control compared to traditional setups, but the performance and cost benefits could be worth it.

Answered By DataWizard42 On

Yeah, RDS is tough at that scale and can be prone to index bloat. Your instinct about OpenSearch is solid. It’s commonly used for this kind of workload, and since your data updates are minimal, reindexing won’t be a nightmare. Just keep an eye on how you set up your analyzers to avoid blowing up query times.

Related Questions

Keep Your Screen Awake Tool

Favicon Generator

JWT Token Decoder and Viewer

Ethernet Signal Loss Calculator

Remove Duplicate Items From List

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.