Hey everyone,
I'm looking for input on the best way to store and search through a massive amount of data. We've got about 300 million records in a flattened table structure, around 50 columns. Our main requirements are for fuzzy text search on some fields and we need to handle a high number of queries per second with response times between 200ms to 1s.
Initially, I thought about using RDS Aurora (MySQL, r6g.xlarge), but I'm concerned about how well it will handle such large data volumes, especially with massive index sizes and maintenance challenges.
DynamoDB seemed like a good choice, but it can't support the fuzzy search we need. Now I'm leaning towards OpenSearch in a serverless setup.
Has anyone tackled a similar situation? We don't expect the data to be updated very often, maybe just once a month.
4 Answers
Considering your needs, OpenSearch probably is the best bet. With 300 million records and high query demands, it might provide better performance and cost-effectiveness compared to RDS, which could become costly and complicated to manage under heavy workloads. Definitely weigh the pros and cons, but OpenSearch sounds like the smart choice!
You're on the right track with OpenSearch! Given your criteria like mostly reads, infrequent updates, and the need for fuzzy text search, RDS may overcomplicate things. Managing large indexes in Aurora could lead to slow and unpredictable performance. OpenSearch can definitely meet your latency goals, especially if you design the index correctly and keep your search fields focused.
OpenSearch is definitely well-suited for your needs. Given that you have a high read-to-write ratio, and with OpenSearch supporting fuzzy searches, it seems like a great fit. Using Aurora might be a headache due to the need for extensive full-text indexes and potential performance issues under load. I’d recommend going with OpenSearch serverless. You'll have some less control compared to traditional setups, but the performance and cost benefits could be worth it.
Yeah, RDS is tough at that scale and can be prone to index bloat. Your instinct about OpenSearch is solid. It’s commonly used for this kind of workload, and since your data updates are minimal, reindexing won’t be a nightmare. Just keep an eye on how you set up your analyzers to avoid blowing up query times.

Related Questions
Keep Your Screen Awake Tool
Favicon Generator
JWT Token Decoder and Viewer
Ethernet Signal Loss Calculator
Glassmorphism CSS Generator with Live Preview
Remove Duplicate Items From List