Hey everyone! I'm looking for recommendations on how to store and efficiently search across a vast amount of data. We've got a flattened table structure with around 300 million records across nearly 50 columns. Our main requirement is to perform fuzzy text search on specific fields while maintaining a high number of queries per second (QPS). We're aiming for response times that align with synchronous API calls, ideally between 200ms and 1s. Initially, we considered loading the data into RDS Aurora (MySQL, r6g.xlarge), but I'm unsure how well it would handle such a massive volume, especially with index maintenance. I also thought about DynamoDB, but it seems that the fuzzy search requirement makes that impractical. I'm now leaning towards OpenSearch serverless as a potential solution. Has anyone dealt with a similar scenario? We anticipate updates to this table to be infrequent, maybe once a month at most.
4 Answers
OpenSearch seems perfect for your situation! You have high read/query needs, infrequent updates, and it handles fuzzy text search beautifully. While Aurora could theoretically work, you'd likely face issues with massive full-text indexes and performance slowdowns, especially when handling lots of simultaneous fuzzy searches. OpenSearch serverless really feels like the way to go, but keep an eye on costs and performance estimations to avoid surprises later on.
Your instincts are spot on! While RDS Aurora might look feasible, the overhead with indexing at that scale typically outweighs its benefits, especially for fuzzy search capabilities. OpenSearch tends to be the go-to for setups with major read traffic and infrequent bulk updates. Just remember to carefully model your index and set up your analyzers to keep query latency in check!
You're definitely on the right track thinking about OpenSearch. With that many records, managing indices in RDS could become a nightmare. Aurora may not provide the performance or ease of maintenance you're looking for. Plus, DynamoDB's limitations with fuzzy search make it a no-go. OpenSearch could give you the performance you need while keeping your operational burden manageable.
Considering the scale you're dealing with, managed OpenSearch could really enhance your performance and save costs compared to RDS, which tends to struggle under heavy loads. The only downside is you might have a slightly higher operational workload, but it should pay off in the long run.

Related Questions
How to Build a Custom GPT Journalist That Posts Directly to WordPress
Cloudflare Origin SSL Certificate Setup Guide
How To Effectively Monetize A Site With Ads