Programming

What’s the Best Database Structure for Fast Primary Key Lookups?

April 19, 2025

Asked By TechyTurtle42 On April 19, 2025

Hey everyone!

I'm building an application that generates batches of strings using an algorithm. I need to check if these strings exist in a huge dataset that has over 2 billion rows, but loading all of that into memory isn't practical. I'm expecting to generate batches of 100 to 5000 strings and possibly process several million checks per hour.

Right now, I'm considering using AWS for this. I thought about storing the dataset on RDS, using the primary key for existence checks based on those strings. There are a couple of other columns I'll need info from later, but primarily, I want to check if the strings exist.

Would something like DynamoDB be a better choice? Also, the app will run on ECS, and I'm a bit worried about I/O performance since streaming data is slow on local storage. It seems like AWS might have optimizations for "storage mounted" containers. My main priorities are cost (since RDS Aurora has those I/O fees) and performance. Any advice? Thanks!

5 Answers

Answered By HashItOut45 On April 20, 2025

Just curious about the application you’re working on. If those strings are extensive, maybe you’d consider storing only their hashes instead to save space and improve lookup speeds. Just a thought!

Answered By StringCheckerX On April 20, 2025

What kind of string checks are you planning? Is it just for exact matches or substrings too? And do you have speed expectations for how fast you need those checks?

Answered By CleverCoder89 On April 19, 2025

Depending on how often you check for certain strings, using ElastiCache might be beneficial. You can check that first, and if it’s not there, query the database and then store it in ElastiCache. It might help with the frequent lookups, although I doubt you could keep all 2 billion in memory.

Answered By DataDynamo88 On April 19, 2025

You might want to look into using Bloom filters to speed things up. Bloom filter indexes in Postgres can help you avoid having to scan the whole table to check for string existence—definitely a good strategy if you're dealing with such large data.

CheckingIn80s - April 20, 2025

Great advice on Bloom filters! Just a heads up, though: the Postgres bloom index isn’t super straightforward since it’s more per row rather than a single filter. Still worth checking out!

Answered By SpeedyQuery101 On April 19, 2025

If local storage is too slow, that’s definitely a concern. You could explore using embedded databases like RocksDB with Bloom filters enabled for rapid access. Basically, you’d load all your rows, compact them, and then put them somewhere like S3 for better access speed during application runs. If you can handle a few false positives, the overall process could be way smoother and faster.

EfficientEngineer - April 20, 2025

How do you ensure you don’t lose accuracy while using Bloom filters? Just curious!

What’s the Best Database Structure for Fast Primary Key Lookups?

5 Answers

Related Questions

How To: Running Codex CLI on Windows with Azure OpenAI

Set Wordpress Featured Image Using Javascript

How To Fix PHP Random Being The Same

Why no WebP Support with Wordpress

Replace Wordpress Cron With Linux Cron

Customize Yoast Canonical URL Programmatically

LEAVE A REPLY Cancel reply