What’s the Best Way to Transfer a Large Database to an Instance’s Local SSDs?

0
7
Asked By DataNinja42 On

Hey everyone! I'm currently using a p4de.24xlarge instance to run a deep learning pipeline with AlphaFold3, which processes a lot of protein data. This instance has eight local SSDs and I'm planning to store a large sequence database (about 700 GB) on these SSDs. Since I'm running eight inference jobs simultaneously, each using a single GPU, I'm concerned that reading from one SSD could slow things down. So, I'm considering copying the database to each SSD. Is that a good approach, or is there a better AWS solution that offers fast read speeds right from boot that can handle the load effectively?

3 Answers

Answered By TechGuru99 On

You might want to ensure your code can accurately target the appropriate SSD path for each job. Also, consider checking out AWS FSx for Lustre if you're looking for shared, high-speed storage that can be booted with your instance. That could improve performance significantly.

Answered By CloudMaster88 On

Your instance offers 4x 100 Gibps Network Performance, so a straightforward solution would be to use AWS CLI with CRT enabled for downloading the database from a nearby S3 bucket at the start of each job. It keeps things simple!

Answered By DataSmith88 On

Here are a few things to think about:
1. What's your required read speed?
2. You might want to consider:
- AWS EFS or FSx for sharing drives across EC2 instances. EFS One Zone Storage is fairly inexpensive—around $30 for 700GB—and allows multiple sources to read from it.
- Alternatively, maybe just use S3 to transfer it directly to each local SSD connected to your instance?
Keep in mind that costs can vary; EFS/FSx could get pricey compared to having one SSD per instance.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.