How to Use Local SSDs with EC2 for Batch Processing?

0
6
Asked By TechNoWizard42 On

I'm looking for a way to utilize my local SSDs with an EC2 instance for training models. I have around 200TB of data, but I only need to access roughly 1GB at a time for batch processing. My goal is to keep the bulk of this data on my local drives to avoid the steep costs and privacy concerns associated with AWS storage solutions. I'm aware that there will be some latency when loading each batch from my local SSD to EC2, but that's acceptable for my needs. Can anyone advise if this setup is possible or suggest alternative methods to manage this without relying heavily on S3?

5 Answers

Answered By CloudSeeker74 On

Are you planning to run your EC2 instance with just 1GB of data at a time, processing it, and then stopping the instance? If that’s the case, you might need to upload your data in 1GB chunks to S3 every time, which sounds tedious.

TechNoWizard42 -

To clarify, I’m going to train a model on 1GB batches sequentially until I get through all 200TB. I need a way to handle the data without drawing so much from S3 storage.

Answered By ComputeCrafter101 On

For large datasets, it's way better to keep your data close to your compute resources. Since you only need 1GB at a time, using an EC2 instance with NVME storage is a good plan. This way, you can transfer data right to the instance before running your computations.

TechNoWizard42 -

I’d love to avoid S3, and it feels like using ephemeral storage might be the answer. I want to process small batches efficiently without maintaining all the data in the cloud.

Answered By NetNinjaX On

If you have NAS, you can connect it to your VPC using a VPN. That could help in keeping your drives accessible without unnecessarily pushing everything to AWS.

Answered By DataDrivenDude58 On

You can't really use a local SSD directly with EC2 for storage. The best approach is to choose the specific data you want to upload to AWS for processing. Trying to access your drives over a VPN will just slow everything down.

TechNoWizard42 -

That was my suspicion. Just hoping for a workaround that might exist.

Answered By SecuritySavvy87 On

Consider setting up EBS encryption for better privacy. You can upload small chunks, and while one is being processed, upload another. It's a more manageable approach!

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.