What’s the Best Way to Run Long-Running Parallel Jobs on AWS?

0
21
Asked By CuriousCat99 On

I'm working on a Monte Carlo simulation script that takes about an hour to run and produces slightly different results each time. I want to run this script roughly 100 times in parallel on a powerful AWS instance. I've checked out AWS Batch and SageMaker, but I'm unsure how to set everything up for this task. What's the easiest way to run these jobs in parallel?

7 Answers

Answered By MadMaxCoder On

Does your script use up an entire instance? Also, do you need specific inputs for each job? If you can confirm they run independently, you might want to try the EC2 RunInstances API with user data on a self-made AMI. Not the prettiest solution, but it's simple and gets the job done.

Answered By BatchBro33 On

You're right; Batch is a good choice for this, although it's not very intuitive. You'll need to set up a cluster and containerize your job, but it gives you a framework for managing tasks effectively, even if it does involve a bit of setup.

Answered By DataDynamo76 On

Check out Coiled or Dask; they work really well with AWS and offer an easier interface for running parallel jobs. You can find more info in their documentation to get started.

Answered By ScriptRunner89 On

The most straightforward method is to use AWS Batch since it fits your long-running tasks. You just need to start jobs and pass parameters to each instance you create. It's pretty effective for this type of workload.

Answered By ComputeWhiz101 On

The best solution depends on your script's complexity and resource demands. It sounds like putting your script into a Docker container could unlock more options for running it across various environments. AWS SageMaker provides a range of distributed compute options, but if you don't need all that, you might find it tedious to set up.

For Docker solutions, AWS Batch is solid for scheduling long-running tasks, though it could be overkill for your needs. Alternatively, using ECS to run tasks might be ideal if you're containerized.

Answered By QuickScaleNinja On

Using AWS Glue could also be beneficial; it can drastically reduce your simulation time if you're familiar with scaling properly.

Answered By TechSavvyGuru28 On

A classic approach is to use SQS to queue up your jobs and set up EC2 instances in an Auto Scaling Group (ASG) to handle the workload. If your jobs can be re-triggered on failure, consider using spot instances to save costs. Since this is a one-time need, you may need to adapt the method a little or utilize Lambda for better control over how many instances you scale up.

HelpfulHannah42 -

I was hoping for something a bit more hands-off, but I'll give this a try. Cheers!

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.