Programming

Is it Possible to Use EC2/Spot Instances with Lambda for a Serverless GPU Architecture?

May 2, 2025

Asked By TechieExplorer92 On May 2, 2025

I'm exploring options for a serverless setup using AWS, specifically for GPU compute. Currently, I'm utilizing RunPod for serving AI models, but their serverless option hasn't been stable enough for production use. Since AWS doesn't provide native serverless GPU computing, I'm considering whether I could:

- Create a Lambda function that spins up an EC2 or Spot instance.
- Have that instance run a FastAPI server to handle inference requests.
- Automatically terminate the instance once I've received the response.
- Ensure this solution scales for multiple concurrent users on my app.

I'm planning to leverage Boto3 for this. Is this a feasible solution, or is there a better approach I should be considering?

4 Answers

Answered By DevOpsNinja24 On May 4, 2025

I’ve had similar requests from clients wanting to replicate RunPod’s functionality on AWS. The biggest hurdles are GPU availability and the cold start issue with instances. When you need an EC2 instance, it might not be available right away, and Spot instances complicate things even more. I think a pub-sub architecture could be a better fit. Your front-end could push messages with data, and a worker can pick these up for processing. I recently tested EKS with HPA and Karpenter, which helps manage scaling for inference tasks efficiently, minimizing cold starts after the initial setup. You could also explore auto scaling groups, but I'd recommend testing how effective that is first.

Answered By QuestioningCoder On May 4, 2025

But is that truly serverless? If you're managing servers, even if it's through spot instances, it strays from the idea of serverless computing. Serverless implies you're not handling virtual machines. Just something to think about, though it might not be a big deal.

Answered By CloudGuru42 On May 3, 2025

Starting EC2 instances can take a while, which might frustrate your users if they're waiting too long for responses. You might want to consider that delay when designing your architecture.

Answered By ServerlessSeeker77 On May 3, 2025

Consider using SQS for your API server to send jobs instead of spinning up instances right away. You could use EventBridge to trigger jobs on ECS, which should handle GPUs as well. This way, you'd only utilize infrastructure when it's needed, keeping your costs down without the need for persistent servers. Good luck!

Is it Possible to Use EC2/Spot Instances with Lambda for a Serverless GPU Architecture?

4 Answers

Related Questions

Set Wordpress Featured Image Using Javascript

How To Fix PHP Random Being The Same

Why no WebP Support with Wordpress

Replace Wordpress Cron With Linux Cron

Customize Yoast Canonical URL Programmatically

[Centos] Delete All Files And Folders That Contain a String

LEAVE A REPLY Cancel reply