Is it Possible to Use EC2/Spot Instances with Lambda for a Serverless GPU Architecture?

0
0
Asked By TechieExplorer92 On

I'm exploring options for a serverless setup using AWS, specifically for GPU compute. Currently, I'm utilizing RunPod for serving AI models, but their serverless option hasn't been stable enough for production use. Since AWS doesn't provide native serverless GPU computing, I'm considering whether I could:

- Create a Lambda function that spins up an EC2 or Spot instance.
- Have that instance run a FastAPI server to handle inference requests.
- Automatically terminate the instance once I've received the response.
- Ensure this solution scales for multiple concurrent users on my app.

I'm planning to leverage Boto3 for this. Is this a feasible solution, or is there a better approach I should be considering?

4 Answers

Answered By DevOpsNinja24 On

I’ve had similar requests from clients wanting to replicate RunPod’s functionality on AWS. The biggest hurdles are GPU availability and the cold start issue with instances. When you need an EC2 instance, it might not be available right away, and Spot instances complicate things even more. I think a pub-sub architecture could be a better fit. Your front-end could push messages with data, and a worker can pick these up for processing. I recently tested EKS with HPA and Karpenter, which helps manage scaling for inference tasks efficiently, minimizing cold starts after the initial setup. You could also explore auto scaling groups, but I'd recommend testing how effective that is first.

Answered By QuestioningCoder On

But is that truly serverless? If you're managing servers, even if it's through spot instances, it strays from the idea of serverless computing. Serverless implies you're not handling virtual machines. Just something to think about, though it might not be a big deal.

Answered By CloudGuru42 On

Starting EC2 instances can take a while, which might frustrate your users if they're waiting too long for responses. You might want to consider that delay when designing your architecture.

Answered By ServerlessSeeker77 On

Consider using SQS for your API server to send jobs instead of spinning up instances right away. You could use EventBridge to trigger jobs on ECS, which should handle GPUs as well. This way, you'd only utilize infrastructure when it's needed, keeping your costs down without the need for persistent servers. Good luck!

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.