I'm exploring options for a serverless setup using AWS, specifically for GPU compute. Currently, I'm utilizing RunPod for serving AI models, but their serverless option hasn't been stable enough for production use. Since AWS doesn't provide native serverless GPU computing, I'm considering whether I could:
- Create a Lambda function that spins up an EC2 or Spot instance.
- Have that instance run a FastAPI server to handle inference requests.
- Automatically terminate the instance once I've received the response.
- Ensure this solution scales for multiple concurrent users on my app.
I'm planning to leverage Boto3 for this. Is this a feasible solution, or is there a better approach I should be considering?
4 Answers
I’ve had similar requests from clients wanting to replicate RunPod’s functionality on AWS. The biggest hurdles are GPU availability and the cold start issue with instances. When you need an EC2 instance, it might not be available right away, and Spot instances complicate things even more. I think a pub-sub architecture could be a better fit. Your front-end could push messages with data, and a worker can pick these up for processing. I recently tested EKS with HPA and Karpenter, which helps manage scaling for inference tasks efficiently, minimizing cold starts after the initial setup. You could also explore auto scaling groups, but I'd recommend testing how effective that is first.
But is that truly serverless? If you're managing servers, even if it's through spot instances, it strays from the idea of serverless computing. Serverless implies you're not handling virtual machines. Just something to think about, though it might not be a big deal.
Starting EC2 instances can take a while, which might frustrate your users if they're waiting too long for responses. You might want to consider that delay when designing your architecture.
Consider using SQS for your API server to send jobs instead of spinning up instances right away. You could use EventBridge to trigger jobs on ECS, which should handle GPUs as well. This way, you'd only utilize infrastructure when it's needed, keeping your costs down without the need for persistent servers. Good luck!
Related Questions
Set Wordpress Featured Image Using Javascript
How To Fix PHP Random Being The Same
Why no WebP Support with Wordpress
Replace Wordpress Cron With Linux Cron
Customize Yoast Canonical URL Programmatically
[Centos] Delete All Files And Folders That Contain a String