Hey everyone! I'm pretty new to AWS and have been diving into Fargate, which seems like a great option since it manages instances for me. I'm looking to host around 20-25 large language models (LLMs) for a web app where users can pick a model to be their personal assistant.
I'm curious if using Fargate is a good choice for this purpose. Also, I'm trying to figure out how to estimate the costs involved in this architecture. I looked at the AWS pricing calculator, but got stuck on some terms like 'pod' and 'tasks.' Can anyone shed light on what those mean, especially in the context of my project? Feel free to ask if you need more details!
3 Answers
Keep in mind that many LLMs need GPUs to perform efficiently, and as of now, Fargate doesn't support GPU instances. If you want optimal performance, hosting these models on GPU instances, which can be quite pricey on AWS, might be better. Just something to consider while planning your setup!
You might also want to check out AWS Bedrock. It offers LLMs as a service and may provide better performance and cost-effectiveness compared to putting everything in a Fargate container. Just remember, using Bedrock means you'll be limited to the models they provide.
I think you're right to question Fargate's suitability. If you really need GPU access, that's a dealbreaker since Fargate doesn’t support it at this time. The Bedrock service could potentially meet your needs better, albeit with some model limitations.
Related Questions
Cloudflare Origin SSL Certificate Setup Guide
How To Effectively Monetize A Site With Ads