I'm looking to use AWS GPU instances for training and inference on some of my custom models, but I'm struggling to find the right instance type that fits my needs. Ideally, I want flexibility similar to what I see with platforms like RunPod, where I can choose from various GPUs (between 1 and 10) and have control over CPU, storage, and RAM in the configuration. Unfortunately, AWS seems limited in this regard. For example, I'm required to use an 8 T4 GPU instance with the gp4dn.metal type, which comes with 96 vCPUs that I really don't need; I'm just focused on getting the GPUs and their VRAM. I've also hit my service quota and requested an increase. I'd be willing to pay a bit more than with RunPod if I could get similar flexibility. Why does AWS (and even GCP) lack this kind of configuration option? What are my best choices to utilize GPUs effectively on AWS? Right now, I need between 1-5 GPUs in parallel with VRAM between 15 to 80GB, with higher numbers being rare cases.
2 Answers
AWS tends to design their instance families around predictable performance and reliability, rather than offering customizable options like some other platforms. The fixed CPU to GPU ratios are designed this way because many AWS users require reliable performance and capacity planning. If your workload isn't CPU-heavy, it may feel like a waste, but it helps AWS streamline their services.
For your scenario, consider using g5 instances for smaller GPU workloads or alternatively, you could run multiple smaller instances instead of one large instance. For training tasks, some developers separate preprocessing on distinct CPU instances to maximize efficiency. If you're looking for flexibility like RunPod, you might need to change your approach because AWS prioritizes reliability over flexible configurations.
I totally agree about the fixed ratios being frustrating. Have you looked into other services that allow more GPU options in one instance, like A40 or A100? They seem to have better flexibility.
On AWS, GPU instances are fixed SKUs, meaning you can't just rent GPUs independently; they come bundled with certain CPU ratios. If you're looking for efficiency, consider running several smaller instances with 1 GPU each rather than one big instance. For example, using five g5.xlarge instances can lead to better resource utilization without wasting CPU capacity. Also, you might want to check out AWS ParallelCluster, which helps manage HPC workloads effectively.
Thanks for the insight! However, I still find the g5 instances limiting. Is there an option to have a mix of GPUs (like A40, A100, RTX 5090) on a single instance? That would be ideal for my needs.

I get that! Especially since other platforms have more diverse options. Can you clarify what you mean by users pushing data hard through GPUs? And regarding flexibility, I'd be interested in platforms aside from RunPod that you think might offer similar configurations.