I'm currently dealing with latency issues in our AWS GPU cloud setup for a latency-sensitive operation that demands heavy GPU compute. My AWS Enterprise package rep suggested that moving to bare metal servers could help with control and reduce latency. I'm curious about some potential adjustments we can make in our existing AWS setup. What optimizations or AWS-native tweaks, like placement groups or enhanced networking, can genuinely help with low-latency GPU workloads? Additionally, what are the pros and cons of moving to bare metal for this kind of work? Lastly, are there any hybrid solutions that combine AWS and bare metal worth considering?
2 Answers
Transitioning from AWS to bare metal does come with significant differences. On the one hand, bare metal offers stable performance since you can tweak hardware settings. But it means less flexibility when you need to scale up. Many companies opt for a hybrid approach, keeping latency-sensitive tasks on bare metal while using AWS for less critical operations or overflow. It might balance performance and scalability well for your needs!
Before making any drastic changes, have you profiled your system? It’s crucial to identify which component is causing the latency. AWS GPU instances can indeed be tricky when you’re looking for consistent performance. You might want to try using newer instance types like p5 or g6e and consider running them as bare metal variants. This could help reduce the hypervisor noise and might give you better results. Also, activating EFA and clustering your nodes can help lower interconnect latency. It might be frustrating, but it’s worth experimenting with process pinning and keeping your data close, using NVMe or FSx instead of S3 or EBS.
That makes sense! I’ll definitely look into EFA and process pinning.

Exactly! A hybrid solution sounds like a smart compromise.