System Operations

How Can I Run Stable GPU Inference on AWS Without Spot Interruptions?

August 1, 2025

Asked By CoolGalaxy123 On August 1, 2025

I'm working with a small team on large language model inference using AWS, but we're struggling with frequent spot instance interruptions. We've tried various solutions like capacity-optimized ASGs and fallback strategies, as well as checkpointing, but these still fail when low latency is crucial. Reserved Instances don't provide the flexibility we need, and on-demand pricing is tough on our budget. Is there a reliable way to use AWS that allows us to keep our workloads stable while also being mindful of costs?

6 Answers

Answered By DevGuru_22 On August 5, 2025

Could you explain how your current setup manages eviction notices? It sounds like you'd need a system that can migrate your tasks before those notices hit. You might explore AWS FIS to simulate evictions and check out Ray for Kubernetes (KubeRay). That framework has features like fault-tolerant task APIs, checkpointing for long-running tasks, and the ability to save progress when your setup gets a termination signal. You can also tweak things like `terminationGracePeriodSeconds` to give your pods some time to handle shutdown.

Answered By DataDrivenSteve On August 4, 2025

Here's my take: If you can't tolerate interruptions at all, then relying on spot instances may not be the best choice. They're not a one-size-fits-all solution.

Answered By CloudNinja99 On August 3, 2025

Are you trying to reserve just one type of instance or multiple? If you request a variety of instances that can do the job, it could improve your chances. Just keep in mind with the AI boom, finding affordable GPU options is a real challenge!

Answered By CodeMasterX On August 2, 2025

If you're using SQS events, they should be able to manage interruptions before you get fully impacted. That's one strategy to check out!

Answered By SysAdminJoe On August 1, 2025

Why not consider adding Kubernetes to your approach? That five-minute warning you get is usually sufficient to scale up and migrate workloads to a new node.

Answered By TechWhiz87 On August 1, 2025

Honestly, I think there might not be an ideal solution. We've transitioned most of our GPU tasks to Lambda Labs since it feels like cloud prices are just skyrocketing for GPUs lately.

How Can I Run Stable GPU Inference on AWS Without Spot Interruptions?

6 Answers

Related Questions

Can't Load PhpMyadmin On After Server Update

Redirect www to non-www in Apache Conf

How To Check If Your SSL Cert Is SHA 1

Windows TrackPad Gestures

LEAVE A REPLY Cancel reply