Hey everyone,
I'm looking for a more efficient way to handle terminated AWS spot instance jobs in my CI pipeline. Currently, I use a script that re-runs these jobs every 15 minutes through the GitHub API, but it's not working well as it's still missing some of the terminated workflows. I came across an older post discussing this issue and thought I'd see if any of you have found better solutions in the meantime. Any suggestions would be greatly appreciated! Thanks!
4 Answers
You might consider setting up workflows that trigger on completion. Like using:
```yaml
on:
workflow_run:
workflows: ["Main Workflow"]
types:
- completed
```
This way, you can check if the workflow succeeded or was terminated and rerun it without the need for constant polling every 15 minutes. Maybe look into whether you can use a catch-all for workflows instead of specifying each one individually. I haven’t tried it myself, but I’d love to hear if it works for you!
If you’re using Buildkite, they have built-in automated retries for steps which can handle spot instances much more smoothly. Might be worth checking out!
Good question! First off, why are you needing to re-run terminated workflows automatically? Just curious if there's a specific reason for that.
For any workload running on spot instances, you really should design it to be easily restartable. That might solve some of your issues.
I'm asking because I have hundreds of tests running daily, and manually re-running each isn't feasible.