How can I effectively re-run terminated AWS spot instance jobs in CI?

0
6
Asked By SparkyPineapple42 On

Hey everyone,

I'm looking for a more efficient way to handle terminated AWS spot instance jobs in my CI pipeline. Currently, I use a script that re-runs these jobs every 15 minutes through the GitHub API, but it's not working well as it's still missing some of the terminated workflows. I came across an older post discussing this issue and thought I'd see if any of you have found better solutions in the meantime. Any suggestions would be greatly appreciated! Thanks!

4 Answers

Answered By WorkflowWhiz On

You might consider setting up workflows that trigger on completion. Like using:

```yaml
on:
workflow_run:
workflows: ["Main Workflow"]
types:
- completed
```

This way, you can check if the workflow succeeded or was terminated and rerun it without the need for constant polling every 15 minutes. Maybe look into whether you can use a catch-all for workflows instead of specifying each one individually. I haven’t tried it myself, but I’d love to hear if it works for you!

Answered By DevOpsMasterX On

If you’re using Buildkite, they have built-in automated retries for steps which can handle spot instances much more smoothly. Might be worth checking out!

Answered By CuriousDev99 On

Good question! First off, why are you needing to re-run terminated workflows automatically? Just curious if there's a specific reason for that.

SparkyPineapple42 -

I'm asking because I have hundreds of tests running daily, and manually re-running each isn't feasible.

Answered By TechGeek123 On

For any workload running on spot instances, you really should design it to be easily restartable. That might solve some of your issues.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.