Hey everyone, I'm currently using a script that triggers every 15 minutes to re-run jobs that have been terminated, but I'm finding it doesn't catch all the terminated workflows. I came across an old post discussing AWS spot instances in CI jobs and I'm curious if there are any newer, better solutions available. I'd appreciate any insights or advice! Thanks!
4 Answers
As a general rule, any workload on spot instances should ideally be designed to be restartable. This can save you a lot of hassle in the long run.
Before diving into solutions, I'm curious why you're re-running those terminated workflows automatically? Is it a necessity due to the large number of tests you’re running?
If you're using Buildkite, they have built-in automated retries for steps that help manage spot instance failures effortlessly.
Have you considered setting up your workflow like this?
```yaml
on:
workflow_run:
workflows: ["Main Workflow"]
types:
- completed
```
This way, you can check if the workflow finished successfully or was terminated and rerun it. It might be more efficient than polling every 15 minutes. You could also explore using `workflows:[all]` instead of listing them individually, saving you some time. Let me know if you give this a shot!
That's exactly why I'm asking for advice! I have hundreds of tests running daily and can't afford to manually re-run each one.