Hey everyone,
I'm using a script that runs every 15 minutes to attempt to re-run terminated jobs via the Github API, but it's not the most efficient solution and I sometimes still miss out on some terminated workflows. I came across an old post from a few years back about this topic, and I'm curious if anyone has found a better way to handle rerunning jobs now. Any fresh ideas or solutions would be greatly appreciated!
3 Answers
Have you considered using events to trigger reruns instead of polling every 15 minutes? You could set up something like this in your workflow configuration:
```yaml
on:
workflow_run:
workflows: ["Main Workflow"]
types:
- completed
```
This way, you can check if the previous run was successful or terminated and then rerun accordingly. I believe it would be more efficient to work off of events rather than constant polling! Let me know if you give it a shot and how it goes!
That's an interesting question! First off, it might be useful to consider why you're having to rerun the workflows automatically. If there are high volumes of tests each day, it's understandable you can't do it manually. You could look into making sure all workloads set to run on spot instances are restartable. That way, if they terminate, it won’t be such a hassle to get them running again.
I've had a great experience with Buildkite, which has automated retries for steps that can deal with spot instance disruptions smoothly. It might be worth checking out if that aligns with your workflow setup!
Exactly, that's why I'm hoping to find a better approach!