I'm facing some challenges with a backend issue in a web application where my AI-driven workflows take longer than a single request and involve multiple steps. The workflow starts with a web request that triggers a background task, which may call external services and perform various actions. However, problems arise during these processes.
Here are the issues I've encountered:
- Execution state can be lost between steps if the process restarts.
- Making retries safe is a challenge, leading to potential duplications of side effects.
- Pausing a workflow and resuming it later without restarting the entire chain is complicated.
- While logs are useful, piecing together what happened during retries and across steps is still quite painful.
I've also tried some solutions:
- Using a queue with workers while saving partial state in a database.
- Implementing idempotency keys for certain operations.
- Breaking the workflows into smaller tasks, although this adds to orchestration complexity.
My main goal is to find a reliable method to run and monitor long-running, stateful workflows within a typical web backend, without having to build an entire distributed systems framework.
I'm particularly stuck on a few points:
- What's the best way to model execution state in this scenario?
- Are state machines or workflow engines worth the added complexity?
- How do you practically handle pause and resume functionality?
I'm looking for any real-world patterns or approaches that have been successful in production.
1 Answer
You might want to check out an interesting article on this topic. It discusses concepts relevant to actor systems that could be beneficial for your workflow. The connection between BEAM-style supervision and your issues with concurrency and durability is definitely something to explore further.

Thanks for sharing! That article helped clarify how to approach concurrency, but I'm more concerned about durability and keeping an overview over long periods, especially with retries and pauses. What aspects do you find link well to practical workflows?