Hey everyone! I'm looking for some advice on managing asynchronous workflows, especially when it comes to ensuring they complete properly. I've noticed issues where processes don't finish, or worse, don't trigger at all when you have multiple steps involved like API Gateway to Lambda to SQS and back to Lambda. It can be pretty frustrating when things break without a clear error message.
For example, I run an end-of-day workflow that failed due to a bug in a calculation determining the next steps. Because of this bug, it didn't send a message to the queue, leaving me unaware that the workflow had crashed. It wasn't until a few days later that I caught this.
I know we can dig through logs to find issues, but that's only helpful if you already know something went wrong. I'm curious if you all have any tools or methods for monitoring these async workflows to catch these kinds of problems early and track the actual versus expected flow. Any tips would be appreciated!
1 Answer
Have you considered adding AWS X-Ray tracing? It can help visualize the flow and spot where things might be breaking down in your async process. It’s pretty useful for tracking the performance and errors across different services!

Sorry, are you referring to X-Ray tracing specifically?