I've been wrestling with about 80 end-to-end tests in our pipeline, and around 15 of them are unreliable. They tend to pass when I run them locally and in CI most of the time, but about one in ten runs, they fail for various timing issues or quirks in how our test environment loads. This flaky behavior has created a lack of trust in our CI results. When the build fails, people tend to just rerun the tests instead of actually investigating the failure. I've tried all the usual fixes—like extending wait times and implementing retry logic—but these measures only help to a certain extent. The real issue isn't just these flaky tests; it's the fact that nobody seems to have time to rewrite them to be more reliable, especially since we're a small team, and rewriting tests doesn't contribute directly to shipping features. I'm currently using Playwright and had a brief look at another tool called Spur, which seems more stable, but nothing concrete yet. I'm looking for recommendations on tools or practices that have worked for other teams in similar situations. What has helped you tackle flaky tests?
4 Answers
Playwright is really solid if you write your tests the right way. You've already pointed out the issue with your flaky tests—it sounds like they need to be restructured for better stability. If management isn't giving your team the time to fix these tests, it doesn't seem like testing is prioritized. So, you might need to either allocate time to fix them or consider whether they can be eliminated altogether. Playwright's got a good reputation for a reason, though it’s only as good as the tests you build with it.
As a temporary fix, you might want to look into automatically rerunning tests that fail. You could flag tests that pass after a retry as 'warn' instead of success. This won't solve the flaky test issue completely, but it could give you better data on what's going on and make the CI results a bit more reliable in the interim.
Dealing with flaky tests can be frustrating, but remember that constantly failing CI doesn't help either. It might actually slow down your feature releases more than solving the flare-ups would. Try identifying the root causes of these failures and track how often they occur. If you can quantify the hours wasted due to this, presenting that data to management might help make the case for investing time to fix the underlying issues. In our case, we managed to eliminate flaky Selenium tests completely, thanks to a better testing framework that handles retries and race conditions much more reliably.
Have you thought about using Playwright's built-in retry mechanism? It might help you stabilize those flaky tests without having to overhaul everything right away. You can find more info on it in their official documentation.

Related Questions
How To: Running Codex CLI on Windows with Azure OpenAI
Set Wordpress Featured Image Using Javascript
How To Fix PHP Random Being The Same
Why no WebP Support with Wordpress
Replace Wordpress Cron With Linux Cron
Customize Yoast Canonical URL Programmatically