I'm facing a huge bottleneck with our testing pipeline. We have a Selenium test suite with around 650 tests that runs with every pull request, but it has become a productivity killer for our team. On average, it takes about 40 minutes to complete, sometimes going up to an hour. The bigger issue is the test flakiness; we see about 8 to 12 tests fail during each run, and the failing tests are always different, which has led the developers to just rerun the tests and grab a coffee instead of trusting the results. We're trying to deploy several times a day, but the QA stage is holding us back. It feels like a lot of the tests have cried wolf too many times, and when something does genuinely fail, everyone assumes it's just another selector issue.
We've attempted to run tests in parallel but hit our continuous integration runner limits. We also tried to optimize which tests run when, but that led to integration issues. It feels like we're caught between slow tests that aren't reliable and not being able to find a solution that gives us fast, stable tests that actually detect real bugs. I'm starting to wonder if the entire selector-based testing approach is flawed for complex modern web applications. Has anyone found effective ways to tackle these problems?
5 Answers
You've really hit the nail on the head with the testing approach. Relying heavily on Selenium for UI tests can lead to flakiness, especially if there are asynchronous components involved. Considering alternatives like switching to Playwright or cleaning up current test logic might offer better reliability and speed.
Can you implement an auto-retry feature for your pipeline? It might help with the immediate issue of flaky tests failing randomly. Also, evaluate how many of those 650 tests are actually necessary. Sometimes, old tests stick around even when they don't reflect anything currently in use, which can clutter your pipeline.
It sounds like you need some serious changes, not just on the technical side but also with management's expectations. I've been in a similar situation where we ended up disabling all end-to-end tests just to get things moving again. Focus on improving those flaky tests; they shouldn't be failing without reason. And if management continues to push for deliveries despite the broken tests, they need a wake-up call about the costs of their decisions.
It's essential to figure out why these tests are flaky in the first place. Look for common patterns in failures or specific tests that cause issues consistently. Gathering this data can help prioritize fixes. Also, carefully assess whether all those tests need to run on every pull request. A smart strategy could involve categorizing tests by importance.
Honestly, it seems like a classic case of mismanagement of the testing process. Flaky tests should not be blocking your releases. How about moving those tests to a different phase in the pipeline, maybe running them nightly instead of on every PR, and only ensure important ones are run during regular merges?

Related Questions
How To: Running Codex CLI on Windows with Azure OpenAI
Set Wordpress Featured Image Using Javascript
How To Fix PHP Random Being The Same
Why no WebP Support with Wordpress
Replace Wordpress Cron With Linux Cron
Customize Yoast Canonical URL Programmatically