There’s a toxic cycle that can be caused by flakey tests. The more unreliable tests are, the less likely you are to trust their results. At best, it means wasted time as people re-run a failing step in an automated pipeline until they get the result they want. It then takes 2 or 3 times longer to identify when a failure is “real” compared to noticing it the first time they fail. At worst, it means the test results get ignored completely. “It’s probably fine, it always fails,” someone might say. This is often an excuse to keep tests out of automated pipelines, or leave them as non-blocking. “We can’t let failing tests get in the way of releasing, because failing tests are never real.”
The feedback loop comes in because as soon as you start ignoring one flaky result, it becomes easier to ignore two or three flaky tests. The less you look at test results, the easier it is let them get even flakier. As they get even flakier, you trust them even less, and the loop repeats.
There’s two obvious ways to break out of this cycle: you either have to fix the tests, or finally admit that they aren’t adding value and get rid of them.
Often people take for granted that the right thing to do is to fix the tests. People outside a team where this is a problem will ask “why doesn’t the team just fix the tests?” as if it’s not an idea they considered. Maybe it’s a matter of time, competing priorities, or expertise. Maybe someone just has to insist on making the tests blocking and accept that they can’t release until they’re made reliable. But that’s the obvious route to go because we default to believing that tests are good, and more tests are better.
The rarer question, it seems, is: “are these tests actually useful?” It could very well be true that the one flakey test is also the only one that actually catches real bugs. This is always the excuse to default to keeping every test: “No I can’t remember the last time this caught a real bug, but what if one day we did introduce a bug here?” However, when we find ourselves so deep in the toxic feedback loop that people barely even notice when the tests fail, it’s very likely that people wouldn’t notice if those tests ever did catch a real bug anyway. Especially if it’s an intermittent bug. (And sometimes, a flaky tests is flakey because the app is flakey, but tests get the blame).
How do you know when it really is safe to cut your losses and delete the flakey test? One way to think of it is to remember what tests are actually for. We don’t have tests just to see them passing or failing; on one level at least, we test in order to catch issues before they make it to production. So, are the flakey tests helping that goal, or hindering it?
I recently spoke with a group with this exact problem: automated testing wasn’t being included in their release pipelines because it was too flakey, and it was hard to put effort into fixing flakey tests because they were too easily ignored outside the pipeline. But here’s where context matters: their change fail rate was consistently well under 10%. Products had 2 or 3 production issues per year, and each was usually addressed in a few hours. For this group, that was quite good. Nobody on the team was stressed by it and their stakeholders were happy. And this was in spite of the fact that big pieces of automated testing were frequently ignored.
While there is always a risk that dropping tests will allow more issues going unnoticed until it’s too late, in this scenario that isn’t going to be as much of a concern. If it’s true that tests already weren’t doing their job with no ill effects in production, then there was little point running them in the first place, and even less point in investing time to fix them. Their process was already safe enough without them.
Context here matters more than anything else, but just maybe you don’t need to bother fixing those flakey tests after all. Would anybody notice if you just deleted them instead?