Pros and Cons of Quarantined Tests
Flaky tests, i.e., those that only fail sometimes, are the bane of any end-to-end automated test suite.
Another type of problem test is one that fails every time but which tests something that is deemed not important enough to fix right now. If you have to ignore some of the failed tests sooner or later you’re going to ignore one that you should have paid attention to. Or worse, you might decide to ignore them all because clearly no-one is fixing the bugs.
If a test is broken, fixing it should always be the first course of action, if possible. But what if some other task has a higher priority? If you’re confident that the problem is the test and not the software being tested, it might be reasonable to allow the test to keep failing, at least temporarily.
When you frequently ignore some failing tests, the whole suite is at risk of being seen as unreliable. A common way to prevent that is to quarantine the flaky/failing tests. Quarantine in this context refers to isolating the troublesome tests from the rest of the test suite. Not for fear of contagion, except in the sense of the negative impact they can have on the perception of the rest of the tests.
I think I first came across the concept in an article by Martin Fowler. It’s a great read on the topic of flaky tests and how to identify and resolve the causes of their flakiness. This post isn’t about how to fix them so check out that article if you’re after that kind of info.
More recently, an article on the Google Testing Blog mentioned the same technique for dealing with the same types of troublesome tests.
Even though quarantining tests can be a good temporary solution, if you don’t fix the tests (or the bugs) you can end up in the situation I mentioned before; a few failing tests create the impression that the entire suite is unreliable, enough so that you might consider them a death sentence.
My team and try to avoid that death sentence in a few ways:
-
Report quarantined test results separately from the rest of the test suite.
That way everyone can see the results of the reliable tests and know that a failure there is something that should be looked at immediately. We don’t have to try to identify the “true” failures amongst the flaky ones.
-
Tag quarantined tests with a reason they’re quarantined.
So flaky tests get tagged as such. Failing tests that aren’t going to get fixed for a while get reported and tagged with the issue number. Comments can be added if the tag isn’t sufficient. This isn’t enough to rescue a quarantined test from oblivion, but it can help avoid the potential problem of losing track of why a test was quarantined.
-
Schedule a regular review of quarantined tests.
If it’s not scheduled it’s not likely to happen. Failing tests can be assigned to someone to fix if priorities change, and time can be invested in fixing a flaky test if we decide it’s more important than we first thought.
-
Delete the test
If any test stays in quarantine for a long time it would be worthwhile rethinking the value the test provides. Maybe it turns out that unit tests, or even exploratory tests, provide enough coverage. Or the test might cover a part of the software that rarely changes, or which doesn’t get much use. In that case if there is a regression it’s not a big deal. We might
@Ignore
the test and leave a comment explaining why—instead of deleting it—if it seems likely someone might decide to write the test again.
How do you deal with flaky or failing tests that don’t get fixed quickly?