Categories
English Geeky

Bisecting Python unit test errors to find test interdependencies

 

Many of our test runs use parallelization to run faster. Sometimes we see test
failures which we can’t reproduce locally, because locally we usually run
sequentially; and even then, the test ordering seems to be somewhat
unpredictable so it’s hard to reproduce the exact test ordering seen in our
test runner.

Most of the time these failures are due to unidentified test interdependencies:
either test A causes test B to pass (where running test B in isolation would
fail), or test A causes B to fail (where running B in isolation would pass). And we have seen more complex scenarios where C passes, A-B-C passes, but A-C fails (because A sets C up for failure, while B would set C up for success). We added some diagnostic output to our test runner so it would show exactly the list of tests each process runs. This way we can copy the list and run it locally, which usually reproduces the failure.

But we needed a tool to then determine exactly which of the tests preceding the failing one was setting up the failure conditions. So I wrote this simple bisecter script, which expects a list of test names, which must contain the faily test “A”, and of course, the name of the faily test “A”. It looks for “A” in the list and will use bisection to determine which of the tests preceding “A” is causing the failure.

#!/usr/bin/python3
"""
Find which test in the test list is causing the failure of a known-failing
test. That is – Given a test list which dictates a specific test order,
under which a test X (which passes when run in isolation) is failing, find
out which of the tests that, on the list, run before X, are causing it to
fail.
Many of our test runs use parallelization to run faster. Sometimes we see test
failures which we can't reproduce locally, because locally we usually run
sequentially; and even then, the test ordering seems to be somewhat
unpredictable so it's hard to reproduce the exact test ordering seen in our
test runner.
Most of the time these failures are due to unidentified test interdependencies:
either test A causes test B to pass (where running test B in isolation would
fail), or test A causes B to fail (where running B in isolation would pass).
And we have seen more complex scenarios where C passes, A-B-C passes, but A-C
fails (because A sets C up for failure, while B would set C up for success).
We added some diagnostic output to our test runner so it would show exactly the
list of tests each process runs. This way we can copy the list and run it
locally, which usually reproduces the failure.
But we needed a tool to then determine exactly which of the tests preceding the
failing one was setting up the failure conditions. So I wrote this simple
bisecter script, which expects a list of test names, which must contain the
faily test "A", and of course, the name of the faily test "A". It looks for "A"
in the list and will use bisection to determine which of the tests preceding
"A" is causing the failure.
Note it's not very tunable, it will run "make test" with
ARGS='–failfast $LIST_OF_TESTS'
And interpret any non-zero exit code as "a test failed".
"""
import argparse
import math
import subprocess
import sys
def bisect_run(f_list, f_test):
# Always called with a f_list that causes f_test to fail.
if len(f_list) == 1:
return("The test that causes the failure is {}".format(f_list[0]))
if len(f_list) == 0:
return("No test causes the failure? what?")
first_half = f_list[:len(f_list)/2]
second_half = f_list[len(f_list)/2:]
print("{} elements in the list, about {} iterations left".format(
len(f_list), int(math.log(len(f_list), 2))))
try:
list_of_tests = first_half[:]
list_of_tests.append(f_test)
test_plan = " ".join(list_of_tests)
subprocess.check_output(
"make test ARGS='–failfast {}'".format(test_plan),
shell=True, stderr=subprocess.PIPE)
except:
print("Test causing failure is in first half of given list")
return bisect_run(first_half, f_test)
else:
print("Test causing failure is in second half of given list")
return bisect_run(second_half, f_test)
def main():
parser = argparse.ArgumentParser(description="""
Find which test in the test list is causing the failure of a known-failing
test. That is – Given a test list which dictates a specific test order,
under which a test X (which passes when run in isolation) is failing, find
out which of the tests that, on the list, run before X, are causing it to
fail.
""")
parser.add_argument("test_list", help="File containing a list of "
"test names, one per line.")
parser.add_argument("failing_test", help="Name of the test that fails. "
"It must exist in the test_list file.")
args = parser.parse_args()
with open(args.test_list, "r") as test_list_file:
test_list = [s.strip() for s in test_list_file.readlines()]
# We don't need to bother with tests before failing_test
f_index = test_list.index(args.failing_test)
test_list = test_list[:f_index1]
print(bisect_run(test_list, args.failing_test))
sys.exit(main())

view raw
bisecter.py
hosted with ❤ by GitHub

As an example, I used it to find a test failure in Ubuntu SSO:

python bisecter.py  test-orders/loadbad1.txt webui.tests.test_decorators.SSOLoginRequiredTestCase.test_account_must_require_two_factor
273 elements in the list, about 8 iterations left
Test causing failure is in second half of given list
137 elements in the list, about 7 iterations left
Test causing failure is in second half of given list
69 elements in the list, about 6 iterations left
Test causing failure is in first half of given list
34 elements in the list, about 5 iterations left
Test causing failure is in second half of given list
17 elements in the list, about 4 iterations left
Test causing failure is in second half of given list
9 elements in the list, about 3 iterations left
Test causing failure is in second half of given list
5 elements in the list, about 2 iterations left
Test causing failure is in second half of given list
3 elements in the list, about 1 iterations left
Test causing failure is in second half of given list
2 elements in the list, about 1 iterations left
Test causing failure is in first half of given list
The test that causes the failure is webui.tests.test_views_account.AccountTemplateTestCase.test_backup_device_warning

 

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.