## Silly testcase hacks

Martin Pool linked to an old post by Evan Miller on how writing tests could be more pleasant if you could just do the setup and teardown parts once, and (essentially) rely on backtracking to make sure it happens for every test. He uses a functional language for his example, and it’s pretty interesting.

But it is overly indented, and hey, I like my procedural code, so what about trying the same thing in Python? Here’s my go at it. The code under test was the simplest thing I could think of — a primality checker:

 def is_prime(n): if n == 1: return False i = 2 while i*i <= n: if n % i == 0: return False i += 1 return True 

My test function then tests a dozen numbers numbers which I know are prime or not, return True if is_prime got the right answer, and False otherwise. It makes use of a magic "branch" function to work out which number to test:

 def prime_test(branch): if branch(True, False): n = branch(2,3,5,7,1231231) return is_prime(n) else: n = branch(1,4,6,8,9,10,12312312) return not is_prime(n) 

In order to get all the tests run, we need a loop, so the test harness looks like:

 for id, result in run_tests(prime_test): print id, result 

(Counting up successes, and just printing the ids of failures would make more sense, probably. In any event, the output looks like:

 [True, 2] True [True, 3] True [True, 5] True [True, 7] True [True, 1231231] True [False, 1] True [False, 4] True [False, 6] True [False, 8] True [False, 9] True [False, 10] True [False, 12312312] True 

Obviously all the magic happens in run_tests which needs to work out how many test cases there'll end up being, and provide the magic branch function which will give the right values. Using Python's generators to keep some state makes that reasonable straightforward, if a bit head-twisting:

 def run_tests(test_fn): def branch(*options): if len(idx) == state: idx.append(0) n = idx[state] if n+1 < len(options): state = state state += 1 vals.append(options[n]) return options[n] idx = [] while True: state, vals = [0, None], [] res = test_fn(branch) yield (vals, res) if state is None: break idx[state] += 1 idx[state+1:] = [] 

This is purely a coding optimisation -- any setup and teardown in prime_test is performed each time, there's no caching. I don't think there'd be much difficulty writing the same thing in C or similar either -- there's no real use being made of laziness or similar here -- I'm just passing a function that happens to have state around rather than a struct that happens to include a function pointer.

Anyway, kinda nifty, I think!

(Oh, this is also inspired by some of the stuff Clinton was doing with abusing fork() to get full coverage of failure cases for code that uses malloc() and similar, by using LD_PRELOAD)

1. aj says:

After posting I had an idea for a more general approach. First a class:

 class Brancher: def __iter__(self): return self def next(self): if not hasattr(self, "idx"): self.idx = [] elif self.nxt is not None: self.idx[self.nxt] += 1 self.idx[self.nxt+1:] = [] else: raise StopIteration() self.vals = [] self.cnt, self.nxt = 0, None return self def __call__(self, *options): if len(self.idx) == self.cnt: self.idx.append(0) n = self.idx[self.cnt] if n+1 < len(options): self.nxt = self.cnt r = options[n] self.vals.append(r) self.cnt += 1 return r 

Then you can do something like:

 for branch in Brancher(): n = branch(*range(2,10)) m = branch(*range(2,10)) if is_prime(n*m): print "FAIL: is_prime(%d*%d) = TRUE" % (n, m) 

and have the loop cover all combinations of the branches. The run_tests function then becomes:

 def run_tests(test_fn): for branch in Brancher(): try: res = test_fn(branch) except Exception as e: yield (branch.vals, None, e) else: yield (branch.vals, res, None) 

(at least, it does when you add in support for exception handling).

2. Martin Pool says:

That is a cute way to write inverted control, but is it really the same thing? You are not testing what happens in a situation built up by previous tests, which is what seems to me to be essential there.

The situation he describes could be described as an application of the Chained Tests pattern, and it has mixed benefits. One of the benefits is that you may need less explicit setup code; one drawback is it’s hard to implement correctly and hard to maintain in non-pure languages and in particular it is hard to have branching.

The kind of case you’re describing is something a lot like what testscenarios handles:

class TestPrimality(TestCase):
scenarios = [(str(a), {‘number’: a, ‘prime’: b)
for a, b in
[(1, True), (2, True), (3, True), (4, False), (6, False)]]

def test_is_prime(self):
self.assertEquals(self.prime, is_prime(self.number))

3. Martin Pool says:

html-stripping web forms are particularly cruel to python…

4. aj says:

Okay, I think that’s the fault of my examples more than the idea though. Here’s (I think) a better example.

Code under test is a primality test that (optionally) also does the Sieve of Eratosthenes thing to speed things up.

 def euler_sieve(n): if n > 1000: raise Exception("cannot do a sieve with %d elements" % n) # from wikipedia candidates = range(n+1) fin = int(n**0.5) for i in xrange(2, fin+1): if not candidates[i]: continue candidates[2*i::i] = [None] * (n//i - 1) return [i for i in candidates[2:] if i] def is_prime(n, sieve = None, sieve_max = None): if sieve is None: i = 2 else: if n < = sieve_max: return n in sieve else: for i in sieve: if n % i == 0: return False i = sieve_max while i*i <= n: if n % i == 0: return False i += 1 return True 

Test code that deals with a bunch of different cases (are we using a pregenerated sieve, does the sieve actually generate, is the number prime, a unit, negative) might be:

def prime_test(branch):
have_sieve = branch(True, False)
if have_sieve:
sieve_size = branch(-1, 0, 1, 2, 10, 100, 500, 1000)
sieve = euler_sieve(sieve_size)
else:
sieve_size = 0
sieve = None
actual_prime = branch(True, False)
if actual_prime:
n = branch(2, 3, 5, 7, 30133, 1231231)
else:
n = branch(-1, 0, 1, 4, 6, 8, 100, 1000000)
return is_prime(n, sieve, sieve_size) == actual_prime

Failures are:

 FAIL: [True, -1] : exception negative number cannot be raised to a fractional power FAIL: [True, 0, True, 2] : exception integer division or modulo by zero FAIL: [True, 0, True, 3] : exception integer division or modulo by zero FAIL: [True, 0, True, 5] : exception integer division or modulo by zero FAIL: [True, 0, True, 7] : exception integer division or modulo by zero FAIL: [True, 0, True, 30133] : exception integer division or modulo by zero FAIL: [True, 0, True, 1231231] : exception integer division or modulo by zero FAIL: [True, 0, False, 1] : exception integer division or modulo by zero FAIL: [True, 0, False, 4] : exception integer division or modulo by zero FAIL: [True, 0, False, 6] : exception integer division or modulo by zero FAIL: [True, 0, False, 8] : exception integer division or modulo by zero FAIL: [True, 0, False, 100] : exception integer division or modulo by zero FAIL: [True, 0, False, 1000000] : exception integer division or modulo by zero FAIL: [True, 1, True, 2] : wrong result FAIL: [True, 1, True, 3] : wrong result FAIL: [True, 1, True, 5] : wrong result FAIL: [True, 1, True, 7] : wrong result FAIL: [True, 1, True, 30133] : wrong result FAIL: [True, 1, True, 1231231] : wrong result FAIL: [True, 2000] : exception cannot do a sieve with 2000 elements FAIL: [False, False, -1] : wrong result FAIL: [False, False, 0] : wrong result FAIL: [False, False, 1] : wrong result 

So you only get one message when your sieve creation fails (creating a sieve of the first -1 numbers [True, -1] fails when sqrt(-1) gets invoked, and for the first 2000 numbers fails because it's deliberately disabled), but when sieve creation (apparently) succeeds you get results for each of the following-on branches you've setup.

5. aj says:

Oh, the above comment’s output is generated by invoking:

 for test, res, exc in run_tests(prime_test): if exc is None: if not res: print "FAIL: %s : wrong result" % (test) else: print "FAIL: %s : exception %s" % (test, exc) 

6. Martin Pool says:

That is a better example. It still seems a little odd to be recreating by hand something very close to being a generator.

Perhaps the tests would be better factored as first checking on generation of the sieve, and then that the sieve is used correctly. (Or perhaps that is just arguing the example.)