Hypothesis

Test faster, fix more

One of the big differences between Hypothesis and Haskell QuickCheck is how shrinking is handled.

Specifically, the way shrinking is handled in Haskell QuickCheck is bad and the way it works in Hypothesis (and also in test.check and EQC) is good. If you’re implementing a property based testing system, you should use the good way. If you’re using a property based testing system and it doesn’t use the good way, you need to know about this failure mode.

Unfortunately many (and possibly most) implementations of property based testing are based on Haskell’s QuickCheck and so make the same mistake.

The big difference is whether shrinking is integrated into generation.

In Haskell’s QuickCheck, shrinking is defined based on types: Any value of a given type shrinks the same way, regardless of how it is generated. In Hypothesis, test.check, etc. instead shrinking is part of the generation, and the generator controls how the values it produces shrinks (this works differently in Hypothesis and test.check, and probably differently again in EQC, but the user visible result is largely the same)

This is not a trivial distinction. Integrating shrinking into generation has two large benefits:

  • Shrinking composes nicely, and you can shrink anything you can generate regardless of whether there is a defined shrinker for the type produced.
  • You can guarantee that shrinking satisfies the same invariants as generation.

The first is mostly important from a convenience point of view: Although there are some things it let you do that you can’t do in the type based approach, they’re mostly of secondary importance. It largely just saves you from the effort of having to write your own shrinkers.

But the second is really important, because the lack of it makes your test failures potentially extremely confusing.

To see this, lets consider the following example in Hypothesis:


from hypothesis import given
from hypothesis.strategies import integers

even_numbers = integers().map(lambda x: x * 2)

@given(even_numbers)
def test_even_numbers_are_even(n):
  assert n % 2 == 0

This test always passes: We generate an even number by multiplying an integer we’ve generated by two. No problem.

Now suppose we made the test fail:


from hypothesis import given
from hypothesis.strategies import integers

even_numbers = integers().map(lambda x: x * 2)

@given(even_numbers)
def test_even_numbers_are_even(n):
  assert n % 2 == 0
  assert n <= 4

This test will of course fail: Any value of n which is at least 5 will fail the test.

But which assertion will fail, and what value will cause it to fail?

In Hypothesis it’s the second one, and it will fail with n=6: The numbers passed in will still always be even integers, and the smallest even integer which fails the test is 6.

But suppose Hypothesis implemented type based shrinking (very early versions of it did implement type based shrinking, but this stopped being the case somewhere well before 1.0 when the API looked very different).

In that case Hypothesis would just know that these things that were failing the tests were integers, so it would say “How about 1? 1 is a perfectly valid integer. Lets try 1”.

It would pass in n=1, the first assertion would trigger, and the test would promptly fail. A successful shrink!

But it’s completely not what we wanted. We were just trying to test on even integers and the shrinking messed this up.

This is in some sense the classic shrinkers are fuzzers problem where an error is reduced to a different error, but it’s a particularly annoying version of that because an error we care about is being reduced to an error we don’t care about.

So we have to duplicate the constraint logic in our test to make this work:


from hypothesis import given, assume
from hypothesis.strategies import integers

even_numbers = integers().map(lambda x: x * 2)

@given(even_numbers)
def test_even_numbers_are_even(n):
  assume(n % 2 == 0)
  assert n % 2 == 0
  assert n <= 4

(Having both the assume and the first assert there is of course redundant)

In this example the problem was relatively obvious and so easy to work around, but as your invariants get more implicit and subtle it becomes really problematic: In Hypothesis it’s easy and convenient to generate quite complex data, and trying to recreate the invariants that are automatically satisfied with that in your tests and/or your custom shrinkers would quickly become a nightmare.

I don’t think it’s an accident that the main systems to get this right are in dynamic languages. It’s certainly not essential - the original proposal that lead to the implementation for test.check was for Haskell, and Jack is an alternative property based system for Haskell that does this - but you feel the pain much more quickly in dynamic languages because the typical workaround for this problem in Haskell is to define a newtype, which lets you turn off the default shrinking for your types and possibly define your own.

But that’s a workaround for a problem that shouldn’t be there in the first place, and using it will still result in your having to encode the invariants into your your shrinkers, which is more work and more brittle than just having it work automatically.

So although (as far as I know) none of the currently popular property based testing systems for statically typed languages implement this behaviour correctly, they absolutely can and they absolutely should. It will improve users’ lives significantly.

This of course goes doubly for dynamic languages, where even working around this problem is hard.

So, in conclusion, just say no to type based shrinking. It’s a bad idea, and failures your property based tests will become significantly harder to understand in its presence.