The Design and Use of QuickCheck
January 14, 2017
Newsletter ↳
QuickCheck is the grandfather of property-based testing libraries. Despite being imitated in over thirty languages, the original implementation remains pre-eminent due to the type system and consistent logic of the Haskell language in which it is written.
Prior randomized testing tools required learning a special language and grammar to generate complex test cases. QuickCheck embeds the generator language in Haskell for a succinct domain specific language that reduces the learning curve and build system dependencies.
Property-based testing is enormously effective and it pays to learn to use it well. Getting started with QuickCheck is easy, but there is a lot to learn under the surface to achieve greater customization and better results. The typical examples you’ll find online have deceptively simple code (usually involving list reversal). The examples make it hard to learn how their succinct tests are possible, and how to customize them when things go wrong. This guide puts it all together piece by piece so you can use QuickCheck with confidence.
First things first, what is property-based testing? A property of a program is an observation that we expect to hold true regardless of the program’s inputs. It may involve only the output (“always outputs a positive number”) or compare input and output (“preserves list length”) or even assess external effects (“matches the output of a trusted external program”).
That is, instead of writing individual unit tests consisting of certain expected input-output pairs of a function, we write the desired properties of functions. Then QuickCheck automatically generates random values which can be run to verify (or rather, falsify) the property.
Using QuickCheck, the developer is forced to think about what the code should do at a high level rather than grunt out a few unmotivated test cases. The tool finds corner-cases where the specification is violated, which leads to either the code or the specification getting fixed. Additionally the specifications persist as rich machine-checkable documentation.
Proponents of formal methods sometimes stress the notion of specification above that of implementation. However it is the inconsistencies between these two independent descriptions of the desired behavior that reveal the truth. We discover incomplete understanding in the specs and bugs in the implementation. Programming does not flow in a single direction from specifications to implementation but evolves by cross-checking and updating the two. Property-based testing quickens this evolution.
Of course the only way to truly guarantee properties of programs is by mathematical proof. However property-based tests approximate deductive confidence with less work by checking properties for a finite number of randomized inputs called test cases.
Methodology of Testing
Given that property based tests can discover only the presence of bugs and never their absence, when should tests be considered adequate? One criterion is code coverage: that every program statement be executed at least once. QuickCheck does not measure code coverage, nor does it knowingly generate test data to exercise particular code paths. When running a test suite you should enable a code coverage report.
# Be sure to enable a code coverage report
# https://docs.haskellstack.org/en/latest/coverage/
$ stack test --coverage
Random data tends to exercise small programs considerably, so applying QuickCheck against one small function at a time is a good way to build total coverage. Also test cases whose statistical distribution follows that of actual data tend to generate higher coverage. For example, think of a function that manipulates email addresses. The vast majority of long random strings will exercise only the invalid email code path. Realistic data will exercise more. QuickCheck’s built-in test case generators use a uniform distribution but the library provides a way to customize generation.
Note there are situations when random test cases are not the most effective tool. QuickCheck will be less definitive than an alternative like SmallCheck for functions which take a finite number of possible input values. SmallCheck generates not just a random set of test cases but all possible cases beneath a given “size.” For example all Unicode strings of length ten or less. If a function has a finite (and small enough) number of possible input values then SmallCheck can prove conclusively when properties hold.
Generating Random Test Cases
As discussed, QuickCheck generates random test cases to falsify program properties. It uses a newtype with a single type argument, Gen a
, to generate values of type a
. Gen wraps a function mapping a random number generator and a size parameter to a value of a given type.
newtype Gen a = MkGen {
unGen :: QCGen -> Int -> a
}
QuickCheck provides combinators to obtain and modify Gen values. For instance here are two basic functions to make Gens:
-- Generate a random element in the given inclusive range
-- (using Random from the "random" package)
choose :: Random a => (a, a) -> Gen a
-- Pick elements at random from a (non-empty) list
elements :: [a] -> Gen a
You get values out of Gen by using generate
. It runs in the IO monad so that it can advance along an internal sequence of pseudo-random values.
-- generate :: Gen a -> IO a
-- produce 1, 2, or 3
$ elements [1,2,3]
generate
-- produce a lowercase letter
$ choose ('a', 'z')
generate
-- produce a constant value (since Gen has a Monad instance)
$ return 1 generate
So far the grammar is fairly limited, but QuickCheck uses a typeclass called Arbitrary
to produce generators for more varied types.
class Arbitrary a where
arbitrary :: Gen a
Generally it’s a Haskell convention that typeclass methods are inter-related by laws (like m >>= return ≡ m
for Monads), but Arbitrary doesn’t follow that convention. It’s an “unprincipled” typeclass in this regard, but still useful.
You can think of Arbitrary as providing common-sense default generators for basic types. You’re still free to create your own generators or use those from third-party libraries, but QuickCheck provides plenty of Arbitrary instances. Compare two ways to generate the same distribution of values:
-- manually generating random Bool
$ choose (False,True)
generate
-- ...vs using the Bool instance of Arbitrary
arbitrary :: Gen Bool) generate (
QuickCheck add-on libraries differ in their approach to providing new Gen types. Some (like quickcheck-unicode) provide functions which return Gen directly, and others (like quickcheck-instances) introduce newtypes with Arbitrary instances.
Type inference allows Arbitraries to combine recursively and easily build more complicated generators from simpler ones. This compounding means it is helpful to take the time to create Arbitrary instances for application-specific data types so they can be more easily incorporated into property tests.
-- there are many recursive instances like these
-- Arbitrary a => Arbitrary [a]
-- (Arbitrary a, Arbitrary b) => Arbitrary (a, b)
-- QuickCheck uses them to build things like this
arbitrary :: Gen [(Int, Bool)]) generate (
Gen is also an instance of Applicative, so it can be applied to fill the fields of custom datatypes.
data MyType = MyType {
foo :: Int
bar :: Bool
, baz :: Float
,deriving (Show)
}
$ MyType <$> arbitrary <*> arbitrary <*> arbitrary generate
If you’re truly lazy you can avoid repeatedly applying arbitrary
by using the generic-random package.
{-# LANGUAGE DeriveGeneric #-}
import GHC.Generics
import Generic.Random.Generic
data MyType = MyType {
foo :: Int
bar :: Bool
, baz :: Float
,deriving (Show, Generic)
}
genericArbitrary :: Gen MyType) generate (
To generate more complicated recursive test cases, use the Monad and Applicative instance of Gen. For instance,
-- Randomly use one of several generators with
-- oneof :: [Gen a] -> Gen a
myList :: Arbitrary a => Gen [a]
= oneof
myList return []
[ :) <$> arbitrary <*> myList
, ( ]
This generator tends to produce very short lists, many of them empty. Statistically that’s how it works when at each step the function is fifty-percent likely to terminate. Rather than use oneof
to choose generators we can skew the probability.
-- like oneof, but with weighted distribution
-- frequency :: [(Int, Gen a)] -> Gen a
myList' :: Arbitrary a => Gen [a]
= frequency
myList' 1, return [])
[ (4, (:) <$> arbitrary <*> myList')
, ( ]
Now the function is four times as likely to proceed at every step, generating longer lists on average. Between myList
and myList'
– one which generates short lists and the other slightly longer – there is a common pattern. It would be more flexible to have an adjustable size.
In general when generating unbounded types like Integer or list the question naturally arises: how large an integer or long a list will we choose? It’s impossible to pick from among all integers with equal probability. QuickCheck resolves this conundrum by including the size parameter in a generator. Remember the Gen type wraps a function whose second parameter is an Int which holds the “size.” Some generators such as choose (0, 9)
disregard it, but others consult it.
We can modify our list generator to heed its size parameter. We’ll tie the size to the likelihood of continuing to build a list and use the sized
function to read the parameter.
-- sized :: (Int -> Gen a) -> Gen a
flexList :: Arbitrary a => Gen [a]
= sized $ \n ->
flexList
frequency1, return [])
[ (:) <$> arbitrary <*> flexList)
, (n, ( ]
The size parameter is especially useful when generating structures like trees which are less likely to terminate when randomly spawning new branches. The size can be decreased deeper in the tree to stop its growth.
Our old friend the generate
function sets size to thirty. Notice the length of lists produced by the flexList
generator by default:
-- there is a debugging function to call generate repeatedly,
-- sample :: Show a => Gen a -> IO ()
> sample (flexList :: Gen [Int])
[]-1]
[3,0,1,1]
[0]
[-4]
[0,7,-2,-2,2,-5,6,10,4,9,-5,-4]
[-12,-2,-1,-11,7,-3]
[-6,8,-3,-2,-6,9,-14,14,14,3,-6]
[0,-10,-14,11,-12,-12,-6,4,-6,-16,-16,-12,-9,4,-13,-7,-7,1,13]
[-16,-6,18,1,0,-17,-14,-18,2,-7,7,6,12,15]
[-12,11,-15,13,-17,-20,10,-18,-15,-5] [
The size parameter applies to all generators at once in a complex result. Notice that the Int values in the generated lists above vary between -30 and 30 (in fact a strictly smaller range in the example above). Generating longer lists also generates larger numbers inside. QuickCheck is designed this way to avoid correlation between the sizes of different parts of the test data which may distort test results.
There are helper function to adjust the size parameter in a generator as well.
-- We can modify a generator to use a fixed size with
-- resize :: Int -> Gen a -> Gen a
--
-- Here's a big list of big numbers
1000 flexList :: Gen [Int])
generate (resize
-- Alternately we can alter a generator's sensitivity
-- to size. This will be useful later
*33) flexList :: Gen [Int]) generate (scale (
Now that we can generate a variety of test cases it is time to move on to the fun part, testing programs.
Specifying Laws
Laws, or properties are predicates that can be checked by testing. QuickCheck represents them as Gen Result
where Result holds various facts about a test including:
- Is the test successful, failed, or should it be discarded?
- Whether to retry the test
- Statistics about test cases
- Reason for failure
- Exceptions thrown
At a high level each property generates a test case inside the Gen monad, evaluates the data against part of the program under test, then constructs a Result. As all this happens inside Gen, the result is of type Gen Result
aka Property
. To run the tests, QuickCheck generates the results and displays statistics about them.
Let’s make a Property by hand (the hard way) and then see how QuickCheck makes it easier.
-- returned values are modified versions of
-- succeeded :: Result
-- failed :: Result
prop_commutativeAdd :: Gen Result
= do
prop_commutativeAdd <- arbitrary :: Gen (Int, Int)
(x, y) return $ if x + y == y + x
then succeeded
else failed { reason = "stupid non-commutative addition" }
(This is a slight simplification in that nowadays Property
is defined as Gen
of a tree of Results rather than literally Gen Result
, but it’s fine for illustration.)
Unfortunately prop_commutativeAdd
contains a lot of boilerplate to test a simple assertion. QuickCheck offers a Testable
typeclass to help us simplify. Testable
is to Property
as Arbitrary
is to Gen
. In particular Testable a
provides a function property :: a -> Property
. QuickCheck actually runs Testable and extracts a Property rather than expecting Property directly.
-- These two instances make life easy:
--
-- Testable Bool
-- (Arbitrary a, Show a, Testable prop) => Testable (a -> prop)
prop_commutativeAdd :: Int -> Int -> Bool
= x + y == y + x prop_commutativeAdd x y
The Testable instance of Bool is similar to what we wrote by hand for the first version of prop_commutativeAdd. With that and because Functions from Arbitrary to Testable are themselves Testable, a simple function like the one above is Testable. It’s important to see how Properties work because the flexibility of Testable makes properties so short that it obscures how things work under the hood.
In fact our example can get shorter. Simple properties like commutativity, associativity, or idempotence have helper methods in the test-invariant package.
import Test.Invariant
prop_commutativeAdd :: Int -> Int -> Bool
= commutative (+) prop_commutativeAdd
To check an individual property use the quickCheck
function:
-- quickCheck :: Testable prop => prop -> IO ()
> quickCheck prop_commutativeAdd
+++ OK, passed 100 tests.
Note that properties must use monomorphic types, you can’t have a property with polymorphism like [a] -> [a] -> Bool
. You must pick a monomorphic instance for which to generate test cases. Be careful which instance you choose! Observe the difference between addition of Ints vs Doubles.
> quickCheck $ associative ((+) :: Int -> Int -> Int)
+++ OK, passed 100 tests.
> quickCheck $ associative ((+) :: Double -> Double -> Double)
*** Failed! Falsifiable (after 5 tests and 188 shrinks):
-1.4210880596288213e-14
2.842174061166933e-14
128.0
Test Case Distribution and Shrinking
Now that we’ve seen the basics of properties and test case generation let’s see them in action. Imagine we want to test two functions, encode
to perform Base64 encoding of binary data, and decode
to restore the data back again. Let’s focus on encode
first. Our goal is to determine properties of this function that hold for all inputs.
Base64 encoding breaks its input into groups of six bits, and assigns the sixty-four possible values to characters which are safe for various kinds of transmission. Now, regardless of the target alphabet, this maps each six bits to one byte, leading to an output size of ceil(4n/3) where n is the number of input bits. (The encoding pads its output with trailing equals signs to make up for input whose bit length is not divisible by six.)
Let’s express the input-output bit ratio property in QuickCheck. The encode function will deal with lazy ByteString to read raw binary data. ByteString does not have an Arbitrary instance supplied by core QuickCheck, but it’s available in the quickcheck-instances package.
import Test.QuickCheck.Instances ()
import qualified Data.ByteString.Lazy as BL
-- assuming our encode function is defined
-- encode :: BL.ByteString -> BL.ByteString
import Codec.Binary.Base64 (encode)
-- Haskell infers the type of b because it is passed to
-- to encode. The quickcheck-instances provides an Arbitrary
-- for it, and this whole property is thus Testable. That's
-- a reminder of why it is written so concisely.
=
prop_sizeRatio b ==
BL.length (encode b) 4 * ceiling (fromIntegral (BL.length b) / 3)
If we tried to use the identity function, id :: a -> a
, as our supposed encode
then this test will fail with any non-empty series of bytes:
> quickCheck prop_sizeRatio
=== prop_sizeRatio ===
*** Failed! Falsifiable (after 3 tests and 1 shrink):
"\NUL"
Examine the output more carefully. We haven’t talked yet about how long and in what order QuickCheck tries test cases. The quickCheck
function generates Results starting with a small size parameter, gradually increasing it. This makes it more likely to find simpler counterexamples if they exist. By default it checks one hundred cases and declares the test a success if they all pass.
In our case it found a counterexample and then started “shrinking” it to find a simpler one. Shrinking is an operation defined per-type as an optional method in the Arbitrary class. It has type shrink :: a -> [a]
, returning a list of simpler potential counterexamples. Candidates are tried in the order they appear in the list.
If not customized by Arbitrary instances the default behavior is to return an empty list, which will prevent shrinking. The documentation in the Test.QuickCheck.Arbitrary module is quite good, and suggests that implementations of shrink should try at least three things:
- Shrink a term to any of its immediate subterms.
- Recursively apply ‘shrink’ to all immediate subterms.
- Type-specific shrinkings such as replacing a constructor by a simpler constructor.
Additionally when defining shrink
on a recursive type it’s important that the shrunken results are actually smaller in some measureable way to avoid infinite descending chains. Never-ending or cyclic shrinkage will prevent QuickCheck from terminating.
In the case of ByteString, shrinking shortens the string and causes the byte values inside to move their values closer to zero (0x00). That’s why our string stopped shrinking at a single zero byte, denoted \NUL
. It can’t go any closer to zero and any shorter string passes the test.
Not only ought the input and output sizes be in a constant ratio, but Base64 encoding should pad the end with equals signs when needed to keep the output a multiple of four bytes. For instance “A” maps to “QQ==”, “AB” to “QUI=” and “ABC” doesn’t need any padding and maps to “QUJD.” This is another property we can test.
import Text.Regex.Posix
=
prop_endsWithPadding b =~ ("(^|[^=])" <> suffix <> "$")) -- at end
(encB && not (encB =~ ("=[^=]"::BL.ByteString)) -- only at end
where
= encode b
encB = fromIntegral $ BL.length b `rem` 3
remainder = BL.replicate
suffix 3 - remainder) `rem` 3)
((fromIntegral $ ord '=') (
This checks with regexes that the output ends with the correct number of equals signs and that no stray ones exist earlier. If our encode function is written correctly then it passes! However, are the random inputs really exercising all the padding possibilities? What if the input cases never attain certain sizes modulo four? We can get a better window into the test by labeling each case.
-- Just add this to the beginning of the property
-- collect :: (Show a, Testable prop) => a -> prop -> Property
=
prop_endsWithPadding b $ ... collect suffix
The test results show that the cases we care about are all represented about equally.
=== prop_endsWithPadding ===
+++ OK, passed 100 tests:
36% ""
35% "="
29% "=="
We’ve now established that the encode function outputs the correct length and pads it properly. The point of Base64 encoding however is to keep its output within a limited subset of ASCII for transmission over restricted protocols. It’s important to verify that the output stays within its allowed alphabet. Here’s the first version of the test:
import Data.Char (ord)
import Data.Monoid ((<>))
import qualified Data.Set as S
=
prop_outputAlphabet b `S.isSubsetOf` allowed
used where
= S.fromList . BL.unpack $ encode b
used = S.fromList . map (fromIntegral . ord) $
allowed 'A'..'Z'] <> ['a'..'z'] <> ['0'..'9'] <> ['+','/','='] [
If the encode function is written correctly then this test will pass! However, as with the padding property, we want to know if it is testing thoroughly enough. Does encode
output to the full alphabet? It might never hit certain values (maybe a range is off). To check we can classify certain outputs as being “half-alphabet” or “full-alphabet.”
-- classify :: Testable prop =>
-- Bool
-- -> String
-- -> prop -> Property
=
prop_outputAlphabet b >= 32) "half-alphabet"
classify (S.size used . classify (S.size used >= 63) "full-alphabet"
$ used `S.isSubsetOf` allowed
where
...
Running it reveals some of the outputs are half-alphabet, but none are full-alphabet. Our tests are not giving us the full picture.
=== prop_outputAlphabet ===
+++ OK, passed 100 tests (36% half-alphabet).
Calling classify
provides a nice way to keep a casual eye on distributions but we want a stronger guarantee. To insist that we get at least one full-alphabet output we switch from classify
to cover
.
=
prop_outputAlphabet b >= 63) 1 "full-alphabet"
cover (S.size used $ used `S.isSubsetOf` allowed
where
...
Now the test fails the way we want.
=== prop_outputAlphabet ===
*** Insufficient coverage after 100 tests (only 0% full-alphabet, not 1%).
Either the encode function has a bug or we’re not giving it big enough test cases to exercise the full output alphabet. Let’s see how to crank up the generator.
Refining Generated Data
As mentioned, QuickCheck begins with a small size parameter when generating results and raises it slowly as testing proceeds. If our failure to get output coverage in the previous section was due to insufficiently large input then there are at least two ways to enlarge the input: use more test cases to allow the size to grow further, or scale the generator sensitivity.
Using more test cases is a matter of configuring the test runner, which we will discuss in a later section. Instead let’s make the most of the number of tests already used. We’ll employ a custom generator which is a scaled version of one provided by quickcheck-instances.
-- To use a custom generator in a Property use
-- forAll :: (Show a, Testable prop)
-- => Gen a -> (a -> prop) -> Property
=
prop_outputAlphabet *3) (arbitrary :: Gen BL.ByteString)) $ \b ->
forAll (scale (let ... in
>= 63) 1 "full-alphabet"
cover (S.size used $ used `S.isSubsetOf` allowed
We saw how to scale a generator a few sections before, so that part is nothing new. As luck would have it, running this modified property with a suitable encode function reliably covers full-alphabet in one hundred runs.
We succeeded above by modifying the generator. Another way to affect generated values is by using newtypes specially designed to adjust generators of underlying types. This approach avoids the need for forAll
. The Test.QuickCheck.Modifiers module includes helper newtypes like NonNegative, NonZero, Positive, OrderdedList etc. Internally there is not much to these types, they simply allow a way for QuickCheck to define new Arbitrary instances. You use them in Testable type signatures:
prop_foo :: (Positive i) -> Bool
= ... prop_foo
A useful member of this family is Small a
. Restricting the number of distinct values taken in test cases is appropriate for testing certain code paths. For instance Gen HashMap
will detect more corner cases if the keys’ cardinalities are reduced to cause collisions.
Some people recommend creating a newtype and Arbitrary instance any time you use a custom Gen. The idea is that it may be more likely that you’ll use the custom Gen uniformly by using the newtype throughout code and tests. However it’s a matter of style.
Getting back to the Base64 example, we want to enforce that encode and decode are inverses. Encoding first and then decoding is the easy direction because encode is not picky about its inputs. The propery looks like this:
-- assume the decode function has this type:
-- decode :: ByteString -> Either ParseError ByteString
-- the easy direction
= decode (encode b) == Right b prop_decEnc b
The other direction does not work as well because many strings are not valid Base64 messages and cannot be decoded.
import Data.Either (rights)
-- this will quickly fail
=
prop_encDec b == (encode <$> rights [decode b]) [b]
We can discard test cases that aren’t suitable by using the conditional operator.
-- (==>) :: Testable prop => Bool -> prop -> Property
=
prop_encDec b ==> [b] == (encode <$> rights [dec])
legit where
= decode b
dec = isRight dec legit
This works but it is inefficient when the ratio of suitable to unsuitable test cases is small. After getting too many rejections QuickCheck fails.
=== prop_encDec ===
*** Gave up! Passed only 45 tests.
It’s always more efficient to use a custom generator than to conditionally discard test cases. Here’s generator for messages in the Base64 alphabet with proper equals padding, plus the modified test.
-- using more Gen combinators: listOf, vectorOf
encoded :: Gen BL.ByteString
= do
encoded <- concat <$> listOf (group 0)
body <- group =<< choose (0, 2)
end return . BL.pack $ body <> end
where
group :: Int -> Gen [Word8]
group pad = do
<- vectorOf (4 - pad)
letters . elements . map (fromIntegral . ord)
$ ['A'..'Z'] <> ['a'..'z'] <> ['0'..'9'] <> ['+','/','=']
return $ letters <> replicate pad 61 -- 61 is ascii for =
= forAll encoded $ \b ->
prop_encDec == (encode <$> rights [decode b]) [b]
QuickCheck, being the powerful tool it is, actually discovers a problem.
=== prop_encDec ===
*** Failed! Falsifiable (after 1 test):
"1yx="
Decoding “1yx=” and re-encoding produces “1yw=”. How can this be? RFC4648 Section 3.5 designates certain encodings as “canonical” and the Base64 decode function is only injective when restricted to the canonical encodings. The prop_encDec
test will pass only if the custom generator produces nothing but canonical forms. Modifying the generator is an exercise for the reader!
Testing encoder/decoders is a natural task for QuickCheck, and this examples goes to show you should make two properties to check that functions are inverses of one another – check both directions. Even our check that the encoder hits all letters of the alphabet did not reveal the importance of its canonical encodings.
A final note about generators. When the data type under test is internally complicated but presents an API for modification it may be easier to generate an Arbitrary series of actions against the API. Have test cases execute the API actions, thus building the datatype. This is the approach used at Pusher to test their Raft protocol implementation with QuickCheck.
Running Multiple Tests
We already used the quickCheck
function to run a single test, but given a file full of Testable properties how do you run them all? The Test.QuickCheck.All module uses Template Haskell to detect properties and run them.
{-# LANGUAGE TemplateHaskell #-}
module MyTests where
import Test.QuickCheck.All
= ...
prop_foo
= ...
prop_bar
--------------------------
return []
runTests :: IO Bool
= $quickCheckAll runTests
This creates a runTests
function which detects the tests due to the prop_
prefix in their names. That is the reason for the naming convention which might have looked a little odd.
To adjust the number of test cases run against each property (or other parameters) use forAllProperties
rather than quickCheckAll
.
-- quickCheckWithResult :: Testable prop => Args -> prop -> IO Result
-- stdArgs :: Args
return []
runTests :: IO Bool
= $forAllProperties $
runTests = 10000}) quickCheckWithResult (stdArgs {maxSuccess
Your main test runner itself can call runTests
for each testing module and exit with the appropriate Unix status code.
module Main where
import qualified MyTests as MT
import System.Exit
main :: IO ()
= do
main -- add test runners into the array for each module
<- and <$> sequence [MT.runTests]
good if good
then exitSuccess
else exitFailure
While the Tasty, HSpec, and test-framework testing frameworks can run QuickCheck Properties they do not output statistics equally well. The built-in Template Haskell test runner in QuickCheck outputs full information (including collect and classify results), as does Tasty. The other two hide the statistics and obscure when failures are caused by too many rejected conditionals.
Testing IO or ST Actions
QuickCheck was originally designed for pure functions, but the Test.QuickCheck.Monadic module supports tests which perform IO actions like calling external programs. There is some highly academic material about it online but it’s quite easy in practice. Here’s the general form of monadic tests:
-- using the functions
-- monadicIO :: PropertyM IO a -> Property
-- run :: Monad m => m a -> PropertyM m a
-- assert :: Monad m => Bool -> PropertyM m ()
= monadicIO $ do
prop_monadic a b <- run (f a)
a' <- run (f b)
b' -- ...
assert someBoolean
The module exposes a bunch of new monadic functions but they correspond to concepts we already know. The examples in Test.QuickCheck.Monadic are very clear and will quickly get you up and running.
pre
is(==>)
forAllM
(andpick
) isforAll
monitor
adaptscollect
/classify
monadic
,monadicIO
,monadicST
all fit the new code into regular Properties
CoArbitrary and Gen (a -> b)
Sometimes it’s useful to generate not just values, but functions. Consider a test involving images. The JuicyPixels package provides a function generateImage
to make them. It requires a function from coordinate to pixel value as its first argument, and the image width and height. Suppose we want to use it to generate not a specific picture with known pixel function, but random pictures.
We know how to generate an arbitrary width and height, but how do we get a random function to assign colors to pixels? It turns out all we have to do is provide an Arbitrary instance for pixels and QuickCheck can “magically” generate functions Pixel px => (Int -> Int -> px)
. Let’s see it in action first, and then discover how it works internally.
-- Magically generates a function "f" and also uses
-- suchThat :: Gen a -> (a -> Bool) -> Gen a
import Codec.Picture
instance Arbitrary PixelRGB8 where
= PixelRGB8 <$> arbitrary <*> arbitrary <*> arbitrary
arbitrary
genImage :: Gen (Image PixelRGB8)
= do
genImage <- arbitrary
f <- arbitrary `suchThat` ( \(x,y) -> x > 0 && y > 0 )
(x, y) return $ generateImage f x y
Here is one of these images.
<- generate $ resize 1000 genImage
img "/tmp/qc.png" $ encodePng img BL.writeFile
(As an aside: in real test code you won’t even have to write your own image generator. Image and other media generators are provided by the QuickFuzz library.)
QuickCheck was able to automatically generate a random function for us because the type of the function’s arguments, Int
, is an instance of the CoArbitrary
typeclass.
The fundamental trick is the observation that a function which modifies a generator can be transformed into a generator of functions. That is, it is possible to write a function promote :: (a -> Gen b) -> Gen (a -> b)
. (The modern incarnation of QuickCheck generalizes to Monad m => m (Gen a) -> Gen (m a)
.) The way such a function would work is not too deep, it’s a matter of rearranging arguments:
a -> Gen b = {meaning of Gen} a -> Int -> Rand -> b
= {reorder args} Int -> Rand -> a -> b
= {meaning of Gen} Gen (a -> b)
The real trick is how to get a function a -> Gen b
in the first place. QuickCheck provides a way to get a related function, a -> Gen b -> Gen b
for a variety of choices of a
through instances of the CoArbitrary
typeclass:
class CoArbitrary a where
coarbitrary :: a -> Gen b -> Gen b
The type signature of coarbitrary
allows it to fit into the crucial instance for Arbitrary (a -> b)
.
instance (CoArbitrary a, Arbitrary b) => Arbitrary (a -> b) where
= promote (\a -> coarbitrary a arbitrary) arbitrary
It was this instance which, through repeated application, gave us Arbitrary (Int -> Int -> PixelRGB8)
.
How do CoArbitrary instances even work? How do you write a function a -> Gen b -> Gen b
to transform any generator, without knowing something about the b
type and how to alter its values? The answer is to deterministically alter the stream of pseudo-random numbers feeding the generator. QuickCheck provides a primitive variant :: Int -> Gen a -> Gen a
function to do this.
With high probability, variant
alters a generator differently for different integer values. In fact the function has been designed carefully so that any composition variant i_0 . variant i_1 . … . variant i_k
modifies a generator differently for distinct i_0, i_1, …, i_k
.
The variant
primitive allows CoArbitrary instances to modify unknown generators:
instance CoArbitrary Bool where
=
coarbitrary b if b then variant 0 else variant 1
The variant
composition property above allows the technique to extend to recursively defined datatypes. QuickCheck provides CoArbitrary instances for a large number of basic types in the Test.QuickCheck.Arbitrary module, even Map and Set. Creating variants can be automated in your own datatypes by deriving Generic, at which point you can simply write instance CoArbitrary MyType
.
Resources
Other Documentation
- http://www.cs.tufts.edu/~nr/cs257/archive/john-hughes/quick.pdf
The original paper introducing QuickCheck. It’s an excellent read. - https://hackage.haskell.org/package/QuickCheck
The package on Hackage. Read the docs in its modules. - http://haskellbook.com chapter fourteen
A thorough introductory Haskell book with lots of examples and exercises. - http://www.cse.chalmers.se/~rjmh/QuickCheck/manual_body.html
A pithy howto. - http://www.cse.chalmers.se/~rjmh/Papers/QuickCheckST.ps
Testing IO using algebraic laws, abstract models, or pre- and post-conditions. - http://publications.lib.chalmers.se/records/fulltext/232550/local_232550.pdf
Experience report about testing nontrivial properties. - https://jaspervdj.be/posts/2015-03-13-practical-testing-in-haskell.html#the-action-trick
Generating actions against an API rather than directly constructing datatypes.
Helper Libraries
- https://hackage.haskell.org/package/quickcheck-unicode
Wider sample of Char with unicode weirdness. - https://hackage.haskell.org/package/quickcheck-special
Boundary values for ints and floats. - https://hackage.haskell.org/package/quickcheck-instances
Arbitrary instances for many types, including dates and times. - https://hackage.haskell.org/package/quickcheck-regex
Generator for strings matching a regex. - http://quickfuzz.org/
Generators for images, audio, documents, net addresses, time. - https://hackage.haskell.org/package/props, https://hackage.haskell.org/package/test-invariant
Helper abbreviations to test laws like associativity and idempotence. - https://hackage.haskell.org/package/checkers
Generator combinators and properties for common typeclasses. - https://hackage.haskell.org/package/fgl-arbitrary
Arbitrary instances for graphs.