Fuzz testing makes you a better programmer

by Michael Bernstein on June 27, 2017

How to gain confidence through randomness in Elm

The programming languages that will gain the most mindshare in front-end programming are those that will steal the best ideas from other languages and incorporate them in a natural way. There are a few languages trying their best to steal as liberally as possible: ClojureScript has done a great job by this measure, as has PureScript, but today I’m going to write a bit how Elm has combined theft with usability in a way that can help an average programmer like me gain more confidence when testing tricky bits of code.

First, I’ll introduce the idea of “Fuzz Testing,” and briefly describe why it’s so useful. Secondly, I’ll describe how Fuzz Testing is integrated into elm-test and argue that its successful integration is just as important as the technique itself.

The idea: Reach for the “fuzz”

Fuzz Testing (also known as generative testing or property-based testing) starts with one idea:

Fuzz Tests randomly generate appropriate data to automatically test your code.

This idea is probably most easily understood in contrast with Unit Testing, which is a widely employed technique for asserting the correctness of complex code. With Unit Testing, you write individual test cases which are designed to express one specific behavior that you’d like to assert about your program. With Fuzz Testing, you write one fuzz test which can be used to test an arbitrary number of values as inputs, thus making sure that your tests cover a wider range of potential inputs than you can with unit tests.

As an example, I’m working on some code that required me to write some tests to assert the behavior of a data structure that is core to my program. The main data structure is a Funnel, and a Funnel is composed of Stages. Here are the type declarations in Elm:

type alias Stage =
    { name : String
    , count : Maybe Float
    , conversion : Maybe Float
    }


type alias Funnel =
    { stages : Dict Int Stage
    }

A Stage is a record containing three fields to hold domain data. A Funnel is a record with one field: stages, which is a Dict (or map) from Integer values to Stage values. A Funnel has three main operations:

Add a stage
Delete a stage
Update a stage

Because of details of the implementation that aren’t important here, updating is simple and doesn’t need a lot of testing. In contrast, adding and removing stages are more complicated because they involve compacting and ordering the Integer ids for the stages Dict. That means they need some really good tests.

Because of my relatively well-honed complexity Spidey-Sense, I was particularly concerned about the algorithm maintaining a correct index after a sequence of deleting and inserting stages. I started by writing it out as a unit test, and I was suddenly faced with the following question: *what sequence of stage ids might cause incorrect behavior? *Not so surprisingly, I had no idea!

I tried a few different sequences and then realized that this was a perfect situation for applying a fuzz test. I knew the basic situation I had to test, but didn’t know which combination of parameters might cause bad behavior. I could give a fuzz test the appropriate parameters, and let it loose to find some bad values for me. Here’s a picture to help explain:

Unit tests require choosing concrete inputs for test data, while fuzz tests allow you to simply provide descriptions of that data.

The top part of this graphic depicts how you would approach writing this test with a Unit Test. Typically, you’d study your code, reason about some potential input combinations, and manually enter them in order to see that some invariant holds under those circumstances. This is not a bad approach overall, but it is one that can be blind to some potential inputs that don’t necessarily bubble up when you’re trying to exhaustively consider all input combinations.

The bottom part of the graphic depicts how you approach this with fuzz testing: instead of manually trying different colored stars that represent specific combinations of concrete values, you supply a cool rainbow star that represents all integers! This lets your tests know that you need some random integer values in certain places. You then ask it to run the test a certain number of times to try and make sure you’re exhaustively testing inputs, and then sit back and see if failure happens.

Fuzz tests make you a better programmer by making it easier to spot the flaws in your implementations. As the person writing tests for code you’ve already written, or code you’re about to write, you’ll have good intuition into the types of things that may go wrong. That’s what fuzz testing lets you do: instead of knowing exactly what will go wrong, all you have to know is the types of things that may go wrong. That’s a big difference.

Usability rules: How to fuzz in Elm

Here’s a small snippet demonstrating how I changed a simple Unit Test I wrote in elm-test to a Fuzz Test. Note that I’ve elided the “Expect” part of the test because its distracting and irrelevant:

-- This unit test will run and test one combination of inputs

test "Max index equals length - 1 after deletions" <|
  \_ ->
    let
      newFunnel =
        Funnel.initFunnel
          |> Funnel.deleteStage 2
          |> Funnel.deleteStage 3
          |> Funnel.insertStage 1 (Stage "Nobody" Nothing Nothing)
    in
    Expect.equal ...

-- The fuzz version will run with 100 different input combinations

fuzz3 int int int "Max index equals length - 1 after deletions" <|
  \f1 f2 f3 ->
    let
      newFunnel =
        Funnel.initFunnel
          |> Funnel.deleteStage f1
          |> Funnel.deleteStage f2
          |> Funnel.insertStage f3 (Stage "Nobody" Nothing Nothing)
    in
    Expect.equal ...

If the differences seem extremely minimal, its because they are, and this alone sums up why I think elm-test nailed the usability of adding Fuzz Testing to its API. Here’s how to use Fuzz Testing in elm-test:

Instead of writing the word “test,” write the word “fuzz.”
Examine your function inputs to see what the appropriate Fuzzer might be for your specific test (this can sometimes be very tricky, but that’s a topic for a different essay).
Instead of passing concrete values, pass randomly generated data via your Fuzzer that conforms to your functions inputs.

That’s pretty much it. In addition to the transparent interface, instead of having to do a bunch of extra work, the tooling just runs the fuzz tests for you, defaulting to running 100 times per test. Crank it up to 1,000 just for kicks. Go ahead, see what happens.

All of this means that there’s a chance that programmers who use Elm and elm-test will actually embrace augmenting their Unit Tests with Fuzz Tests. It might actually become popular! This is a very good thing because everyone who writes Fuzz Tests will be a better programmer because of it.

Special thanks to Ben Linsay and James MacAulay for their thoughtful reviews.