I do most of my interesting programming at home currently. Limited time in the evenings means that development is very stop-starty: projects get periods of mad enthusiasm and then are dropped for a few months while I concentrate on something else.

In this context I've found that having a large suite of automated tests can be a double edged sword: Usually when I come back to a project after a few months I have slightly different goals and so want to change the codebase, often drastically.

If the tests are tightly coupled to the implementation this adds significant drag, and is sometimes enough to cause me delete a ton of code and start again. Worse than that, occasionally I just stop running tests and hack.

So over the years I've come to spend an increasing amount of effort isolating tests to make them less brittle in the face of change. This is not an exact science but there's often some way to re-gig a test to make it achieve most of the same purpose without being quite so tightly coupled to the implementation. I guess for my purposes I'm advocating mostly blackbox over whitebox testing at some level. Here's a couple of examples:

System test data in 'native' formats

I like to code 'top-down' as it keeps me focused. I usually start with some data that I want presenting or transforming or storing or mining or whatever. I write some small system tests and code down from there.

I learnt this tip the hard way: It's very advantagous to have test data in a native external format that's not likely to change.

On my first data-aggregation project I had most of the tests using an internal triples format of my own design, which meant that when I changed my mind a few months later I had a ton of testdata and tests to change. I ended up deleting a lot of the code and starting again.

The second time I picked up the project I converted all the inline testdata to CSV and JSON and made the tests run an implicit import transform before invoking the top-level functions. The tests became slightly more complex but also less brittle and I'm now much less likely to delete them.

Inputs/outputs as language primitives

Inevitably as a codebase gets bigger I find that adding top-level blackbox tests isn't enough to drive development and that I need whitebox tests at a unit-level to help with algorithmically intense parts of the project. These tests increase motivation and speed up my coding but unfortunately are a lot more brittle during change and tend to be the ones that get deleted first when I come back to a project.

To combat this I often find it's worth refactoring important algorithms into functions that take language-primitive arguments (e.g. ints, lists etc..), separate from the object graph of the application.

  • A totally contrived illustration:
  • Replace:

        Foo::do-something-clever-with-Bar-Objects( objects )
        do-something-clever-with-id->name-pairs( id/name-pairs )

    and have 'Foo' callers unpack the Bah objects into an list of id,name pairs before calling the function.

The tests checking 'do-something-clever' functionality are now less coupled to the internal object graph and are passing only the data required to fulfill the operation.

Now this is obviously a tradeoff: The additional unpacking may add overhead (sometimes not). It might make the function interface unnecessarily complicated. Sometimes the tradeoff works well, sometimes it doesn't, but I always at least consider trying to separate out domain-objects from an algorithmically intense function. Often the algorithm is central to the application but the layout and interaction of the object graph is contrived.


It might be that I'm missing some important piece of the testing puzzle - I've mostly coded test-first for as long as I can remember but I've always had a mixed relationship with the outcome. Hopefully there's a silver bullet somewhere that I just haven't been told about yet.