What About Unit Tests?

February 27, 2020

If you’ve been reading this Fresh Look at Testing series of blog posts, you may have noticed that there is little to no mention of unit tests. So what about unit tests? Unit tests are the most common and frequently discussed kind of tests in software development at large and seem to have the most support among developers. However, you won’t see them discussed or recommended in this series because they are a strange mix of too vague and too specific to be useful, and they often fall short of delivering enough benefit for the effort it takes to write them. Let’s take a look at why this is:

Unit tests must exercise a single “unit” of code, but what is a unit? Most commonly, it is a single method, but it is also often a class or struct, or even an assembly of pieces. Some people write “unit tests” for views, that create a view in code, take some actions, then verify a variety of properties on that view.

Therefore using the name “unit” is essentially meaningless (too vague) since a unit can be almost anything. And while the vast majority of unit test are API tests as described in an earlier post, the “unit test” designation is somehow less specific and useful.
On the other hand, unit tests are generally forbidden or discouraged from testing “more than one thing” (whatever “thing” may be) or “the interactions between things”, because this is the domain of “integration tests”. And so in too many examples to count, teams often discourage their developers from writing important, even crucial tests to validate behaviors that involve multiple types or methods — all because those tests wouldn’t be “unit tests” and “everyone agrees that we should primarily be writing unit tests”.

This is when the concept of “unit test” becomes too specific — when somehow a test must be reduced in practical scope in order to fulfill an arbitrary requirement that “only one thing” be tested.

Some of the fixation on unit tests comes from a rational desire to have tests that can run quickly, and which aren’t brittle. But any API tests which test a behavior that involves multiple types should be just as fast as “unit tests”, because API tests are code-to-code, not UI-based.

Concern about a test that validates a crucial behavior that involves multiple types being brittle is probably misplaced. Making sure tests aren’t brittle is vital, and accomplished by testing behaviors rather than implementations.

Neglecting to test and validate a crucial requirement and instead opting to only test the individual pieces in isolation is failing to write the most important test in favor of writing several others that won’t catch the right regressions and are likely to never be useful.

A Real Example

I’ll give you a real recent example I encountered of overly focusing on “unit tests”. In a large app with millions of users, a lot of excess metrics were being sent from each client and chewing up database storage.

A feature was added to allow the server to send a flag that would tell the client app not to send certain performance metrics for the session (sampling). The feature was implemented by first decoding a shouldSendPerformanceEvents key from the startup response. Next, the tracking manager was given a shouldSendPerformanceEvents property, which if false, would cause it to not send any of the performance events that were reported throughout the app.

Two unit tests were written:

A test to validate that the flag from the server response was properly decoded into the StartupResponse struct. This test passed, because the flag was being properly decoded.
A test to validate that when the tracking manager’s shouldSendPerformanceEvents property was set to false then calling the trackPerformanceEvent method didn’t result in an event being sent.

There is nothing wrong with these unit tests. However, a couple months after the feature was deployed (with all unit tests passing), it was discovered that NO performance events were being sent at all from any clients. The reason? Because even though the feature flag was being decoded properly, and the tracking manager was doing what the shouldSendPerformanceEvents property indicated, there was no code written in the app to set the value decoded from the startup response on the tracking manager’s shouldSendPerformanceEvents property. And that property defaulted to false.

So, even though the flag was being sent from the backend as true for 50% of clients, there was no code that set shouldSendPerformanceEvents to that true value on the tracking manager. And since the property’s default was false, no performance events were being sent at any time from any client.

The most important requirement was that the tracking manager’s shouldSendPerformanceEvents property should be set to what was specified by the server. But since that would involve an interaction between multiple types (the startup code and the tracking manager) and thus would not be a “unit test”, that most important requirement was never tested.

If You Don’t Believe Me, Believe Meme

There may be no better way to make the point about the pitfalls of depending on overly small, isolated unit tests than the many memes you can find under the banner of “2 unit tests, 0 integration tests”. Here are a few good examples 😄

So to say it plainly: there is no value in artificially limiting tests to strictly be “unit tests”, and there is no advantage to writing such narrow tests instead of testing the actual crucial requirements.

It’s worth noting that the reverse is not true: in the real world example I gave above, there would have been value in just writing the integration test instead of the two unit tests. An integration test that mocked the startup response as input, then validated that the tracking manager did or did not send events as a result would have caught this bug. And that test also would have caught any bug in decoding the flag or in the tracking manager not acting in accordance with its shouldSendPerformanceEvents property. The key point is that the actual, full requirement must be what is tested, not somewhat arbitrary subdivisions of that requirement tested in isolation.

And this brings up the last problem with unit tests as they are typically implemented: rather than testing an explicit, real requirement for the code, unit tests depend on developers independently coming up with sets of tiny possible subrequirements for a single method or class and creating narrow tests for them. Not only does this put the burden of inventing or conceiving of all possible requirements onto individual developers, but it encourages a “missing the forest for the trees” mentality where it’s better to have a dozen tests that don’t relate to a larger real requirement than a couple of tests that do.

This can (and does) lead to an explosion of hundreds or thousands of unit tests which tend to be very narrowly useful (or not useful at all), incur a large cost to write and maintain, and yet somehow never manage to detect when the real requirements for the app are being missed. There are some compelling arguments that simply testing higher level requirements will also exercise all the behaviors of the units that matter, and do a better job verifying behaviors while requiring far fewer tests and effort.

For those interested in diving into some of those arguments, these articles are a really interesting place to start:

Why Most Unit Testing is Waste

An Even Better Follow-Up

Wrap-up

The reason why unit tests are generally popular is because they are fast and should have a relatively few number of dependencies that cause them to fail (not brittle). These are good attributes, but validating an actual requirement is the most important concern of any test, and brittleness should only be reduced to the greatest degree possible that still tests the full requirement.

So the benefits of speed and least possible brittleness are important, and are absolutely provided by the concept of API tests. There is no reason to fixate on unit tests specifically (or vaguely) as an artificial constraint on how many pieces a test is allowed to involve. And so the rather arbitrary definition of what a unit test should be, along with the frankly harmful results of fixating on them is why this series will not be referencing them or encouraging that mental model at all.

Next up: See how to stop worrying about unit tests by focusing on a useful and practical approach to testing

What About Unit Tests?

A Real Example

If You Don’t Believe Me, Believe Meme

Wrap-up

Share this article with friends