Towards Better Unit Testing
Last year, I wrote a post about the problems with code coverage as a metric, and left the topic as “to be continued”, saying that I had some ideas for a better approach. Well, it's taken long enough, but here are the first pieces in that objective to evolve a better way to create and measure well-tested software.
What Makes Good Unit Tests
They should exercise a method or function with an explicit output, not side effects. In other words, unit tests are not the place to examine if or how a function changes global state or other parts of a running program (that’s a job for integration testing, another topic). Unit tests must be reducible ultimately to a single logical statement: “the result of calling function x with the input a should be the output b”.
The input may be a simple value, a tuple, an instance of a complex type, or even some structure that captures and holds a number of different states. But the function under test must only access those values by receiving them as input parameters, not through singletons or instance variables on the function’s enclosing type (in the case of methods).
Similarly, the output can be a single value, a tuple of multiple values, or an instance of a complex type. But all results of the function’s work should be contained in this output, not set through singletons, ivars, or other retained references to outside code.
This concept is important, because some of the most common difficulties developers have in writing and maintaining units tests is trying to manage layers of mocked objects or singletons, establishing the desired (or required) initial environment state, and checking side effects. A change to any one of those moving pieces will break the test, or pollute its results. And a unit test that breaks when something other than the function it tests changes is not a unit test.
This is not to say that those types of tests can’t or shouldn’t be written, but they are a different kind of test, and are far more application-coupled than good unit tests, which should be able to move and be reused seamlessly along with the code they validate. And by adhering to the principles of good unit tests (including this first criterion), you will find that the complexity and fragility of many of the unit tests you’ve probably had difficulty with in the past will fall away.
And if you are thinking that this first requirement of a good unit test has just as much to do with how you write your code as it does with how you write your tests, you are absolutely right :)
They should verify not just the most optimistic input and output, but as many possible categories of input as is feasible, and the resulting output. By category, I mean for example, when dealing with a string input: short strings, long strings, alphanumeric strings, string with punctuation and / or other ascii symbols, strings with unicode values in all possible ranges, empty strings, etc.
For unit tests that verify the creation of models from json data, this would mean testing all the broad variations of possible json input: empty or missing keys, wrong values for keys, and combinations of various valid values. Note that many json parsing libraries already have their own unit tests for verifying that missing keys / wrong values are handled in the expected way, so your own unit tests would only be verifying that your own code responds correctly to the output or errors generated by those libraries.
They should most definitely verify the edges of any valid input ranges. For example: if a function expects an input that is an
Int
between and including 1 and 100, the values 1 and 100 should most definitely be tested as producing a valid result, and the values 0 and 101 (just outside the range) should be tested and validated as creating the expect error or failure.Although not really relevant to unit testing, a similar principle exists for UI or integration tests. For example, when UI testing something like a label that accepts a string of up to 8 characters, a good test should examine the widest and narrowest possible string values for the situation, e.g. “WWWWWWWW” vs. “llllllll” as the edges of the valid input range.
They should also test invalid cases, meaning: inputs that you don’t expect to ever receive, but which are nonetheless possible. For example, this function:
func characterCount(for value: Any)->Int { return (value as! String).characters.count }
In order to accept values with the type
Any
, the function specifies this as the parameter type. But the programmer who wrote it clearly never expects a value any type other thanString
to be passed in (because the force cast will crash the program in that case).But, because a good unit test should test any possible case, even invalid and unexpected, a unit test for this function should include passing in a non-
String
value and validating the desired behavior (even if it is an error) in that case.Note that there is no need to test cases the compiler won’t allow in the first place. For example, you shouldn’t test trying to pass a nil value into a function that takes a non-optional parameter, or test trying to pass an
Int
value to aString
parameter. These would not be considered possible cases since they should already fail to compile.
They should test any special cases specified in business requirements and any bug cases that have been discovered during or after initial development. For example, if for the
characterCount()
function above, certain strings with Unicode characters are found to return the wrong count (because of the way Unicode allows for combining characters to for single glyphs), not only should the function implementation be corrected to account for this, but a unit test case should be added for the category of input (strings with combining characters) that triggered the initial bug. This test will catch any future changes to the function implementation that might trigger a regression and break what you just fixed.
What Are Good Unit Tests Good For Anyway?
So, assuming you write unit tests that meet all the above criteria for being “good unit tests”, what value do you get out of that? This is an important point, because I’ve seen the value of unit tests be both dramatically overstated and understated. And considering how spending time to write unit tests is often controversial with clients and/or management, the more specific and factual the arguments for writing them are, the better!
Safe refactoring. It is often assumed that unit tests exist primarily to ensure that code meets the requirements of some specification, or the acceptance criteria for a feature or user story. But this is very rarely the case for unit tests and is again the domain of integration or functional tests (which pretty much exist entirely to ensure that code performs to a specification). Mostly, this is because requirements and specifications almost never define expectations down to the level of a single function, which is all unit tests should be concerned with. If you do have acceptance criteria or a specification which is so detailed that it tells you what all the different success and failure and edge cases are for individual functions… well, wow — that’s awesome and unit tests will help you prove those out.
But for the rest of us, unit tests are written for developers — ourselves and others — in the future who will be modifying and changing code. Good unit tests are intentionally ignorant of how a function is coded, and care only about “does the function produce the correct output for an input?”. This intentional ignorance allows for something amazing: you can completely rewrite the guts of your app, implement new algorithms and libraries, switch architectures or change almost anything about your code, and your unit tests will never need to themselves be rewritten. Instead, they will simply tell you if any of those changes to your code broke the basic contract and expectations of a given function or method. And when the tests don't fail, they can validate that your changes did not break anything.
This may or may not be a valuable payoff for all developers and teams, but if you plan on maintaining and improving an application for any period of time, the freedom to refactor and change things confidently, quickly and safely is one of the greatest gifts imaginable.
The most complete, and guaranteed up-to-date documentation for your code. The problem with documentation (and this even includes comments and Swift markup in your source code) is that it gets stale. Code changes, but the documentation doesn’t, and before too long your code is on version 3.0 and your documentation is stuck on version 1.0 and is now misleading or useless.
Plus, documentation is often very high level like “this function counts and returns the number of characters in a string”, but does not address specific questions like “how does it handle emojis?”, “does it count punctuation as characters?”. Often, to answer those questions we need to dive into the implementation and start reading through the source code directly. But good unit tests can quickly be read and scanned as a simple statements of "calling function x with input a should result in the output b”, remember? And this becomes a fast and easy way to understand exactly how a function is expected to behave without having to read the source code.
And, crucially, because your tests will fail if those expectations are no longer being met, or if the code now reflects different expectations than the unit tests call for, it is practically guaranteed that unit tests as documentation will always be accurate and up-to-date.
This usage of unit tests as documentation is so useful that some languages even allow unit tests to be written right next to the source code, in the same file. This ensures that the tests for code always travel with the code if a file is reused in another project. But it also puts the documentation of expected behavior right where the code is defined, where it is most useful. This may sound strange, but not only does it make sense in light of the benefits covered here, but it is something that may be in Swift’s future as a solution to allowing unit testing of private methods and functions according to Chris Lattner.
And that is it, for the most part. A fairly complete discussion or what good unit tests consist of, and what they are good for. And with that established, we can tackle to main point of this post: how can we make a better way to write, maintain and measure the good unit tests our apps should have?
A Separation of Concerns
Doing things “better” in programming usually involves a few familiar themes, including: making code more reusable instead of writing similar code over and over, breaking code that does too many things into smaller independent pieces that do one thing each, and separating out pieces that need to change frequently from pieces that don’t. How can we apply these themes to writing unit tests in Swift?
Inputs and outputs should be data, not code. As mentioned in the earlier description of good unit tests, there should almost always be a large number and variety of inputs and expected outputs tested on the same function. If you think about it, a test that passes one input into a function and checks the result is logically identical to another test that passes in a different input and checks for a different result. The only difference is in the data (input and expected output), not the code needed to set up the test, call the function being examined, and perform verification.
Yet it is very very common to see iOS unit tests written with one test method per input being tested. So testing 100 different input values and their expected output would mean 100 different test methods. That’s a lot of extra boilerplate, extra maintenance (imagine if the function being tested had a name or signature change… that would need to be updated in 100 places!), and redundant code.
So the first way good unit tests can be made better is to separate the part that doesn’t change (test setup, execution and verification code) from the part that does change (inputs and expected outputs). The mapping of inputs to expected outputs can be set up in a number of ways: a dictionary, a plist, a json file, etc. But the important thing would be to make defining many different inputs and outputs simple and free of boilerplate or redundancy.
Similarly, separate out the code that validates a value returned from a function being tested against an expected value. For example, if a function returns a
Person
instance, which has the propertiesfirstName
,lastName
, andage
you will usually see code like this inside multiple XCTest methods:let expectedOutput = //set up what the expected `Person` output should look like somehow let testOutput = getPerson(withID: 636) // exercise the function being tested with a known input XCTAssertEqual(testOutput.firstName, expectedOutput.firstName) XCTAssertEqual(testOutput.lastName, expectedOutput.lastName) XCTAssertEqual(testOutput.age, expectedOutput.age)
But ultimately, there should be a single method that can compare a person instance against an expected value and validate all properties. And that method should be called every time validation is needed, rather than verifying a series of properties over and over in different test methods.
Define and stick to a template for good unit tests that simplifies (and limits) writing a unit test to its most minimal form. This means simply pointing to a mapping of inputs and expected outputs, and a small piece of code that applies the inputs to a function or method under test, and then passes the resulting output to the method described in point 2 above for validation. This not only creates consistency and a great starting point for developers who aren’t used to writing good unit tests, but it also limits the ability to write brittle or complex units tests (or integration tests disguised as unit tests). The benefit of this shouldn't be underestimated. If you've seen how convoluted and messy unit tests look on many iOS projects, you'll understand how consistency and simplicity in unit tests can be a game changer!
Introducing TestKit
As those who know me might have guessed, I have followed my own advice and created a small open source library that does steps 1-3 above for you. It’s called TestKit, and if you’ve read this far, you’ll immediately understand the principles it is built on.
With TestKit, there is a very simple template for all unit tests. First, load the spec (inputs and expected outputs) from a json file and provide an optional handler for failed cases:
let spec = TestKitSpec.init(file: "ValidPasswordTests") { XCTFail($0.message) }
Then call the run()
method on the spec instance and pass in a closure which receives the test input, passes it to the function being tested, and returns the result from that function, like this:
spec.run(){
(input:String) -> Bool in
return isValidPassword(string: input)
}
That’s it! The template makes all your unit tests look like this:
func testIsValidPassword() {
let spec = TestKitSpec.init(file: "ValidPasswordTests") { XCTFail($0.message) }
spec.run(){
(input:String) -> Bool in
return isValidPassword(string: input)
}
}
No complex setup or boilerplate code, just a simple “dumb” test that exercises a single method or function.
The accompanying spec / json file looks something like this:
{
"test-description": "Test cases to verify password validation logic",
"test-cases": [
{
"name" : "Valid Passwords",
"inputs" : ["1sdfD8sFlk", "Happy763!"],
"expected-output" : true
},
{
"name" : "Valid: Min and Max Length",
"inputs" : ["Snd6HHus", "sDG$34DdfsfFs8aa"],
"expected-output" : true
},
{
"name" : "Invalid: One Below and One Above Allowed Length",
"inputs" : ["Snd6HHu", "sDG$34DdfsfFs8aa1"],
"expected-output" : false
},
{
"name" : "Invalid: Too Short",
"inputs" : ["", "Ask87d"],
"expected-output" : false
},
{
"name" : "Invalid: Too Long",
"inputs" : "Asdhalkjd234FSfdjflksj@fsffShkjdkjs5sdfkjh34",
"expected-output" : false
},
{
"name" : "Invalid: Non-ASCII",
"inputs" : ["WeirdThing77™", "Asd54Fsd!😀"],
"expected-output" : false
},
{
"name" : "Invalid: No Number",
"inputs" : "SfsdfEeEEff!",
"expected-output" : false
},
{
"name" : "Invalid: No Lowercase",
"inputs" : "ONLYC8PITALS!",
"expected-output" : false
},
{
"name" : "Invalid: No Uppercase",
"inputs" : "lowercase4thewin",
"expected-output" : false
}
]
}
And this combination of simple unit test with externally defined inputs/outputs results in fully testing 14 sets of input and output values, covering a variety of valid and invalid cases.
The console output looks like this:
Test Case '-[TestKitExampleTests.TestKitExampleTests testIsValidPassword]' started.
TESTKIT: Running tests from file: "ValidPasswordTests.testkit"
TESTKIT: Starting test case named:"Valid Passwords"
input 1/2 verified
input 2/2 verified
TESTKIT: The test case named:"Valid Passwords" has PASSED
TESTKIT: Starting test case named:"Valid: Min and Max Length"
input 1/2 verified
input 2/2 verified
TESTKIT: The test case named:"Valid: Min and Max Length" has PASSED
TESTKIT: Starting test case named:"Invalid: One Below and One Above Allowed Length"
input 1/2 verified
input 2/2 verified
TESTKIT: The test case named:"Invalid: One Below and One Above Allowed Length" has PASSED
TESTKIT: Starting test case named:"Invalid: Too Short"
input 1/2 verified
input 2/2 verified
TESTKIT: The test case named:"Invalid: Too Short" has PASSED
TESTKIT: Starting test case named:"Invalid: Too Long"
input 1/1 verified
TESTKIT: The test case named:"Invalid: Too Long" has PASSED
TESTKIT: Starting test case named:"Invalid: Non-ASCII"
input 1/2 verified
input 2/2 verified
TESTKIT: The test case named:"Invalid: Non-ASCII" has PASSED
TESTKIT: Starting test case named:"Invalid: No Number"
input 1/1 verified
TESTKIT: The test case named:"Invalid: No Number" has PASSED
TESTKIT: Starting test case named:"Invalid: No Lowercase"
input 1/1 verified
TESTKIT: The test case named:"Invalid: No Lowercase" has PASSED
TESTKIT: Starting test case named:"Invalid: No Uppercase"
input 1/1 verified
TESTKIT: The test case named:"Invalid: No Uppercase" has PASSED
TESTKIT: 9/9 test cases PASSED for the file: "ValidPasswordTests.testkit"
Test Case '-[TestKitExampleTests.TestKitExampleTests testIsValidPassword]' passed (0.100 seconds).
Additional Niceties
In additional to testing expected input and output, TestKit can also expect and verify that specific errors are thrown as a result of certain input values. This is an important part of testing failure cases, particularly if you allow your app to recover from certain errors and thus want to make sure the right error is thrown under the right conditions.
Lastly, separating the bulk of unit test code into data contained in a json file allows that unit test data to be shared across platforms. So just as we discussed the primary benefit of good unit tests being safe refactoring, TestKit helps you realize one more benefit: safe porting of code to other platforms.
Because all the expected valid, invalid, edge and failure cases are contained in a platform independent json file, any platform (iOS, Android, browser, server) can load and validate the spec. TestKit is a small library consisting of about 500 lines of Swift source code, and once ported to other languages or platforms, the unit tests for iOS code can be moved over to Android, Web, etc. with little to no effort. So when a team ports and / or reimplements code from another platform, TestKit unit tests will validate that the new implementation is correct and fulfills the same expectations as the original code!
What About Measurement?
In my original article on code coverage, and at the beginning of this article, I mentioned the goal of easily measuring the quality of unit test coverage. TestKit currently logs information about the number of cases and inputs tested, and that is a first step. The next step will be to add a general report on the number of functions tested and the number of cases and inputs validated for each. This will give a much better picture of how well your code is tested, vs. the simple true / false of the code coverage metric.
Beyond that, perhaps there can be more formalization of reporting around what kinds of test cases are present (failure, valid, invalid, edge) and perhaps even some built-in sets of reusable input sets for string, numbers, etc. that allow you to easily expect a function to, for example, succeed with “All variations of Unicode strings of 16 characters or less”.
As for the final step of accomplishing a metric that with a breadth similar to "code coverage", but with far more comprehensive requirements for the robustness of those tests, that will depend on the future dynamism of Swift. Specifically, the ability to get a listing of all module methods and functions at runtime, which can be compared with the methods and functions that have TestKit cases for a broad, but also thorough metric of how well-tested your app is.
I’m really looking forward to that, but in the meantime, I hope you give TestKit a try and find it as useful and exciting as I have. As always, I appreciate your feedback here in the comments, or on GitHub.