What Is Testing For Anyway?

The first part of a fresh look at testing

September 22, 2019

For something that is thought to be precise and unambiguous, the whole subject of software testing tends to be comically unclear. I’ve had a deep interest in the subject of automated testing for iOS for years now, and find that if you ask 3 people what the point of testing is, you’ll probably get a dozen different answers.

For kicking off my “Fresh Look at Testing” series, I thought it would be helpful to narrow down what the purpose of automated testing is (and what it isn’t). I believe that testing is much simpler and has a much more limited purpose than people usually tend to think.

First, What It Isn’t

Here are some things that automated testing is often credited with that it shouldn’t be: 

Ensuring app quality  

This is probably the most commonly attributed benefit to automated testing, but in fact quality is something that is focused on and imbued in an app from inception to execution and isn’t something that can be added or even “verified” in testing. Put another way: a terrible app can have hundreds of good unit and even UI tests that pass. A consistently great app can have no tests. Tests simply cannot increase or even measure “quality” and it’s not useful to expect or say that they do.  
Guaranteeing the absence of bugs  

The second most common misconception about testing is that it can guarantee that an application is free of most or all bugs. This is absolutely untrue. Considering the number of possible execution pathways and combinations of state for a typical codebase (trillions would be a gross underestimation), there has never been and likely never will be a test suite that exercises every possibility for correctness (assuming that what “correctness” is for all possibilities could even be described before the end of time). This means that virtually every app has thousands or more “bugs” that will never be noticed by an automated test, and even the best test suites can’t hope to scratch the surface of guaranteeing the absence of bugs to any significant degree.  

A much more successful approach for guaranteeing an absence of bugs is the evolving subject of formal verification, which approaches code as a form of mathematical proof. This isn’t testing however, it is designing mathematically provable code and the guarantee is as only as good and complete as the underlying proof. Automated testing that consists of loading up a program and simulating hundreds or even thousands of different scenarios at runtime is neither proof, nor math, nor formal verification.  
Demonstrating code correctness / correct implementation  

The last frequent but incorrect belief about automated testing is that it can be used to demonstrate that a programmer or programmers have written correct code. There are many problems with this assumption of which I’ll only touch on a couple.

First, tests do not and often cannot address the many most common types of incorrect code: such as code that causes memory issues (leaks, corruption, dangling pointers, etc.), dangerous side effects (like when you save that document it does get persisted but it also wipes out other saved data outside the scope of the test), code that is slow or scales poorly, or code which functions adequately under narrow optimistic / expected conditions but fails in a hundred other scenarios.   

Secondly, any correctness implied by a passing test is based in faith that the test itself is correct. But tests often don’t do or validate what a programmer may think they do. And very frequently (thanks to silly practices like code coverage mandates and unit test dogma that I’ll talk about more in depth later) programmers are only writing tests for the sake of saying they wrote tests, or specifically to show that that the code they just wrote is correct (usually after the fact). If I write code that says:  

2 + 2 = 5  

And then I write a test for that code that expects something like:  

Given the addends of 2 and 2, the result should be 5  

Well, the test “proves” that my code is correct, but in fact both my code and the test are wrong. This is far more common than you might think, and it’s a natural risk involved when programmers write tests for their own code after the fact.

This may seem critical of programmers writing tests for their own code (as opposed to someone else writing the tests), but in fact I think programmers testing their own code is a good thing, made better when the tests are written first and better yet when the tests themselves are defined by a person or people beyond the programmer... but more on that later in another post.

So Are Automated Tests Useless Then?

It may sound like I’m trying to dismiss automated testing as not being very beneficial. But stick with me here, because I’m actually a big fan of automated testing done right, and for the right reasons.  By eliminating the unrealistic goals for automated testing and not wasting time chasing those white whales, we can instead focus on what such tests can do well. Which is really just one thing, with a happy side effect.

The Purpose of Automated Tests

And that leads us to the actual purpose of all automated tests:  

Confirming expected behaviors

 That sounds obvious, but this simple statement unlocks a lot of important details.

Let’s start with the word “expected”. This is the key to everything, because no test can discover something new or unexpected. It can only accurately compare some results from a specific scenario with a well-defined expectation. This is the reason why tests cannot find new bugs or unanticipated defects in the way human testers (and your actual users) can — new bugs are by their very nature defects that were not expected, and thus no automated tests have been written to detect them.   

The other important insight from the word “expected” is that tests can never be better or more complete than the explicit requirements they are based on. Let’s take a simple example of a screen that only has a single button on it. If I handed you an app, already coded, that displays this single-button screen and told you to test it, what would you do?  

Well, the first thing you would probably try to do is figure out what the button does — Is it tappable? Does it trigger some action when tapped? What does its label say? Only by knowing what is expected of this button and screen can any sort of test be written. In fact, without me giving you a clear statement of those expectations, your tests are likely to be guesses and quite possibly wrong.  

For example, even if you just wrote a test the verify the presence of a single button on screen and nothing more, what if I had coded it so that the button only appears on screen during business hours (9am - 5pm)? If your test ran before or after that window it would fail, because it was expecting the wrong behavior. To have a reasonably complete and accurate test suite you would need the most complete statement of requirements / expectations that I could provide.   

And that is why this concise summary of what automated tests are for is particularly enlightening. In addition to being simple, it captures a vital insight: without clearly defined expectations of behaviors, there are no useful tests that can be written. And the corollary insight: no test suite can be more complete than the set of defined requirements for the code.   

Unfortunately, on most projects, requirements are vague at best and it is implicitly assumed that the programmer who codes a feature will fill in the blanks. So instead of tests being a clear expression of requirements, they are often a collection of undocumented programmer guesses and assumptions. This topic is something I’ll be talking about in much greater detail, but hopefully you are getting the idea that good testing isn’t really possible without good requirements. 
The other important detail is contained in the word “behaviors”. For tests to be useful, they need to verify a behavior as opposed to an implementation. What’s the difference? Well, a behavior is a high level description of what something is supposed to do. On the other hand, an implementation is a specific decision about how to do it.   

For example, sorting an array alphabetically is a behavior — it means that given an array of strings in a random order, after the sorting is executed, they will be in a predictable order based on their first letter. But there are many ways to implement sorting. Some well-known approaches are quick sort, merge sort, bubble sort, and others. These describe how to execute the sort. One of the key benefits of automated tests is that they allow programmers to change how code is implemented while ensuring that the behavior is the same.  

In the case of a sorting method for arrays, the test shouldn’t know or care about how the sort is accomplished. It should merely confirm the behavior is correct. If the code implements a merge sort algorithm today and passes its automated tests, and tomorrow the implementation is converted to a quick sort, the same tests should continue to pass without being modified because the behavior is the same.

A test that depends on, or is coupled to a specific implementation is not useful because changing the code it exercises will break the test and cause it to be rewritten. Tests should only change when requirements change, and should allow code implementations to be modified while continuing to confirm that their behavior is correct.

A Happy Side Effect

While the sole purpose of automated tests can adequately be summed up as Confirming Expected Behaviors, there is a nice benefit to be realized from have automated tests that do this. It’s so nice, that it’s a primary reason why some kinds of tests (like unit tests) are written at all. And that benefit is: confident refactoring

When your automated tests are written well, not coupled to specific implementations, and thoroughly confirm expected behaviors, it gives programmers the freedom to refactor the code underlying those behaviors: move it around, reorganize it, use different dependencies, swap implementations, etc. The automated tests will validate that the behavior before and after the refactor is the same — the key to a successful refactor.

This side effect of confident refactoring is a vital part of healthy teams and codebases, because it allows a wealth of great practices — from continuous refactoring to keep the codebase modern and clean, to faster iteration speed and feature delivery. The ability to know that the last set of changes made to a codebase have not broken or compromised the important expected behaviors of the application is one of the most important advantages you can bring to your team or project.

And automated tests (the right kind) are the path to get there!

A Fresh Look At Testing

Now that we’ve (hopefully) clarified and simplified the purpose of automated testing, my next posts will be diving into more detail about some of the principles and techniques for writing good automated tests! We’ll be taking a look at both some modern approaches and tools, as well as challenging some common ideas about testing that probably aren’t helping much at all.

Hope you return for the next installment in the fresh look at testing! Meanwhile, I’d love to hear your thoughts and comments below.

  
  - - -
Next up: Learn about the two kinds of tests that mobile developers should be writing!

Or: Find out where testing fits in with the product cycle and what should and shouldn’t be considered “testing”.