Creating a Sustainable Software Cycle
Breaking the perpetual deadlock of legacy code, lack of testing, and missing documentation
Having been an iOS developer at quite a few tech companies, from small to large, I’ve found that they all mostly share the same painful reality of software development:
- Lots of fragile legacy code that is risky to refactor or update because it is untested
- Tests can’t be written for legacy code because what it even does (and why) is unknown. Plus, since it doesn’t have tests in the first place it is usually not written to easily allow for testability at a later date.
- Tests (usually unit tests) written for new / recent code rarely or never catch actual bugs. Emphasis is placed on writing these tests, but the return on investment is questionable since even with thousands of tests run continuously, roughly the same volume of bugs occur in production as existed before the thousands of tests.
- When refactors or changes to recent code are needed, the unit tests usually end up having to be modified, rewritten, or simply turned off, since they are usually too tightly coupled to a specific implementation.
- Eventually this recent code, with its disabled, short-circuited or reduced-coverage tests turns into legacy untested code that is fragile and risky to refactor, starting the cycle again.
Typically, software teams to solve this vicious cycle with the “enforcement of testing” approach, which looks something like:
a. All PRs must have tests with them , and / or
b. Code coverage percentage must not be lowered with any new PRs.
This seems reasonable on the surface, except that this sort of mandate doesn’t really ensure good or meaningful testing. It’s easy to write a test that technically results in “coverage” but will never catch any bugs. This ends up just perpetuating the cycle above and does nothing to address legacy code either.
The underlying problem
The reason why this cycle is so hard to break is because the apparent problem (lack of tests) is not the real problem, but is merely a symptom of the real problem. Additionally, merely increasing the quantity of tests does not result in a well-tested codebase.
So what is the real underlying problem? It’s a lack of clear, documented, testable requirements for everything in the codebase.
Keep in mind that programming is a one-way hashing algorithm that turns requirements into source code. It is not possible to unhash source code back into the requirements that resulted in that code. Comments can be very helpful in illuminating motivations, but they are informal and prone to rot. Therefore, the process of programming is a lossy process that loses intent and context along the way, unless the intent and context (i.e. requirements) are explicitly persisted and maintained somehow.
Typically how this underlying problem manifests is when some individual developer is expected to:
- Receive some vague, incomplete description of what to implement
- Create a (hopefully) working implementation (and if it doesn’t work, hopefully it will be caught before it is released by QA or other internal testing)
- Then invent their own tests to “provide code coverage” for the code they wrote.
Six months later, no one knows exactly how the code that developer wrote works or why it was written that way (including the developer themself!) and the best you could do is maybe try to search through JIRA looking for the ticket that described what the original request was (which doesn't include all the conversations, etc. that happened during development).
Or, maybe your team is really good and wrote out some sort of technical design doc in Google Docs or Confluence in advance. That's better, but not only is it decoupled and separate from the code itself (so may not be easy to find), but it starts becoming obsolete as soon as development starts due to workarounds, bug fixes, edge cases and more being discovered and implemented during development. Not to mention the iterative process of change that comes immediately afterwards which causes the code to change without the documentation itself being updated.
So to solve this whole mess, the missing ingredient of explicit, testable requirements needs to be introduced at every stage of development (grooming, iteration, bug fixes). And those requirements must be constantly in sync with the actual reality of the code, not maintained in a separate universe somewhere.
What is a good requirement?
First, let me be clear that I’m not advocating for long rambling requirements documents that are worked on for weeks before any code can be written. While that has many advantages over no requirements, it's a slow process that is just as prone to rapid obsolescence.
A good requirement is a conceptual rule or intent coupled with specific examples to demonstrate how it is expected to behave. The number of examples included for a rule should be as many as needed to capture every unique behavior of the rule. Examples are what makes a requirement testable. Requirements without examples are not testable.
Example of a bad requirement:
Debit card users shouldn't be able to overdraft
👆 This requirement is missing context and has no clarifying or testable examples
So let’s try to make it better:
Debit card charges should be denied if the amount of the charge is greater than the user’s available balance
👆 Much better context! But not yet testable as there are no examples, and we could ask many questions about edge cases that are not answered by this single statement.
Rule: Debit card charges should be denied if the amount of the charge is greater than the user’s available balance
- Example: Given the user has a $50 available balance, when a $51 charge is requested then the transaction should be declined.
- Example: Given the user has a $50 available balance, when a $49 charge is requested then the transaction should be approved
👆 This is the first implementable, testable requirement! It has the rule and specific examples that can be easily tested. To illustrate, look how straightforward it would be to write tests for this requirement:
func testOverdraftDeclined() {
let mockAccount = MockAccount()
mockAccount.balance = 50.00
let testTransaction = Transaction(account: mockAccount)
let result = testTransaction.charge(51.00)
XCTAssertEqual(result, .declined)
}
Note that at this point, this probably isn’t a truly complete requirement, but it’s defined enough to start coding, testing and moving ahead incrementally.
During development or perhaps QA, someone might notice a scenario that was not accounted for in the original requirement: if the user’s account has a pending charge, it is possible for the a new charge to use money that should be been reserved for the pending charge, resulting in an eventual overdraft.
Therefore, the requirement gets updated to:
Rule: Debit card charges should be denied if the amount of the charge is greater than the user’s available balance minus any pending transactions
- Example: Given the user has a $50 available balance, when a $51 charge is requested then the transaction should be declined.
- Example: Given the user has a $50 available balance, when a $49 charge is requested then the transaction should be approved
- Example: Given the user the user has a $50 available balance and there is a $25 pending charge on the user’s account, when a $49 charge is requested then the transaction should be declined
Notice how requirements get more specific and comprehensive over time, iteratively. Also note that a developer, product manager, or QA analyst coming back to this code years later can see exactly how the code is supposed to work.
Keeping requirements in sync with code
Now that we’ve looked at what good implementable requirements look like the next questions are:
- Where do we keep them?
- How do we test them?
- How do we ensure they don’t grow stale and out of sync with the actual state of code?
There can be different answers to these questions, but I strongly advocate for one answer that addresses all three: Requirements should be kept in inside the repo alongside the code and automatically tested on every commit using a test framework like Cucumber or (shameless plug) my own TestKit.
Let’s break this down a bit: the requirements given above as examples are all written in the “Given, When , Then” format for describing a behavior. This loose structure for expressing requirements has been formalized into a standard called Gherkin. Gherkin is just a set of keywords and conventions for writing these kinds of example-based requirements. Because of the “standardized” nature of Gherkin, it can be parsed and fed into testing frameworks, which can then attempt to validate every statement in the requirements
The most common framework for testing Gherkin requirements is called Cucumber, but it isn’t particularly well maintained for iOS. However, Gherkin is a defined standard and so other frameworks can be written and maintained to test Gherkin requirements, such as my previously mentioned TestKit and others. These frameworks all essentially work the same way:
They provide a way for test code to register certain blocks of functionality in response to requirements that match specified RegEx patterns. For example this code registers test behaviors that will run in response to the requirement phrase "Given the user has a $50 available balance":
given(“the user has a $50 available balance”){ testAccount = MockAccount() testAccount.balance = 50.00 }
This is how developers hook into requirements and "prove" each line in their test code. Because it uses RegEx, the registered test functionality can also be dynamic, e.g.
given(“the user has a $(<balance>\d?) available balance”) { guard let balance =$0[“balance”].floatValue else { XCTFail(“Invalid balance in requirement”) } testAccount = MockAccount() testAccount.balance = balance }
This dynamic version will read any specified amount for the available balance out of a requirement and configure the mock account appropriately.
They parse all requirements in a given file or folder
If a requirement doesn’t have any test functionality registered to handle it, the test fails
Each parsed line of every requirement has its associated test functionality invoked — in order — allowing tests to validate the requirements are true
There are all sorts of strategies and conventions to make this process as effective as possible (which I’ll be talking about more in future posts), but just looking at this basic setup it gives us the following benefits:
- Simply adding or changing a requirement in the project without having associated test code to prove it will fail the tests. This ensures that all requirements must be validated and tested at all times
- The test suite is ultimately based on high-level requirements, not implementation details, so tests are always able to validate successful refactors. Note that there may be implementation details in the registered test functions (like how to set a balance on a mock account), but these details are updated separately from the actual requirements, which the suite validates and which don't change due to refactoring.
- Any changes to the code which break a requirement will fail the tests. Any changes to the requirements which are not reflected in the code will cause the tests to fail as well.
In short, this approach brings us to the ideal state of requirements which are never out of sync with the code, and code which is never undocumented or diverged from requirements!
Bringing it all together
By using this requirements-driven approach to software development, we can finally achieve the sustainable process that has been so elusive. Once a codebase has been set up with a testing framework like Cucumber or TestKit (and has usable test setups for both UI tests and API (or “unit”) tests) it only requires one simple rule in order to keep the entire process working: “Requirements are always written before coding begins, and the first step to coding is updating the requirement”. This single rule ensures that testing occurs, documentation exists, developers are implementing the right thing, and refactors in the future will be safe and validated.
Note that bugs and bug fixes must follow this pattern as well. When tested requirements exist, a bug becomes simply a requirement example that wasn’t captured earlier. A bug is fixed by:
1. Adding the example to the requirements, including what should be the outcome of that example
2. Writing or fixing the code to validate that new example (while also continuing to meet all previously captured requirements and examples).
And thus the sustainable software development cycle looks like this:
- Product or QA (in the case of bugs) identify rules and examples that the product should follow (backlog)
- Product + Developers + QA ask questions and surface specific examples to identify how the rule should be followed for any possible scenario that is thought of (grooming). These examples are written as Gherkin requirements.
- Developers are assigned specific requirements to implement. The requirements are already captured in the project, the Jira ticket holds no requirements itself, just notes which previously decided requirements will be implemented. The requirements to be implemented ARE the tests, so the developer works on the implementation until all the requirements tests pass, which is when the ticket is done. Note that PRs can’t pass CI or merge until the developer has hooked up test functionality to prove the requirements.
- The cycle starts again and runs iteratively: based on user feedback or defects, product and / or QA provides new rules and examples which become requirements in the repo, and so on.
- When it comes time to complete refactor a portion of the application (say, from UIKit to SwiftUI), this can be done confidently knowing that all the same requirements that were used to defined all the existing code must also be validated against the new code as well.
Further discussion
While this basic arrangement really does provide a truly sustainable software development cycle, there are unsurprisingly devils in the details. I plan to address these in subsequent posts, but just to give proper acknowledgement to some of the key challenges:
- It requires a disciplined grooming process to translate all work into testable, example-based requirements. Ideally, this process would include product, QA and software engineers together in a discussion, asking questions and capturing rules and examples in Gherkin format (usually the best Gherkin-writer — often someone in QA or a developer — will translate the discussion into the actual Gherkin that everyone else give a thumbs up to). Similarly, it takes work with QA to translate bug reports into specific missing rules or examples.
- Writing good Gherkin to be plainly understandable and not overly implementation-focused takes practice. Additionally, Gherkin has a few conveniences and features that should be learned as well in order to facilitate the process. For example, tags can be used to limit requirements to certain platforms or OS versions, or to exclude new requirements from being tested until a ticket is in flight to implement those requirements, etc.
- Requirements are either focused on user experience (what the user sees) or API behaviors (what responses calling code will receive following method calls with certain parameter values, etc.). Testing both kinds of requirements means that the project needs to have basic testability in place for both code (i.e. mocks for unit tests, etc.) and UI (the ability to launch the app with various mock preconditions in place for when the UI is interacted with by test code).
I hope that the content in this article is nonetheless enough to get you started on the path to a well-tested, well-understood and sustainable project that can be maintained confidently without regressions!
Posted in: bugscode coveragecoding processfresh look at testingiostestingtestkitui testingunit tests