A degrading understanding of a system makes it harder to confidently make necessary changes - up to a point where no one dares to touch anything anymore, in fear of not noticing that a change breaks existing functionality. Having a suite of automated tests in place can greatly reduce that risk and thereby enable the continuous evolution of the system.
- What are properties that automated tests should have in order to maximize the value they provide over the lifetime of a system?
- How can systems be designed to facilitate the creation of useful tests?
- What can be done if a system is not (yet) designed in a way which allows to accomodate such tests?
Most of the aspects reflected on below are described in-depth i.a. in the following books:
- Working Effectively with Legacy Code, M. C. Feathers, 2005 - an often-recommended classic motivating why to create automated tests, and how to make it possible [Feathers, 2005]
- Software Engineering at Google, T. Winters, T. Manshreck, and H. Wright, 2020 - devoting several chapters to the way tests are used in that software organization [Winters et al., 2020]
- Clean Architecture, R. C. Martin, 2017 - containing general advice on how to structure (testable) systems [Martin, 2017]
- Object-Oriented Reengineering Patterns, S. Demeyer, S. Ducasse, O. Nierstrasz, 2003 - a collection of approaches for restructuring existing systems, showing ways to improve the design, and describing strategies to use and grow test-suites [Demeyer et al., 2003]
1. Working with legacy code
Most relevant software lives for long periods of time and the continuous adaption to new requirements is a relevant challenge.
With growing size, age, and complexity of a system, it seems not to be unusual for even seemingly simple changes to take longer and longer to be implemented, and to carry increasing risks of breaking existing functionality.
When it comes to making changes, [Feathers, 2005] describes developers to “edit and pray” that the change does not break anything, building confidence upon their experience with the system and some exploratory manual testing.
Obviously this only gets riskier, once different people start to extend different parts of the system, and once experienced contributors leave the team while new developers join.
Slowly, the architectural vision of the design starts to blur, and the understanding of how the systems works gets lost up to the point where no one really knows anymore what is going on.
Needless to say, that just continues to increase the risk of breaking changes (ultimately, how could one do any regression testing when not even knowing how the system is supposed to behave?).
As an alternative, Feathers describes an approach of covering existing functionality with automated tests, providing a “safety net”, which allows for controlled refactorings as well as the controlled addition of new features or fixes. This way, the chance to break existing functionality is reduced and all developers (experienced or not) are provided with confidence into the correctness of the software.
2. Properties of useful automated tests
[Winters et al., 2020] mention different dimensions to classify automated tests:
- scope (describing the amount of validated code)
- narrow (e.g. single class, or even single function)
- medium (e.g. multiple classes)
- large (e.g. system / end-to-end tests, verifying the interaction of sub-systems)
- size (describing the amount of resources the test needs)
- small (the test and its dependencies all run inside a single thread)
- medium (everything runs locally, in different processes, network calls to localhost, or file-system access are allowed)
- large (calls to external systems are allowed)
Fast and deterministic
The execution time of a test is typically determined by its size, and the larger the test the more flaky it tends to be, since e.g. network calls to external systems may timeout (flakiness - the extent to which a test tends to fail sometimes, not being caused by any actually problematic code change).
To keep a test suite fast and deterministic, it is recommended to rely on a majority of small tests whenever possible (not leaving room for any flakiness, and even providing the option for simple parallel execution by multiple threads). Furthermore, [Winters et al., 2020] emphasize the fact that only fast, small tests are practical to be run as part of the normal development workflow, and experience shows that longer-running tests tend to be not executed. However, the importance of larger tests is also acknowledged in order to cover aspects that small tests cannot verify.
Robust and maintainable
When it comes to scope, there are different aspects to keep in mind:
- On the one hand, narrow-scoped tests (e.g. testing a single implementation class) usually allow for a quick and precise analysis of the root-cause in the event of failure.
- On the other hand, narrow-scoped tests tend to be brittle (they break on unrelated changes), since a simple redistribution of some logic between closely collaborating classes can already cause a lot of test failures.
[Winters et al., 2020] recommend to test business-relevant behaviour via the public API instead of directly depending on implementation details, to avoid frequently changing the tests (“Don’t depend on volatile things!” - [Martin, 2017] even recommends to use a dedicated testing API to shield tests from changing implementation details).
Pure refactorings should not break tests, and doing so may indicate an innapropriate level of abstraction (test behaviour - not methods/classes).
The same holds true for changes that introduce new features or fix bugs, which also should not require an adjustment of existing tests.
So with respect to scope, following these recommendations may typically result in testing at least several related classes together.
Another interesting aspect of scope is that it is defined by the amount of validated code, as opposed to executed. In particular, [Winters et al., 2020] argue that - if possible - a test should stick with the real implementations of the dependencies of the tested code, instead of replacing them with test doubles by default (preferring classical over mockist testing):
“Using real implementations can cause your test to fail if there is a bug in the real implementation. This is good! You want your tests to fail in such cases because it indicates that your code won’t work properly in production.”
This is especially true when the dependency itself is not properly tested on its own.
Obviously, dependencies on things running outside the test thread (e.g. external services, databases) must be replaced by test doubles in order to keep the test small. Here, [Winters et al., 2020] prefer the usage of lightweight fake implementations over mocks to be able to test state instead of interactions.
With respect to scope, [Feathers, 2005] also mentions the importance of narrow-scoped tests, i.a. because of the simplicity with which failure causes can be located.
However, it should normally also be easy to spot the root cause of a problem even among a couple of collaborators.
3. How to make code untestable
If not carefully taking testing into account, it is surprisingly easy to end up with code which makes it incredibly hard to add any useful (fast, small) automated tests ex-post.
Consider the following Java sample:
@Service
class CalculationServiceImpl implements CalculationService {
/**
* @return true, if successful
*/
@Override
public boolean calculate(int input) {
Result result = new FirstCalculator().calculateFirstPart(input); // 1.
SecondProcessor.calculateSecondPart(result, input); // 2.
SessionContext.store("myResult1", result); // 3.
boolean isSuccessful = result.getValue32() == 13; // 4.
if (isSuccessful) {
ThirdProcessor.calculateThirdPart(); // 5.
NotificationService.sendKafkaMessageToCalculationsTopic(); // 6.
}
return isSuccessful;
}
}
class FirstCalculator {
Result calculateFirstPart(int input) {
int baseValue = new MariaDbDatabaseAccess().getBaseValue(); // 1.1
// [...] some calculation logic
return result;
}
}
class SecondProcessor {
static void calculateSecondPart(Result result, int input) {
int extraInfo = CalculationHelperWebService.getExtraInfo(); // 2.1
// [...] some calculation logic
}
}
class ThirdProcessor {
static void calculateThirdPart() {
Result result = (Result) (SessionContext.get("myResult1")); // 5.1
// [...] some calculation logic
}
}
// [...] various other classes
Imagine writing a test for the calculate
functionality of the CalculationService
(public API).
The actual calculation logic seems to be distributed over at least three different collaborator classes, but the distribution seems rather arbitrarily.
In order to avoid creating a brittle test which is broken by the first upcoming refactoring, none of the three calculation parts should be tested in isolation.
new FirstCalculator().calculateFirstPart(input)
First, there is a call to a collaborating class (FirstCalculator
), which in turn collects some additional information from a database (1.1
):int baseValue = new MariaDbDatabaseAccess().getBaseValue();
This may already become a first problem, since there is no simple way to replace the MariaDB with a lightweight fake implementation, so that we would need to either run the complete test against a real MariaDB (making the test larger), or resort to advanced mocking library features (which are hopefully available).SecondProcessor.calculateSecondPart(result, input);
Then the collaboratorSecondProcessor
is called, which first fetches some needed information from a remote service via network before running its calculations (2.1
):int extraInfo = CalculationHelperWebService.getExtraInfo();
Again, replacing this static call may require advanced magic to be available (again also including possibly surprising side-effects in case the static mock is not properly cleaned up at the end of the test).SessionContext.store("myResult1", result);
This shows another possibly hard-to-replace static call that depends on some framework-provided (thread-local?) session state to be setup, so that later parts of the calculation logic can then retrieve that state (e.g.5.1
and6
). Relying on side-effects like this is just a great option to create a sufficiently confusing data flow which does not ease creating relevant test cases in general.
Why would one ever resort to using some sort ofSessionContext
at all here? Well, at least it allowed us to share information between different places without having been required to refactor much existing logic which would have been risky to touch..boolean isSuccessful = result.getValue32() == 13;
At4.
the orchestrating logic of thecalculate
method is suspended by some core business logic, breaking with the abstraction level of the method. Apart from making it harder to understand the now even more widespread calculation logic, this does not really hinder creating the test. It merely serves as an example of needed refactoring and is probably again a symptom of missing tests, since it was probably too risky to put it elsewhere in the first place. This also illustrates the importance of testing via stable interfaces (public API), since implementation details such as the distribution of logic among collaborating classes may be likely to change - especially in systems where missing tests prevented adding new features at the right places.
Finally, 5.
just represents some dependency on the SessionContext
, and 6.
another hard-coded dependency on external systems.
Obviously the overall sample is kept short and the length of realistic methods won’t contribute a lot to make writing tests any easier.
Summing up some general problems which may add up over time and result in hard-to-test code:
- hard-wired dependencies to external systems (hard to fake/mock)
- dependencies hidden deep in the core business logic
- behavior relying on side-effects and arcane features of the used framework
- mangling different aspects and abstraction levels, just to avoid changing code at other places
4. Design for testability
As shown in the previous part, testability should be a key aspect during system design/implementation. So how can code be structured to allow for useful tests?
Providing Seams
The Seam is a central concept of [Feathers, 2005], described as “a place to alter behavior without editing that place”.
In particular, Object Seams are recommended, i.e. providing places to allow replacing problematic dependencies with subtypes.
Dependency injection (DI) is a central functionality of many popular frameworks such as
Spring
or Quarkus,
and both DI containers allow to replace existing bindings of (problematic) implementation classes with mocks or own fake implementations.
However, when using constructor injection it is also simply possible to construct instances of the classes under test by hand, without relying on any DI framework functionality.
Of course, this manual construction has the disadvantage of having to manually create the complete dependency graph, but may speed up test execution significantly.
For the problematic FirstCalculator
above, a more testable version could look like this:
class FirstCalculator {
private final MariaDbDatabaseAccess dbAccess;
// constructor allows to provide fake/mock dependencies
FirstCalculator(MariaDbDatabaseAccess dbAccess) {
this.dbAccess = dbAccess;
}
Result calculateFirstPart(int input) {
int baseValue = dbAccess.getBaseValue();
// [...] some calculation logic
return result;
}
}
This way, a fake-implementation of MariaDbDatabaseAccess
could be provided (subclassing it and overriding problematic behavior).
Additionally, the CalculationServiceImpl
would also need to offer a seam by providing a constructor allowing to inject the FirstCalculator
instance with the faked database access dependency:
@Service
class CalculationServiceImpl implements CalculationService {
private final FirstCalculator firstCalculator;
// constructor allows to provide test-specific dependencies
CalculationServiceImpl(FirstCalculator firstCalculator) {
this.firstCalculator = firstCalculator;
}
@Override
public boolean calculate(int input) {
Result result = firstCalculator.calculateFirstPart(input);
// [...]
}
}
(Luckily, lombok may reduce some of this constructor boilerplate.)
Clean architecture
[Martin, 2017] describes clean architecture, a general approach to structure a system so that central business logic is properly separated from external dependencies.
The main idea is also at the core of similar concepts like hexagonal architecture or onion architecture.
In particular, a Dependency Rule is formulated which states that:
“Source code dependencies must point only inward, toward higher-level policies.”
When core business logic needs to invoke functionality from outer layers, this dependency must be inverted (Dependency Inversion Principle), so that the source-code dependency still only points inward, opposing the flow of control.
Consider the sample above:
A somewhat cleaner version of the code sample shown above could be realized as follows, replacing the low-level MariaDB dependency of the core logic with the abstract interface BaseValueProvider
:
// core business logic
class FirstCalculator {
private final BaseValueProvider baseValueProvider; // abstract dependency
// constructor allows to provide fake/mock dependencies
FirstCalculator(BaseValueProvider baseValueProvider) {
this.baseValueProvider = baseValueProvider;
}
Result calculateFirstPart(int input) {
// core logic directly invokes functionality from outer layers,
// but has no source-code dependency
int baseValue = baseValueProvider.getBaseValue();
// [...] some calculation logic
return result;
}
interface BaseValueProvider { // part of the core business logic
int getBaseValue(); // abstract functionality needed by the core logic
}
}
// ---
// implementation detail, outside the core logic
class MariaDbDatabaseAccess implements BaseValueProvider {
@Override
public int getBaseValue() {
// [...] actual MariaDB access logic
}
}
As a result, central business logic can be kept independent from external influences and low-level details, be it frameworks, UI, or databases. This does not only make it simpler to replace a specific database or UI technology when a popular new one emerges, but also makes it simple to provide seams which allow to swap out problematic dependencies with mocks or lightweight fake implementations.
When implementing clean architectures it may also be helpful to enforce the dependency rule e.g. in Java with help of ArchUnit tests.
5. Reengineering for testability
Some systems were not designed with clean architecture or useful automated tests in mind. [Demeyer et al., 2003] give a number of recommendations on reengineering, i.e. on how to restructure systems in an improved form.
One chapter touches on the question of which parts of a system to prioritize. Understandibly, reengineering efforts should not focus on stable, flawlessly working parts, but rather on the faulty ones, which require change and suffer the worst from reliance on outdated technologies, developer fluctuation, insufficient documentation, duplicated code, or tangled structure.
Starting with the most problematic parts, the core business functionalities as well as dependencies and auxiliary functions need to be analyzed to identify a cleaner, more testable target design as well as the corresponding target scope of useful automated tests.
In order to safely make the necessary code changes (e.g. introducing seams to break and invert dependencies), [Demeyer et al., 2003] recommend to incrementally introduce tests for the parts of the system which are changed (alleviating the the legacy code dilemma [Feathers, 2005] - requiring tests to safely do changes which only enable the creation of tests).
Two step approach:
- start with larger-sized tests which allow to keep as many dependencies in place as possible, minimizing the amount of necessary code changes to create the tests
- refactor the covered code so that creating small tests of the core business logic becomes feasible
First, this may e.g. involve running tests against an existing database or other external services. Even though running the larger tests may take time and will be subject to flakiness, it will still provide the necessary safety net to incrementally move towards a cleaner design with faster tests.
[Demeyer et al., 2003] advise to start with black-box tests of big abstractions, focusing on business values, instead of individual sub-components. In particular, one recommendation is to record business rules as tests, aiming to represent core functionality by a set of canonical examples with well-defined actions and clear, observable results. Since covering all rules may not be feasible (depending on their number and the runtime of the larger tests), it is suggested to start with essential cases.
Having the larger-sized tests in place, the necessary refactorings can be done to break the problematic dependencies and introduce a cleaner architecture. Subsequently, the implemented test scenarios of the larger-sized tests can be nicely reused to create fast-running small tests, swapping out problematic dependencies with leightweight fake implementations. Furthermore, more small tests can be added to further increase the amount of covered business rules.
While the small tests can be run quickly and often as part of the local development workflow, the larger-sized tests should still be run on a regular basis in order to verify the functionality against the actual dependencies. (Errors seem to stem as often from self-developed business logic as from unexpected behavior of dependencies - be it caused by actual bugs or just by unclear documentation.)
Useful tests
- To be fast and not flaky, automated tests must be small in size (running inside a single thread).
- To be maintainable and not brittle, automated tests should test through stable interfaces (public API), focusing on business requirements instead of being too narrow-scoped. This leaves room to freely refactor the internal implementation.
- Being able to build useful automated tests requires conscious management of source code dependencies. This needs to be kept in mind when designing and implementing a system.
- Adding useful automated tests to a grown system ex-post can be a laborious endavour, which may benefit from first creating larger-sized tests as an intermediate step towards fast and deterministic smaller tests.