Is Agile Fragile?

This was the proposal put to me several times at iqnite. It contains a grain of truth. It also represents a misunderstanding of what Agile is.

Obviously, there are many ways in which any process can fail. The question I want to address here is Agile’s reliance on automated testing.

Major changes in direction, Refactoring, Collective Code Ownership, Short Iterations: all of these things require significant testing – either manual or automated. Except of course, manual testing is too expensive to be done often. With manual testing, you are locked into long iterations – unless you are prepared to deploy untested code.

But automated testing is no panacea.

Too many automated test-suites are clogged with broken tests, tests that cannot be run, tests that fail but are not fixed.

Broken tests are not the same thing as failed tests – the very purpose of any test is to fail when appropriate. A test that passes is broken if it won’t fail when it should. A failed test is a not a broken test if it leads to a code fix.

Developers rarely commit failing tests. Tests fail when things change.

Developers frequently commit broken tests. Tests break when a developer assumption is violated or incorrect.

Developers necessarily make assumptions about the test environment, business rules and the behaviour of the rest of the system. They have to. Testing is the art of capturing assumptions. A test is an assumption captured in a testable format: “Test that when X happens, the system does Y“.

But there are better and worse ways to approach capturing assumptions.

The standard lazy way to capture assumptions is to manually load test-data into the test-database, create your code, observe the output and then create tests to validate that the code reliably produces that output. The best that can be said about such tests is that they are cheap to create – they can be created even when the coder does not know what output the code would create or should create.

The downside is that they are fragile – they are highly likely to be irreparably broken, even though they pass. They often fail when they shouldn’t and frequently don’t fail when they should. When such tests fail, they rarely point clearly at the violated assumption. Instead of analysing the target system, the tester all too often spends time debugging the tests,sometimes just to get them to run. Even when they are ‘fixed’, they are still prone to break again the next time anything in the system changes. The cost of maintaining badly written automated tests very often exceeds their ongoing value.

The better way to capture assumptions is to have the test-suite load the necessary test-data into the test-database, to create one test at a time, to create each test before creating the code that makes that test pass, to make each test as small as possible, and to make each test pass before moving onto the next test. This is harder, much harder, because it requires the developer to have a fuller understanding in advance, rather than just trying and tweaking until it ‘looks’ right. But it produces more robust and reliable tests.

Having the test suite create the test-data eliminates the risk that the data will not be there – and eliminates the need for future testers to reverse engineer the data the test needs.

Implementing tests one at a time, and keeping each test as small as possible, reduces duplication between tests and reduces the risk of creating untested code. This reducing the rework required when business rules change because it reduces the number of ‘false positives’ and ‘false negatives’; it reduces the risk that a single change will cause many tests to fail, or even worse, not cause any tests to fail.

Implementing the test before the code means that the test validates not just the code but also the business rule; it validates the what the code should do, not just what the code does. This reduces the risk of creating a test passing only when the code does the wrong thing.

As with code, the higher the initial quality, the lower the ongoing maintenance cost. The cost of maintaining well written automated tests is usually manageable.

Tests that signal only that something has changed and that debugging is required can have a place, but they are really a form of manual test. They need to be updated every time anything changes, and they need a human to interpret the result.

Manual tests will always be more robust. They require less effort to update, because they are less explicit, relying on the tester for the extra details. In manual testing, the tester can be expected to interpret vague instructions, to correct minor errors on the fly, to adjust for changes in business rules, and to intelligently evaluate variations from the originally specified results as either pass or fail. All of these things break automated tests. But of course, manual tests are more expensive to run – at least when compared to well crafted automated tests.

Automated tests will never be as robust, but that does not mean that they have to be brittle, if sufficient care is put into their creation. They are more expensive to create and maintain than manual tests, but can have a lower total cost because of the cost of running manual tests, and automated tests have intangible benefits: they are repeatable, they can test things that are not accessible to manual tests, and they can be run more often and in full. And how best to take advantage of those benefits is a topic for another blog post.