The OnePub Dart Side Blog - The myths and principles of Testing

All this testing is testing my patience. Yes, you can actually over-test an application.

In this week's The Dart Side Blog we are going to discuss, not how to test, but what and when to test. We are also going to look at some of the common misconceptions around testing.

Whether you are working for a startup, a large organization or anywhere in the middle, you are going to have to make trade-offs all the way through the development life cycle of your Flutter app.

  • Do we add a new feature or do a security review?

  • Do we write more tests or push the beta out?

A core element of a developer’s job is to understand where and when to apply those trade-offs.

The right answer to any trade-off will depend on factors such as:

  • The target audience

  • The phase of the development cycle (beta, production…)

  • The sensitivity of the data

  • The legislative environment

  • Your employer’s appetite for risk

  • The complexity of the code

Take two extreme examples:

  • A solitaire app

  • A banking app

Now ask the question: Do we add a new feature or do we do a security review?

I think you will agree, we get two quite different answers.

Testing is no different, we need to make trade-offs between features, security and testing. In fact, we need to make trade-offs as to what we test and when we test.

The Test Driven Development (TDD) myth

Let’s start by putting this one to bed, or perhaps a little more graphically, take the dog out and shoot it.

TDD is one of those ideas that looks interesting but is completely misguided. TDD assumes that you know what your code is going to look like ahead of time.

The Dart Side Blog is sponsored by OnePub, the Dart private package repository.

The Free Tier allows you to privately publish up to 25 packages.

onepub.dev

This is a pipe dream, particularly with UI development which can take some really nasty twists and turns. The whole reason state management is such a big deal in Flutter is because of how hard it is to get all of our widgets talking.

During the early days of a project, the shape of our code changes almost daily. We will go through multiple refactoring cycles as the code tells us how it needs to be structured.

System design is like a battle plan, and no battle plan survives engagement with the enemy.
The enemy, in this case, is the Flutter framework.

Believing that you know the shape of your code is like believing in the creature that lives in that mythical waterfall that we all abandoned.

TDD simply doesn’t fit into a world where we need to be (dare I use the term) agile. Where planning is short-term and change is constant.

Tests are by nature brittle and intrusive, TDD dictates that we must be constantly attending to the suite of unit tests, rather than focusing on building code.

Maintaining test suites is a critical process, but it isn’t critical in the early stages of an application's life cycle and is actually detrimental to getting that MVP (minimal viable product) out the door.

Having said all of that, there are exceptions, which we discuss below.

The myth of 100% code coverage.

I’ve often seen posts from developers bragging about achieving 100% code coverage and then receiving resounding compliments for a job well done.

The truth is that code coverage is a meaningless metric and trying to achieve 100% code coverage is a waste of resources.

If 100% code coverage is to have some meaning then you would expect that in some way it relates to the level of testing required to validate an application.

Let’s have a look at a little example:

void addPositives(int a, int b, int c, int d) {
   var sum = 0;
   If (a > 0) sum += a;
   If (b > 0) sum += b;
   If (c > 0) sum += c;
   If (d > 0) sum += d;
   return sum;
}

test(‘addPositives’, () -> {
    expect(addPostives(1,2,3,4), 10);
}

The above unit test provides 100% code coverage for the addPositives method.

Yet if we look at the ‘addPositives’ method, there are actually 16 paths (4^2) through the code, so we need to perform 16 tests to ensure that we have tested every path. Every (dependent) conditional statement in your code doubles the number of test paths (n^2).

It is fairly easy to see that code coverage bears no relation to the actual number of tests required to provide comprehensive coverage of your application.

The reality is that it is impossible to write enough tests to provide complete ‘path’ coverage of your app. It only takes 32 dependent ‘conditional statements’ in your app to require 4 billion unit tests, and I need them written by Tuesday.

So yes, you are going to have to make some choices as to what you test and what you don’t test. Using code coverage as a metric drives you to write tests for code that doesn’t need to be tested, leaving less time to write the important tests. To be clear, when I say ‘doesn’t need to be tested’, I mean that it doesn’t need a unit, widget or integration test, but more later.

Throw away code coverage as a metric, it's meaningless and worse, misleading.

This piece of Rubber is not like the others.

Lie back, close your eyes and let me take you on a road trip across outback Australia.

OK, maybe don’t close your eyes… are you still there? Oh good, we can continue.

OnePub pro subscriptions are only $1 per member per month

Imagine if you will, we are doing the road trip between Australia's two largest cities, Melbourne and Sydney (population 10 million combined).

800km (500 miles) of straight black bitumen (except when it’s red), little to see and little to do, except avoid the occasional roo (the colloquial term for Kangaroo), because these buggers are dumb cunts; whilst trying to avoid being squished by the all too frequent road trains (semi/tractor with three trailers 50m (150ft) long).

So your mind looks for things to entertain it.

It’s at this point that you notice all the bits of rubber covering the road. On closer inspection, you realize that they are bits of truck tires. Now truck tires are expensive so you don’t expect them to be scattered all across the road. But it turns out that the fact that they are expensive is why we see them scattered all across the road.

When a truck tire goes bald, they don’t just throw it away, they add a new layer of rubber (with treads) to the old tire and sell it as a ‘retread’. As these retreads age, they tend to throw off the new outer layer leaving bits of rubber littering the blacktop.

So two developers are sitting in this car and we like patterns, so it doesn’t take long to notice there is a pattern to the placement of the rubber.

With rubber being thrown from the tires of a fast-moving truck, you would expect the rubber to exhibit a largely random distribution across the road.

It doesn’t. There is far more rubber on the side of the road and even the rubber on the road is not spread evenly.

So we spend the next easy 100km looking for patterns and they quickly become apparent.

Excluding the bumps in the road caused by the rotting corpses of roadkill, the maimed bodies of Dropbear victims (I don’t want to talk about it) and the occasional pothole, the distribution of rubber shows a very clear pattern.

The graph shows energy on the y-axis and the breadth of the road on the x-axis.

The energy peaks are caused by the passage of the wheels of vehicles. Any rubber that lies in a high-energy area is likely to be hit by a wheel. When hit by a wheel it is likely to be moved in a random direction. If it moves to another high-energy area it is likely to be hit again, if it moves to a low-energy area it is less likely to be hit.

Research has shown that potholes in the road create local minima, until sufficiently filled with road kill.

By this process, over time, all of the rubber migrates to the edge of the road where it has a low probability of being hit; unless some farmer comes along with a slasher and the rubber is once again thrown onto the road.

We can essentially say that the rubber will move down the energy gradient to the lowest energy state.

Code is like rubber. Some code exists in a high-energy state and some code exists in a low-energy state. When writing tests, we want to identify and target the code that exists in a high-energy state.

You can think of tests as helping to move code to a lower energy state.

But what do we mean when we say code is in a high-energy state? Essentially, I’m referring to the risk of the code causing a boo-boo.

The more likely the code is to cause a boo-boo that the customer cares about, the higher the energy state the code is in.

So let’s define a new metric, which I like to call the ‘boo-boo index’.
We can roughly define the boo-boo index as how likely the code is to cause us to lose our job.

So testing reduces the boo-boo index of the tested code because it reduces the chance that the piece of code will cause us to lose our job.

But we are professionals here, and boo-boo index sounds a bit childish, so let's instead use BBI when we are talking to our boss.

Now you might say that this definition means I should write tests for every path, alas late ‘delivery’ of a project increases the BBI of the entire code base, hence the need for trade-offs.

We have already alluded to the factors that define the BBI.

But let’s look at them in some more detail:

  • The phase of the development cycle (beta, production…)

  • The target audience

  • The sensitivity of the data being held

  • The legislative environment

  • Your employer's appetite for risk

  • The type of application

  • The complexity of the code

  • Time to delivery

Development Cycle

Now rubber on the road can be a hazard. If a truck passes our car, kicking up rubber, the momentum of the rubber is high enough that it could break a windscreen.

But these roads are long and the traffic interactions are low, so the chance of a smashed windscreen is low. In urban areas, the rubber poses a much greater risk as the number of vehicle interactions are much higher. You can think of rural roads as your beta environment and urban areas as a production environment.

OnePub allows you to securely distribute your licensed packages to your customers.

During beta, the danger posed by high energy code is much less and therefore the need to provide tests is much lower. Essentially the code has a lower BBI during beta testing.

There is a very strong argument to minimize tests during the early development stages as they essentially slow development down with a low payback. Beta testers are generally very tolerant of bugs (in fact some love to find bugs for you).

Bugs during beta have a lower BBI, which tells us that we can delay much of your test suite development too late in the beta or early production.

But you need to know your customers and perhaps more importantly how they value their data.

In the development of OnePub we knew that our beta testers were very likely to upload real code. A security breach, even during the beta, could be lethal to the business.

So we had a very high BBI in our security subsystem (even during beta). Unit and integration testing helped us reduce the BBI of these critical pieces of code.

As your code moves to production you will want to increase the number of unit tests. An early MVP is still going through a lot of changes, so try to restrict your tests to critical pieces of code. As the product matures you can add more tests which increases your deployment confidence. Over time, tests become an advantage rather than a burden. As always you need to understand your environment to judge that curve.

Target audience

It is really important to understand your customer. A user playing solitaire does not yield the same BBI as a driver relying on our autopilot software.

If you have a risk-averse customer, you are going to need to implement more tests earlier in the development phase.

Data sensitivity

If we are talking about the solitaire high score stored on the customer’s local device, I don’t think we need to do a security review and unit tests in the beta phase are broadly meaningless (with exceptions we talk about below).

If you are storing credit card details, then you are operating in a completely different environment and you may need to heavily encase your credit card processing and storage logic in test code.

The legislative environment

But I’m just a developer?

Tell that to James Liang, the engineer at the center of Volkswagen’s Dieselgate who was sentenced to 5 years in jail for his part in the fraud.

Many (all?) state statutes, won’t accept a lack of knowledge of the law as an excuse. If you have concerns about possible legal ramifications then officially raise it up the command chain - in writing - and write more tests.

Your employer's appetite for risk

There is a big difference between working for a game developer and building the SLS launch software.

Understand your employer's appetite for risk. This will largely be a factor of their customer base and the phase of development, but don’t be deaf to the way your employer reacts to boo-boos.

You may need to talk to management about allocating additional resources to writing tests if they have a low-risk appetite. Open a communications channel and let them know your concerns (in writing). Make certain they understand the trade-off - I need more testing resources or you can expect more boo-boos.

Type of application

Low-risk apps need less testing than high-risk apps. In low-risk apps, tests become more about improving the deployment confidence and can be left until later in the development life cycle.

The risk level of the app is an emergent factor of each of the other issues that make up the BBI.

Complexity

This is an important one and breaks all of the rules because there are no rules, there are only principles, which you should break all the time.

If you asked me what architectures, methodologies or design patterns I use? I would say ‘All of them, as little as possible’.

Some problem domains benefit from the TDD methodology.

Functions that parse data are a classic example. I’m the author of the money2 package which parses and formats money amounts. The parser is quite complex and really easy to break. The only way I can have any confidence that the code is working is to have a large number of unit tests. Not quite per the TDD methodology, I developed these methods and the associated unit tests together and I add more tests as edge cases are discovered.

On the other hand, a function with a single conditional statement can be tested by either running up the app and making certain it works or stepping through it a couple of times with the debugger to observe its behaviour.

You can identify functions which are low on the BBI as they have the following features:

  • a small number of lines of code

  • a small number of conditional statements

  • a low number of function calls to other low-complexity functions.

  • naturally tested by running the app.

  • used in a low-risk area of the code base

Timeline

Late delivery increases the BBI for the entire code base. Don’t let perfection get in the way of delivery.

The easiest way to lose your job is to not deliver.

Choose your trade-offs carefully, and push back on feature creep.

If your boss requests a new feature, ask them: does the app deliver value without this feature?

If the answer is yes, then the feature should wait until after the initial release.

OnePub directly integrates with the Dart CLI tooling, so it's easy to use.

Conclusion

Testing your code is important but during the early stages of development, it is more cost-effective to do manual testing. Understand your environment and adapt the quantity of tests you write to suit the environment.

  • Throw out TDD (largely) and ignore useless code coverage metrics.

  • Don’t write tests for low boo-boo functions, they just sap your energy and provide no value.

  • Areas of code with a high BBI may require tests to be written in the early development stages.

  • Focus your testing where it yields the most value.

  • You don’t always need to write a unit test when you fix a bug. If the code has a low BBI and you are convinced running it through the debugger is adequate, then leave it alone. If instead, you realize that the code is high on the BBI then it's time to write some tests.

Testing your code is critical to publishing a successful Flutter app. However ‘publishing’ is also a critical action in delivering a successful Flutter App.

To publish an app, we need to make a series of trade-offs. It's your job to work out which are the right trade offs.

Hopefully this blog gives you some principles to work with in making those decisions.

Happy coding.

The Pragmatic Programmer.