Types of testing
Everyone seems to have their own definitions of the terms regression, integration and unit testing. I’ve seen it cause many
arguments spirited discussions (including some I’ve been at the centre of). A lot of the time, the disagreements are merely around nomenclature, and so are unimportant. Sometimes, however, there are subtle and important points of difference between these terms that can influence decisions about what kind of test suite you should be building.
I don’t pretend to have a canonical definition of these terms; feel free to disagree with the following. However, I would like to describe the different types of testing (particularly automated testing), how they fit together, and their general uses.
Let’s start at the top, and work our way down…
Also known as end-to-end testing, or sometimes integration testing (the first example of a contradictory definition). This is where you test as much of your system as possible via the public interface. For end-user interfaces (e.g. web applications), this can be quite difficult, and often this sort of testing is done manually. However, there are frameworks like Selenium emerging to attempt to tackle this problem in an automated way. If your system has a machine-only interface (e.g. XML or JSON over HTTP) automated regression testing is a lot easier.
The big advantage of this kind of testing is that you are testing a lot. The project I’m currently working on even spins up a set of virtual servers, re-deploys the entire application and starts up its various components, just as it would in the production environment. This is awesome for catching those little problems you don’t want to see during a go-live day (like file permission problems).
The downside is these kinds of tests tend to be very fragile. Even if nothing’s changed in the code base, things can break randomly. There are so many different variables involved that something is bound to break once in a while. Maybe someone left themselves logged in, and the database couldn’t be rebuilt. Maybe the server’s a bit slow, and some timeout starts getting hit. Maybe there was a small, cosmetic change to the UI which inadvertently breaks a lot. It’s no fun spending all day chasing false positives, but if you’re smart, you can massage your test harness to account for a lot of these issues, or at least to fail quickly with a good error message.
Another problem is that these tests are slow. That means you won’t run them very often. Maybe only once a day. And you certainly don’t want this as your only verification that a particular change works, and hasn’t broken everything (or even a few things). Developers could waste days making a speculative fix, kicking off the regression tests, waiting to see the results, then attempting to fix up the new problems they’ve introduced. For that we need to go down a level…
This kind of testing is in the middle of the three I’m going to talk about, so looks a little different depending on which side you’re viewing it from. If regression tests are testing the whole system (or as much as possible), integration tests are obviously testing a smaller sub-set of the system. That might be how two specific components of the system integrate with each other. That might be how parts of the application integrate with the database. That might simply be how two particular classes integrate with each other.
The key is, we’re not testing the whole system, but we’re not yet at the smallest possible unit. As you can see, that’s a somewhat vague definition. And this is where we can get into debates over nomenclature. For instance, to some people, it’s not an integration test if you mock some part of the system. For example, if you’re testing how some classes integrate with each other, but some of them call to the database, you might want to fake the calls to the database. This makes the tests less fragile – they don’t require a database to be setup in a particular known state to run correctly, and so they won’t fail if that setup breaks for some reason. It also removes variables from your tests – if a particular test isn’t about database functionality, it makes sense to remove that from the equation, and test only the things you’re interested in.
This distinction between integration vs unit testing (which I’ll describe in a minute) is more than just pedantry. Often, integration tests are run separately to unit tests, specifically because they are more fragile (and take longer to run) than unit tests. In an ideal world (somewhere I’ve never been) a developer would have a fast running, stable set of tests she can run before she checks in a change. If those tests run slowly and she has to spend an hour chasing false positives afterwards, her productivity is going to suffer hugely. Or more likely, she won’t bother to run the tests at all.
But the important distinction is not whether a test is integration or unit, but whether it’s potentially fragile or slow-running. And in practice, that’s the better way to separate your tests.
If integration tests can possibly use mocks, unit tests almost always do. A unit test tests one method of one class or module. Besides simple methods (or perhaps heavily computational methods) that don’t make calls to any other methods (including ones in it’s own class), you’ll have to mock something to make it a true unit test.
The big advantage of unit tests is they avoid all the problems of regression tests and some integration tests. They run quickly and they don’t break easily. But, as much as we’d like it, unit tests can never be the full picture. They test the smallest part of the system possible. So instead of worrying about false positives, false negatives become a problem. All your unit tests pass, you try to release to production, only to find out DNS wasn’t setup properly. Your client wonders how you can have 90% test coverage but miss something so obvious. Or an even more likely scenario is you have two classes, beautifully written, with perfect interfaces and 100% unit test coverage. But one calls the other using the incorrect parameters.
Other than this obvious problem, some languages can make certain kinds of unit testing or mocking difficult. Perl doesn’t tend to suffer from this, as it doesn’t really put any limits on the programmer (although that can come with its own risks). Languages such as Java can be a bit more restrictive, although clever people have found a way around most of those issues (e.g. simply adding your tests to a class’s namespace so you can unit test private methods).
A bigger reason not to unit test is sometimes it simply doesn’t make sense to do so. A common example is database access layers, which do little but connect to the database, and perhaps contain a little data-level logic. You can spend quite a lot of time creating a framework that mocks away the database, but then find the tests catch very few real bugs. Instead, it might make sense to test these only with integration tests, but do so in a way that makes them fairly robust, so they can still be run with the rest of the unit tests (I blogged about my approach for doing this in Perl a while ago).
There are of course several other types of testing. For instance, component testing – if your architecture is broken up into many components, each of which have an interface that can be tested (a very good idea!), you could have a set of tests just for that component. This is kind of an intermediate type between integration and regression testing.
And I’m not even going to mention load, performance or stress testing (and probably several I others don’t even know about). Apart from anything, this post is already long enough
So which is the most important type of testing?
It’s tempting to say “all of them”. But, of course, we (usually) have limited resources, and can’t always do everything we want to. You could make a reasonable argument that if you had good unit test coverage (and some non-fragile integration tests to make sure classes talk to each other properly), and a good suite of regression tests, the integration tests become less important, because the application is more or less covered. That’s all well and good if you can get a stable enough set of regression tests. I’ve rarely seen this with a UI-based app. I’m sure it’s possible, just not easy.
Other important lessons at least tangentially related to the above:
- Be clear about what you’re testing – each test should be designed to test something in particular, and should include as much or as little as the system as is necessary to achieve that aim. Thinking about which type of testing you’re trying to do can help with that
- Be clear about what you’re not testing – equally important, remember which parts of the system you’re missing out. Are they covered somewhere else?
- Think about the fragility of your tests – much more important than getting the terms right. Tests sometimes need to be fragile, but make sure you know about that, and do you best to improve them. A test suite is worthless if you have no confidence in it because it continuously throws false positives.