Everyone seems to have their own definitions of the terms regression, integration and unit testing. I’ve seen it cause many
arguments spirited discussions (including some I’ve been at the centre of). A lot of the time, the disagreements are merely around nomenclature, and so are unimportant. Sometimes, however, there are subtle and important points of difference between these terms that can influence decisions about what kind of test suite you should be building.
I don’t pretend to have a canonical definition of these terms; feel free to disagree with the following. However, I would like to describe the different types of testing (particularly automated testing), how they fit together, and their general uses.
Let’s start at the top, and work our way down…
Also known as end-to-end testing, or sometimes integration testing (the first example of a contradictory definition). This is where you test as much of your system as possible via the public interface. For end-user interfaces (e.g. web applications), this can be quite difficult, and often this sort of testing is done manually. However, there are frameworks like Selenium emerging to attempt to tackle this problem in an automated way. If your system has a machine-only interface (e.g. XML or JSON over HTTP) automated regression testing is a lot easier.
The big advantage of this kind of testing is that you are testing a lot. The project I’m currently working on even spins up a set of virtual servers, re-deploys the entire application and starts up its various components, just as it would in the production environment. This is awesome for catching those little problems you don’t want to see during a go-live day (like file permission problems).
The downside is these kinds of tests tend to be very fragile. Even if nothing’s changed in the code base, things can break randomly. There are so many different variables involved that something is bound to break once in a while. Maybe someone left themselves logged in, and the database couldn’t be rebuilt. Maybe the server’s a bit slow, and some timeout starts getting hit. Maybe there was a small, cosmetic change to the UI which inadvertently breaks a lot. It’s no fun spending all day chasing false positives, but if you’re smart, you can massage your test harness to account for a lot of these issues, or at least to fail quickly with a good error message.
Another problem is that these tests are slow. That means you won’t run them very often. Maybe only once a day. And you certainly don’t want this as your only verification that a particular change works, and hasn’t broken everything (or even a few things). Developers could waste days making a speculative fix, kicking off the regression tests, waiting to see the results, then attempting to fix up the new problems they’ve introduced. For that we need to go down a level…
This kind of testing is in the middle of the three I’m going to talk about, so looks a little different depending on which side you’re viewing it from. If regression tests are testing the whole system (or as much as possible), integration tests are obviously testing a smaller sub-set of the system. That might be how two specific components of the system integrate with each other. That might be how parts of the application integrate with the database. That might simply be how two particular classes integrate with each other.
The key is, we’re not testing the whole system, but we’re not yet at the smallest possible unit. As you can see, that’s a somewhat vague definition. And this is where we can get into debates over nomenclature. For instance, to some people, it’s not an integration test if you mock some part of the system. For example, if you’re testing how some classes integrate with each other, but some of them call to the database, you might want to fake the calls to the database. This makes the tests less fragile – they don’t require a database to be setup in a particular known state to run correctly, and so they won’t fail if that setup breaks for some reason. It also removes variables from your tests – if a particular test isn’t about database functionality, it makes sense to remove that from the equation, and test only the things you’re interested in.
This distinction between integration vs unit testing (which I’ll describe in a minute) is more than just pedantry. Often, integration tests are run separately to unit tests, specifically because they are more fragile (and take longer to run) than unit tests. In an ideal world (somewhere I’ve never been) a developer would have a fast running, stable set of tests she can run before she checks in a change. If those tests run slowly and she has to spend an hour chasing false positives afterwards, her productivity is going to suffer hugely. Or more likely, she won’t bother to run the tests at all.
But the important distinction is not whether a test is integration or unit, but whether it’s potentially fragile or slow-running. And in practice, that’s the better way to separate your tests.
If integration tests can possibly use mocks, unit tests almost always do. A unit test tests one method of one class or module. Besides simple methods (or perhaps heavily computational methods) that don’t make calls to any other methods (including ones in it’s own class), you’ll have to mock something to make it a true unit test.
The big advantage of unit tests is they avoid all the problems of regression tests and some integration tests. They run quickly and they don’t break easily. But, as much as we’d like it, unit tests can never be the full picture. They test the smallest part of the system possible. So instead of worrying about false positives, false negatives become a problem. All your unit tests pass, you try to release to production, only to find out DNS wasn’t setup properly. Your client wonders how you can have 90% test coverage but miss something so obvious. Or an even more likely scenario is you have two classes, beautifully written, with perfect interfaces and 100% unit test coverage. But one calls the other using the incorrect parameters.
Other than this obvious problem, some languages can make certain kinds of unit testing or mocking difficult. Perl doesn’t tend to suffer from this, as it doesn’t really put any limits on the programmer (although that can come with its own risks). Languages such as Java can be a bit more restrictive, although clever people have found a way around most of those issues (e.g. simply adding your tests to a class’s namespace so you can unit test private methods).
A bigger reason not to unit test is sometimes it simply doesn’t make sense to do so. A common example is database access layers, which do little but connect to the database, and perhaps contain a little data-level logic. You can spend quite a lot of time creating a framework that mocks away the database, but then find the tests catch very few real bugs. Instead, it might make sense to test these only with integration tests, but do so in a way that makes them fairly robust, so they can still be run with the rest of the unit tests (I blogged about my approach for doing this in Perl a while ago).
There are of course several other types of testing. For instance, component testing – if your architecture is broken up into many components, each of which have an interface that can be tested (a very good idea!), you could have a set of tests just for that component. This is kind of an intermediate type between integration and regression testing.
And I’m not even going to mention load, performance or stress testing (and probably several I others don’t even know about). Apart from anything, this post is already long enough 🙂
So which is the most important type of testing?
It’s tempting to say “all of them”. But, of course, we (usually) have limited resources, and can’t always do everything we want to. You could make a reasonable argument that if you had good unit test coverage (and some non-fragile integration tests to make sure classes talk to each other properly), and a good suite of regression tests, the integration tests become less important, because the application is more or less covered. That’s all well and good if you can get a stable enough set of regression tests. I’ve rarely seen this with a UI-based app. I’m sure it’s possible, just not easy.
Other important lessons at least tangentially related to the above:
- Be clear about what you’re testing – each test should be designed to test something in particular, and should include as much or as little as the system as is necessary to achieve that aim. Thinking about which type of testing you’re trying to do can help with that
- Be clear about what you’re not testing – equally important, remember which parts of the system you’re missing out. Are they covered somewhere else?
- Think about the fragility of your tests – much more important than getting the terms right. Tests sometimes need to be fragile, but make sure you know about that, and do you best to improve them. A test suite is worthless if you have no confidence in it because it continuously throws false positives.
Recently, I’ve been coming across a few Perl projects (both at work, and in the wild) that are reasonably large and complex, have test suites, but don’t use Test::Class. And I really struggle to understand why. Most of them have decent test suites, at least in terms of coverage. But, personally, I have a hard time looking at the test files and understanding exactly what’s going on.
Most other widely used languages have at least one class-based testing framework that is the de-facto standard, usually based around xUnit. I find there are several benefits to using a framework like Test::Class:
- As I mentioned, it makes your tests more readable by breaking them up into methods (I blogged a while ago about breaking them up even further for maximum readability). This ensures you’re clear about what you’re testing in every test, and you don’t get a really complex setup, used by a whole lot of different tests. Not only is that difficult to read, but it quickly becomes complex to maintain, and the tests become fragile, throwing you false negatives.
- Makes code-reuse easier, with inheritance (ok, so we all use roles now, but inheritance is still better than nothing).
- Makes it easy to setup fixtures by storing them in $self, using setup, teardown, etc. methods, all of which encourages more code-reuse
- Helps with test driven development, by allowing you two map one application class to one test class, or set of classes where necessary. You can do it with a bunch of .t scripts as well, I guess, but I usually find it easier with classes.
- And many more…!
We like to have a lot of options in Perl, and I’m fine with that, but you usually should have a good reason for deviating from the norm. So is Test::Class the norm (or did I just get a bad sample)? If it’s not the standard, why isn’t it? It seems to have a pretty thin set of dependencies, is stable and mature code, and fits right in with Test::Builder and friends.
In my opinion, you should use Test::Class unless you have a damn good reason not to!
I’ve noticed three progressive stages that I’ve moved through in learning to write tests. They may not be the same for everyone, but I suspect there are some people out there who can relate. At any rate, the following stages have been my experience. By the way, I’m really talking about unit / integration testing, rather than functional testing, or (since those terms can be quite muddy) about testing done by programmers rather than testers.
Stage 1: Testing because you have to
To begin with, a lot of people write tests because their place of work says they have to. For others (like me) it may be because a lot of people online keep going on about it. In my case, this also lead me to attempt to introduce it as a standard at work, even though I was just starting out with it myself.
Often during this stage, testing can be a bit of a chore. Part of the reason may be because you don’t really (deeply) understand what you’re doing, and so you make a lot of mistakes. Your tests are just plain wrong. For example, you might mock too much or too little, making your tests fragile.
But this is a natural learning process to me. There’s certainly nothing wrong with adopting something simply because most of the industry says it’s a best practice, especially when you don’t know any better. Once you’ve mastered it yourself, you’re then in a position to decide whether to retain it, or to tweak the process. It’s much better to adopt something you don’t understand than to ignore it for the same reason.
Stage 2: Testing to spot regressions
Once you start writing tests in an automated and repeatable manner, one of the first benefits you notice is indication of regression. If you refactor a method, you can be sure it still behaves in the same way if you have some test coverage. Additionally, you might also get tests complaining if you make a change to one class, and it breaks another (assuming you haven’t mocked out too much, or have integration as well as unit tests).
This is where some sort of continuous integration starts to become important (i.e. running the tests frequently – preferably on every checkin), especially in teams with more than a handful of developers, because the tighter your feedback loop becomes, the easier it is to track down the cause of the issue.
Stage 3: Testing to see if your code works
I started programming around the age of 10. While it was obviously a long time before I did much professionally, I’ve still always needed to see if the code I’ve written actually works. Until a couple of years ago, I’d usually achieve this by somehow running the program. This makes sense for something simple, but anything more complex can lead to the need for complicated (and often ad-hoc) testing rigs, or large amounts of time trying to get the application into the desired state.
Nowadays I often simply write a test. It’s a subtle change, but my brain process is not “I should write a test because it’s a best practice”, or “I should write a test in case something breaks this later”, but “I should write a test to see if this bit of code actually does what I want it to do”. I guess it took a long time to break down those years of habitually trying to run the code to see if it worked.
This stage is really what most people call Test Driven Development. Some would argue that if you just learnt to do “test first”, and to always write tests, you’d skip straight to this point. I tend to think you can still be in stage 1 even if you’re writing your tests first, and it’s not always practical – or even necessary – to write tests for everything. There are a lot of people who get pretty fanatical when it comes to testing… I guess I’m a pragmaticist at heart.
Still, I’m sure a lot of people have gone straight to stage 3. Or progressed in a different order. Or have reached further stages of Testing Enlightenment that I’m not yet aware of. If so, I’d love to hear your experiences.
I thought I’d pass on a simple tip that was given to me by a colleague last year. When writing tests, it can help to break it into sections called “Given”, “When”, and “Then”, indicated by comments:
- “Given” – set up the criteria for your test
- “When” – execute the code for the test
- “Then” – inspect the results, and make assertions
I’ve tried a lot of different ways of writing testing frameworks for database interaction, but have finally settled on an approach that seems to work pretty well. I figured I’d describe it here in case it’s of use to anyone. Pretty much every project I’ve worked on in the last few years has used an ORM; for Perl I use DBIx::Class, and in Java, Hibernate. This has led me down a particular path in terms of an approach to testing, but it could probably work without an ORM, so long as you have a well-encapsulated data access layer. But for the purposes of this post, I’ll use DBIx::Class code as an example.