[unit-testing] Database data needed in integration tests; created by API calls or using imported data?

3 Answers

In integration tests, you need to test with real database, as you have to verify that your application can actually talk to the database. Isolating the database as dependency means that you are postponing the real test of whether your database was deployed properly, your schema is as expected and your app is configured with the right connection string. You don't want to find any problems with these when you deploy to production.

You also want to test with both precreated data sets and empty data set. You need to test both path where your app starts with an empty database with only your default initial data and starts creating and populating the data and also with a well-defined data sets that target specific conditions you want to test, like stress, performance and so on.

Also, make sur that you have the database in a well-known state before each state. You don't want to have dependencies between your integration tests.


This question is more or less programming language agnostic. However as I'm mostly into Java these days that's where I'll draw my examples from. I'm also thinking about the OOP case, so if you want to test a method you need an instance of that methods class.

A core rule for unit tests is that they should be autonomous, and that can be achieved by isolating a class from its dependencies. There are several ways to do it and it depends on if you inject your dependencies using IoC (in the Java world we have Spring, EJB3 and other frameworks/platforms which provide injection capabilities) and/or if you mock objects (for Java you have JMock and EasyMock) to separate a class being tested from its dependencies.

If we need to test groups of methods in different classes* and see that they are well integration, we write integration tests. And here is my question!

  • At least in web applications, state is often persisted to a database. We could use the same tools as for unit tests to achieve independence from the database. But in my humble opinion I think that there are cases when not using a database for integration tests is mocking too much (but feel free to disagree; not using a database at all, ever, is also a valid answer as it makes the question irrelevant).
  • When you use a database for integration tests, how do you fill that database with data? I can see two approaches:
    • Store the database contents for the integration test and load it before starting the test. If it's stored as an SQL dump, a database file, XML or something else would be interesting to know.
    • Create the necessary database structures by API calls. These calls are probably split up into several methods in your test code and each of these methods may fail. It could be seen as your integration test having dependencies on other tests.

How are you making certain that database data needed for tests is there when you need it? And why did you choose the method you choose?

Please provide an answer with a motivation, as it's in the motivation the interesting part lies. Remember that just saying "It's best practice!" isn't a real motivation, it's just re-iterating something you've read or heard from someone. For that case please explain why it's best practice.

*I'm including one method calling other methods in (the same or other) instances of the same class in my definition of unit test, even though it might technically not be correct. Feel free to correct me, but let's keep it as a side issue.

I've used DBUnit to take snapshots of records in a database and store them in XML format. Then my unit tests (we called them integration tests when they required a database), can wipe and restore from the XML file at the start of each test.

I'm undecided whether this is worth the effort. One problem is dependencies on other tables. We left static reference tables alone, and built some tools to detect and extract all child tables along with the requested records. I read someone's recommendation to disable all foreign keys in your integration test database. That would make it way easier to prepare the data, but you're no longer checking for any referential integrity problems in your tests.

Another problem is database schema changes. We wrote some tools that would add default values for columns that had been added since the snapshots were taken.

Obviously these tests were way slower than pure unit tests.

When you're trying to test some legacy code where it's very difficult to write unit tests for individual classes, this approach may be worth the effort.

This will probably not answer all your questions, if any, but I made the decision in one project to do unit testing against the DB. I felt in my case that the DB structure needed testing too, i.e. did my DB design deliver what is needed for the application. Later in the project when I feel the DB structure is stable, I will probably move away from this.

To generate data I decided to create an external application that filled the DB with "random" data, I created a person-name and company-name generators etc.

The reason for doing this in an external program was: 1. I could rerun the tests on by test modified data, i.e. making sure my tests were able to run several times and the data modification made by the tests were valid modifications. 2. I could if needed, clean the DB and get a fresh start.

I agree that there are points of failure in this approach, but in my case since e.g. person generation was part of the business logic generating data for tests was actually testing that part too.

I generally use SQL scripts to fill the data in the scenario you discuss.

It's straight-forward and very easily repeatable.