Database data needed in integration tests; created by API calls or using imported data?


In integration tests, you need to test with real database, as you have to verify that your application can actually talk to the database. Isolating the database as dependency means that you are postponing the real test of whether your database was deployed properly, your schema is as expected and your app is configured with the right connection string. You don't want to find any problems with these when you deploy to production.

You also want to test with both precreated data sets and empty data set. You need to test both path where your app starts with an empty database with only your default initial data and starts creating and populating the data and also with a well-defined data sets that target specific conditions you want to test, like stress, performance and so on.

Also, make sur that you have the database in a well-known state before each state. You don't want to have dependencies between your integration tests.


This question is more or less programming language agnostic. However as I'm mostly into Java these days that's where I'll draw my examples from. I'm also thinking about the OOP case, so if you want to test a method you need an instance of that methods class.

A core rule for unit tests is that they should be autonomous, and that can be achieved by isolating a class from its dependencies. There are several ways to do it and it depends on if you inject your dependencies using IoC (in the Java world we have Spring, EJB3 and other frameworks/platforms which provide injection capabilities) and/or if you mock objects (for Java you have JMock and EasyMock) to separate a class being tested from its dependencies.

If we need to test groups of methods in different classes* and see that they are well integration, we write integration tests. And here is my question!

  • At least in web applications, state is often persisted to a database. We could use the same tools as for unit tests to achieve independence from the database. But in my humble opinion I think that there are cases when not using a database for integration tests is mocking too much (but feel free to disagree; not using a database at all, ever, is also a valid answer as it makes the question irrelevant).
  • When you use a database for integration tests, how do you fill that database with data? I can see two approaches:
    • Store the database contents for the integration test and load it before starting the test. If it's stored as an SQL dump, a database file, XML or something else would be interesting to know.
    • Create the necessary database structures by API calls. These calls are probably split up into several methods in your test code and each of these methods may fail. It could be seen as your integration test having dependencies on other tests.

How are you making certain that database data needed for tests is there when you need it? And why did you choose the method you choose?

Please provide an answer with a motivation, as it's in the motivation the interesting part lies. Remember that just saying "It's best practice!" isn't a real motivation, it's just re-iterating something you've read or heard from someone. For that case please explain why it's best practice.

*I'm including one method calling other methods in (the same or other) instances of the same class in my definition of unit test, even though it might technically not be correct. Feel free to correct me, but let's keep it as a side issue.

If you are going to building lists of students you can make a list builder class - StudentsBuilder. By default the builder class will generate a list of Students will psuedo-random properties defined by you. This is similar to the approach of AutoPoco.

I find that making your own list builder class is more flexible in terms of defining the creation behavior and supporting any type of class. I make a builder class with IList<T> fields (similar to a data-oriented structure of arrays (SoA) approach).

public class StudentsBuilder
    private int _size;
    private IList<string> _firstNames; 
    private IList<string> _lastNames;
    private IList<MentorBuilder> _mentors;

    public StudentsBuilder(int size = 10)
        _size = 10;
        _firstNames = new RandomStringGenerator(size).Generate();
        _lastNames = new RandomStringGenerator(size).Generate();
        _mentors = Enumerable.Range(0, size).Select(_ => new MentorBuilder()).ToList();

    public StudentsBuilder WithFirstNames(params string[] firstNames)
        _firstNames = firstNames;
        return this;

    public IList<Student> Build()
        students = new List<Student>();
        for (int i = 0; i < size; i++)
            students.Add(new Student(_firstNames[i], _lastNames[i], _mentors[i].Build());
        return students;

Each field list is overridden using a separate method taking a params array argument. You could also make field lists public in order to use a fancier With(Action<StudentsBuilder> action) syntax for overriding values. Test code looks like:

var students = new StudentBuilder(size: 4)
    .WithFirstNames("Jim", "John", "Jerry", "Judy")

Spring testing: which is the common way of creating and maintaining test data?

If you are just unit testing, its good to mock up database connection and data.

If you are doing end to end testing, you need to design your test in such a way that it will create data, perform tests and then finally remove data.

Most of the time you have services for CURD operation. You need to use existing services intelligently. Following approach worked for me. - check if data exists. (using id reserve for test). Remove data if its already there. - create data using service. (e.g. have some unique id). - perform update and fetching operation. - finally delete test data.

This would be clean approach and you might want to use your DEV database for this.

There might be better approach than this but above worked for me.