What's the best strategy for unit-testing database-driven applications?database


I'm always running tests against an in-memory DB (HSQLDB or Derby) for these reasons:

  • It makes you think which data to keep in your test DB and why. Just hauling your production DB into a test system translates to "I have no idea what I'm doing or why and if something breaks, it wasn't me!!" ;)
  • It makes sure the database can be recreated with little effort in a new place (for example when we need to replicate a bug from production)
  • It helps enormously with the quality of the DDL files.

The in-memory DB is loaded with fresh data once the tests start and after most tests, I invoke ROLLBACK to keep it stable. ALWAYS keep the data in the test DB stable! If the data changes all the time, you can't test.

The data is loaded from SQL, a template DB or a dump/backup. I prefer dumps if they are in a readable format because I can put them in VCS. If that doesn't work, I use a CSV file or XML. If I have to load enormous amounts of data ... I don't. You never have to load enormous amounts of data :) Not for unit tests. Performance tests are another issue and different rules apply.


I work with a lot of web applications that are driven by databases of varying complexity on the backend. Typically, there's an ORM layer separate from the business and presentation logic. This makes unit-testing the business logic fairly straightforward; things can be implemented in discrete modules and any data needed for the test can be faked through object mocking.

But testing the ORM and database itself has always been fraught with problems and compromises.

Over the years, I have tried a few strategies, none of which completely satisfied me.

  • Load a test database with known data. Run tests against the ORM and confirm that the right data comes back. The disadvantage here is that your test DB has to keep up with any schema changes in the application database, and might get out of sync. It also relies on artificial data, and may not expose bugs that occur due to stupid user input. Finally, if the test database is small, it won't reveal inefficiencies like a missing index. (OK, that last one isn't really what unit testing should be used for, but it doesn't hurt.)

  • Load a copy of the production database and test against that. The problem here is that you may have no idea what's in the production DB at any given time; your tests may need to be rewritten if data changes over time.

Some people have pointed out that both of these strategies rely on specific data, and a unit test should test only functionality. To that end, I've seen suggested:

  • Use a mock database server, and check only that the ORM is sending the correct queries in response to a given method call.

What strategies have you used for testing database-driven applications, if any? What has worked the best for you?

Personally, I create a dummy testing database and populate it with a known data set for each testing run (I do that right in the setUp functions). Then the tests run against that data set, and then it's removed on tearDown...

Now, this is more of a Integration test than an Unit test (And personally I treat it differently from a unit test, run on its own schedule along with other integration tests), but it's still quite useful.

It sounds like you're having issues in the DAO code itself? Otherwise, the DAO layer is the obvious place to do your mocking, but if you're trying to test the DAO, then you'll need to mock that which comes beneath.

Personally, I tend to stay away from mocking large, complex libraries; if you really need to test the DAO layer directly and the DAO works directly with JDBC, you've got three obvious choices:

  1. Run an integrated test that includes the DAO and JDBC along with a Database
  2. Add a layer above JDBC with a thinner interface, better suited for mocking.
  3. Use JDBC mocks either of your own writing, or some of the items listed above.

I would almost always choose #1 or #2. Because there's a host of possibilities of errors in malformed SQL syntax and the like I tend to lean towards #1. I realize, however, that that's not what you're asking for. ;)

Where to start with the development of first database driven Web App (long question)?

I'm not sure where to start.

We always start with the data model.

having never worked with databases before, how to I incorporate these items into the models and test units?

Ah, well, there's the problem.

You have to start with the data, because everything starts with the data.

  1. Write use cases. Your 5 technical requirements aren't really very good use cases because there's no Actor who interacts with a system. You need to write down the things an actor does -- what the actor interacts with.

  2. Identify the objects that the actor interacts with. When in doubt, write down all the nouns in your use cases.

  3. Draw a picture of the classes that will implement those objects.

  4. Walk through your use cases to be sure that all the things an Actor needs to do are based on attributes of your class definitions. All the data an actor needs has to be in these classes, which will become tables in your database.

Now, you have to start doing more technical work. You need a web framework with an ORM layer. You're using PHP, so you'll have to find a good framework for this. Cake or Doctrine or something.

  1. Define one class in your ORM layer. You don't have to be complete or precise, you just need something that you can test with.

  2. Write unit tests for this class to be sure that it has the right data elements. This will be relatively simple at first, since your individual data classes will start out simple.

Once the basic set of classes are defined, you'll need to look at relationships among classes. This is where the real work starts.

  1. Look at a relationship in your picture. Any line. Pick one randomly.

  2. Write unit tests to be sure that you can "navigate" across that line. From one object, fetch the related objects at the other end of the line. This shouldn't be very complex, just a few lines of code.

  3. Create a "fixture" -- test data you can load into a database -- to represent the objects and their relationships.

  4. Run the test. If it breaks, update your ORM definitions to support the navigation properly.

When you've finished doing all the various relationships, you will have built a reasonable complete data model in a Test-Driven manner.

Now you can build your PHP presentation for the model objects.

First the backbone of any web application is the database. I would suggest starting simple, doing up the database design and then writing model and view classes to render them. Second, you should design your DB with your goals in mind (sub-pages, tree structure) but don't be afraid to make mistakes. We could drill you on normalization or benefits of using a object relational model but if you personally don't see the need for those, it'll be hard to internalize the concepts.

TDD can be employed from day one, as you test the model and view classes of the DB. UML I am not sure it would be helpful - if possible just stick to sequence, class and collaboration diagrams.

A concrete answer for using DB in unit test - insert dummy rows during setup, and remove them during the teardown stages. You may want to write a PHP script to "clean up" a DB after testing.

Hope this helps!