What's the best strategy for unit-testing database-driven applications?
I'm always running tests against an in-memory DB (HSQLDB or Derby) for these reasons:
- It makes you think which data to keep in your test DB and why. Just hauling your production DB into a test system translates to "I have no idea what I'm doing or why and if something breaks, it wasn't me!!" ;)
- It makes sure the database can be recreated with little effort in a new place (for example when we need to replicate a bug from production)
- It helps enormously with the quality of the DDL files.
The in-memory DB is loaded with fresh data once the tests start and after most tests, I invoke ROLLBACK to keep it stable. ALWAYS keep the data in the test DB stable! If the data changes all the time, you can't test.
The data is loaded from SQL, a template DB or a dump/backup. I prefer dumps if they are in a readable format because I can put them in VCS. If that doesn't work, I use a CSV file or XML. If I have to load enormous amounts of data ... I don't. You never have to load enormous amounts of data :) Not for unit tests. Performance tests are another issue and different rules apply.
I work with a lot of web applications that are driven by databases of varying complexity on the backend. Typically, there's an ORM layer separate from the business and presentation logic. This makes unit-testing the business logic fairly straightforward; things can be implemented in discrete modules and any data needed for the test can be faked through object mocking.
But testing the ORM and database itself has always been fraught with problems and compromises.
Over the years, I have tried a few strategies, none of which completely satisfied me.
Load a test database with known data. Run tests against the ORM and confirm that the right data comes back. The disadvantage here is that your test DB has to keep up with any schema changes in the application database, and might get out of sync. It also relies on artificial data, and may not expose bugs that occur due to stupid user input. Finally, if the test database is small, it won't reveal inefficiencies like a missing index. (OK, that last one isn't really what unit testing should be used for, but it doesn't hurt.)
Load a copy of the production database and test against that. The problem here is that you may have no idea what's in the production DB at any given time; your tests may need to be rewritten if data changes over time.
Some people have pointed out that both of these strategies rely on specific data, and a unit test should test only functionality. To that end, I've seen suggested:
- Use a mock database server, and check only that the ORM is sending the correct queries in response to a given method call.
What strategies have you used for testing database-driven applications, if any? What has worked the best for you?
For JDBC based project (directly or indirectly, e.g. JPA, EJB, ...) you can mockup not the entire database (in such case it would be better to use a test db on a real RDBMS), but only mockup at JDBC level.
Advantage is abstraction which comes with that way, as JDBC data (result set, update count, warning, ...) are the same whatever is the backend: your prod db, a test db, or just some mockup data provided for each test case.
With JDBC connection mocked up for each case there is no need to manage test db (cleanup, only one test at time, reload fixtures, ...). Every mockup connection is isolated and there is no need to clean up. Only minimal required fixtures are provided in each test case to mock up JDBC exchange, which help to avoid complexity of managing a whole test db.
Acolyte framework includes a JDBC driver and utility for this kind of mockup: http://acolyte.eu.org .
Even if there are tools that allow you to mock your database in one way or another (e.g. jOOQ's
MockConnection, which can be seen in this answer - disclaimer, I work for jOOQ's vendor), I would advise not to mock larger databases with complex queries.
Even if you just want to integration-test your ORM, beware that an ORM issues a very complex series of queries to your database, that may vary in
- order (!)
Mocking all that to produce sensible dummy data is quite hard, unless you're actually building a little database inside your mock, which interprets the transmitted SQL statements. Having said so, use a well-known integration-test database that you can easily reset with well-known data, against which you can run your integration tests.