I've been working with Java applications that require a database, and I've struggled to find an effective way to populate the database for integration tests. Here are some of my main issues: 1. Using Java code to insert data can be hard to maintain and often ends up being pretty verbose. Plus, when it comes to the database's initial state for tests, it might not be clear what's actually in there. 2. SQL scripts can be difficult to read and when the model changes, updating these files can be tedious. 3. Relying solely on the app's API to fill the database can slow down tests significantly, especially when specific setups are needed right from the start. 4. I explored DBUnit but it doesn't quite address my concerns. I've developed a tool that addresses some pain points, but I'm still interested to hear how others manage their integration test setups. Here's a link to my project: [https://gitlab.com/carool1/matchadb](https://gitlab.com/carool1/matchadb). It currently works with Hibernate, but I'm considering creating a version that works with just JPA. What approach do you guys use for integrating test data? Am I missing something?
5 Answers
I think the best way is to use the same type of database as in production, like MySQL or Postgres, rather than an in-memory option. When a test starts, the database should be completely empty except for the expected structure. You can populate it using Java code (like Hibernate), and ensure there’s no implicit state left after the test by truncating all the tables afterwards.
Why should your integration tests assume any data exists by default? Just create what you need right in your tests. It cuts down on setup time and ensures you’re testing your code directly. If a product or user needs to be present, create them within the test! This way, you're testing the very functionality you need.
Yeah, that's the drawback of this method. You can end up with slower tests if each test needs to create everything from scratch.
I've tried numerous approaches, and none seem to be perfect. We use flying scripts for migrations, and my tests are a mix of unit mocks and integration with test containers. Honestly, it’s a bit of a mess, and I feel like consolidating into one strategy would be better.
I use migration scripts to ensure the database structure is consistent across environments. For integration tests, I either have bootstrap scripts or use a custom framework that can insert records as needed. For larger datasets, sometimes I go through a UI tool to bulk load data.
Using the same database as production is the way to go for me. I also think integrating some test containers for your tests can be powerful. If you do need a large dataset, loading it from a `.sql` file at the start can save time!

I get what you're saying, but I found that having to call the API to create users every time can really slow things down. I prefer having some basic data already in the test database.