I'm interested in figuring out the best practices for seeding a local database for development without resorting to production data. I've experienced a variety of methods across different projects, most of which have their downsides:
- Some teams rely on outdated seed scripts that break during migrations.
- Others suggest just taking a production dump, but that always comes with the hassle of anonymizing sensitive data.
- Then there's the option of starting from an empty database and manually creating test records, which can be labor-intensive and prone to missing relationships, especially with complex foreign key setups (like users linking to orders, line items, and products).
I'm particularly curious about what others do these days. Here are a few specific questions:
1. Do you manage seed files manually, or is there a better approach?
2. What ORM factory libraries do you use, if any?
3. Have you ever dumped and anonymized production data for development?
4. Is there another unique method you're using?
I'm also interested in how you handle consistent data for continuous integration and continuous deployment testing.
6 Answers
This really is a tough nut to crack! Ideally, you'd have well-maintained seed scripts that give a good mix of realistic data, plus examples of past bugs. It seems like just about every tool out there falls short of making seed data easy to handle or accessing production data without a headache. For some projects, I've seen custom import/export tools that help isolate tricky data scenarios, like specific price histories, which have proven incredibly useful for development.
What about using an automated system for generating seed data? Do you think that could help!
Did you know AI can help in this area? It can generate relevant seed data just from your schema without needing any production data.
I think you touched on most of the sensible approaches! There’s no universal solution since it really depends on your specific datasets and requirements. What do you find works best in practice?
Using an ORM factory with Faker has really streamline my process! We get consistent, flexible test data quickly and easily.
I've set up a container with a database that mirrors production data. You just spin it up and run migration scripts—simple and effective!
My workflow prioritizes setting up the dev database first, filled with mock data for testing. Seed files are built alongside the tests for consistency. Then we have a UAT environment with realistic data, making the transition to production much smoother. Can be a bit of a laborious journey, but it pays off in the end!
That UAT environment can be tricky! How do you fill it with realistic data—are you sampling from production or creating it by hand?

Exactly! It's one of those scenarios where if you really want to get it right, you need to invest the time—otherwise, it becomes a total mess when you hit a snag.