I'm having a frustrating issue with our CI/CD pipeline where our microservices processing files (like image resizing, video transcoding, and document parsing) are failing tests due to inconsistent test data. Our testing setup runs in Docker containers and requires various file types and sizes for boundary tests—some files need to be exactly 10MB, while others should be over 100MB. However, I can't commit large binary files to the repository.
Currently, I've tried:
- Downloading random files from the internet, which is unreliable since the sizes vary.
- Storing test files in S3, which works but adds an external dependency.
- Using dd commands to create files, but they end up with the wrong headers or formats.
The S3 method works but feels overkill for simple unit tests, plus some of our environments lack internet access. I've built a simple solution that generates files exactly to spec in the browser (check it out [here](https://filemock.com?utm_source=reddit&utm_medium=social&utm_campaign=devops)). Now, I'm considering incorporating this into our pipeline using headless Chrome for on-demand file creation. Has anyone tackled something similar? What approaches do you use for generating test files without relying on external dependencies or bloating your repo?
3 Answers
Consider using ORAS to store your test files directly in your Docker registry if you're lacking other storage options. It's really handy for that purpose! We often use it for database snapshots, and it fits your use case perfectly. You can find more about it [here](https://oras.land/docs/commands/use_oras_cli/).
Using S3 for file storage is definitely a solid choice since it integrates well with most setups, and the costs are minimal over time. Plus, the ease of access is great for continuous integration. If you're using something like Artifactory, that could work well too by packaging your files as zip files. But I also suggest creating a dedicated script to generate the specific files you need, using Python instead of dd commands for better control. You can run this script either as a part of your pipeline or just integrate it into your test setup.
What about generating files using commands like `dd if=/dev/urandom of=file bs=1024 count=1024`? It might not give you all the right headers, but it's a quick way to create large files.
That's pretty smart! Using container registries for artifact storage is a clever approach.