I'm currently using FluxCD to maintain the desired states of our Production and Development clusters, which are stored in a Git repository. To sync persistent volume (PV) data between the two, we manually restore backups from Production to Development using Velero, but this process takes about 2-3 hours each time, which is frustrating. I want to automate this backup restoration process to run nightly or weekly, but it's complex and requires many manual adjustments like deleting scheduled backups, updating S3 secrets, and suspending flux controllers to prevent them from deleting resources during the restore. I'm wondering if there's a better and simpler way to keep my Production and Development cluster data in sync, rather than going through this tedious process? I've tried syncing only the PV data but ran into permission issues. Any insights from your experiences would be really appreciated!
2 Answers
It sounds like you might be overcomplicating things a bit. Syncing data from Production to Development is tricky for a reason—data security is key! Instead of syncing Production data directly, have you considered using masked or anonymized copies? It helps to test with realistic datasets without risking sensitive information. Many companies routinely do this to ensure compliance and proper testing environments. You just want to be careful about how you handle PII when pulling data over.
It’s risky to use Production data in Development due to compliance issues. Instead of doing that, try creating stub datasets or using fixtures for your automated tests—this way you can still run your tests on dev without the complications. Importing production data directly can lead to serious compliance and data security nightmares.
Just like you said! Relying on production data in development isn’t ideal; it can lead you into regulatory troubles. Stick with realistic yet anonymized datasets to keep development safe and compliant.
Exactly! You can mimic the production environment without actually importing sensitive data. Creating synthetic datasets for development can save you from a ton of headaches.