How Do You Sync Data from Prod to Dev Without the Hassle?

0
0
Asked By CloudyNomad42 On

I'm currently working on syncing data between our Production and Development Kubernetes clusters. We maintain the desired state using FluxCD and have a setup similar to the example in the FluxCD GitHub repository. However, syncing Persistent Volume (PV) data is quite annoying since we have to manually restore a Velero backup from Production to Development, which takes about 2-3 hours each time.

We're looking into automating the restore process to run nightly or weekly. Right now, our restore sequence involves:
1. Restoring basic Kubernetes resources like flux-controllers, ingress, sealed-secrets-controller, and cert-manager.
2. Restoring PostgreSQL using PgBackrest.
3. Managing secrets.
4. Restoring Kubernetes apps that rely on Postgres, such as GitLab and Grafana.

During restoration, we have to make numerous adjustments to Kubernetes resources, like deleting scheduled backups, updating S3 secrets to read-only, and temporarily suspending flux-controllers to prevent them from removing Velero restore resources that aren't defined in our desired state repository. I've also been utilizing Velero Resource Policies and Restore Hooks to manage these adjustments.

I'm wondering if there's a simpler way to keep Production and Development cluster data in sync or if I'm missing something in my approach. I previously attempted just syncing PV data, but encountered permission issues with certain pods accessing the data. Would love to hear how others are handling this issue!

3 Answers

Answered By CautiousCoder88 On

For real, avoid pulling production data into dev like the plague. You should be generating synthetic data instead. Just think about performance tests or tests that require real data—it should always be done with anonymized or stubbed datasets. Keeping it professional means steering clear of using production data in development.

Answered By DataGuard555 On

I completely agree! That kind of process seems risky. Have you thought about what kinds of data you actually need for development? Most of it can often be replicated externally. Why not just create stubs and automate data imports rather than pulling production data? That way you avoid any compliance issues or nightmares later!

Answered By DevOpsWiz99 On

Are you sure you want to be syncing production data directly to dev? It can lead to serious security issues. Usually, businesses pull production data, anonymize it, and then use it for dev environments. It’s crucial for avoiding potential data leaks or compliance headaches. However, it’s usually better to work with synthetic data for development. You can still create representative test data without the risks that come from using actual production data.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.