How to Automate SQL Queries Safely?

0
15
Asked By TechieTurtle92 On

I'm currently dealing with a pretty manual workflow for executing SQL queries at my company. The process goes like this: a Software Development Engineer (SDE) writes a query in a post or thread, then someone from DevOps or Sysadmin has to review the query, execute it on the database, check the output for any exposed confidential data, and then share the sanitized results back with the SDE. We keep it manual to ensure data confidentiality and proper query review, but it really slows things down. Recently, my manager expressed that this manual method isn't sustainable anymore.

I'm looking for suggestions on how to automate this process while maintaining the necessary safeguards. Specifically, I'd like to know:
- What DevOps or data engineering tools could streamline this workflow?
- Ideally, I'd like a solution where SDEs can create queries, DevOps can review and approve them, and then the queries run in a secure environment with proper logging.
- It would be great if the system could differentiate between read-only and write queries too.

Has anyone set up something like this? Do you think tools like GitHub PR with CI/CD, Airflow with manual triggers, or a custom internal tool would be the way to go?

5 Answers

Answered By ApprovalAce35 On

We use Git with GitHub Actions and Liquibase for managing database changes. For ad-hoc query retrievals, I recommend implementing a ticketing system where two people need to approve a query before it can run. This adds a layer of security and keeps things manageable.

Answered By InformedGuy77 On

I think creating views that mask sensitive data could work. Like, you could show just the first few characters of sensitive fields or hash the values. Grant devs read-only accounts that can only access these views, enable query logging, and monitor for any heavy queries. It lets developers query freely under controlled conditions, but you'll need to keep an eye on it.

Answered By DataDynamo88 On

You might want to consider doing an ETL to a read-only database. This approach allows you to mask sensitive data while ensuring that developers only have read access. This way, queries don’t run against the production database, and developers can perform checks themselves without needing to involve the DevOps team constantly.

Answered By SkepticalDev23 On

I've been down this road before, and it didn't end well. I think all read-only queries should happen through a data warehouse tool, while write operations should only be performed via applications or migrations. If your developers have messed with production data in the past, adding automated tools might just give them more chances to screw things up.

Answered By CloudySkies21 On

For ad-hoc queries, AWS Athena or Google BigQuery could be fantastic options. Both are serverless SQL query services with solid security features. You could set up an approval workflow using AWS Lambda or Google Cloud Functions to automate the review and execution process. This would also help in maintaining logs and ensuring data privacy while cutting down on manual reviews.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.