I'm working with the AWS Aurora database and looking to log long-running queries for performance analysis. Normally, for MySQL, you can set parameters like "slow_query_log" and for PostgreSQL, "log_min_duration_statement". These settings help log queries that exceed a specific duration, allowing you to analyze performance issues later, perhaps through CloudWatch alerts.
However, I have concerns about this approach in organizations that handle Personally Identifiable Information (PII) or Payment Card Industry (PCI) data, like financial institutions. There's the risk of sensitive information appearing in logs because it can sometimes be embedded in the SQL query literals.
How can we safely implement logging features without unintentionally exposing sensitive information? I'm trying to balance the need for monitoring performance while complying with regulatory requirements to protect sensitive data, especially in a large organization where many developers are running queries daily.
4 Answers
In regulated industries, it’s often recommended to disable options like "log_min_duration_statement" due to compliance requirements. Logging sensitive data can put organizations at risk. If you must log certain queries for troubleshooting, consider implementing PII scrubbing methods that align with existing protocols. Additionally, think critically about the actual necessity of slow query logging in a production environment; the volume of logs can become overwhelming and may not provide much new information. Instead, try profiling queries before they hit production and monitor latency via performance insights for a clearer picture.
Consider using CloudWatch’s capabilities efficiently: enable slow query logs but pair them with a strict data protection policy that masks sensitive data as it logs. With tools like AWS data protection features in CloudWatch, you can monitor performance without exposing sensitive details. Here are some links that might help: [AWS Database Products](https://aws.amazon.com/products/databases/) and others related to RDS, Aurora, etc.
You might want to follow a generic approach regardless of the database system: 1) Enable slow query logging exports from RDS to CloudWatch Logs. 2) Create a CloudWatch data protection policy for your RDS log group. 3) Use custom data identifiers or the pre-configured ones for PII to mask sensitive data during ingestion. This way, you can continue logging essential performance metrics while safeguarding sensitive information. Check out AWS’s guidelines on handling sensitive log data for more insights.
There's a feature called query digest in MySQL that helps redact sensitive information from logs. Can this be incentivized to be pushed to CloudWatch in a compliant way? Also, remember to maintain good log hygiene across all application logs, as those can also interact with sensitive data.

Related Questions
How To Get Your Domain Unblocked From Facebook
How To Find A String In a Directory of Files Using Linux