I'm looking for a method to capture all the INSERTS, UPDATES, and DELETIONS from my Aurora PostgreSQL database directly into S3 in Parquet format. This would be for compliance reasons and for historical analytics, essentially implementing Slowly Changing Dimension (SCD) Type 2 for all tables. It seems like using AWS Database Migration Service (DMS) with Change Data Capture (CDC) would be a good choice since it allows for wildcard patterns to automatically manage table captures without the hassle of individual configurations. However, I'm worried that DMS, which is typically seen as a tool for one-time migrations, might not be suitable for long-term continuous operation. Is there a built-in solution from AWS that addresses this issue? I'm hoping to avoid custom coding for each table or any issues with atomicity related to the services that interact with the database.
1 Answer
Using DMS for ongoing replication from Aurora to S3 in Parquet format has worked well for us, with very few issues encountered along the way. It's definitely designed with CDC in mind.

That sounds just like what I need! Would you be able to share how you set it up? Also, do you partition the Parquet files by date? Is it reliant on Lambda like many AWS setups?