Hey everyone! We're in the process of moving our workloads to AWS from an on-prem Cloudera setup. Currently, we use Sqoop for our daily loads from RDBMS to HDFS. I'm looking for a comparable tool within the AWS ecosystem that can help us achieve this, ideally without using binlog CDC, as our use case is straightforward—the tables we want to load have a clear updated_date and records aren't deleted. Any suggestions?
2 Answers
You might want to check out AWS Database Migration Service (DMS). It's designed for just this kind of scenario!
AWS Glue would be a good fit! It can connect to JDBC-compatible databases like MySQL. You can set up a Glue job and pull rows based on your updated_date. Plus, you can store the data in S3 in formats like Parquet or CSV, similar to what you do with HDFS. It also allows scheduling this process daily. Just a heads up, though, connecting Glue to your on-prem infrastructure might be a bit complex. You can use Direct Connect (DX) or a Site-to-Site (S2S) VPN for that.
Related Questions
How To Get Your Domain Unblocked From Facebook
How To Find A String In a Directory of Files Using Linux