I've been using AWS Glue for our internal data processing; it works great for crawling S3 buckets, transforming Parquet files, and loading data into Redshift. However, the tool seems to struggle with extracting data from SaaS applications. Unfortunately, there are no prebuilt connectors for popular platforms like Salesforce, HubSpot, or NetSuite, which forces us to rely on custom solutions. This involves writing Python code or using Lambda functions for API calls, pagination, rate limiting, OAuth token management, and ultimately dumping the data in S3 for Glue to handle. Each extractor takes weeks to develop and requires constant maintenance whenever a SaaS vendor updates their API.
With around 20 SaaS sources to process, it's just not practical for our small team of three. It's interesting to note that the AWS Marketplace offers a handful of ETL tools that specifically address this issue, suggesting that AWS itself recognizes Glue's limitations in SaaS data ingestion. I'm curious—what tools or solutions are you all using for the SaaS extraction?
9 Answers
We opted for a managed ETL tool specifically designed for SaaS sources. It drops data into S3, and then Glue takes over for transformations. This way, we avoid writing custom API code, and the tool takes care of the heavy lifting like rate limiting and incremental sync, which previously drained our resources.
We use Precog for SaaS extraction, and it loads the data directly into Redshift for us. We still use Glue for the transformation part since it integrates nicely with our AWS setup. Different tools excel at different tasks, and trying to force Glue to handle SaaS ingestion was too much hassle.
AWS is definitely aware of this challenge, which is why they created AppFlow. However, I found that it may not cover all your needs and the available connectors can be pretty limited.
We've encountered similar challenges in our field, so we built our own ingestion schema for various major SaaS products. During a fellowship with AWS, we even created an open-source framework for this purpose. We prefer using our internal services over Glue, as our model leans more toward ELT.
Honestly, you’re making me feel lucky for not having to deal with AWS right now!
I've faced the same issue! We use Glue for our internal ETL processes, but we switched to a different tool for SaaS ingestion. We tried AppFlow, but the connector coverage didn't meet our needs and the pricing was confusing as we scaled. Eventually, we decided to go with a third-party solution.
I hate to say it, but AI might be a game changer for this kind of data extraction. It can handle the boring stuff like mapping and connectors really well, especially if you set it up to trigger from errors or bugs you encounter. There are tools like Claude Code that can work within your existing AWS ecosystem to manage API calls and fixes without needing extra licenses.
Have you thought about using something like n8n to handle the API calls? You could push data to S3 from there, letting Glue take care of the rest. Just a thought!
We moved to Fivetran and dbt for our SaaS extraction. While it may not cover every use case, it significantly eased the process and allowed us to focus more on transformations with Glue.

Related Questions
How To: Running Codex CLI on Windows with Azure OpenAI
Set Wordpress Featured Image Using Javascript
How To Fix PHP Random Being The Same
Why no WebP Support with Wordpress
Replace Wordpress Cron With Linux Cron
Customize Yoast Canonical URL Programmatically