I'm looking to overhaul our current SaaS data ingestion setup. We designed it to be cloud-native using AWS, with a separate Lambda function for each of our 18 different SaaS sources. Each function pulls data via APIs, processes the responses, and saves JSON files in S3, all orchestrated by Step Functions and monitored by CloudWatch. While our architecture looks great on paper, it's a nightmare in practice. Each Lambda has been written differently, some using deprecated Python versions, and the error handling varies widely. Failures are tough to trace since everything's asynchronous. I'm considering switching to a managed ingestion tool that could feed data directly into Redshift and possibly keep some raw data in S3. My primary concern is whether the reduction in maintenance and hassle with the tools justifies the cost. Has anyone else made this switch and what was your experience?
5 Answers
I've also tried going back to a custom backend, utilizing Python on Elastic Beanstalk with SQS and Celery. If you’re looking for a higher-level abstraction, AWS supports Apache Airflow, which could simplify your task management as it operates similarly to your current setup. Migrating your Python Lambdas to Airflow tasks might also be relatively straightforward.
I did this swap about eight months ago. We kept everything downstream intact but replaced the custom Lambdas with a managed tool. The data goes into S3 in the same structure, so our Glue jobs didn’t need any adjustments. The operational overhead dropped significantly since we stopped getting woken up at 2 AM due to Lambda failures. The cost of the tool was comparable to what we were spending on Lambda executions and the engineering time to maintain them.
It sounds like what you really need is a process improvement rather than a tool switch. Ensure consistency and maintainability in your Lambda setups by applying a standard review process, using linting tools, and setting up a proper CI/CD pipeline. This could alleviate many of the issues you've encountered without switching to a completely new tool.
If you keep using the S3 landing zone, confirm that your chosen tool supports the file format and folder structure your Glue jobs need. Aligning these details from the start can save you a lot of rework later.
Your problems seem to stem from skill gaps rather than just tool issues. Switching to a SaaS tool won't resolve things magically. Many platforms, like Boomi or Jitterbit, might even complicate your ETL processes. Instead of doing everything in one job, consider splitting your tasks: first extract and load the data, then transform it. This approach minimizes bugs and helps maintain cleaner code, even if you'll still need to manage idempotency and other concerns.

That's great to hear! My worry was that switching would force me to completely redesign the pipeline. Sounds like you managed to just replace the data source without other major changes.