Need Help with AWS Data Pipeline: Avoiding Duplication and Step-by-Step Setup

0
5
Asked By TechDreamer42 On

Hey everyone! I'm just getting started with building a data pipeline using AWS services, and I'd love to get some advice and best practices from you all. I'm working on a setup where I have a mock API hosted on EC2 that returns sales data. Here's what I have planned:

- A Lambda function, triggered daily via EventBridge, fetches data from the API and saves it in an S3 bucket under a /raw/ directory.
- Then, a Glue Crawler and Glue Job run daily to clean the data, convert it to Parquet format, and add some derived fields, saving the results into another location under /processed/.

However, I'm running into a few issues:

1. Data Duplication: The Glue job picks up all files from the /raw/ folder each day, causing old data to be processed along with new data, leading to duplicates. I'm thinking of organizing raw data by date (like /raw/{date}/data.json) to avoid this—is that a good idea? But if I run the Glue job manually for the same date, won't I still face duplication in the /processed/ folder?

2. Keeping Athena Updated: How can I make sure Athena is always aware of the latest data?

3. Step-by-Step Guidance: Since I'm learning, if anyone has a detailed walkthrough or example for this type of setup (from batch ingestion to transformation to reporting), that would be super helpful!

Thanks so much in advance for your insights and resources!

1 Answer

Answered By InsightfulAnalyst88 On

If you're facing issues with QuickSight, I feel you. We had a similar experience and switched to using Metabase, and it made a huge difference in how we visualize data, especially for Redshift. It's definitely worth checking out if you're looking for better analytics tools.

CuriousCoder73 -

Oh, thanks! I'll definitely consider it.

MetabaseFan21 -

Hey, sorry to hear about your frustrations with QuickSight. If you want to share your issues, I suggest sending feedback directly to their service teams it might help them improve! They’re open to hearing from users.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.