Best AWS Services for Implementing a Data Lakehouse

0
6
Asked By DataWiz42 On

I'm currently leading a pilot project to set up an enterprise Data Lakehouse on AWS for a university. I chose the Medallion architecture, which includes Bronze for raw data, Silver for clean validated data, and Gold for modeled BI data, to ensure data quality and scalability. I'm looking for recommendations on AWS services to support this flow. I'm considering the AWS Glue Data Catalog for central indexing on S3, using Amazon Athena for SQL queries on the Gold data, and Amazon QuickSight for visualization. However, I'm facing challenges with ingestion, storage, and transformation. My current database is on RDS, but I'm wondering what the best options are. I'm also interested in any courses or tutorials that could help me out. Thanks!

2 Answers

Answered By TechieTom On

If you're managing a lot of users without AWS console access, Sagemaker Unified Studio can simplify collaboration. It offers a more user-friendly interface for those people, which could streamline some of your processes.

Answered By CloudGuru77 On

Choosing the right services really hinges on your data type, refresh rates, and overall volume. For ETL, I highly recommend using AWS Glue. For data warehousing, you can't go wrong with Redshift for your Gold layer. For your Bronze and Silver layers, consider Iceberg Tables on S3 for efficient data lake storage. For cataloging, look into DataZone or Sagemaker Catalog, and for BI, QuickSight is a solid choice. If your dataset is relatively small or well-partitioned, Athena can be a great option for querying as well.

DataWiz42 -

Thanks for your insights! The data I'm dealing with is extensive, covering everything from student records to classroom info, so I'm considering batch updates in the early mornings. Cost optimization is a priority, hence keeping the original format in S3 for the Bronze layer.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.