RDBMS, Synapse, or Databricks: What’s the Best Choice for Our Data Architecture?

0
10
Asked By CuriousExplorer92 On

Hey everyone! We're currently weighing our options between sticking with our EDBPS (PostgreSQL) setup or moving to a lakehouse architecture. Our goal is to calculate stock replenishment based on future demand using daily stock movements combined with historical sales data, shipment costs, and taxes.

Here's some context:
- **Data Infrastructure**: We have high-frequency stock movement and sales pipelines that update every 5 minutes, while shipment and tax data refresh monthly or upon request. Our stock and sales tables have about 2 million records each, while shipment and tax tables are much smaller.
- **Performance History**: We recently noticed significant slowdowns with PostgreSQL. Running on 4 CPUs took 3 hours, and even on 8 CPUs, we still range from 2 to 3 hours for execution. We tried Databricks with an on-demand cluster but ultimately decided on Synapse Serverless SQL Pool for its cost-effectiveness and performance.

I'm looking for feedback on our choice. Do you think we missed any architectural issues? Is Synapse the best fit for our needs with this volume and SLA? And how would you handle 2 million record joins more efficiently?

3 Answers

Answered By DataWizard44 On

Honestly, 2 million rows isn't that large. If PostgreSQL was taking hours, it likely means you had a missing index or a poorly optimized query. A well-configured SQL instance should handle that join pretty quickly.

Between your new options, I think Synapse Serverless is the best choice. Databricks might be excessive for what you need, both in capabilities and costs. Just watch out for the little files in ADLS since they can slow down Synapse’s performance.

Answered By AnalyticsNinja88 On

Synapse Serverless seems like a great fit based on your requirements. Just make sure you partition your parquet files properly in ADLS to minimize data scanning. For the joins, think about pre-aggregating data into daily rollups instead of needing those 5-minute updates for your monthly reports.

Also, keep an eye on scalability; 2 million rows are manageable now, but you might need to rethink things as your data grows.

If possible, look into further tuning after you’ve settled into your current setup.

Answered By TechLover21 On

Totally steer clear of Databricks! I had a rough time spending a ton there and found it didn't live up to expectations. Like, why pay that much when Synapse gives you what you need for way less?

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.