Hey everyone! We're currently weighing our options between sticking with our EDBPS (PostgreSQL) setup or moving to a lakehouse architecture. Our goal is to calculate stock replenishment based on future demand using daily stock movements combined with historical sales data, shipment costs, and taxes.
Here's some context:
- **Data Infrastructure**: We have high-frequency stock movement and sales pipelines that update every 5 minutes, while shipment and tax data refresh monthly or upon request. Our stock and sales tables have about 2 million records each, while shipment and tax tables are much smaller.
- **Performance History**: We recently noticed significant slowdowns with PostgreSQL. Running on 4 CPUs took 3 hours, and even on 8 CPUs, we still range from 2 to 3 hours for execution. We tried Databricks with an on-demand cluster but ultimately decided on Synapse Serverless SQL Pool for its cost-effectiveness and performance.
I'm looking for feedback on our choice. Do you think we missed any architectural issues? Is Synapse the best fit for our needs with this volume and SLA? And how would you handle 2 million record joins more efficiently?
3 Answers
Honestly, 2 million rows isn't that large. If PostgreSQL was taking hours, it likely means you had a missing index or a poorly optimized query. A well-configured SQL instance should handle that join pretty quickly.
Between your new options, I think Synapse Serverless is the best choice. Databricks might be excessive for what you need, both in capabilities and costs. Just watch out for the little files in ADLS since they can slow down Synapse’s performance.
Synapse Serverless seems like a great fit based on your requirements. Just make sure you partition your parquet files properly in ADLS to minimize data scanning. For the joins, think about pre-aggregating data into daily rollups instead of needing those 5-minute updates for your monthly reports.
Also, keep an eye on scalability; 2 million rows are manageable now, but you might need to rethink things as your data grows.
If possible, look into further tuning after you’ve settled into your current setup.
Totally steer clear of Databricks! I had a rough time spending a ton there and found it didn't live up to expectations. Like, why pay that much when Synapse gives you what you need for way less?

Related Questions
Biggest Problem With Suno AI Audio
Ethernet Signal Loss Calculator
Sports Team Randomizer
10 Uses For An Old Smartphone
Midjourney Launches An Exciting New Feature for Their Image AI
ShortlyAI Review