I'm diving into building a custom database to handle web analytics, focusing on metrics like user traffic, funnels, and sessions. I've experimented with OLAP solutions like ClickHouse, but it wasn't efficient with my memory usage given my low web traffic. This isn't just a side project – I'm serious about creating a workable solution for my SaaS applications. I'm seeking guidance on the best programming languages to use (I know Rust, C++, and Go), and whether I should prioritize read performance over write speeds. Any insights or how-to resources would be greatly appreciated!
7 Answers
While it's great to want to build something from scratch for the experience, existing database engines are very mature. They have been tested and refined over the years with ongoing support, so your time might be better spent optimizing what’s already out there instead of reinventing the wheel.
Building your own database can be a fantastic learning experience! But remember, if you just need something for a project, existing solutions might save you a lot of time and headaches. If you want to research further, consider looking at resources like the 'Build Your Own X' GitHub repo for guidance.
Is there really a need to create a new database for web analytics? There are plenty of existing solutions that can handle this quite well. What unique requirements do you have that warrant a new design?
Building a database might be more complex than you realize. Most database systems are designed to utilize memory efficiently, so if there's free memory, they'll use it to optimize performance. Remember, ClickHouse is made specifically for web analytics, so it seems like it should fit your needs well.
Honestly, ClickHouse was built with web analytics in mind, so it might be your best bet. Just explore its capabilities fully before deciding to start fresh with a new project. Plus, you could check out Tinybird's web analytics starter kit if you're worried about memory issues.
Why not consider using PostgreSQL? It's quite powerful, and if you add the Timescale extension, it can effectively manage time-series data, which could be beneficial for your needs.
From my experience, finding a balance between read and write speeds is key for web analytics. If you’re focused on real-time data capture, writing speeds should take precedence. Given your background, using Go might be the best option for a performant backend. Just keep in mind the importance of integrating with proven technologies rather than starting from scratch.

Absolutely! Timescale makes PostgreSQL a solid choice for analytics.