I'm running an uptime monitoring service where we perform checks (HTTP, TCP, UDP, ICMP) every two minutes from multiple regions. Each check generates a row in BigQuery, which is becoming pricey as the table grows. Our users often want to open dashboards that analyze historical data, and on-demand pricing makes it expensive, especially for frequent scans.
I'm writing thousands of rows per second, but my queries—like getting incidents with exact timestamps—need to hit the raw data, which can be costly. I've considered alternatives like ClickHouse and TimescaleDB. ClickHouse is frequently recommended, but I'm unsure about its ingest process and setup for a heavy write workload. TimescaleDB appeals because it's based on Postgres, but I'm worried about reaching its limits as well.
I'm curious if anyone has transitioned from BigQuery to a different solution without escalating costs. Did you face any new challenges, or was it worth it? What about hybrid setups—like using ClickHouse for hot data and moving cold data to a storage solution? I'm open to sharing numbers in the comments if it helps!
4 Answers
Switching to ClickHouse could definitely be beneficial for your situation. When I transitioned my company from BigQuery to ClickHouse, we found it particularly effective for handling percentile queries across time ranges. The aggregation with materialized views is essential for performance. You can keep raw data for detailed drilling down while most dashboard queries hit the materialized views, which is a big win on performance and cost.
I’m interested in your needs. What’s stopping you from running a traditional database setup on a VPS or dedicated server? It could be a feasible option!
Honestly, I'm pretty good at deploying functions but databases aren’t my strongest point, so I tend to lean towards managed solutions. Still, a self-hosted Postgres on my VPS is definitely something I can consider.
You might want to consider implementing a daily rollup to condense the rows from the last day into a single entry per check. This could help with costs. However, keep in mind that certain queries, like those needing exact timestamps or detailed incident logs, will still require you to access raw data. You can't solely rely on rollups for everything, but it could mitigate some cost concerns. Just my two cents!
I actually have a daily rollup already, which does help with some queries. The more complex ones, though, still rely on raw data, especially for troubleshooting.
Have you thought about using a time-series database like InfluxDB or TimescaleDB? They’re tailored for high write loads and could reduce your costs while keeping access speeds up for your visualizations. Plus, you can aggregate older data into BigQuery if necessary.
Yeah, I've actually been looking into TimescaleDB because it's similar to Postgres, which I trust. InfluxDB seems a bit tricky when it comes to joining with other check configurations. Any insights you can share from your experience?

Don't forget to account for potential storage bloat with all those per-check aggregates and the ingestion process. Using Kafka can help manage that, and consider keeping older raw data in Parquet format for efficient storage.