Should I Use Sidecars or a Central Receiver for Thanos with Kubernetes?

0
13
Asked By CuriousCoder88 On

Hey everyone! I'm considering upgrading my Prometheus setup for long-term metric storage and I'm leaning towards Thanos. I'll have a few Kubernetes clusters, each running their own Prometheus for collecting metrics. From what I gather, using sidecar containers seems to be the recommended approach. However, I'm not too thrilled about the idea of constantly having to update the central Thanos with new targets for remote sidecars, even though my scale is small.

Here are the two options I'm thinking about:

**Option 1:** Each Kubernetes cluster has a sidecar that needs to:
* Export metrics to S3
* Expose a gRPC port
* Allow Thanos to fetch the last two hours of metrics from each sidecar
* Require me to update the Thanos config with new Kubernetes clusters
* Have S3 credentials configured on each sidecar

**Option 2:** Each Prometheus can remote_write to a central Thanos without needing to modify the Thanos config for new clusters. This means all metrics will be local, leading to simpler configuration overall.

I'm leaning towards Option 2, but I'd love to hear your thoughts! Thanks!

4 Answers

Answered By StatsWhiz On

I’m in a similar situation. Remote write to central Thanos is easier to manage and helps reduce cloud resource usage, but it could lead to long-term costs and fragility, especially with cross-region networking issues. I'm planning to run Prometheus with sidecars on my EKS clusters in different regions, uploading data blocks to an S3 bucket in the same region. This way, my centralized Thanos can query each bucket without incurring too much cost.

CloudCrafted -

That sounds like a solid plan! My infrastructure is distributed across multiple data centers too, so I see your point about transfer costs not being a concern there. Still, network issues can happen, which makes me think sidecars could be the better choice.

Answered By SysAdminFan77 On

Definitely go with the sidecar! It’s designed for this kind of setup and helps reduce network traffic and CPU usage. Plus, keeping metrics local means you have better control and visibility. Remote write might seem easier, but it can introduce delays and confusion if things go wrong, especially with alerting and partial data scenarios.

K8sNinja -

What if the receiver is near the storage or querier? The only risk would be losing metrics if there's a network issue between the receiver and remote Prometheus. I'm just trying to understand all the possibilities.

Answered By TheCloudExpert On

Using remote write can rack up network costs if you’re on public cloud and your clusters are in separate regions.

DataDrivenDev -

I'm self-hosted and used to remote write, but what you said definitely makes sense. Thanks for pointing that out!

Answered By CloudGuru97 On

I chose Option 2 but went with Mimir instead since it has a Helm chart available.

TechieTom -

I looked into Mimir as well, but something about it feels off for me. I'm just more comfortable with Prometheus. Appreciate your insight!

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.