System Operations

Best Strategies for Observability with Metrics and Distributed Tracing

September 16, 2025

Asked By TechWhiz42 On September 16, 2025

I'm on the lookout for a solid observability solution that can handle metrics and distributed tracing. Currently, we're shipping logs using Grafana Agent in our cluster, but I need a way to achieve full end-to-end tracing from service to service without making any code changes to our existing applications. I've come across Odigos, but I'm interested in exploring other options as well.

Here are my main concerns:
1. Can I really get reliable tracing between services in a production environment without altering the application code?
2. What tools or tech stacks have you seen companies using effectively for this?
3. How do larger organizations typically manage observability in these scenarios?

I'd appreciate any tool recommendations or real-life examples of how others have tackled this problem!

5 Answers

Answered By CodeNinja23 On September 19, 2025

You might want to check out OpenTelemetry along with Grafana Tempo. If your developers can add some OpenTelemetry tracing, that would really enhance it, but even without that, it's a solid combo.

Answered By MetricMaster99 On September 18, 2025

OpenTelemetry paired with Datadog or Grafana is also fairly common among big companies.

Answered By NatashaNetOps On September 18, 2025

For larger setups, I've seen many turn to Istio or Linkerd for service mesh—those offer great tracing and metrics without needing to modify application code. Typically, Prometheus is the go-to for metrics, while Grafana is used for creating dashboards. Some prefer managed solutions like Datadog or New Relic to avoid overhead, but keep in mind, their pricing can be a bit unpredictable. Other more budget-friendly APM tools I've worked with include CubeAPM, Coralogix, and Signoz. One effective stack I set up was with OpenTelemetry Operator, Tempo, and Loki in Grafana Cloud, giving us traces, logs, and metrics all integrated. It requires minimal changes from devs if you want extra detail.

Answered By DataGuru85 On September 17, 2025

Look into the OpenTelemetry Operator; it helps with auto-instrumentation by injecting libraries. I’ve used it with Grafana Cloud and Grafana Alloy, and it provided a lot of insights without putting heavy demands on the devs. They’ve started to see its value and are working on filling in any gaps now.

LazyCoder88 - September 19, 2025

Fingers crossed! I'd love a solution that allows for deployment without having to set up every app individually.

AppDevHero - September 19, 2025

Totally agree! The Application Observability product in Grafana is fantastic once you establish basic instrumentation.

Answered By CloudEvangelist77 On September 16, 2025

In my experience, large organizations often use Dynatrace. It’s pretty effective but can be pricey. For Kubernetes applications, it runs an agent on every node that collects detailed metrics and performs tracing to analyze calls between apps effectively.

Best Strategies for Observability with Metrics and Distributed Tracing

5 Answers

Related Questions

Can't Load PhpMyadmin On After Server Update

Redirect www to non-www in Apache Conf

How To Check If Your SSL Cert Is SHA 1

Windows TrackPad Gestures

LEAVE A REPLY Cancel reply