How to Transition from Datadog to Grafana on AWS?

0
15
Asked By CuriousCoder42 On

I've been tasked with creating a proof of concept to replace Datadog with the Grafana/Prometheus/Loki/Alloy stack, possibly including more from the Grafana suite like Tempo in the future. The setup will be on AWS using EKS, and I need to monitor over 30 accounts with a focus on serverless services. While AWS has made it easier to share cross-account logs and metrics, I'm finding that the open-source options (like Otel collectors) still don't fully support this, even after more than a year since their release. Despite some PRs for merging functionality, none have been accepted. Right now, I'm scrapping logs by establishing IAM roles for each account and configuring Otel Collector on a per-account basis. However, the Otel Collectors can't automatically discover shared cross-account metrics/logs, so I'm using Kinesis streams to transfer logs to Firehose Receiver. I'm struggling to tag logs properly and set up metric namespaces manually for each account, which feels overwhelming. Has anyone successfully transitioned from Datadog using this stack? I couldn't find any discussions online about this process and it seems daunting to achieve even the basics. Is it just too complex, or do companies shy away from it for a reason?

4 Answers

Answered By GrafanaNinja On

Check out Grafana LGTM (Loki, Grafana, Tempo, Mimir) as a solid alternative. Use the helm chart provided here: https://github.com/grafana/helm-charts/tree/main/charts/lgtm-distributed. It uses Prometheus for scraping and supports OpenTelemetry. You can implement a daemon set like Vector or Fluent-bit to route all container logs to Loki. Just keep in mind that Otel is more focused on metrics and session tracing.

Answered By CloudGuru91 On

Transitioning away from Datadog isn't just a simple task, especially with so many AWS accounts to manage. The Grafana stack is certainly capable, but it's not a plug-and-play solution. You'll have to manually configure each account since cross-account support with Otel isn't fully available yet. Plus, tagging and enriching logs can be quite tedious, which adds to the workload.

Answered By TechExplorer On

You might find this article helpful: https://digitalis.io/post/beyond-datadog-how-to-create-a-scalable-cost-effective-monitoring-solution. It provides insights on building a more scalable and cost-effective monitoring solution that could help with your transition.

Answered By TerraformTinkerer On

We recently tackled the same issue by setting up Terraform to enable CloudWatch Logs ingestion into Alloy through a Firehose, utilizing a Lambda to link it to the internal ALB for Loki. It’s been pretty challenging for us since we have over 100 AWS accounts and around 60 Kubernetes clusters! Just a heads up, the AWS network costs and ALB charges really add up due to metrics cardinality and log noise. But, overall, it's still more cost-effective than the previous per-cluster solution we used.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.