I'm trying to optimize a system that works with metric data. Currently, I make 50 API calls per minute using AWS Lambda, which are triggered by an EventBridge scheduler. This setup sends data to CloudWatch, and then the open-source YACE (Yet Another Cloudwatch Extractor) scrapes that data every 5 minutes to push it to Prometheus for Grafana dashboards. The issue is that this introduces a 5-minute delay in the data displayed on Grafana. I'm looking for insights on how to remove Lambda, CloudWatch, and YACE from this flow to make it more efficient and streamlined while ensuring I meet my data needs. Any suggestions?
3 Answers
Consider using AWS Step Functions for orchestrating your data flow. It can replace some of the complexity with Lambda and Lambda triggers while still letting you manage routes for your metric data effectively.
It sounds like your setup has become overly complicated. Instead of going through CloudWatch and YACE, consider a more direct approach. Focus on how your final dashboard looks and what data you need for that. If you're using Lambda just to fetch data for CloudWatch, you might be able to skip it altogether and directly push your metric data to Prometheus. Might be worth taking a step back to reevaluate your needs and simplify the flow.
Given the data aggregation you're dealing with, have you looked into using Kinesis Firehose? It could allow you to ingest your metric data and send it to multiple endpoints directly without needing Lambda to handle the heavy lifting. You might find it simplifies your workload, though you should consider how much data you actually need to query at once.
I haven't fully explored Kinesis Firehose yet. It sounds promising, but I'm worried about the constraints since I'm pulling data from an SDK that limits my calls. I'll take a closer look!
You're right! The initial setup was primarily for Grafana. Since I've got the data in Prometheus, keeping it in multiple places just complicates things. I'm keen on removing the redundancy.