I'm on the hunt for some advice on how to set up an in-house solution for monitoring network latency and infrastructure health across multiple AWS accounts and regions. I'm specifically looking to avoid AWS-native tools like CloudWatch, Managed Prometheus, or X-Ray because of cost and flexibility concerns. I want to use Lambda as my go-to automation tool for running periodic tests. My goal is to scale this monitoring solution across a large multi-account and multi-region AWS deployment, covering use cases such as monitoring VPN latency, Transit Gateway attachments, VPC connectivity, and so on. Has anyone developed or encountered a pattern for observability across regions and accounts that doesn't rely on AWS-native tools or dashboards?
1 Answer
You can start by only instrumenting the hosts you control and focus on service-level measurements instead of getting bogged down by network-level metrics for hosts you can't manipulate. Monitoring quotas and using AWS health APIs should be your priority. Also, ensure your IaC has policies to prevent mistakes like pushing bad routes. I only use EC2 for our needs and we handle deployment with tools like Packer and Cloud Custodian for scheduling. We collect metrics using Prometheus.
What do you think of using Lambda for this?