I've been running several small ECS tasks and I'm facing challenges with monitoring. While I can easily track errors with CloudWatch alarms, I'm struggling to monitor when services fail to start. I've configured container insights to check the RunningTaskCount, but the cost of monitoring is as high as my CPU usage due to the number of small Fargate instances. I want to be notified if my tasks aren't running correctly without incurring high costs. Is there a better solution for setting up alarms when ECS tasks aren't starting?
5 Answers
You can also rely on the sample count from service-level metrics like CPUUtilization. Each task sends a datapoint which can serve as an easy proxy for Tracking Running Tasks count without needing deep insights.
For our setup, we created an additional cluster and service with just one task. We deploy changes there first, which helps us catch startup issues before affecting our main production cluster.
I set up an EventBridge that listens for changes in ECS and sends those events to a Lambda function. The Lambda checks the exit codes and if they're not clean, I get an email notification. It's been a reliable way to keep tabs on task statuses without too much hassle.
It's important to ensure your health checks are functioning well. If autoscaling is properly set, it should handle task replacements. Just make sure to monitor how you want to respond when an alarm triggers.
If your tasks are assigned to a target group, you might want to check the healthy host count. It'll give you a good indication of how many of your tasks are actually running and healthy.

Related Questions
Can't Load PhpMyadmin On After Server Update
Redirect www to non-www in Apache Conf
How To Check If Your SSL Cert Is SHA 1
Windows TrackPad Gestures