I'm currently working with SQS in production and honestly, the dead letter queue (DLQ) management is a total mess. I've got a CloudWatch alarm set up, but a lot of my team doesn't seem to trust it, and we've faced issues with messages stacking up unnoticed. I've talked to a few people recently, and it seems like no two teams handle this the same way. Some are using Lambda functions to monitor and send alerts, while others just check them manually (definitely not ideal). A few have integrated it with Datadog but then complain about the expenses. I'm just wondering, what solutions are you using? Is there a practical approach I'm missing, or is everyone just dealing with their own makeshift fixes?
4 Answers
Have you thought about setting a proper message expiration time? That way, your DLQ can self-manage to some extent, which might help avoid the buildup of unprocessed messages.
The distinction between ApproximateNumber and NumberOfMessagesSent is crucial! We messed that up as well. I hadn't considered the retention period mismatch either. I really wish these things were pre-configured out of the box!
Definitely feels like there should be a better solution than just shelling out a ton of cash for Datadog, especially for smaller teams like ours.
We use Datadog too, but since we also need it for security information and event management (SIEM), we're only collecting logs once and splitting the cost with our security team. Datadog can be pricey, but you can manage the costs. Here are a few tips: 1) Only send what you really need to minimize incoming data, and drop unnecessary stuff at the index level to save on indexing costs. 2) Keep logs for a shorter duration if they’re just for alerts—keeping logs only for 3 days can save money. 3) Consider a one-year contract for a lower rate if your usage is steady. The default retention is 30 days, which can get expensive, so set a new, shorter default index for alerts.

Related Questions
Can't Load PhpMyadmin On After Server Update
Redirect www to non-www in Apache Conf
How To Check If Your SSL Cert Is SHA 1
Windows TrackPad Gestures