I manage a set of workers that pull data from a queue and log it to CloudWatch, especially during busy periods related to sporting events. However, with all events crammed into a single log group, it's a nightmare to audit specific event logs. When I try to find information about a particular event, the process is painfully slow, and downloading large data sets (around 15-20 GB per hour for a three-hour event) takes forever. I'd like to know if there's a way to create separate log groups for each sporting event, so I could avoid unrelated log lines and download only the necessary ones for my audit. Am I missing a critical setting, or is there a better approach?
5 Answers
You might want to check out CloudWatch Logs Insights! It lets you filter log groups by specific event identifiers, which can speed things up significantly. Another alternative is using Athena for querying, though that might get pricey and could be more than you need.
Athena is good, but just a heads up, it can timeout if the queries are too complex. Finding a solid partition strategy is crucial!
Honestly, it sounds like you're doing the right things, but CloudWatch is not very efficient for large-scale audits. Ensure your workers log in structured JSON and include the sporting event ID on each line. This will make CloudWatch Logs Insights much more effective for filtering. Long-term, consider shipping your logs to S3 and querying with Athena. It'll help you focus on just the data you need without sifting through tons of logs.
Are you using structured logs? If your logs are in JSON format, it can simplify querying. CloudWatch Logs Insights is helpful, but remember that the syntax can be a bit tricky to remember sometimes.
Yeah, my logs are JSON. The queries work but they're slow and return way too much data. I might just end up writing a custom script to handle anomalies instead of browsing through all that info.
CloudWatch Insights can help if your log messages include event identifiers. This way, you can query using those and the timeframe to pinpoint exactly what you need.
We're using Insights for general reviews, which is helpful, but when it comes to detailed audits, it often feels like a lost cause.
We handle terabytes of logs daily with Insights. It's all about structuring your data and figuring out how to query effectively to get the needed results.

I'll look into Athena. It sounds like we might be over-relying on CloudWatch for something it wasn't really designed for.