I'm looking to share some insights on managing large volumes of AWS logs, particularly from CloudTrail, VPC Flow, and WAF. We have multiple AWS WAFs generating a massive amount of logs that quickly exceed our SIEM ingestion limits – sometimes within just an hour. I'm considering a few strategies: using Athena to query logs stored in S3 (but may need to run a Glue ETL job to convert to Parquet and ensure cost-effectiveness), utilizing SecurityLake for AWS logs (though it doesn't seem to work well with non-AWS sources), or employing a tool like CRIBL to reduce log size before sending them to the SIEM. Being a non-profit, our budget doesn't allow for simply increasing the SIEM limits, so I'm eager to learn how others are tackling these challenges!
4 Answers
Have you looked into using Firehose to aggregate and reduce log sizes? It could simplify your workload significantly. Maybe include a feature flag to toggle aggregation on and off when necessary, as not all logs may be essential outside of specific incidents.
That $5 per TB query cost on Athena isn't too bad if you're leveraging proper partitioning and date predicates. CloudTrail and VPC Flow logs are already somewhat structured, which can help reduce costs. If your querying is occasional, Athena should work well for you. For more consistent analysis, consider Redshift Spectrum, but weigh the costs before deciding.
I recently heard about a new consolidated logging feature that could fit your needs perfectly. It's designed for high-volume logs and might help streamline your process. Check out the details on AWS's website; it was just announced a couple of days ago!
What I do is dump all logs into S3 after a few weeks of ingestion. Instead of sending everything to the SIEM, I focus on local analysis and pull relevant findings (like those from GuardDuty) to the SIEM with additional context when necessary. Going back in time is easy with S3 archives, especially if you store logs in Parquet format. Remember that traditional SIEMs may not be best suited for modern cloud applications that generate tons of data, so reconsider your goals.

Related Questions
Can't Load PhpMyadmin On After Server Update
Redirect www to non-www in Apache Conf
How To Check If Your SSL Cert Is SHA 1
Windows TrackPad Gestures