I'm working on centralizing logs from our Application Load Balancer (ALB) and CloudFront into S3 buckets for our SIEM system to access. I'm assuming we should have a main bucket with a structured folder layout. Should each folder contain logs for individual load balancers, or is there a better way to organize it? It's important that we can also run Athena queries efficiently since our developers need log access, but they can't go through the SIEM for security reasons.
4 Answers
From my experience, adding a 'top level' prefix for each log format helps a lot. You can group similar logs under one prefix, making it way easier to parse them in your SIEM system.
Most AWS services that deliver logs to S3 format them with date/hour based partitions, which can be really useful for organizing your data quickly in Athena. Keep that in mind when planning your structure!
According to the docs, if you don't change the prefix settings, your logs will be structured like this: `bucket/[prefix]/AWSLogs/aws-account-id/elasticloadbalancing/region/yyyy/mm/dd/aws-account-id_elasticloadbalancing_region_app.load-balancer-id_end-time_ip-address_random-string.log.gz`. For CloudFront, they'll be stored similarly. It really helps to stick to the defaults to avoid extra work. Check out the AWS documentation for more specifics!
Definitely saves time worrying about the folder structure!
ALB logs will go to S3 with minimal setup. Just stick with the defaults! When creating your Athena table for querying, consider using Partition projections to manage costs better. You might want to use a tool like ChatGPT to help with setting up the Athena table.
Great stuff, thanks! Helps to not have to do any of this structuring ourselves.