I'm trying to figure out a way to delete Amazon S3 objects based on their last accessed date. I know that Intelligent-Tiering moves objects based on access but doesn't delete them that way. Also, the standard lifecycle rules don't cover deletion based on last access either. What methods have you folks found effective for this? I've heard using access logs and Athena can get pretty pricey. Is there a workaround for S3's intelligent tiering system?
5 Answers
It's tricky without knowing the exact function S3 serves for you. AWS often recommends using "S3 Access logs + Athena + Lambda" for these situations. But you'd need a last accessed timestamp for the objects, a mechanism to track that, and a script that deletes items based on expiration. If you have a backend managing access, you could update timestamps on every GetObject call and automate later deletions based on that.
Yeah, access logs can be a bit pricey. I can't think of a solid alternative. An inventory report lacks the last accessed info, and there's no event for it either. Using access logs seems necessary, though. I’d weigh your options: maybe go for Athena queries or create a process that updates a DynamoDB table with last access info without keeping the logs.
A solid first step is enabling Amazon S3 server access logging. It might help track access patterns. Then again, maybe there are better options, right?
It really depends on your needs. Do you prioritize accuracy, or is keeping costs low your main focus? You also have to consider how your S3 bucket is accessed—directly or through an app. If accuracy is key and you want to avoid constant operational costs, you might want to explore using AWS tools strategically.
Did you know that S3 Intelligent-Tiering triggers an event whenever an object moves between tiers? You could set up a Lambda function to delete objects when they switch from frequent to infrequent access after being inactive for a period, like 30 days.

Yeah, I do need it to be super accurate because the data on S3 is quite important. I’d prefer to keep costs low, though, since retaining data for the sake of deletion can add up. I initially thought the intelligent tiering's internal mechanisms would handle expiration based on access, but that doesn’t seem to be the case.