I'm noticing a significant increase in my AWS bills lately, and I need some advice on how to manage our storage costs. We have a hybrid setup where most of our raw data is generated on-premises, and currently, we're pushing everything into an S3 bucket for processing and long-term storage. However, about 95% of this data becomes cold almost immediately, yet we have to keep it for compliance purposes, sometimes up to 10 years.
S3 Standard storage is getting expensive, and even S3 Infrequent Access (IA) doesn't cut it. I've considered S3 Glacier Deep Archive for its low cost, but the retrieval times are slow and it doesn't feel very transparent to our apps.
Right now, we have a tape library on-prem that costs us effectively nothing in OpEx. I'm looking for a solution that allows us to use S3 for a hot/warm tier and then move older data to our on-prem tape archive without having to manually relocate every single file. Are there any hybrid users out there who have implemented a successful workflow for this?
1 Answer
Have you thought about using S3 Glacier for long-term storage? It’s quite cheap, but I get your concern about retrieval times. I wonder if splitting warm and cold storage between on-prem tape and cloud might actually end up being faster than trying to retrieve from Glacier. Just a thought, though!
I get that sticking to the cloud is simpler, but Glacier's egress fees really stack up. What if you processed data in the cloud and then ended up deleting the originals from there while retaining them locally? Could save some costs!