What Tools Are People Using to Compress Logs and Cut Storage Costs?

0
2
Asked By TechWhiz77 On

I've been developing a custom compression tool that's tailored for log files and similar structured data, which is immutable and time-indexed. My initial tests show impressive results, achieving 15-20 times compression while still allowing for query operations. The main reason behind creating this tool is that many cloud vendors charge high fees for data storage, and existing open-source solutions can rack up hardware costs when your log volume exceeds 20-30GB per day. For instance, it's estimated that you'd need to spend about $400 monthly to store just one month's worth of logs from a daily output of 30GB.

In designing this tool, I had a few key ideas:
- It's unnecessary to decompress the data or load it into memory for querying purposes.
- I've decoupled the index and data files to ensure that when stored in S3, only the index file can be downloaded for common queries based on timestamps and facets.
- The goal is to keep storage costs as low as possible (currently below $1/TB) without necessitating extra compute; the data can stay in S3 and be retrieved as needed.

I'm really interested in how others are tackling this issue or if you've discovered better solutions. Here are some specific questions I have:
1. Are data storage costs a concern for you?
2. How are you managing long-term log retention?
3. What compression methods are you using, and are you able to query without decompressing the data?
4. What are the compliance retention periods you need to deal with?
5. Would you be open to specialized tools for this, or do existing methods like gzip suffice?

1 Answer

Answered By ByteBuster On

I generally convert my logs into a binary format, compress them, and stream to a storage solution that manages tiering. I'm a fan of gzip because with the right hardware, decompression is blazing fast. It's all about keeping up with incoming data without significant bottlenecks. If you're on AWS, M7i.metal instances can do the trick too! If you hit the right scale, moving to a database like Prometheus might also be the way to go since they handle log aggregations decently.

DataGenius -

What do you recommend for converting logs to a binary format? Also, how does that improve anything? I hear you on hardware acceleration, but that might not be feasible for everyone.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.