I'm curious about the best practices for deleting data stored on Kafka brokers. If we are saving all records to a data lake like Snowflake, can we automate the deletion of data and send out notifications? How can Kafka help with this? I'm not too experienced with Kafka yet. What strategies do you all use in production to keep costs down?
4 Answers
You can set up a cleanup policy in Kafka to automatically delete old records using the retention.ms property, which defines how long messages are kept. In our setup, we typically set this to two weeks.
We generally don’t delete data frequently. Each team manages their own cleanup policies based on their needs. If developers require more storage, they have to request additional budget. Since we’re on-premises, it’s not super costly.
It really depends on your organization’s policies regarding data retention.
Retention settings can be adjusted based on either time or the total size of data. For example, some of our topics have data that is kept for just 15 minutes, while others retain for a couple of weeks. Nothing lasts forever here. I wish we had similar policies for our S3 storage too.
What types of topics do you keep for just 15 minutes?