We're currently running OpenShift with an etcd database that, after compaction and defragmentation, is sitting at 5 GiB. However, we've noticed that within just 24 hours, this size balloons to 8 GiB, indicating we have about 3 GiB of old keys. We're trying to pinpoint which API object is creating this churn so we can address the issue effectively. Any advice on how to track this down?
3 Answers
Actually, just looking at counts might not help if you're modifying a small number of large resources, which could lead to lots of old revisions. You need to dig deeper than resource counts to get valuable insights.
You might want to consider writing a simple Go program or even a shell script to examine etcd. It can help you list the newest keys and their sizes. Just a guess, but it could be events causing the churn!
Check out your apiserver metrics to see how many UPDATE requests are being processed for different objects. This should give you insight into what's contributing to the growth.

Thanks for pointing that out! It makes sense to focus on resource modifications rather than just counts.