Hey everyone! I've been trying to implement ZGC in our production environment for a while, but honestly, it's been quite a mess. We run on Kubernetes and use Spring Boot 3.3 with Java 21 and 24. Despite ZGC supposedly only needing the heap size to function properly, we keep hitting issues. For starters, the memory reported to Kubernetes is doubling based on the maxRamPercentage we set. The working set memory is close to our limit, but actual heap usage is about 50% less. We've even had to use SoftMaxHeapSize just to manage our settings and enforce more aggressive garbage collection. We're struggling to really identify the root of these problems, especially since the documentation suggests we shouldn't have to tweak so much. Has anyone else faced similar challenges? How did you tackle them? I'm open to switching back to G1 if that's what it takes. Thanks! Also, just recently, we tried using generational ZGC in our testing.
6 Answers
Why did you decide to adopt ZGC in the first place? Just curious about what led you to explore it.
ZGC can be really aggressive in RAM usage. Are your pause times under 10 ms? As long as you stay within your heap limits, does it really matter if it allocates more compared to G1?
Yeah, pause times are fine, but the memory metrics are all over the place.
With your setup (Java 24, ZGC, K8s, SoftMaxHeapSize), we had similar experiences with maxRamPercentage not being respected. Switching to Xmx and SoftMaxHeapSize worked wonders for us! I’d recommend monitoring the RSS alongside CGroup for better insights as well.
It sounds like ZGC can be a bit tricky. We've dealt with similar issues where the memory reported felt way off, sometimes up to three times higher than the actual usage. I’d recommend checking out the article on ZGC and heap multi-mapping for some insights.
I heard that only the non-generation mode used to cause those issues, but it seems like the generational mode has improved with JDK 21.
We went through the same struggles with ZGC initially, thinking it was going to be the future of garbage collection. We found that the memory accounting was just really inaccurate which didn’t match our actual usage. So, we went back to optimizing G1, and that seemed to fix a lot of our headaches!
Right? No blog ever mentions that you might just have to find these things out on your own.
Honestly, I think a well-tuned G1GC outperformed ZGC for us across multiple microservices. You may also want to try Shenandoah with adaptive heuristics; we found it handled our loads better but you won’t hear much about it from Oracle.
We've been experimenting with new Java features and wanted to stay updated, especially since big companies are pushing these changes.