Hey folks! I'm facing some issues with tuning CPU and memory configurations for our Spring Boot microservices running on EKS. We have 6 Java 17 microservices (Spring Boot 3.5.4) that are primarily I/O-bound, with a few that are memory-heavy, but none are CPU-bound. We have HPA enabled and multiple nodes in our cluster. Here's a quick breakdown of our setup:
- Deployment YAML resources:
- Requests: CPU: 750m, Memory: 850Mi
- Limits: CPU: 1250m, Memory: 1150Mi
- Using the image: eclipse-temurin:17-jdk-jammy
- JVM Flags: -XX:MaxRAMPercentage=50
- Usage stats show:
- Idle: ~520Mi
- Under traffic: ~750Mi
Currently, HPA is set to target CPU at 80% (we're at around 1% usage) and memory at 80% (currently about 83% usage). We're running 6 pods but they're in a scaling-limited state due to memory usage.
Some of our challenges include:
- High CPU usage during startup, so we increased CPU requests to 1250m to lower cold start latency.
- Post-startup, CPU usage drops to ~1%, but HPA still wants to scale based on memory thresholds, leading to unnecessary CPU over-allocation.
- The class loading on the first request causes significant latency: the first request takes about 500ms while subsequent requests are around 80ms.
I'm looking for tips on:
- Properly tuning requests/limits for Java services in Kubernetes, particularly since CPU is mostly a concern at startup.
- Whether I should decouple HPA from memory and only scale based on CPU or custom metrics.
- Any best practices for JVM flags, such as MaxRAMPercentage or container-aware GC tuning for EKS.
Thanks for any insights or stories you can share!
1 Answer
One approach you might consider is lowering the CPU requests and not setting a CPU limit. This allows your service to utilize what it needs during the startup phase without over-allocating resources after. Remember, HPA scaling is based on requests, not limits; so adjusting your memory requests can also help. Currently, you're at 750Mi under load, which exceeds 80% of your request limit, causing unnecessary scaling. Try increasing the memory request or adjusting your scaling threshold to around 90-95%.

I get what you're saying, but I have some concerns. If you set no limits and a service scales up, it could hog CPU resources on the node. This could hurt the performance of other services, and in a worst-case scenario, lead to those services restarting. How does the boot process of one service impact the overall performance of the other services?