I have a Kubernetes pod that's running some AI inference models, but I'm noticing that it's not utilizing the CPU resources I'm allocating effectively. I set the pod to allow a maximum of 10 CPUs, but it's only hitting around 8. I even tried reducing the max to 8 CPUs, and now it seems to peak around 6. I'm not sure why it's not using all the resources available to it, and I'm looking for some insights. Is there something I'm missing that could help maximize resource usage?
3 Answers
To fine-tune your settings, consider running a tool like https://github.com/robusta-dev/krr on your cluster. It can help you find the right CPU and memory settings for your application. A good strategy is to set requests equal to what you expect your limits to be, and then remove the CPU limits altogether for better performance.
Good news! There's nothing wrong with your Kubernetes setup. The behavior you're seeing might be due to how the application is coded. Some apps are designed to use only a percentage of available resources, like Java's JVM, which might not be pushing for all the CPUs you allocated. It’s worth looking into how your specific application handles CPU allocation.
You might want to check the "cpu throttle" metrics if you're already collecting them. If your process tries to exceed the CPU limit, it could get throttled by the kernel, which explains the lower usage. Generally, I'd recommend avoiding CPU limits overall; the kernel typically manages resource allocation better than we can, though you should still set memory limits.

Got it, thanks! Do you have any recommendations on what I should read or look into to understand this better? I've been thrown into this Kubernetes role with no prior experience.