How can I manage excessive exited containers in my Kubernetes cluster?

0
12
Asked By CloudNinja77 On

I'm running a Kubernetes cluster on OpenShift with Argo Workflows, and the client wants to keep their workflow runs for a while before cleaning up. Unfortunately, this has led to thousands of exited containers piling up on a single node. My coworker noticed that the kubelet was throwing gRPC errors and that the node was in a 'not ready' state before manually cleaning up the exited containers.

One error message we encountered was related to exceeding the message size limit: `rpc error: code = ResourceExhausted desc = grpc: received message larger than max (16788968 vs. 16777216)`. Additionally, the Multus CNI configuration file was missing, which seems odd to me.

During a recent test, we ran a cron job that spawned 10 containers over the weekend without cleaning them up, which caused the node to go into a 'not ready' state, leading to issues like being unable to SSH into it. The OpenStack logs were flooded with out-of-memory errors, as many processes, including Fluent Bit, and some .NET applications were killed due to excessive memory usage. What strategies or configurations can we implement to address this issue and manage the exited containers more effectively in our workflow?

3 Answers

Answered By OpsMaster3000 On

It sounds like your resource requests for the containers might not be appropriate. You need to ensure that the resource requests match what your containers actually need to prevent out-of-memory situations.

MemorySaver12 -

Interesting point, but it's worth noting that the containers were working fine on other nodes. The main issue here seems to be the excessive number of exited containers taking down the node.

Answered By TechGuru92 On

Isn't the Kubernetes garbage collector supposed to manage old containers automatically? It feels like it should help keep things tidy without manual intervention.

ContainerExpert88 -

The garbage collector probably only cleans up orphaned containers. Since you can still see these pods with `kubectl get pods`, they aren’t considered orphaned—so the GC won't touch them.

Answered By DevOpsDynamo On

You might be running into a problem where your gRPC message size exceeded the default limit of 16 MB. If you have control over the client, consider adjusting the 'grpc.MaxCallSendMsgSize()' option to allow larger messages. Here's a link to the gRPC documentation on this if you're interested: [gRPC MaxCallSendMsgSize](https://pkg.go.dev/google.golang.org/grpc?utm_source=godoc#MaxCallSendMsgSize)

FixItFelix -

Unfortunately, this limit is hardcoded in the kubelet’s CRI client. There's an issue open on GitHub discussing it, but if you're hitting this error, you’ve likely already hit a serious problem with your setup.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.