I've been running into a problem with my nightly cron job that scales down pods to zero. Instead of terminating properly, they often end up in an 'Error' state. When I later scale the app back up, the new pods start just fine, but I'm left with these old pods stuck in error that I have to delete manually. This issue only seems to happen with one particular app; the others are functioning normally. Can anyone help me figure out what's going on?
4 Answers
Another suggestion you could try is deleting any finalizers on those pods. Removing the finalizer may let the pods disappear from your list.
If the pods are not automatically cleaning up, you might need to delete them manually or wait for the garbage collector. Just so you know, by default, the terminated-pod-gc-threshold is set to 1250, so it won't kick in until you hit that limit.
When you describe the pod, are you seeing any errors in the state? For example, I noticed that some of my pods showed as 'Terminated' with 'Error' and an exit code of 137, which usually indicates they were OOMKilled (Out Of Memory). You might want to check the memory usage and limits in your `kubectl describe` output.
Have you checked the logs for those pods using `kubectl logs`? That might give you some insight into what's causing the error state.
Related Questions
Can't Load PhpMyadmin On After Server Update
Redirect www to non-www in Apache Conf
How To Check If Your SSL Cert Is SHA 1
Windows TrackPad Gestures