I have a Docker container that's part of a Kubernetes cluster running a Java app for video processing using ffmpeg and ffprobe. It was working perfectly until last week, but after a recent code push from my dev team, it stopped functioning correctly at the ffprobe command. I tried doing a hard reset to our previous commit and rebuilt the image, yet it still doesn't work. Interestingly, the old image runs fine, and even the same Docker image works in one cluster but not in another. I'm really stumped on what to check next!
3 Answers
Using an image built from the same commit isn’t the same as deploying the old image. If the old one works fine but the new one fails, it suggests there’s something different going on. Look closely at your Dockerfile—are you pulling different package versions or using a 'latest' tag somewhere?
You might want to check if there's a service mesh or firewall that's blocking outbound connections. Also, consider whether the node might be caching the image, especially if tags can change in your image registry. Don't forget to review any Kyverno or OPA Gatekeeper policies that could be limiting capabilities, and double-check your pod's security context too. Running 'docker inspect' or 'docker history' on the involved images could also show some useful differences. Lastly, ensure that there aren't configuration discrepancies between the two clusters that could be relevant.
If the old image works while the same code didn’t produce a functioning new image, check if the dependencies are version pinned correctly. Sometimes, the CI/CD pipeline pulls the latest versions instead of matching what you might have locally. Make sure your Dockerfile reflects the correct base image and that your package files are locked correctly.
For sure! Comparing the diffs between the old working image and the rebuilt one should give you a clearer picture of what’s wrong.