I'm experiencing issues where my Kubernetes pods fail to start because AWS ECR lifecycle policies are expiring images. Even though the upstream public images are still accessible through the Pull Through Cache, my pods are hitting `ImagePullBackOff` errors. Specifically, this has caused disruptions in my Istio service mesh since sidecar containers couldn't start when the `istio/proxyv2` image expired. My current workaround is to manually pull images whenever this happens, but this isn't a scalable or reliable solution for production. I'm considering using `imagePullPolicy: Always`, but I'm concerned this will slow down pod startup times and increase the number of registry calls. What are the best practices in the Kubernetes community for handling this issue?
1 Answer
You might want to change your lifecycle policy to keep a certain number of images instead of a time-based expiry. This way, you'll always have at least one image available to pull, which can help mitigate these failures. It’s a simple adjustment that could save you a lot of headaches.
Does AWS ECR allow for that? I'm using the latest tag right now.