Hey everyone! I'm diving into K3s for my homelab, and I've hit a bit of a snag. Here's what my setup looks like:
I have three nodes: `debian` (the main worker), `docker` (a backup worker), and `hatsune` (control plane). They are all on K3s versions v1.34.4 and v1.34.5.
I started off by deploying `immich-postgres` and waited for all replicas to be online. Then, I deployed Immich, but it can't resolve the address of the postgres cluster (`acid-minimal-cluster`). The deployment has an initContainer that tries to resolve the address, but the immich pod doesn't start because it can't find it.
Here's the weird part: when I delete and restart the CoreDNS pod, everything works fine. However, when I drain the `debian` node and try to run services on the `docker` node, the issue pops up again and I have to restart CoreDNS once more. I checked the CoreDNS configmap, and it has the `cache 30` option, so it should be functioning properly. Any ideas on why this is happening? I've provided enough context, I hope!
1 Answer
It sounds like you're experiencing a DNS race condition during startup. Immich might be trying to start before the postgres service name is fully registered in CoreDNS, hence the resolution fails. Restarting CoreDNS clears that issue temporarily, but you might want to ensure that your initContainer is properly set up to wait for the service to be available. Having it pause until the `nslookup` for postgres succeeds is a smart approach!

Thanks for the feedback! I figured out that I do need that initContainer, but it’s been weird because my immich deployment is over 35 hours old, and postgres is still not resolving. The pods are healthy, and the service is available with an IP, but it’s still not working.