I have a home Kubernetes cluster running on Talos Linux, where some applications rely on SQLite databases stored on an iSCSI target from my TrueNAS server. I'm manually defining Persistent Volumes (PV) and Persistent Volume Claims (PVC) for these workloads, rather than using CSI drivers. Every now and then, I need to restart my TrueNAS server for maintenance or updates, which causes the iSCSI target to go offline for about 5 to 30 minutes.
During this downtime, my liveness and readiness probes trigger pod failures, leading Kubernetes to attempt restarts. However, when the TrueNAS server is back online, the pods still encounter input/output errors, claiming they can't write to the config folder attached to the iSCSI target. I find that manually deleting the problematic pod allows Kubernetes to create a new one that works fine.
I'm curious how others are managing disconnects with iSCSI targets, especially when they stay offline for extended periods. Any tips on auto-restarting pods or preventing these stale connections?
3 Answers
Unfortunately, there's not much that can be done once the underlying infrastructure is lost for an extended period; the volume mount usually becomes stale and irrecoverable. You might need to adjust your maintenance practices to scale down instances before taking storage offline or ensure that your pods can restart automatically after such events.
I use a custom liveness script that checks the status of the mounted volumes. When it detects a stale volume, the pod is terminated, allowing Kubernetes to attempt to re-establish the connection, which usually succeeds when the external storage returns.
The downside with a liveness script is that it typically only restarts the container and not the whole pod. Even if the iSCSI server comes back online, the pod often retains a stale connection, leading to repeated container restarts without real recovery. Do you not experience that issue with your setup?
Have you considered scaling down your workflows in Kubernetes before you restart the TrueNAS server? That way, you can minimize issues during planned maintenance, and it might help avoid some of the complications with stale connections afterward.
I can see how that would help. However, I usually forget to scale them down and back up. It would be great if Kubernetes could automatically detect when iSCSI volumes become unavailable and restart the entire pod instead of just the containers.

I'm particularly interested in how to ensure automatic pod restarts. My understanding is that liveness and startup probes only cause container restarts, not the entire pod. I've also tried having an additional pod that checks the main pods, but even that approach didn't resolve the stale connection issue.