I have a home Kubernetes cluster running on Talos Linux, where I'm using iSCSI targets from my TrueNAS server to serve volumes for applications that rely on SQLite databases. Currently, I'm not using CSI drivers and have manually defined Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) for my workloads.
Sometimes I need to restart my TrueNAS server for maintenance, which causes the iSCSI target to be temporarily unavailable for about 5 to 30 minutes. I have both liveness and readiness probes in place, so the pods fail and Kubernetes attempts to restart them. However, when the iSCSI server comes back online, the pods restart but still produce I/O errors, stating they can't write to the config folder mounted from the iSCSI target. Only when I manually delete the pod and let Kubernetes create a new one does everything work normally again.
It seems like the old iSCSI connection is being reused, causing these issues even after the target is back up. How are others managing iSCSI target disconnects for extended periods?
3 Answers
Unfortunately, after a certain period of downtime for the underlying infrastructure, the volume mounts can become stale and irrecoverable. I recommend improving your maintenance strategy to ensure you scale down before taking the storage offline or set your pods to restart automatically when issues occur.
One way to handle this is by scaling down your workloads in Kubernetes before doing any maintenance on the TrueNAS server. If it's planned downtime, it shouldn't be a problem to do that beforehand.
I've implemented a liveness script that checks if the mounted volumes are stale. If they are, the pod terminates and tries to re-establish the connection, which usually works when the external storage is back online.

Related Questions
Can't Load PhpMyadmin On After Server Update
Redirect www to non-www in Apache Conf
How To Check If Your SSL Cert Is SHA 1
Windows TrackPad Gestures