How to Handle iSCSI Target Disconnects in Kubernetes?

0
5
Asked By TechieExplorer99 On

I have a home Kubernetes cluster running on Talos Linux, where some applications rely on SQLite databases stored on an iSCSI target from my TrueNAS server. I'm manually defining Persistent Volumes (PV) and Persistent Volume Claims (PVC) for these workloads, rather than using CSI drivers. Every now and then, I need to restart my TrueNAS server for maintenance or updates, which causes the iSCSI target to go offline for about 5 to 30 minutes.

During this downtime, my liveness and readiness probes trigger pod failures, leading Kubernetes to attempt restarts. However, when the TrueNAS server is back online, the pods still encounter input/output errors, claiming they can't write to the config folder attached to the iSCSI target. I find that manually deleting the problematic pod allows Kubernetes to create a new one that works fine.

I'm curious how others are managing disconnects with iSCSI targets, especially when they stay offline for extended periods. Any tips on auto-restarting pods or preventing these stale connections?

3 Answers

Answered By StorageGuru101 On

Unfortunately, there's not much that can be done once the underlying infrastructure is lost for an extended period; the volume mount usually becomes stale and irrecoverable. You might need to adjust your maintenance practices to scale down instances before taking storage offline or ensure that your pods can restart automatically after such events.

CuriousCoder42 -

I'm particularly interested in how to ensure automatic pod restarts. My understanding is that liveness and startup probes only cause container restarts, not the entire pod. I've also tried having an additional pod that checks the main pods, but even that approach didn't resolve the stale connection issue.

Answered By VolumeViking88 On

I use a custom liveness script that checks the status of the mounted volumes. When it detects a stale volume, the pod is terminated, allowing Kubernetes to attempt to re-establish the connection, which usually succeeds when the external storage returns.

StorageGuru101 -

The downside with a liveness script is that it typically only restarts the container and not the whole pod. Even if the iSCSI server comes back online, the pod often retains a stale connection, leading to repeated container restarts without real recovery. Do you not experience that issue with your setup?

Answered By CuriousCoder42 On

Have you considered scaling down your workflows in Kubernetes before you restart the TrueNAS server? That way, you can minimize issues during planned maintenance, and it might help avoid some of the complications with stale connections afterward.

DatabaseDabbler23 -

I can see how that would help. However, I usually forget to scale them down and back up. It would be great if Kubernetes could automatically detect when iSCSI volumes become unavailable and restart the entire pod instead of just the containers.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.