How to Handle iSCSI Target Disconnects in a Kubernetes Cluster?

0
8
Asked By TechWhiz123 On

I'm running a home Kubernetes cluster on Talos Linux where some applications depend on SQLite databases stored on an iSCSI target from my TrueNAS server. I manually configure Persistent Volumes (PV) and Persistent Volume Claims (PVC) for these workloads and don't use CSI drivers. Occasionally, I need to restart my TrueNAS server for maintenance, causing the iSCSI target to be unavailable for about 5 to 30 minutes.

During this downtime, my pods fail their liveness/readiness probes, and while Kubernetes attempts to restart them once the iSCSI server is back online, I still encounter I/O errors. It seems Kubernetes reuses the old iSCSI connection, leading to failures. The only way to resolve this issue is to delete the pod manually, which then allows everything to function normally again.

How do you all manage iSCSI target disconnects that last for a significant period?

3 Answers

Answered By DataDevil99 On

Unfortunately, there’s no perfect solution to this problem. Once the underlying infrastructure goes down for a bit, the volume mount becomes stale and can't be recovered. It's crucial to either scale down before taking the storage offline or ensure your pods have a way to restart automatically after reconnecting.

Answered By CloudSeeker77 On

Have you considered just scaling down your workloads in Kubernetes before you do maintenance on the TrueNAS server? If it's planned maintenance, that might prevent some of the issues you're facing.

Answered By StorageGuru45 On

I actually use a liveness script that checks if the mounted volumes have become stale. If the script fails, the pod gets terminated which hopefully helps with re-establishing the connection when the external storage returns.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.