I'm having a tough time troubleshooting a CIFS mount issue in a mixed Linux/Windows domain setup. Our architecture includes a Windows Server 2022 domain controller and Windows Server 2025 file servers, along with Rocky Linux 8/9 application servers. Here's the setup:
- We mount a Windows DFS share on our Rocky boxes via CIFS in the fstab file.
- Everything works fine until I reboot the primary file server (FS1). When FS1 reboots, the Rocky application server (RS1) switches its mount to the secondary server (FS2) instead.
- After FS1 comes back online, RS1 doesn't revert to FS1 unless I either reboot RS1 or force it to unmount and remount.
I've checked the DFS Namespaces (DFSN) settings: the ordering method is set to 'Lowest Cost', the option for clients to fail back to preferred targets is enabled, and the cache timeout is set for 10 seconds. However, even with these configurations, it doesn't switch back automatically once FS1 is available again. I'd like to know how I can further troubleshoot this issue, or if there's a reliable method to determine which file server the mount is currently pointing to, so that I can address slow performance complaints more effectively.
2 Answers
It sounds like you're running into some caching issues with the DFS setup. Even Windows has quirks with its DFS replication and path traversal; it can mess things up just like on Linux with CIFS. Unfortunately, the Linux CIFS implementation doesn’t handle the SYSVOL path traversal too well either, which may lead to stale mount points in your situation. A lot of users face this when working with DFS. One workaround is to do a full unmount and then remount the share, but that can be pretty resource-intensive and not ideal for your situation. You might want to set up a script that checks which server the mount is pointing to and alerts you if it's not the primary one. That way, you can act quickly before users notice the slowdown.
Have you considered running `mount -a` as a cron job every once in a while? It might help in re-evaluating your current mounts. While it's true `mount -a` usually just remounts if something is unmounted, it also checks if the current mounts are valid. It could help switch back to FS1 when it becomes available again, assuming the proper settings play nice together.
I was under the impression that `mount -a` only applies to unmounted filesystems. But if it's able to check the cost settings and point to the right server, that could be worth trying!

Glad to hear it's only the DFS Namespace causing trouble! Another workaround could be to implement scheduled checks—something that detects if you're connected to FS2 and triggers an unmount/remount process. It might save you some headaches.