I'm experiencing an issue with a large-scale Dell storage system that occasionally throws "No space left on device" errors during a data gathering project. I have a multi-core Linux server using an NFS-mounted file system on this Dell storage system, which serves files for thousands of clients—each having around 800 to 1000 files. When I tar files for clients that meet certain criteria, the process sometimes fails with that no-space error, despite the total storage seeming sufficient. This happens intermittently, making it frustrating to diagnose. In fact, I've noticed that when the error occurs, the system shows as having no free space available and all available inodes being unused. I've consulted our storage engineers, but no clear causes have been identified. Have others experienced and resolved similar issues?
5 Answers
I'd recommend putting some monitoring on the storage server to examine the filesystem. Sometimes, processes hold large files open, which means that even if you delete those files, the space counts as still being used as long as the filehandle is open. Ensure you analyze the ability of your filesystem to dynamically manage inodes, especially if you're using ext4, btrfs, or XFS.
Keep an eye on that; it could definitely shed some light on the issue.
An errno 28, which indicates "No space left on device", can sometimes be misleading—not always an actual space issue. Try running `watch "df -ih && lsblk"` while your tar job is running. It can help identify the problem as it progresses. It might be worth leaning on your storage engineers for support since they should be able to help you diagnose this.
Thanks! I've updated my post with more details. At the time of the exception, it showed no used inodes, but the directory itself was reported as full.
Monitoring is key. That could help you replicate the issue in a controlled environment.
Is your data backed up? It might be helpful to compress and archive old data, then delete it from the live system. As a side note, it's usually the responsibility of the teams who own the data to handle cleanup, not just sysadmins.
My boss has us help out researchers, so we're occasionally tasked with unusual jobs like this.
It sounds like you're running into one of two common problems. First, make sure that the filesystem where you're creating the tar files has enough free space for the entire file. The second issue could be related to running out of inodes, which can lead to errors even when there's still disk space available. Running `df -hi` should help you check the inode status. Keep an eye on those!
Bingo!
You're right; I did add some extra info to my post. The destination directory stats showed full while inodes were empty. How can that happen?
Do you understand what inodes are? It's possible to run out of inodes while still having free disk space. The approach to managing this issue depends on the filesystem you're using.
Yes, I know about inodes. I mentioned that at the time of the exception, no inodes were in use. It's strange that the destination directory appeared full regardless.
Definitely worth considering that if multiple processes are holding large files open, it may affect the available space.

Thanks! That makes sense. This process is using as much parallelism as I can fit, which could be contributing.