I'm managing a classroom of 61 identical machines with RHEL 7.8. Recently, I noticed that the /boot partition on one machine is 100% full according to the 'df' command, but when I check with 'du /boot', it only shows 51MB of usage. The partition is 1GB, and other machines have varying usages from 11% to 80%, even though they all have the same files. We've checked for open files with 'lsof', ran 'fsck.xfs' without issues, and even booted from a recovery disk to confirm there's nothing there before mounting. After deleting some older kernels, the usage dropped to 95%, but it's creeping back up to 100% without any new files or changes. Is there some hidden metadata causing this? Any thoughts?
6 Answers
I had a similar issue and discovered a process was writing to a directory that was later covered by an NFS mount, which made the space seem available. Make sure there aren't any double mounts happening, as this could lead to confusion on space usage.
First thing's first: why are you using XFS? Just kidding, but really, switching to a more user-friendly filesystem might help prevent these issues in the future!
This commonly happens when a file is deleted but still has an open file handle. The 'df' command will report that space as used, while 'du' can't see it. This might get reset after a reboot unless you have something in systemd or init scripts that's misconfigured. Check for open handles with `sudo lsof | grep -i deleted`.
It sounds like you might be dealing with internal XFS logs that aren't counted by 'du'. They can vary in size across machines. To investigate, try running `xfs_info /boot` to check the log size. You can also check for extended attributes using `xfs_db -r -c "freesp -s" /dev/nvme[boot-partition]` to see if there's hidden space being consumed. If things still look off, you might need to consider using `xfs_repair -L` but make sure you're okay with temporary unavailability.
Thanks for the tips! I ran lsattr and found nothing out of the ordinary. The xfs_info command showed a log size of 10MB like the other machines. I couldn't try a repair since class is in session, but I'll get to it soon.
Don't forget to check inode usage with `df -i`. Sometimes running out of inodes can show misleading results in file usage!
You might have some reserved space or hidden metadata affecting the space reporting. Make sure to check `xfs_info /boot` for any reservations. If you're still puzzled, try `xfs_db -c 'blockhead' /dev/sdX` for a deeper look. Also, using lsattr might show some oddities, so keep an eye out for that.
I did look for that in the third paragraph, but didn't find any open handles listed.