I'm looking for an efficient way to compare files in two directories because I suspect I have duplicates. I know there are hundreds of files involved, and I'm aware that some files exist in one directory but not the other. What I need is a method to quickly compare the file names and sizes, and then get a list of files that have the same name but different sizes, plus any files that are in one directory but missing from the other. What tools or commands can help me with this?
5 Answers
I’ve had good success using checksums for comparison. I have a script with relevant bits which I can share later if no one else does first.
You could also try something like this: `diff <(cd dir1; du -s * | sort -n ) <(cd dir2; du -s * | sort -n )`. It's a neat trick that helps compare sizes directly!
You might want to try using the command `rsync -avn /dir1 /dir2`. Just make sure to include the `-n` option so it doesn’t actually move any files. Also, if you're looking for a checksum comparison, you can add `-c` to check based on file contents.
Another approach could be using a command like `find -exec md5sum | sort | uniq -c`. Plus, tools like `fdupes` and `jdupes` are fantastic for large directories; I prefer `jdupes` for its speed!
Another great option is `diff -rq /dir1 /dir2`. It's pretty straightforward and can give you a quick comparison. Depending on your needs, you might want to measure the speed of this against `rsync`, as they can produce slightly different outputs.
I had no idea `diff` could do all that! I'm so glad I found this out after all these years. It definitely saves the hassle of scripting.
Looks like I’m not the only one who didn’t know about this command—very helpful!

I love `jdupes` too! It’s saved me so much time in searching for duplicates in my large libraries.