Find missing files but with same file name

Sat May 21 21:59:29 UTC 2016

Thursday, May 19, 2016, 8:56:55 AM, Karl wrote:

> On Wed, 2016-05-18 at 20:59 -0700, rikona wrote:
>> What I'd like to do is find archived pix that may have the SAME file
>> name as a current pix, but are DIFFERENT pix.

> The following script delivers a list of all files from two
> directories where the filenames in both directories are the same,
> but their contents differ.

Thank you!! Linux has a lot of good tools, and I was hoping a script
guru might be able to put something together to help with this
problem.

> Call the script "cmpdir", and pass the two directory names on the
> command line. It would be a good idea to pass fully qualified directory
> names. Also a good idea to capture the output for later perusal.

>    cmpdir /home/me/archive1 /home/me/archive2

> Some limitations:

> - only two directories. The outer DO loop can be as many directories as
> you like though. The algorithm doesn't care, just adjust as needed.

> - not recursive. You will have to run it on every pair of directories
> you want to compare.

This is a problem - each of the many archives has a lot of
subdirectories for the well organized photos.

> - doesn't deal with spaces in filenames or directory names.

Probably okay for the photos, but all the subdirectories have spaces.
Any easy way to fix that?

> - not great code

> - not well tested

Understood. As long as it can't destroy anything, I'll be quite careful. :-))

> - not optimised. It reads the entire content of every file. It doesn't
> check modified dates, file sizes or try any other cheap ways to see if
> two files differ.

> - it compares ALL files in the directories, it doesn't limit itself to
> image formats.

>> And, I'd also like to see ones with different names that are in the
>> archives but not in the current pix file trees.

> Left as an exercise for the reader :-)

Looks like I'll be learning a bit of python... :-) [I probably need to
do that anyway...]

> Regards, K.

> #!/bin/sh

> D1=$1
> D2=$2
> CD=`pwd`
> LIST="/tmp/list.$$"

> # Dump list of files plus hashes
> for d in $D1 $D2 ; do
> {
>    cd $d
>    for i in * ; do
>    {
>       if [ -f $i ] ; then
>          H=`md5sum $i`
>          echo "$d/$i $H" >> $LIST
>       fi
>    }
>    done
>    cd $CD
> }
> done

> # Sort, skipping the path
sort -k 2,3 $LIST >> $LIST.sorted

> # Use uniq to discard identical lines
> # Skip the path when comparing
> uniq -f 1 -u $LIST.sorted > $LIST.unique

> # Sort again, this time only on name
> sort -k 3 $LIST.unique | cut -d\  -f1

> rm $LIST.sorted
> rm $LIST.unique
> rm $LIST

Doesn't this remove the file with the unique names?

I find this an impressive example of how one can use a group of Linux
tools to achieve a desired result. Thank you!

-- 

 rikona