Find missing files but with same file name

Karl Auer kauer at biplane.com.au
Thu May 19 15:56:55 UTC 2016


On Wed, 2016-05-18 at 20:59 -0700, rikona wrote:
> What I'd like to do is find archived pix that may have the SAME file
> name as a current pix, but are DIFFERENT pix.

The following script delivers a list of all files from two directories
where the filenames in both directories are the same, but their
contents differ.

Call the script "cmpdir", and pass the two directory names on the
command line. It would be a good idea to pass fully qualified directory
names. Also a good idea to capture the output for later perusal.

   cmpdir /home/me/archive1 /home/me/archive2

Some limitations:

- only two directories. The outer DO loop can be as many directories as
you like though. The algorithm doesn't care, just adjust as needed.

- not recursive. You will have to run it on every pair of directories
you want to compare.

- doesn't deal with spaces in filenames or directory names.

- not great code

- not well tested

- not optimised. It reads the entire content of every file. It doesn't
check modified dates, file sizes or try any other cheap ways to see if
two files differ.

- it compares ALL files in the directories, it doesn't limit itself to
image formats.

> And, I'd also like to see ones with different names that are in the
> archives but not in the current pix file trees.

Left as an exercise for the reader :-)

Regards, K.

#!/bin/sh

D1=$1
D2=$2
CD=`pwd`
LIST="/tmp/list.$$"

# Dump list of files plus hashes
for d in $D1 $D2 ; do
{
   cd $d
   for i in * ; do
   {
      if [ -f $i ] ; then
         H=`md5sum $i`
         echo "$d/$i $H" >> $LIST
      fi
   }
   done
   cd $CD
}
done

# Sort, skipping the path
sort -k 2,3 $LIST > $LIST.sorted

# Use uniq to discard identical lines
# Skip the path when comparing
uniq -f 1 -u $LIST.sorted > $LIST.unique

# Sort again, this time only on name
sort -k 3 $LIST.unique | cut -d\  -f1

rm $LIST.sorted
rm $LIST.unique
rm $LIST


-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Karl Auer (kauer at biplane.com.au)
http://www.biplane.com.au/kauer
http://twitter.com/kauer389

GPG fingerprint: E00D 64ED 9C6A 8605 21E0 0ED0 EE64 2BEE CBCB C38B
Old fingerprint: 3C41 82BE A9E7 99A1 B931 5AE7 7638 0147 2C3C 2AC4







More information about the ubuntu-users mailing list