Find missing files but with same file name
Karl Auer
kauer at biplane.com.au
Thu May 19 15:56:55 UTC 2016
On Wed, 2016-05-18 at 20:59 -0700, rikona wrote:
> What I'd like to do is find archived pix that may have the SAME file
> name as a current pix, but are DIFFERENT pix.
The following script delivers a list of all files from two directories
where the filenames in both directories are the same, but their
contents differ.
Call the script "cmpdir", and pass the two directory names on the
command line. It would be a good idea to pass fully qualified directory
names. Also a good idea to capture the output for later perusal.
cmpdir /home/me/archive1 /home/me/archive2
Some limitations:
- only two directories. The outer DO loop can be as many directories as
you like though. The algorithm doesn't care, just adjust as needed.
- not recursive. You will have to run it on every pair of directories
you want to compare.
- doesn't deal with spaces in filenames or directory names.
- not great code
- not well tested
- not optimised. It reads the entire content of every file. It doesn't
check modified dates, file sizes or try any other cheap ways to see if
two files differ.
- it compares ALL files in the directories, it doesn't limit itself to
image formats.
> And, I'd also like to see ones with different names that are in the
> archives but not in the current pix file trees.
Left as an exercise for the reader :-)
Regards, K.
#!/bin/sh
D1=$1
D2=$2
CD=`pwd`
LIST="/tmp/list.$$"
# Dump list of files plus hashes
for d in $D1 $D2 ; do
{
cd $d
for i in * ; do
{
if [ -f $i ] ; then
H=`md5sum $i`
echo "$d/$i $H" >> $LIST
fi
}
done
cd $CD
}
done
# Sort, skipping the path
sort -k 2,3 $LIST > $LIST.sorted
# Use uniq to discard identical lines
# Skip the path when comparing
uniq -f 1 -u $LIST.sorted > $LIST.unique
# Sort again, this time only on name
sort -k 3 $LIST.unique | cut -d\ -f1
rm $LIST.sorted
rm $LIST.unique
rm $LIST
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Karl Auer (kauer at biplane.com.au)
http://www.biplane.com.au/kauer
http://twitter.com/kauer389
GPG fingerprint: E00D 64ED 9C6A 8605 21E0 0ED0 EE64 2BEE CBCB C38B
Old fingerprint: 3C41 82BE A9E7 99A1 B931 5AE7 7638 0147 2C3C 2AC4
More information about the ubuntu-users
mailing list