Find missing files but with same file name
rikona
rikona at sonic.net
Sat May 21 21:59:29 UTC 2016
Thursday, May 19, 2016, 8:56:55 AM, Karl wrote:
> On Wed, 2016-05-18 at 20:59 -0700, rikona wrote:
>> What I'd like to do is find archived pix that may have the SAME file
>> name as a current pix, but are DIFFERENT pix.
> The following script delivers a list of all files from two
> directories where the filenames in both directories are the same,
> but their contents differ.
Thank you!! Linux has a lot of good tools, and I was hoping a script
guru might be able to put something together to help with this
problem.
> Call the script "cmpdir", and pass the two directory names on the
> command line. It would be a good idea to pass fully qualified directory
> names. Also a good idea to capture the output for later perusal.
> cmpdir /home/me/archive1 /home/me/archive2
> Some limitations:
> - only two directories. The outer DO loop can be as many directories as
> you like though. The algorithm doesn't care, just adjust as needed.
> - not recursive. You will have to run it on every pair of directories
> you want to compare.
This is a problem - each of the many archives has a lot of
subdirectories for the well organized photos.
> - doesn't deal with spaces in filenames or directory names.
Probably okay for the photos, but all the subdirectories have spaces.
Any easy way to fix that?
> - not great code
> - not well tested
Understood. As long as it can't destroy anything, I'll be quite careful. :-))
> - not optimised. It reads the entire content of every file. It doesn't
> check modified dates, file sizes or try any other cheap ways to see if
> two files differ.
> - it compares ALL files in the directories, it doesn't limit itself to
> image formats.
>> And, I'd also like to see ones with different names that are in the
>> archives but not in the current pix file trees.
> Left as an exercise for the reader :-)
Looks like I'll be learning a bit of python... :-) [I probably need to
do that anyway...]
> Regards, K.
> #!/bin/sh
> D1=$1
> D2=$2
> CD=`pwd`
> LIST="/tmp/list.$$"
> # Dump list of files plus hashes
> for d in $D1 $D2 ; do
> {
> cd $d
> for i in * ; do
> {
> if [ -f $i ] ; then
> H=`md5sum $i`
> echo "$d/$i $H" >> $LIST
> fi
> }
> done
> cd $CD
> }
> done
> # Sort, skipping the path
sort -k 2,3 $LIST >> $LIST.sorted
> # Use uniq to discard identical lines
> # Skip the path when comparing
> uniq -f 1 -u $LIST.sorted > $LIST.unique
> # Sort again, this time only on name
> sort -k 3 $LIST.unique | cut -d\ -f1
> rm $LIST.sorted
> rm $LIST.unique
> rm $LIST
Doesn't this remove the file with the unique names?
I find this an impressive example of how one can use a group of Linux
tools to achieve a desired result. Thank you!
--
rikona
More information about the ubuntu-users
mailing list