Find missing files but with same file name

Joel Roth joelz at pobox.com
Thu May 19 09:48:53 UTC 2016


rikona wrote:
> I lost some photo files, possibly doing backups because they have the
> same file name as a different photo. In this case pc050049.jpg is NOT
> the same pix as pc050049.jpg, for example. Pictures that I know I had
> [I have a print :-) ] at an event are missing.
> 
> I have many archive disks, CDs and DVDs with perhaps a hundred or so
> large dir trees of archived pix.
> 
> What I'd like to do is find archived pix that may have the SAME file
> name as a current pix, but are DIFFERENT pix. Given MANY archives, is
> there an efficient way to do this in Ububtu? About 20,000 current pix
> to process.
> 
> And, I'd also like to see ones with different names that are in the
> archives but not in the current pix file trees.

Hi Rikona,

For this type of job, I would typically write a perl script,
but you could use any scripting language that has hash
variables (dictionaries). It would be most straightforward
assuming your system has RAM to keep the entire data
structure in memory, and if you could have all pictures mounted
on your system at once.

For each name, such as pc050049.jpg you want to store 
the path, and the size and probably md5 hash of the file
contents. Here is an approximate YAML representation.

pc050049.jpg:
 - /home/rikona/photos/2012-5-16
   size: 25683934
   md5: adfec80203fedcab
 - /mnt/backup1/dvd12/photos/2003-4-8
   size: 18403456
   md5: 8938475deadbeef8383894

Then you iterate through the data structure, looking for
same named files of differing content. 

That's an analysis of the process. It's a good first project
for learning to code in a language of your choice.

Have fun,

 
> Thanks,
> 
> rikona
> 
> 

-- 
Joel Roth
  





More information about the ubuntu-users mailing list