Weeding out Duplicate files in Hardy Heron 8.04
Loïc Grenié
loic.grenie at gmail.com
Sun Jan 17 21:55:21 UTC 2010
2010/1/17 Patton Echols <p.echols at comcast.net>:
> On 01/15/2010 06:47 AM, Loïc Grenié wrote:
>> 2010/1/15 Rafiq Hajat <ipi.malawi at gmail.com>:
>>
>>> Hi,
>>> I've been transferring all my music (MP3, OGG, FLAC, M4A, WMA) into my
>>> laptop and have noticed numerous duplications of the same song. The
>>> directory size is now 16GB and it's a nightmare to to collate and
>>> categorise. Any ideas on how to get rid of the duplications?
>>
>> If the files are the same you can use md5sum
>>
>> Loic
>
> I can see how you would use md5sum to see if the files are the same, but
> how would you use it to remove duplicates?
If your files are in a directory /home/dir/mm, you can try the
following:
find /home/dir/mm -type f -print0 | xargs -0 md5sum > /tmp/files
(all has to be typed on a single line) and then
sort /tmp/files | uniq -w 32 -D > /tmp/dupes
(all has to be typed on a single line). In /tmp/dupes you have
the list of duplicate files preprended with its md5 sum. You
can look at the list using
less /tmp/dupes
(type q to quit). If the duplicate list is not too big, you can
remove all but one of each "duplicate" (that way you can also
visually check if these are names of real duplicate). If you
want something more automatic you can try the following:
sum=z;while read md name;
do
if [ "$md" = "$sum" ]; then
rm "$name"
else
sum="$md"
fi
done < /tmp/dupes
Warning: if automatically erases files, it may be dangerous.
Hope this helps,
Loïc
More information about the ubuntu-users
mailing list