37.16 Duplicate Photos

20220122

Duplicates of photos readily occur as we copy photos around on our storage and attempt to manage large collections of photos with different file naming schemes.

Duplicate photos can readily be found using fdupes (see Section19.5). With no options fdupes lists groups of duplicated files in the specified directory:

$ fdupes .

./20180323_122434_02.jpg
./20180323_122434_01.jpg
./20180323_122434_00.jpg

./20030102_092312_03.jpg
./20031012_092312_00.jpg

./20200531_151245_01.jpg
./20200531_151245_00.jpg

With -r (--recurse) sub-directories are included. A summary of duplicates is obtained with -m (--summarize):

$ fdupes --summarize .

13567 duplicate files (in 6407 sets), occupying 16996.0 megabytes

Deleting duplicates with fdupes will retain the first listed within each group. Using --reverse and --order= by name might be useful for filenames that differ by numerals, so keeping the lowest numbered file. Explore with order to get what best works for you.

$ fdupes --order='name' --reverse .

./20180323_122434_00.jpg
./20180323_122434_01.jpg
./20180323_122434_02.jpg

./20031012_092312_00.jpg
./20030102_092312_03.jpg

./20200531_151245_00.jpg
./20200531_151245_01.jpg

The following command will delete duplicates, keeping the first listed in each group:

$ fdupes --delete --noprompt --order='name' --reverse .

   [+] ./2020/20200926_063024.jpg
   [-] ./camera/20200926_063024.jpg
   [-] ./todo/20200926_063024.jpg


   [+] ./2020/20201114_061818.jpg
   [-] ./camera/20201114_061818.jpg
   [-] ./todo/20201114_061818.jpg


   [+] ./2020/20201003_051104.jpg
   [-] ./camera/20201003_051104.jpg
   [-] ./todo/20201003_051104.jpg


   [+] ./2020/20201114_062446.jpg
   [-] ./camera/20201114_062446.jpg
   [-] ./todo/20201114_062446.jpg

...


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0