# this howto was tested to work with
cat /etc/os-release |grep PRETTY
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"

why?

duplicate files are a waste of disk space.

every system experiences catastrophic failures, slow downs and crashes of programs, when RAM or disk space runs out 😀

BUT: under no circumstances shall a program be designed, to allow accidents that delete ALL files 😀 (without saveguards like: “you will delete all fiels under this folder?” “are you sure?” “are you really sure?”

sometimes files are stored in a certain folder for a reason.

there are programs, that allow

1. finding duplicate files

2. then deleting one copy

3. then setting a link to the still existing copy

= disk space is saved, and all files are still accessible via their folders

hardlink

hardlink.manpage.txt

hardlink is a tool which replaces copies of a file with hardlinks, therefore saving space.

examples:

# start a test dry-run on the current directory
hardlink -v --dry-run .

it also per default searches in all subdirectories of the current directory:

# do it for real
hardlink -v .

# checking the result, nice :D it works! :D
hardlink -v --dry-run .
Mode: dry-run
Files: 14
Linked: 0 files
Compared: 0 xattrs
Compared: 0 files
Saved: 0 bytes
Duration: 0.00 seconds

jdupes

jdupes.manpage.txt

website: https://github.com/jbruchon/jdupes

WARNING: jdupes IS NOT a drop-in compatible replacement for fdupes!

identify and delete or link duplicate files

examples:

jdupes -m .
Scanning: 7 files, 1 items (in 1 specified)
6 duplicate files (in 1 sets), occupying 6 MB

     -L --linkhard
              replace all duplicate files with hardlinks to the first file in each set of duplicates

fdupes

identifies duplicate files within given directorie (fdupes.manpage.txt)

su - root;
apt update;
apt install fdupes;
-H --hardlinks
normally, when two or more files point to  the  same  disk  area
they are treated as non-duplicates; this option will change this behavior

examples:

fdupes -r -m .
8 duplicate files (in 1 sets), occupying 8.4 megabytes

rdfind

rdfind.manpage.txt

su - root;
apt update;
apt install rdfind;
# dry run (no file is removed)
rdfind -dryrun true ./search/in/this/folder

# WARNING! THIS REMOVES FILES! MAKE BACKUP!
rdfind -deleteduplicates true ./search/in/this/folder

creditz: https://www.tecmint.com/find-and-delete-duplicate-files-in-linux/

duff

website: http://duff.dreda.org/

duff.manpage.txt

su - root;
apt-get update;
apt-get install duff; # install duff

duff examples:

Normal mode

Shows normal output, with a header before each cluster of duplicate files, in this case using

  • recursive search (-r) in .folder /comics
duff -r comics
2 files in cluster 1 (43935 bytes, digest ea1a856854c166ebfc95ff96735ae3d03dd551a2)
comics/Nemi/n102.png
comics/Nemi/n58.png
3 files in cluster 2 (32846 bytes, digest 00c819053a711a2f216a94f2a11a202e5bc604aa)
comics/Nemi/n386.png
comics/Nemi/n491.png
comics/Nemi/n512.png
2 files in cluster 3 (26596 bytes, digest b26a8fd15102adbb697cfc6d92ae57893afe1393)
comics/Nemi/n389.png
comics/Nemi/n465.png
2 files in cluster 4 (30332 bytes, digest 11ff80677c85005a5ff3e12199c010bfe3dc2608)
comics/Nemi/n380.png
comics/Nemi/n451.png

The header can be customized (with the -f flag) for example outputing only the number of files that follow:

duff -r -f '%n' comics
2
comics/Nemi/n102.png
comics/Nemi/n58.png
3
comics/Nemi/n386.png
comics/Nemi/n491.png
comics/Nemi/n512.png
2
comics/Nemi/n389.png
comics/Nemi/n465.png
2
comics/Nemi/n380.png
comics/Nemi/n451.png

Excess mode

Duff can report all but one file from each cluster of duplicates (with the -e flag).

This can be used in combination with for examplerm to remove duplicates, but should only be done if you don’t care which duplicates are removed.

duff -re comics
comics/Nemi/n58.png
comics/Nemi/n491.png
comics/Nemi/n512.png
comics/Nemi/n465.png
comics/Nemi/n451.png

czkawka

czkawka it is the rust rewritten successor to fslint (fslint is no longer in the default Debian repo, for whatever reason)

what is neat about czkawka:

  • it searches a directory for duplicate files and lists the biggest files first
  • be-aware:
    • the terminal version is sufficient (imho)
      • Debian 11 (yet): was unable to install the gui
      • plz prepare for a lengthy install that involves downloading a lot of software and compiling it

install:

# as default user
curl --proto '=https' --tlsv1.2 https://sh.rustup.rs -sSf | sh

# check rust is installed
rustc --version
rustc 1.64.0 (a55dd71d5 2022-09-19)

# warning! THIS WILL DOWNLOAD AND COMPILE A LOT!
cargo install czkawka_cli

# run it
czkawka_cli dup --directories /where/to/search/for/duplicates | less

# if the gui was required
# become root
su - root
apt update
apt install software-properties-common ffmpeg
apt install libgdk-pixbuf-2.0-dev libghc-pango-dev libgraphene-1.0-dev librust-pango-sys-dev libglib2.0-dev cairo-dev libcairo2-dev librust-pango-sys-dev

# Ctrl+D (logoff root)
cargo install cairo-dev

# can try to install gui, but won't work
cargo install czkawka_gui

https://lib.rs/crates/czkawka_cli

liked this article?

  • only together we can create a truly free world
  • plz support dwaves to keep it up & running!
  • (yes the info on the internet is (mostly) free but beer is still not free (still have to work on that))
  • really really hate advertisement
  • contribute: whenever a solution was found, blog about it for others to find!
  • talk about, recommend & link to this blog and articles
  • thanks to all who contribute!
admin