# this howto was tested to work with
cat /etc/os-release |grep PRETTY
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
why?
duplicate files are a waste of disk space.
every system experiences catastrophic failures, slow downs and crashes of programs, when RAM or disk space runs out 😀
BUT: under no circumstances shall a program be designed, to allow accidents that delete ALL files 😀 (without saveguards like: “you will delete all fiels under this folder?” “are you sure?” “are you really sure?”
sometimes files are stored in a certain folder for a reason.
there are programs, that allow
1. finding duplicate files
2. then deleting one copy
3. then setting a link to the still existing copy
= disk space is saved, and all files are still accessible via their folders
hardlink
hardlink is a tool which replaces copies of a file with hardlinks, therefore saving space.
examples:
# start a test dry-run on the current directory
hardlink -v --dry-run .
it also per default searches in all subdirectories of the current directory:
# do it for real hardlink -v . # checking the result, nice :D it works! :D hardlink -v --dry-run . Mode: dry-run Files: 14 Linked: 0 files Compared: 0 xattrs Compared: 0 files Saved: 0 bytes Duration: 0.00 seconds
jdupes
website: https://github.com/jbruchon/jdupes
WARNING: jdupes IS NOT a drop-in compatible replacement for fdupes!
identify and delete or link duplicate files
examples:
jdupes -m .
Scanning: 7 files, 1 items (in 1 specified)
6 duplicate files (in 1 sets), occupying 6 MB
-L --linkhard
replace all duplicate files with hardlinks to the first file in each set of duplicates
fdupes
identifies duplicate files within given directorie (fdupes.manpage.txt)
su - root; apt update; apt install fdupes;
-H --hardlinks normally, when two or more files point to the same disk area they are treated as non-duplicates; this option will change this behavior
examples:
fdupes -r -m .
8 duplicate files (in 1 sets), occupying 8.4 megabytes
rdfind
su - root; apt update; apt install rdfind; # dry run (no file is removed) rdfind -dryrun true ./search/in/this/folder # WARNING! THIS REMOVES FILES! MAKE BACKUP! rdfind -deleteduplicates true ./search/in/this/folder
creditz: https://www.tecmint.com/find-and-delete-duplicate-files-in-linux/
duff
website: http://duff.dreda.org/
su - root; apt-get update; apt-get install duff; # install duff
duff examples:
Normal mode
Shows normal output, with a header before each cluster of duplicate files, in this case using
- recursive search (
-r
) in .folder /comics
duff -r comics
2 files in cluster 1 (43935 bytes, digest ea1a856854c166ebfc95ff96735ae3d03dd551a2)
comics/Nemi/n102.png
comics/Nemi/n58.png
3 files in cluster 2 (32846 bytes, digest 00c819053a711a2f216a94f2a11a202e5bc604aa)
comics/Nemi/n386.png
comics/Nemi/n491.png
comics/Nemi/n512.png
2 files in cluster 3 (26596 bytes, digest b26a8fd15102adbb697cfc6d92ae57893afe1393)
comics/Nemi/n389.png
comics/Nemi/n465.png
2 files in cluster 4 (30332 bytes, digest 11ff80677c85005a5ff3e12199c010bfe3dc2608)
comics/Nemi/n380.png
comics/Nemi/n451.png
The header can be customized (with the -f
flag) for example outputing only the number of files that follow:
duff -r -f '%n' comics
2
comics/Nemi/n102.png
comics/Nemi/n58.png
3
comics/Nemi/n386.png
comics/Nemi/n491.png
comics/Nemi/n512.png
2
comics/Nemi/n389.png
comics/Nemi/n465.png
2
comics/Nemi/n380.png
comics/Nemi/n451.png
Excess mode
Duff can report all but one file from each cluster of duplicates (with the -e
flag).
This can be used in combination with for examplerm
to remove duplicates, but should only be done if you don’t care which duplicates are removed.
duff -re comics
comics/Nemi/n58.png
comics/Nemi/n491.png
comics/Nemi/n512.png
comics/Nemi/n465.png
comics/Nemi/n451.png
czkawka
czkawka it is the rust rewritten successor to fslint (fslint is no longer in the default Debian repo, for whatever reason)
what is neat about czkawka:
- it searches a directory for duplicate files and lists the biggest files first
- be-aware:
- the terminal version is sufficient (imho)
- Debian 11 (yet): was unable to install the gui
- plz prepare for a lengthy install that involves downloading a lot of software and compiling it
- the terminal version is sufficient (imho)
install:
# as default user curl --proto '=https' --tlsv1.2 https://sh.rustup.rs -sSf | sh # check rust is installed rustc --version rustc 1.64.0 (a55dd71d5 2022-09-19) # warning! THIS WILL DOWNLOAD AND COMPILE A LOT! cargo install czkawka_cli # run it czkawka_cli dup --directories /where/to/search/for/duplicates | less # if the gui was required # become root su - root apt update apt install software-properties-common ffmpeg apt install libgdk-pixbuf-2.0-dev libghc-pango-dev libgraphene-1.0-dev librust-pango-sys-dev libglib2.0-dev cairo-dev libcairo2-dev librust-pango-sys-dev # Ctrl+D (logoff root) cargo install cairo-dev # can try to install gui, but won't work cargo install czkawka_gui
https://lib.rs/crates/czkawka_cli
liked this article?
- only together we can create a truly free world
- plz support dwaves to keep it up & running!
- (yes the info on the internet is (mostly) free but beer is still not free (still have to work on that))
- really really hate advertisement
- contribute: whenever a solution was found, blog about it for others to find!
- talk about, recommend & link to this blog and articles
- thanks to all who contribute!
