This is actually VERY usefull to find files that waste disk space.
lsb_release -a; # tested on
Distributor ID: Debian
Description: Debian GNU/Linux 12 (bookworm)
the solution: czkawka_cli
- install rust like this (no need to install rust as root)
- install for default-non-root user:
-
cargo install czkawka_cli
-
- after a bit of downloading & compiling, run it like this:
- run it (non-root, because it is not installed for root only for default-non-root user)
-
czkawka_cli dup --directories /where/to/search/for/duplicates/ | less
czkawka_cli AUTOMATICALLY sorts by filesize (great job! all involved! :D)
in order to limit output to 30 blocks:
czkawka_cli dup --directories /where/to/search/for/duplicates/ > /scripts/find_largest_30_duplicates.sh.txt
then define the following 2 scripts
(script 1 uses script 2 so both need to exist)
vim /scripts/output_x_amount_of_textblocks.sh #!/bin/bash echo "=== output $2 amount of text blocks from $1 defined by delimiter $3 ===" TEMPFILE=$1 # TEMPFILE=/scripts/find_largest_30_duplicates.sh.txt # fdupes -Sr $1 | grep "bytes each" -A 2 > $TEMPFILE BLOCK_LIMIT=$2 BLOCK_COUNTER=0 DELIMITER="$3" while read -r LINE; do if [ $BLOCK_COUNTER -le $BLOCK_LIMIT ]; then # verbose output # echo "currently on block: "$BLOCK_COUNTER; echo "$LINE" if [[ "$LINE" == *"$DELIMITER"* ]]; then ((BLOCK_COUNTER++)) fi fi done < "$TEMPFILE" # then call it like this /scripts/output_x_amount_of_textblocks.sh "/scripts/find_largest_30_duplicates.sh.txt" 30 "----"
all in one script:
vim /scripts/find_largest_30_duplicates.sh #!/bin/bash BLOCK_LIMIT=30 BLOCK_COUNTER=0 DELIMITER="----" TEMPFILE=./find_largest_30_duplicates.sh.txt # create temp file and make sure it's empty echo "" > "$TEMPFILE" # verbose: monitor changes to tempfile # tail -f "$TEMPFILE" & # AUTOMATICALLY sorts by filesize 😀 czkawka_cli dup --directories $1 > $TEMPFILE echo "=== output $2 amount of text blocks from $1 defined by delimiter $3 ===" # fdupes -Sr $1 | grep "bytes each" -A 2 > $TEMPFILE while read -r LINE; do if [ $BLOCK_COUNTER -le $BLOCK_LIMIT ]; then # verbose output # echo "currently on block: "$BLOCK_COUNTER; echo "$LINE" if [[ "$LINE" == *"$DELIMITER"* ]]; then ((BLOCK_COUNTER++)) fi fi done < "$TEMPFILE"
then run it like:
# mark it runnable chmod +x /scripts/*.sh # run it /scripts/find_largest_30_duplicates.sh /path/to/duplicates/ | less
alternative with fdupes
# define script vim /scripts/find_largest_30_duplicates.sh #!/bin/bash TEMPFILE=./find_largest_30_duplicates.sh.txt fdupes -Sr $1 | grep "bytes each" -A 2 > $TEMPFILE BLOCK_LIMIT=30 BLOCK_COUNTER=0 DELIMITER="--" while read -r LINE; do if [ $BLOCK_COUNTER -le $BLOCK_LIMIT ]; then # verbose output # echo "currently on block: "$BLOCK_COUNTER; echo "$LINE" if [[ "$LINE" == *"$DELIMITER"* ]]; then ((BLOCK_COUNTER++)) fi fi done < "$TEMPFILE" # call it like this /scripts/find_largest_30_duplicates.sh "/where/to/search/for/duplicates/"
compiling msort
WARNING! MULTIPLE versions of MSORT exist!
Bill Poser (billposer ÄT alum DOT mit DOT edu) (https://packages.debian.org/trixie/msort)
and the BSD msort
how to compile Bill Poser’s msort
manpage: msort.man.txt
Second, msort requires support for Unicode normalization. It can be compiled to use either libicu (International Components for Unicode), which may be obtained
from http://www.icu-project.org/, or libutf8proc, which may be obtained from http://www.flexiguided.de/publications.utf8proc.en.html.
ICU is fairly widely used, so you already have it on your system.
To use it, give the option –disable-utf8proc to configure. msort defaults to using utf8proc because utf8proc is smaller and easier to install.
of course it’s much easier to just (and also recommended)
su - root apt update apt install msort
So because thie utf option was missing, the compilation quest began …
lsb_release -a Description: Debian GNU/Linux 12 (bookworm) # prepare requirements: tre # recommended to compile this inside a vm # because it will require a lot of packages # build tre (https://github.com/laurikari/tre/) su - root # allow username to use sudo usermod -a -G sudo username apt install build-essential git wget autoconf automake gettext libtool zip autopoint libutf8proc-dev libuninum-dev; reboot; # save all unsaved files and reboot to make user permission (sudo) changes active # test if sudo works sudo bash # if yes Ctrl+D # become non-root user again cd ~; mkdir software; git clone https://github.com/laurikari/tre.git; cd tre; ./utils/autogen.sh; ./configure; make; make check; sudo make install
# compile bill's msort from src cd ~; mkdir software; cd software; wget http://billposer.org/Software/Downloads/msort-8.53.tar.gz; # here is a backup copy of the src and corresponding sha512sum file tar fxvz msort-8.53.tar.gz; cd msort-8.53; ./configure; make; sudo make install; # if all went (almost) well # this should output, which means: CELEBRATE! IT WORKS :D msort --version msort 8.53 lib gmp not linked lib utf8proc lib tre 0.8.0 lib uninum not linked glibc 2.36 Compiled Oct 4 2023 11:52:38 on x86_64 under Linux 6.1.0-12-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.52-1 (2023-09-07)
how to compile the bsd msort:
lsb_release -a Description: Debian GNU/Linux 12 (bookworm) su - root; apt update; apt search autoreconf; apt install dh-autoreconf; Ctrl+D # logoff root, become non-root user cd ~ mkdir -p software cd software git clone https://github.com/mayank-02/msort.git cd msort # actually start building autoreconf --install ./configure make # install binaries sudo make install # the user knows if it is the BSD msort because this will fail msort --version msort: invalid option -- '-' Refer README.md for more information.
Links:
GNU Linux -> disk usage – why is my harddisk full? find biggest largest files and directories
liked this article?
- only together we can create a truly free world
- plz support dwaves to keep it up & running!
- (yes the info on the internet is (mostly) free but beer is still not free (still have to work on that))
- really really hate advertisement
- contribute: whenever a solution was found, blog about it for others to find!
- talk about, recommend & link to this blog and articles
- thanks to all who contribute!