rem-dups - Remove Duplicate Files Script


rem-dups is a great command-line utility for finding and removing duplicate clone files.

rem-dups was found here after searching the Internet for a useful utility to remove Cruft which happens to anyone who runs computers long enough or works with large amounts of data, files, pictures, etc.

The nice thing about this script is that it uses standard Linux tools found with most distributions and you have control over what to choose to keep or remove. What you want to keep note of is that this script finds exact duplicate files, so if you are working files with some differences, you'll need to use another type of tool, maybe like diff.


GET THE SCRIPT

You can download script "rem-dups" or you can copy and paste the script shown below to a location where you keep handy utility programs:

#!/bin/sh
# rem-dups - Finds duplicate files, puts them in rem-duplicates.sh for removal

OUTF=rem-duplicates.sh;

echo "#! /bin/sh" > $OUTF;
echo "# File created by $0 $1 $2" >> $OUTF;
echo "cd $(pwd)" >> $OUTF;
find "$@" -type f -exec md5sum {} \; |
  sort --key=1,32 | uniq -w 32 -d --all-repeated=separate |
  sed -r 's/^[0-9a-f]*( )*//;s/([^a-zA-Z0-9./_-])/\\\1/g;s/(.+)/#rm \1/' >> $OUTF;
chmod a+x $OUTF;
ls -l $OUTF

After you download and save it, use the command chmod a+x rem-dups to make the script executable, otherwise you will need to run it as sh rem-dups to use the script.


RUN THE SCRIPT

Once you have downloaded and saved rem-dups, change to the directory you want to check, and run the script to check the directory and all sub-directories for duplicate files. This creates a new "rem-duplicates.sh" script file which you can modify and run afterwards.

Below is an example of what you would type on the command-line, and below that, is an explanation of each command done:

1.
2.
3.

 
 
[you@genesis ~]$ chmod a+x rem-dups
[you@genesis ~]$ cd Pictures
[you@genesis Pictures]$ sh ~/rem-dups                                           
-rwxr-xr-x 1 you you 256 2011-01-01 00:01 rem-duplicates.sh
[you@genesis Pictures]$

  1. chmod a+x rem-dups - To keep this simple, we assume you saved the script as file "rem-dups" in your "home" directory (which is directory "~"). Therefore, use the chmod command to make it known as an executable file. If you do not want to do that, you will need to run the command as sh rem-dups.

  2. cd Pictures - Change to the directory you want to check (For this example, we assume you want to check directory "Pictures" and all sub-directories).

  3. sh ~/rem-dups - You can type ~/rem-dups or sh ~/rem-dups to begin the process of searching the directory for duplicate clone files. This process may take a long time if you have a large number of files in the directory. When it is done, it will display the size of the resulting "rem-duplicates.sh" output file.


CHOOSE THE FILES YOU WANT TO REMOVE

After the script finishes, you will have an editable "rem-duplicates.sh" output file.

Use your favorite text editor to choose which files you want to remove. For the files you choose to remove, you will want to remove the preceeding comment mark "#" so that the command can remove the file.

1.
 
 
[you@genesis Pictures]$ kwrite rem-duplicates.sh                                
[you@genesis Pictures]$

  1. kwrite rem-duplicates.sh - Use your favorite text editor to choose the files you want to remove. If you are using a Desktop such as KDE, you can use kwrite to edit the file. If you use another type of Desktop, you will need to use another editor.

    In this example, we have two pictures with different names, but they are identical.
    #! /bin/sh
    # File created by /home/you/rem-dups  
    cd /home/you/Pictures
    #rm CatsAndDogs.jpg
    #rm DogsAndCats.jpg

    Using a text editor, we chose to remove "DogsAndCats.jpg" by removing the preceeding "#" and to keep "CatsAndDogs.jpg" by leaving the "#" in front of that line.
    #! /bin/sh
    # File created by /home/you/rem-dups  
    cd /home/you/Pictures
    #rm CatsAndDogs.jpg
    rm DogsAndCats.jpg

    Save the edited file when you are done.


CLEANUP AND EXIT

Run the script ./rem-duplicates.sh to remove duplicate clone files, then cleanup by removing the script "rem-duplicates.sh". When you are done, you can then exit the command-line terminal.

1.
2.
3.
 
[you@genesis Pictures]$ ./rem-duplicates.sh
[you@genesis Pictures]$ rm rem-duplicates.sh                                    
[you@genesis Pictures]$ exit

  1. ./rem-duplicates.sh - You can type ./rem-duplicates.sh or sh rem-duplicates.sh to erase all the duplicate files you don't want.

  2. rm rem-duplicates.sh - When you finish erasing all duplicate files you do not want to keep, you can then remove rem-duplicates.sh since it is no longer needed anymore.

  3. exit - Exit the command-line terminal. You are done.

Links To Sections Of This Page
Top Of Page Get Script Run Script Edit Choices Cleanup

Other Pages On This Web Site
Home Page Backup With Tar Disk Image Backup BOINC On Linux
android.rules For adb Server Remove Spaces In File Names Change MAC Address Java On Linux
Win At Roulette Ship Arcade PicDis Disassembler


Copyright© 2000..2023 Joe's Cat Website