List of duplicate file finders

This is a list of software tools to find and clean duplicate files in a directory.

Open Source

Language

*nix

Windows

OS X

CLI

GUI

Software

Python

ActiveState Recipe - a minimal python command line tool that only detects duplicates

Python

{{?}}

{{?}}

dedupe_copy - filters duplicates while copying and allows automatic reordering

C

duff - a Unix command-line utility for quickly finding duplicates in a given set of files

C++

{{?}}

Duff - a GUI duplicate file finder and processor for Windows

C

{{?}}

{{?}}

dupedit - Compares many files at once without checksumming. Avoids comparing files against themselves when multiple paths point to the same file.

Python

{{?}}

dupeguru - runs on various platforms. Special versions for music or picture available.

C++

Duplicate Files Finder - GUI Application for Windows and Linux. Project site.

Perl

{{?}}

{{?}}

{{?}}

{{?}}

{{?}}

dupious - Perl-based duplication finder for small to large systems, or multiserver setups. Former finddup.pl

C

{{?}}

{{?}}

{{?}}

dupmerge - POSIX C compliant and runs on various platforms (Win32/64 with Cygwin, *nix, Linux etc.)

Perl

{{?}}

{{?}}

{{?}}

dupseek - Perl with algorithm optimized to reduce reads

Python

fastdupes fast and small python command line tool to find duplicates

Perl

{{?}}

{{?}}

{{?}}

{{?}}

fdf - Perl/c based and runs across most platforms (Win32, *nix and probably others). Uses MD5, SHA1 and other checksum algorithms

Perl

{{?}}

{{?}}

fdupe - a small script written in Perl. Doing its job fast and efficiently.

C

{{?}}

{{?}}

fdupes - Command line tool written in C. MD5 then byte-by-byte. Can also compare hardlinks.

Java

findrepe - free Java-based command-line tool designed for an efficient search of duplicate files, it can search within zips and jars.(GNU/Linux, Mac OS X, *nix, Windows)

C

{{?}}

{{?}}

{{?}}

freedup - POSIX C compliant and runs across platforms (Windows with Cygwin, Linux, AIX, etc.)

Perl

{{?}}

{{?}}

{{?}}

{{?}}

freedups - Perl script that hardlinks duplicates to save space, caches file checksums.

Python

fslint - has command line interface and GUI.

Python

{{?}}

{{?}}

hardlinkpy - A tool to hardlink together identical files in order to save space. It is a complete rewrite and improvement over the original hardlink.c code (which was written by: Jakub Jelinek <jakub@redhat.com>). Performance is orders of magnitude faster than hardlink.c due to a more efficient algorithm.

Python

liten - Pure Python deduplication command line tool, and library, using md5 checksums and a novel byte comparison algorithm. (Linux, Mac OS X, *nix, Windows)

Python

liten2 - A rewrite of the original Liten, still a command line tool but with a faster interactive mode using SHA-1 checksums (Linux, Mac OS X, *nix)

C#

ndupfinder - uses MD5 hashing to efficiently find duplicates. binaries not available as of now. needs compilation by user. WPF gui available for windows.

C++

{{?}}

rdfind - One of the few which rank duplicates based on the order of input parameters (directories to scan) in order not to delete in "original/well known" sources (if multiple directories are given). Uses MD5 or SHA1.

Python

remdups - Small python command line tool with intermediate hash list file to produce an option driven remove file shell script.

C, Perl, SH

{{?}}

{{?}}

repeats - C and SH, from littleutils. File sizes, then partial-read hashes, then full-read hashes, then (optionally) byte-for-byte comparisons. Highly efficient. (Linux, *nix, Cygwin)

Bash

{{?}}

{{?}}

rmdupe - a shell script that uses linux tools to detect and remove duplicates.

C

rmlint - Fast finder with command line interface and many options to find other lint too (uses MD5), claims to be better than rdfind and fdupes.

C

{{?}}

{{?}}

ssdeep - identify almost identical files using Context Triggered Piecewise Hashing

Java

{{?}}

DFS - search by content / size / name

C++

{{?}}

{{?}}

ua - Unix/Linux command line tool, designed to work with find (and the like).

Commercial Or With More Restrictive License

See also

  • List of Unix programs
  • Data deduplication
  • Duplicate code

External Comparisons