A CLI tool for staging and reviewing duplicate or near-duplicate image files. It can scan by exact SHA-256 as well as perceptual hashes (aHash and pHash), move candidate sets into numbered decision folders, and serve a local Flask UI so you can pick which copies to keep before the script copies them back to their originals and cleans up.
Recommended that you save a copy of your files before use as to not accidentally loose any files!
find_dupe_images.py. orchestrates the scan, staging, review loop, and cleanup. It usesmodules.ui_console.UISplitto show Rich-powered logs while the Flask server is running and can build groups with SHA-256, aHash, or pHash.modules/web_review.py. hosts the/apiendpoints, maintains per-group state, exposes global folder preferences, and launches the browser-based review grid.modules/ui_console.py. keeps a split-pane terminal layout so Flask logs and the main script progress can be observed at once._manifest.tsvfiles (one per duplicate group) map every staged file back to its original path._group_meta.jsonrecords the detection mode/threshold used for that group so resumed runs stay consistent.
- Python 3.13+
- Install dependencies with:
pip install -r requirements.txtpip install flask pillow python-dotenv rich numpyuv sync
find_dupe_images.pyscans the provided root directory (image extensions inIMG_EXTS, unless--include-allis supplied).- Depending on
--mode:exact: files are bucketed by size first, then hashed with SHA-256 to find byte-for-byte duplicates.ahash/phash: each image is reduced to a 64-bit perceptual fingerprint; groups are formed greedily by Hamming distance, using--thresholdas the cutoff for "looks similar enough."
- Each candidate set is moved atomically into a decision folder (
_DECISION_DUPESby default) and annotated with_manifest.tsvplus_group_meta.json. - A Flask server from
modules.web_reviewserves the grouped files and remembers what you selected via_review_state.json. - After you pick which files to keep from each group in the browser, the script copies the kept files back (with collision-safe names) and deletes the rest, cleaning up the decision folder afterward.
python find_dupe_images.py /path/to/photos--decision-folder PATH- where staged duplicate folders live (default_DECISION_DUPES).--mode {exact,ahash,phash}- pick byte-for-byte (exact) or perceptual matching (ahashorphash).--threshold N- Hamming distance cutoff for perceptual modes (0-64). Lower = stricter. Ignored for exact mode.--include-all- include every regular file instead of just common image extensions.--dry-run- stage groups and log actions without moving files or starting the review UI.--host/--port- control where Flask listens (127.0.0.1and5173by default).--no-open- skip opening the browser automatically if you prefer to navigate to the UI yourself.
- Run the script pointing at a directory tree. If you want to test without touching files, add
--dry-run. - When groups are found, the script will either resume an existing
_DECISION_DUPESfolder or stage new groups. Keep an eye on the terminal logging pane for progress messages. - A browser window opens (unless
--no-open) with thumbnails grouped by the selected mode. Use the UI to mark which files to keep and then finish the group. - Back in the terminal,
find_dupe_images.pycopies the kept files back to safe names, deletes the rest (with retries for locked files), and removes the group folder. - Repeat the browser review for each group until none remain. The script will clean up
_DECISION_DUPESwhen it becomes empty.
Auto-finish (double-click automation when exactly one image is selected) is only available in exact mode; it is automatically disabled for perceptual modes where human review matters more.
If you have a workflow where one folder should always win, the UI remembers your preferred folders and auto-selects their files in later groups (see modules/web_review.py for preference handling).