PhotoCat is a Python utility to catalog, de-duplicate, and consolidate large photo/video collections into a clean, tagged library.
It uses SQLite to track files, SHA-256 for duplicate detection, and optionally ExifTool to embed tags in file metadata.
- Catalog multiple roots (mark as
--sortedfor folder-derived tags, or--unsortedto skip tags). - Compute hashes with progress + periodic commits (safe to interrupt).
- Detect duplicates and choose canonical files.
- Propose a consolidated layout (
YYYY/YYYY-MM-DD/filename.ext). - Export plan as CSV for review in Excel.
- Execute plan:
- Default: copy files (safe, originals untouched).
- Optional: hardlink files with
--use-hardlinks(saves space, but modifying metadata also changes originals).
- Append folder-derived tags into EXIF/IPTC/XMP keywords with ExifTool.
- Quarantine redundant originals after execution (safe-delete workflow).
- Audit scanned roots with
roots.
- Python 3.11+ (tested with 3.11.9)
- ExifTool installed and in PATH (needed for writing tags)
- SQLite3 (for manual inspection, optional; DB is managed automatically)
python photocat.py --db index.db scan --sorted "F:\Photos" --unsorted "F:\PhoneDump"--sorted: folder names become tags (F:\Photos\Trips\Hawaii2019→ tagsTrips,Hawaii2019).--unsorted: files indexed but no tags derived from folders.
Check what roots have been scanned:
python photocat.py --db index.db rootspython photocat.py --db index.db hash- Commits every 500 files (safe to interrupt).
- Add
--limit 1000to test on smaller batches.
python photocat.py --db index.db propose --library-root "E:\PhotoLibrary"python photocat.py --db index.db export --csv proposed_layout.csvOpen in Excel and review columns like dest_path and tags.
- Default: copy (safe, originals untouched)
python photocat.py --db index.db execute --library-root "E:\PhotoLibrary" --verify --write-tags
- Optional: hardlink (saves space, but metadata changes affect originals)
python photocat.py --db index.db execute --library-root "E:\PhotoLibrary" --use-hardlinks
Recommended for most users: copy + write-tags.
python photocat.py --db index.db quarantine --quarantine "E:\Quarantine" --execute- Moves duplicates safely to a quarantine folder (mirroring original paths).
- Review and delete when satisfied.
- Default is copy: originals remain untouched.
- Use
--use-hardlinksonly if you fully understand the risks. - If you combine
--use-hardlinkswith--write-tags, metadata updates will also modify the originals. - Hashing is the most time-consuming step; use
--limitto experiment before full runs. - The database (
index.db) stores everything: file metadata, hashes, plans, logs, roots. Safe to inspect with DB Browser for SQLite.
python photocat.py --db index.db scan --sorted "F:\Photos" --unsorted "F:\PhoneDump"python photocat.py --db index.db hash --limit 2000python photocat.py --db index.db propose --library-root "E:\PhotoLibrary"
python photocat.py --db index.db export --csv proposed_layout.csvpython photocat.py --db index.db execute --library-root "E:\PhotoLibrary" --verify --write-tagspython photocat.py --db index.db quarantine --quarantine "E:\Quarantine" --executeCount files hashed:
sqlite3 index.db "SELECT COUNT(*) FROM media_file WHERE sha256 IS NOT NULL;"Check duplicate groups:
sqlite3 index.db "SELECT sha256, COUNT(*) FROM dup_member GROUP BY sha256 HAVING COUNT(*)>1 ORDER BY COUNT(*) DESC LIMIT 10;"📌 With this setup you can safely build a new, clean photo library, review in CSV before copying, embed tags in your consolidated copies, and eventually quarantine/delete old scattered duplicates.