Skip to content

tazztone/subject-frame-extractor

Repository files navigation

Subject Frame Extractor

Python PyTorch Gradio License: MIT

An AI-powered tool for extracting, analyzing, and filtering high-quality frames from video footage. Designed for dataset builders (LoRA / Dreambooth training), content creators, and researchers who need curated image sets from raw video — not just raw frame dumps.

Also includes a Photo Culling mode for scoring and rating RAW photo libraries.

Tech Stack

Layer Technology
Runtime Python 3.10+ (3.12 recommended)
UI Gradio 6.x
Segmentation SAM 3 (Segment Anything Model 3, Facebook Research)
Object detection YOLO (80 COCO classes)
Face analysis InsightFace (similarity matching, blink detection, head pose)
Quality scoring NIQE (perceptual), Laplacian variance (sharpness), pHash + LPIPS (dedup)
Video / media FFmpeg, yt-dlp
RAW processing ExifTool (embedded preview extraction — no demosaicing)
Data PyTorch, NumPy, OpenCV, SQLite, Pydantic
Dependency management uv

Architecture

/
├── app.py                  # Gradio UI entry point
├── cli.py                  # Headless CLI (extract / analyze / full / status / photo)
├── core/
│   ├── config.py           # Full configuration schema (Pydantic)
│   ├── extractor.py        # Extraction strategies (keyframe, interval, scene, Nth)
│   ├── analyzer.py         # AI analysis pipeline (SAM seeding, tracking, metrics)
│   ├── tracker.py          # Subject tracking across scenes
│   ├── face.py             # InsightFace integration
│   ├── quality.py          # NIQE, sharpness, entropy, LPIPS scoring
│   ├── dedup.py            # pHash + LPIPS deduplication
│   ├── photo.py            # Photo culling: RAW ingest, scoring, XMP sidecar export
│   └── database.py         # SQLite session metadata
├── SAM3_repo/              # SAM 3 submodule
└── scripts/
    ├── linux_run_app.sh
    └── setup scripts

Pipeline: Extract frames → scene segmentation → AI seeding (face ref / text / YOLO) → SAM 3 propagation → quality metrics → interactive filtering → AR-aware crop export.

Key Features

  • Extraction strategies — keyframes, fixed intervals, scene-based, every Nth frame; YouTube URL support
  • Multi-class tracking — find and track any of 80 COCO objects via YOLO + SAM 3; open-vocabulary text descriptions
  • Face matching — find every frame of a specific person using InsightFace reference photo
  • Quality filtering — interactive sliders for sharpness, contrast, NIQE perceptual score
  • Smart deduplication — pHash + LPIPS removes near-identical frames per scene
  • AR-aware export — subject-centred crops in 1:1, 9:16, 16:9, or custom ratios
  • Photo culling mode — RAW preview extraction (CR2, NEF, ARW, DNG, ORF…), AI scoring, export to Lightroom/Capture One XMP sidecar star ratings

Quick Start

Prerequisites: Python 3.10+, FFmpeg in PATH, CUDA GPU recommended (~8 GB VRAM for SAM 3)

git clone --recursive https://github.com/tazztone/subject-frame-extractor.git
cd subject-frame-extractor
uv sync

# Launch Gradio UI
uv run python app.py
# → http://127.0.0.1:7860

CLI Usage

# Extract frames
uv run python cli.py extract --video video.mp4 --output ./results --nth-frame 10

# Run AI analysis (with face reference)
uv run python cli.py analyze --session ./results --video video.mp4 --face-ref person.png --resume

# Full pipeline in one command
uv run python cli.py full --video video.mp4 --output ./results --face-ref person.png

# Photo culling workflow
uv run python cli.py photo ingest --folder /path/to/raws --output ./photo_session
uv run python cli.py photo score --session ./photo_session
uv run python cli.py photo export --session ./photo_session   # → XMP sidecars

Configuration

See core/config.py for the full Pydantic schema. Key settings:

Category Key Fields Default
Paths logs_dir, models_dir, downloads_dir logs, models, downloads
Models face_model_name, tracker_model_name buffalo_l, sam3
Performance analysis_default_workers, cache_size 4, 200

See AGENTS.md for architecture details, critical rules, and development guidelines.

License

MIT — see LICENSE.


Technical Debt & Roadmap

This project uses a semi-automated TODO tracking system to prioritize refactors and features.

  • Check Current Debt: Run uv run python scripts/generate_todo_report.py to generate TODO_REPORT.md.
  • Top 20 Summary:
    1. [High] Refactor core/pipelines.py to use modular core/managers. (In Progress)
    2. [High] Implement thread-safe model access for InsightFace.
    3. [Medium] Add temporal consistency smoothing between frames in MaskPropagator.
    4. [Medium] Add adaptive quality thresholds based on propagation distance.
    5. [Low] Support demosaicing for RAW photo ingest (currently uses previews).

About

extracting, analyzing, and filtering frames from video files or YouTube links

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages