Subject Frame Extractor

An AI-powered tool for extracting, analyzing, and filtering high-quality frames from video footage. Designed for dataset builders (LoRA / Dreambooth training), content creators, and researchers who need curated image sets from raw video — not just raw frame dumps.

Also includes a Photo Culling mode for scoring and rating RAW photo libraries.

Tech Stack

Layer	Technology
Runtime	Python 3.10+ (3.12 recommended)
UI	Gradio 6.x
Segmentation	SAM 3 (Segment Anything Model 3, Facebook Research)
Object detection	YOLO (80 COCO classes)
Face analysis	InsightFace (similarity matching, blink detection, head pose)
Quality scoring	NIQE (perceptual), Laplacian variance (sharpness), pHash + LPIPS (dedup)
Video / media	FFmpeg, yt-dlp
RAW processing	ExifTool (embedded preview extraction — no demosaicing)
Data	PyTorch, NumPy, OpenCV, SQLite, Pydantic
Dependency management	uv

Architecture

/
├── app.py                  # Gradio UI entry point
├── cli.py                  # Headless CLI (extract / analyze / full / status / photo)
├── core/
│   ├── config.py           # Full configuration schema (Pydantic)
│   ├── extractor.py        # Extraction strategies (keyframe, interval, scene, Nth)
│   ├── analyzer.py         # AI analysis pipeline (SAM seeding, tracking, metrics)
│   ├── tracker.py          # Subject tracking across scenes
│   ├── face.py             # InsightFace integration
│   ├── quality.py          # NIQE, sharpness, entropy, LPIPS scoring
│   ├── dedup.py            # pHash + LPIPS deduplication
│   ├── photo.py            # Photo culling: RAW ingest, scoring, XMP sidecar export
│   └── database.py         # SQLite session metadata
├── SAM3_repo/              # SAM 3 submodule
└── scripts/
    ├── linux_run_app.sh
    └── setup scripts

Pipeline: Extract frames → scene segmentation → AI seeding (face ref / text / YOLO) → SAM 3 propagation → quality metrics → interactive filtering → AR-aware crop export.

Key Features

Extraction strategies — keyframes, fixed intervals, scene-based, every Nth frame; YouTube URL support
Multi-class tracking — find and track any of 80 COCO objects via YOLO + SAM 3; open-vocabulary text descriptions
Face matching — find every frame of a specific person using InsightFace reference photo
Quality filtering — interactive sliders for sharpness, contrast, NIQE perceptual score
Smart deduplication — pHash + LPIPS removes near-identical frames per scene
AR-aware export — subject-centred crops in 1:1, 9:16, 16:9, or custom ratios
Photo culling mode — RAW preview extraction (CR2, NEF, ARW, DNG, ORF…), AI scoring, export to Lightroom/Capture One XMP sidecar star ratings

Quick Start

Prerequisites: Python 3.10+, FFmpeg in PATH, CUDA GPU recommended (~8 GB VRAM for SAM 3)

git clone --recursive https://github.com/tazztone/subject-frame-extractor.git
cd subject-frame-extractor
uv sync

# Launch Gradio UI
uv run python app.py
# → http://127.0.0.1:7860

CLI Usage

# Extract frames
uv run python cli.py extract --video video.mp4 --output ./results --nth-frame 10

# Run AI analysis (with face reference)
uv run python cli.py analyze --session ./results --video video.mp4 --face-ref person.png --resume

# Full pipeline in one command
uv run python cli.py full --video video.mp4 --output ./results --face-ref person.png

# Photo culling workflow
uv run python cli.py photo ingest --folder /path/to/raws --output ./photo_session
uv run python cli.py photo score --session ./photo_session
uv run python cli.py photo export --session ./photo_session   # → XMP sidecars

Configuration

See core/config.py for the full Pydantic schema. Key settings:

Category	Key Fields	Default
Paths	`logs_dir`, `models_dir`, `downloads_dir`	`logs`, `models`, `downloads`
Models	`face_model_name`, `tracker_model_name`	`buffalo_l`, `sam3`
Performance	`analysis_default_workers`, `cache_size`	`4`, `200`

See AGENTS.md for architecture details, critical rules, and development guidelines.

License

MIT — see LICENSE.

Technical Debt & Roadmap

This project uses a semi-automated TODO tracking system to prioritize refactors and features.

Check Current Debt: Run uv run python scripts/generate_todo_report.py to generate TODO_REPORT.md.
Top 20 Summary:
1. [High] Refactor core/pipelines.py to use modular core/managers. (In Progress)
2. [High] Implement thread-safe model access for InsightFace.
3. [Medium] Add temporal consistency smoothing between frames in MaskPropagator.
4. [Medium] Add adaptive quality thresholds based on propagation distance.
5. [Low] Support demosaicing for RAW photo ingest (currently uses previews).

Name		Name	Last commit message	Last commit date
Latest commit History 1,149 Commits
.agents/skills		.agents/skills
.github/workflows		.github/workflows
SAM3_repo @ bfbed07		SAM3_repo @ bfbed07
core		core
docs		docs
scripts		scripts
tests		tests
ui		ui
.env_example		.env_example
.geminiignore		.geminiignore
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
app.py		app.py
benchmark_comprehension.py		benchmark_comprehension.py
cli.py		cli.py
pyproject.toml		pyproject.toml
skills-lock.json		skills-lock.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Subject Frame Extractor

Tech Stack

Architecture

Key Features

Quick Start

CLI Usage

Configuration

License

Technical Debt & Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Subject Frame Extractor

Tech Stack

Architecture

Key Features

Quick Start

CLI Usage

Configuration

License

Technical Debt & Roadmap

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages