An AI-powered powerhouse for extracting, analyzing, and filtering high-quality frames from video.
Designed for content creators, dataset builders (LoRA/Dreambooth), and researchers. This tool bridges the gap between raw video footage and curated, high-quality image datasets using state-of-the-art AI.
Traditional frame extraction is noisy. Subject Frame Extractor uses advanced segmentation and quality heuristics to ensure you only keep the frames that matter.
- Intelligent Extraction: Beyond simple intervals—use scene detection and keyframe awareness.
- Multi-Class Tracking: Automatically find and track any of 80 COCO objects (people, cars, animals, etc.) using YOLO26 and SAM3.
- Scene-Level Deduplication: Automated extraction of the single best frame per shot.
- Quality First: Filter by sharpness, contrast, and perceptual quality (NIQE).
- Face Matching: Find every frame of a specific person using InsightFace.
- Extraction Strategies: Keyframes, fixed intervals, scene-based, or every Nth frame.
- YouTube Integration: Direct URL processing with resolution control.
- Scene Intelligence: Automatically segments video into shots to optimize analysis.
- SAM 3 Integration: Precise subject segmentation and tracking across scenes.
- Open-Vocabulary Detection: Describe what you want to find (e.g., "a golden retriever") and let the AI find it.
- Face Analysis: Similarity matching, blink detection, and head pose estimation (yaw/pitch/roll).
- Perceptual Metrics: Real-time quality scoring to surface the "best" frames automatically.
- Interactive Sliders: Filter thousands of frames in real-time based on AI-calculated metrics.
- Smart Deduplication: Uses pHash and LPIPS to remove near-identical frames.
- AR-Aware Cropping: Export subject-centered crops in 1:1, 9:16, 16:9, or custom ratios.
- RAW Support: Extract high-resolution embedded previews from RAW files (CR2, NEF, ARW, DNG, ORF, etc.) using ExifTool. No demosaicing required for ultra-fast ingestion.
- Quality Culling: AI-powered scoring for focus, composition, and technical quality.
- Sharpness: Laplacian variance based edge-detection to identify focused shots.
- Naturalness (NIQE): Perceptual quality score that measures how "natural" an image looks without needing a reference.
- Information (Entropy): Measures the complexity/detail density of the image.
- Face Prominence: Uses InsightFace to detect faces and score them based on confidence and size.
- Lightroom/C1 Interop: Export internal scores as 1-5 star ratings directly to non-destructive XMP sidecars.
- Segmentation: Segment Anything Model 3 (SAM 3)
- Face Analysis: InsightFace
- UI Framework: Gradio 6.x
- Data Science: PyTorch, NumPy, OpenCV, Pydantic
- Media Handling: FFmpeg, yt-dlp
- Database: SQLite (for lightning-fast metadata filtering)
- Python 3.10+ (3.12 recommended)
- FFmpeg installed and in your system PATH.
- CUDA-capable GPU (highly recommended; ~8GB VRAM for SAM 3).
We highly recommend uv for its speed and reliability.
-
Clone with Submodules
git clone --recursive https://github.com/tazztone/subject-frame-extractor.git cd subject-frame-extractorNote: Use
git submodule update --init --recursiveif already cloned. -
Sync Environment
uv sync
-
Launch
uv run python app.py
Alternatively, on Linux, use:
./scripts/linux_run_app.shAccess the UI at
http://127.0.0.1:7860.
python -m venv venv- Activate:
. venv/bin/activate(Linux/Mac) or. venv\Scripts\activate.ps1(Windows) pip install -r requirements.txtpip install -e SAM3_repo
The application provides a powerful CLI for automated extraction, analysis, and headless operation. Always use uv run to ensure the correct environment.
Extract thumbnails and detect scenes from a video:
uv run python cli.py extract --video path/to/video.mp4 --output ./results --nth-frame 10- Caching: Subsequent runs with identical settings will skip automatically using fingerprints.
- Force: Use
--forceto re-extract even if a fingerprint match is found. - Clean: Use
--cleanto delete the output directory before starting.
Run the full AI pipeline (seeding, tracking, metrics) on an existing extraction:
uv run python cli.py analyze --session ./results --video path/to/video.mp4 --face-ref person.png --resume- Resume: Use
--resumeto skip already completed scenes (usesprogress.json).
Run extraction and analysis in one command:
uv run python cli.py full --video video.mp4 --output ./results --face-ref person.pngCheck the progress and metadata of a session:
uv run python cli.py status --session ./resultsProcess image folders and sync ratings to sidecars:
# 1. Ingest folder (crawls images, extracts RAW previews)
uv run python cli.py photo ingest --folder /path/to/raws --output ./photo_session
# 2. Score photos (sharpness, naturalness, face prominence, etc.)
uv run python cli.py photo score --session ./photo_session
# 3. Export XMP sidecars (Ratings & Labels compatible with Lightroom)
uv run python cli.py photo export --session ./photo_session- Source: Upload a video or paste a YouTube URL. Choose your extraction resolution.
- Extract: Run the extraction. The tool identifies scenes and generates thumbnails.
- Define Subject:
- Hybrid Seeding: Combine face reference with text descriptions and YOLO mask prompts for robust initialization.
- By Face: Upload a reference photo for similarity matching.
- By Text: Enter a description (e.g., "cat", "person in red").
- Auto: Let the AI select the most prominent subject.
- Analyze: Review "Scene Seeds". Run Propagation to track subjects through the video.
- Filter: Use sliders in the Metrics & Filtering tab to curate your dataset.
- Export: Select your crop settings and aspect ratio, then hit Export.
For detailed information on architecture, critical rules (Agent Memory), development workflows, and testing, please refer to the AGENTS.md.
See core/config.py for the full schema.
| Category | Key Fields | Default |
|---|---|---|
| Paths | logs_dir, models_dir, downloads_dir |
logs, models, downloads |
| Models | face_model_name, tracker_model_name |
buffalo_l, sam3 |
| Performance | analysis_default_workers, cache_size |
4, 200 |
| Quality | quality_weights_* |
(Variable Weights) |
MIT License. See LICENSE for details.