Episcan

Match TV episodes using audio transcription and episode subtitles (default) or descriptions.

Features

Audio Transcription: Uses OpenAI Whisper to transcribe video audio (loaded on-demand)
Time-Synchronized Comparison: Compares matching time segments for fair subtitle-to-subtitle comparison
Smart Subtitle Track Selection: Finds best matching video subtitle track using similarity analysis
Comprehensive Subtitle Caching: OS-appropriate cache storage with automatic management
Enhanced Subtitle Provider Support:
- Local subtitle files (--subtitles-dir)
- Embedded video subtitles (--try-subtitles)
- Subliminal library (default) - Multiple providers with authentication support
- External ID matching (TMDB/TVDB/IMDB) for better provider accuracy
Robust Subtitle Retry System:
- Configurable retry attempts with exponential backoff (default: 5 retries)
- Smart failure handling (exit/prompt/continue after retries exhausted)
Intelligent Transcription Defaults:
- Subtitle comparison: 3 minutes starting at 1 minute (skips intros)
- Description comparison: Full episode transcription
Universal Subtitle Support: Handles SRT, WebVTT, ASS/SSA, MicroDVD, MPL2, TMP, and JSON formats via pysubs2
Advanced File Management: Smart conflict resolution for renaming with file preservation
Optimal Episode Matching: Uses SBERT embeddings and Hungarian algorithm for unique assignments
GPU Acceleration: CUDA support for both Whisper and sentence transformers
Memory Efficient: Conditional model loading saves resources when not needed
Progress Tracking: ETA calculations and detailed processing feedback
Multiple APIs: TMDB (preferred) and TVDB support with external ID enrichment

Quick Start

# Default: Use subliminal for subtitle downloads with 5 retry attempts
uv run python main.py /path/to/videos

# Use local subtitle files
uv run python main.py /path/to/videos --subtitles-dir /path/to/subtitles

# Compare against episode descriptions instead
uv run python main.py /path/to/videos --use-descriptions

# Custom retry behavior: 3 retries, then continue anyway
uv run python main.py /path/to/videos --subtitle-retries 3 --on-subtitle-failure continue

# Clear cache and disable caching for fresh downloads
uv run python main.py /path/to/videos --clear-cache --no-cache

Usage Examples

Basic Usage

# Default behavior: subtitle comparison using subliminal
episcan /path/to/videos

# Use local subtitle files for comparison
episcan /path/to/videos --subtitles-dir /path/to/subtitles

# Compare against episode descriptions (full transcription)
episcan /path/to/videos --use-descriptions

# Force full episode transcription even for subtitle comparison
episcan /path/to/videos --max-duration 0

Advanced Options

# Custom models, auto-rename, verbose output
episcan /path/to/videos \
    --whisper-model medium \
    --sbert-model sentence-transformers/all-MiniLM-L6-v2 \
    --rename auto \
    --verbose

# Try embedded subtitles first, fallback to Whisper
episcan /path/to/videos --try-subtitles

# Cache management for re-processing shows
episcan /path/to/videos --clear-cache  # Clear existing cache
episcan /path/to/videos --no-cache     # Disable caching entirely

# Adjust transcription timing
episcan /path/to/videos --max-duration 300 --start-offset 30

Environment Variables

export TMDB_API_KEY="your_tmdb_key"             # Preferred
export TVDB_API_KEY="your_tvdb_key"             # Fallback

# Optional: Subtitle provider authentication (improves success rates)
export ADDIC7ED_USERNAME="your_username"        # Addic7ed account
export ADDIC7ED_PASSWORD="your_password"
export OPENSUBTITLES_USERNAME="your_username"   # OpenSubtitles account  
export OPENSUBTITLES_PASSWORD="your_password"

How It Works

Subtitle Comparison (Default)

Check Cache: Looks for previously downloaded subtitles in OS-appropriate cache directory
Download Subtitles: Uses subliminal with enhanced provider configurations and external ID matching
Enhanced Matching: Utilizes TMDB/TVDB/IMDB IDs for better provider accuracy
Retry System: Automatically retries failed downloads with exponential backoff (2s→4s→8s→16s→32s)
Cache Storage: Saves downloaded subtitles with metadata for future use
Time-Synchronized Extraction: For partial transcription, extracts matching time segments from episode subtitles
Smart Track Selection: When using --try-subtitles, finds video subtitle track with best similarity to episode content
Fair Comparison: Compares equivalent content (3min transcript vs 3min subtitle segment)
Optimal Assignment: Uses Hungarian algorithm to ensure unique episode matches

Description Comparison

Full Transcription: Transcribes entire episodes for comprehensive comparison
Metadata Matching: Compares transcripts against episode descriptions from TMDB/TVDB
Similarity Scoring: Uses cosine similarity with sentence transformers

Subtitle Sources

Priority Order:

Local Files (--subtitles-dir) - Custom subtitle directory
Subliminal (default) - Enhanced provider support with authentication:
- Multiple provider strategies with intelligent fallback
- External ID matching (TMDB/TVDB/IMDB) for better accuracy
- Optional authentication for Addic7ed and OpenSubtitles (via environment variables)
- Configurable retry system with exponential backoff (default: 5 attempts)
- OS-appropriate cache storage (macOS: ~/Library/Caches/episcan/)
- Cache hits eliminate re-downloads for repeated processing

Why Subliminal?

Enhanced Provider Support: Multiple subtitle providers with authentication for higher success rates
External ID Matching: Uses TMDB/TVDB/IMDB IDs for more accurate content matching
Robust Retry System: Automatic retries with exponential backoff for temporary failures
Smart Caching: Prevents re-downloads with persistent storage and metadata
Format Support: Handles various subtitle formats automatically
Respectful: Built-in rate limiting and intelligent provider rotation
Zero Configuration: Works without API keys, but supports authentication for better results

Command Line Options

positional arguments:
  video_dir                Directory containing video files (default: current directory)

options:
  --tvdb-api-key           TVDB API key (or use TVDB_API_KEY environment variable)
  --tmdb-api-key           TMDB API key (or use TMDB_API_KEY environment variable)
  --subtitles-dir          Directory containing subtitle files for episodes
  --use-descriptions       Use episode descriptions instead of subtitles (default: use subtitles)
  --force-tvdb             Force TVDB even if TMDB key available
  --whisper-model          Whisper model: tiny, base, small, medium, large (default: base)
  --sbert-model            Sentence transformer model (default: all-mpnet-base-v2)
  --max-duration           Transcription duration in seconds (default: 180 for subtitles, 0=full)
  --start-offset           Skip intro seconds (default: 60)
  --rename                 File renaming: none, prompt, auto (default: none)
  --try-subtitles          Try embedded subtitles first, fallback to Whisper
  --subtitle-retries       Number of retry attempts for missing subtitles (default: 5)
  --on-subtitle-failure    Action when subtitles still missing after retries: exit, prompt, continue (default: exit)
  --clear-cache            Clear subtitle cache before processing
  --no-cache               Disable subtitle caching (always download fresh)
  --verbose                Detailed processing information

Supported Subtitle Formats

Thanks to pysubs2 integration, episcan supports all major subtitle formats:

SubRip (.srt)
WebVTT (.vtt)
Advanced SubStation Alpha (.ass, .ssa)
MicroDVD (.sub)
MPL2 (.mpl)
TMP (.tmp)
JSON subtitles

Example Output

Using TMDB API
Found series: Breaking Bad (TMDB ID: 1396, Year: 2008)
  External IDs - IMDB: tt0903747, TVDB: 81189
Loading sentence transformer model (sentence-transformers/all-mpnet-base-v2) on cuda...
Found 8 video files
Detected: Breaking Bad Season 1
Fetching episode subtitles using subliminal...
  Attempting to get subtitles for 8 episodes (checking cache first)...
✓ All episodes have subtitles
Processing 8 video files...
  1/8: episode1.mkv ✓ (12.3s)
  2/8: episode2.mkv ✓ (11.8s)
  ...

Calculating optimal matches...

=== FINAL MATCHES ===
✓ episode1.mkv -> S01E01 - Pilot
  Similarity: 0.847

→ episode5.mkv -> S01E05 - Gray Matter
  Similarity: 0.723

=== FILE RENAMING ===
Planned renames:
  episode1.mkv → Breaking Bad - S01E01 - Pilot.mkv
  episode5.mkv → Breaking Bad - S01E05 - Gray Matter.mkv

Renamed 8/8 files successfully

Performance Tips

Caching Benefits: Second runs on same show are dramatically faster with subtitle cache hits
GPU Acceleration: Use CUDA-compatible hardware for 5-10x speed improvement
Memory Optimization: Use --try-subtitles to avoid Whisper loading when embedded subtitles are available
Cache Management:
- Cache persists between runs for faster re-processing
- Use --clear-cache when switching show versions or subtitle preferences
- Cache stored in OS-appropriate locations (Linux: ~/.cache/episcan/, Windows: %LOCALAPPDATA%\episcan\)
Model Selection:
- whisper-model base: Good balance of speed/accuracy (loaded on-demand)
- sbert-model all-MiniLM-L6-v2: Faster but less accurate than default
Transcription Optimization:
- Default 3-minute excerpts work well for most shows with time-synchronized comparison
- Use --max-duration 0 for very short episodes or poor matches
- Adjust --start-offset for shows with long intros
Subtitle Extraction: --try-subtitles can be much faster than transcription for videos with embedded subs

Troubleshooting

Common Issues

Missing subtitles despite retries:

# Set provider credentials for better access
export ADDIC7ED_USERNAME="your_user"
export ADDIC7ED_PASSWORD="your_pass"

# Increase retry attempts
episcan /path/to/videos --subtitle-retries 10

# Continue anyway with partial coverage
episcan /path/to/videos --on-subtitle-failure continue

# Use description comparison as fallback
episcan /path/to/videos --use-descriptions

File renaming conflicts:

# Conflicts are automatically resolved with UUID preservation
# Files are never deleted - conflicting files get UNMATCHED_ prefix
# Example: "Show - S01E01.mkv" becomes "UNMATCHED_Show - S01E01_a1b2c3d4.mkv"

Poor matching accuracy:

# Try full episode transcription
episcan /path/to/videos --max-duration 0

# Use higher quality models
episcan /path/to/videos --whisper-model medium --sbert-model sentence-transformers/all-mpnet-base-v2

Subtitle provider rate limiting:

# Automatic exponential backoff handles most rate limiting
# For persistent issues, try local subtitle files
episcan /path/to/videos --subtitles-dir /path/to/subtitles

Dependencies

subliminal>=2.1.0 - Multi-provider subtitle downloads with dynamic discovery
pysubs2>=1.6.0 - Universal subtitle parsing
platformdirs>=3.0.0 - OS-appropriate cache directories
sentence-transformers - Text similarity embeddings
openai-whisper - Audio transcription
tmdbsimple - TMDB API client
tvdb-v4-official - TVDB API client
scipy - Hungarian algorithm for optimal matching
torch - GPU acceleration support

License

MIT License - see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Episcan

Features

Quick Start

Usage Examples

Basic Usage

Advanced Options

Environment Variables

How It Works

Subtitle Comparison (Default)

Description Comparison

Subtitle Sources

Why Subliminal?

Command Line Options

Supported Subtitle Formats

Example Output

Performance Tips

Troubleshooting

Common Issues

Dependencies

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Episcan

Features

Quick Start

Usage Examples

Basic Usage

Advanced Options

Environment Variables

How It Works

Subtitle Comparison (Default)

Description Comparison

Subtitle Sources

Why Subliminal?

Command Line Options

Supported Subtitle Formats

Example Output

Performance Tips

Troubleshooting

Common Issues

Dependencies

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages