AI-Powered, Privacy-First Media Organizer with Video Analysis & Search
Brings clarity to your clutter. 100% local. 100% private. Zero data leaves your machine.
ContentOrganizer is a comprehensive, AI-powered media management system that intelligently analyzes, organizes, and searches your image and video collections. Built with privacy-first principles, all processing happens locally on your machine with zero cloud dependencies.
- 📸 Smart Image Organization - AI-powered visual analysis and automatic folder structuring
- 🎬 Video Analysis & Tagging - Automated content analysis with visual + audio processing
- 🔍 Natural Language Search - Find videos with queries like "rope climbing videos" or "beach footage"
- 📦 Intelligent Export System - Automated retrieval and organization of matching content
- 🛡️ 100% Local & Private - All AI processing runs on your machine
- ⚡ Lightning Fast Search - Redis caching + vector embeddings for instant results
git clone https://github.com/jharri34/ContentOrganizer
cd ContentOrganizer
./setup.sh# Activate environment
source venv/bin/activate
# Analyze your videos
python -m contentorganizer --video-analyze /path/to/videos
# Search with natural language
python -m contentorganizer --video-search "rope climbing videos"
# Start API server
python -m contentorganizer --api-
🔍 AI Analysis Pipeline
- Visual Analysis: Extracts frames and identifies objects, scenes, activities
- Audio Transcription: Uses Whisper to transcribe speech and identify sounds
- Semantic Tagging: Generates meaningful tags from visual + audio content
- Embedding Generation: Creates vector embeddings for similarity search
-
💾 Smart Storage
- Redis: High-speed caching for instant search results
- LanceDB: Vector database for semantic similarity matching
- Metadata: Comprehensive video information with timestamps
-
🔍 Natural Language Search
- Semantic Understanding: Finds related content (e.g., "rope" matches "shibari")
- Advanced Filtering: Filter by date, duration, tags, file type
- Relevance Scoring: Results ranked by semantic similarity
-
📦 Automated Export
- Smart Organization: Creates timestamped export folders
- Manifest Tracking: JSON files with complete export metadata
- Flexible Options: Copy files or create symlinks
# Basic search
python -m contentorganizer --video-search "rope videos"
# Advanced search with filters
python -m contentorganizer --video-search "climbing footage" \
--after "2024-01-01" \
--min-duration 60 \
--export ./my_exports
# API search
curl "http://localhost:8000/api/v1/search?q=rope%20climbing&limit=10""rope climbing videos"→ Finds climbing, rope work, and related activities"shibari sessions"→ Semantic search for rope bondage and related content"beach sunset footage"→ Coastal scenes, sunsets, ocean content"outdoor performance art"→ Nature settings with artistic performances"forest aerial work"→ Tree/forest environments with aerial activities
# Run interactive image organization
python -m contentorganizer --images
# Or use the default mode
python -m contentorganizerProcess:
- Select your image directory
- AI analyzes each image for content, objects, scenes
- Preview suggested folder structure
- Confirm and organize automatically
# Analyze all videos in a directory
python -m contentorganizer --video-analyze /path/to/videos
# Analyze with custom settings
video-analyze /path/to/videos --batch-size 5 --skip-transcription# Basic search
python -m contentorganizer --video-search "your query"
# Search with export
python -m contentorganizer --video-search "rope videos" --export ./exports
# Advanced filtering
python -m contentorganizer --video-search "climbing" \
--after "2024-01-01" \
--before "2025-01-01" \
--min-duration 30 \
--max-duration 600 \
--limit 50 \
--export ./exports
# Use symlinks instead of copying
python -m contentorganizer --video-search "shibari" --export ./exports --symlink# Move videos from source to organized destination
python -m contentorganizer --video-organize /source/path /dest/path
# Auto-scan and organize videos
python -m contentorganizer --video-auto-scan /path/to/scan
# Separate videos by file extension
python -m contentorganizer --video-separate /path/to/videos# Default settings (localhost:8000)
python -m contentorganizer --api
# Custom host and port
python -m contentorganizer --api --host 0.0.0.0 --port 9000
# Development mode with auto-reload
python -m contentorganizer --api --reloadSearch Videos:
# GET request with query parameters
GET /api/v1/search?q=rope%20climbing&limit=10&after=2024-01-01
# POST request with JSON body
POST /api/v1/search
{
"query": "rope climbing videos",
"after": "2024-01-01",
"min_duration": 30,
"limit": 50
}Export Videos:
# Export specific video IDs
POST /api/v1/export
{
"video_ids": ["video1", "video2", "video3"],
"export_path": "/path/to/export",
"query": "rope videos",
"copy_mode": true
}
# Check export status
GET /api/v1/export/{task_id}/statusInteractive Documentation:
- Open
http://localhost:8000/docsfor complete API documentation - Try endpoints directly in the browser interface
- Python: 3.8 or later
- Operating System: Linux, macOS, Windows (WSL recommended)
- Memory: 4GB RAM minimum, 8GB+ recommended
- Storage: 10GB free space for AI models and databases
- Redis: For caching and fast lookups
- FFmpeg: For video processing (auto-installed by setup script)
# Clone repository
git clone https://github.com/jharri34/ContentOrganizer
cd ContentOrganizer
# Run automated setup
./setup.shThe setup script automatically:
- ✅ Checks Python version compatibility
- ✅ Installs system dependencies (Redis, FFmpeg, etc.)
- ✅ Creates Python virtual environment
- ✅ Installs all Python packages
- ✅ Downloads AI models (Whisper, sentence-transformers, vision models)
- ✅ Sets up database directories and configuration
- ✅ Runs installation tests
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install core packages
pip install -e .Ubuntu/Debian:
sudo apt-get update
sudo apt-get install -y redis-server ffmpeg python3-opencv
sudo systemctl start redis-servermacOS:
brew install redis ffmpeg opencv
brew services start redisArch Linux:
sudo pacman -S redis ffmpeg opencv
sudo systemctl start redis# Download required models (done automatically on first run)
python -c "
import whisper
import sentence_transformers
whisper.load_model('base')
sentence_transformers.SentenceTransformer('all-MiniLM-L6-v2')
"Create config.yml in the project root:
# Redis settings
redis:
host: localhost
port: 6379
db: 0
# Database settings
database:
path: ./video_db
table_name: videos
# AI Models
models:
embedding_model: all-MiniLM-L6-v2
whisper_model: base
vision_model: google/vit-base-patch16-224
# API settings
api:
host: 0.0.0.0
port: 8000
cors_origins: ["*"]
# Export settings
export:
base_path: ./exports
max_file_size_mb: 1000
copy_mode: true.png,.jpg,.jpeg,.gif,.bmp,.tiff,.webp
.mp4,.avi,.mov,.mkv,.wmv,.flv,.webm,.m4v
When you export videos, they're automatically organized:
exports/
├── rope_climbing_videos_2025-10-12_1430/
│ ├── export_manifest.json
│ ├── mountain_climbing_session.mp4
│ ├── rock_climbing_tutorial.mov
│ └── rope_access_work.mp4
├── shibari_sessions_2025-10-12_1445/
│ ├── export_manifest.json
│ ├── basic_ties_tutorial.mp4
│ └── advanced_suspension.mov
└── beach_sunset_footage_2025-10-12_1500/
├── export_manifest.json
├── golden_hour_beach.mp4
└── ocean_waves_sunset.mov
Each export includes a comprehensive manifest file:
{
"query": "rope climbing videos",
"export_date": "2025-10-12T14:30:00Z",
"export_mode": "copy",
"filters": {
"after": "2024-01-01",
"min_duration": 30
},
"total_videos": 15,
"exported_videos": 12,
"errors": [],
"results": [
{
"file_name": "mountain_climbing_session.mp4",
"original_path": "/mnt/f/sorted-videos/mp4_videos/mountain_climbing_session.mp4",
"tags": ["rope", "climbing", "mountain", "outdoor", "sport"],
"duration": 245.6,
"file_size": 89456123,
"score": 0.94
}
]
}┌─────────────────────────────────────────────────────────────┐
│ ContentOrganizer │
├─────────────────────────────────────────────────────────────┤
│ CLI Interface │ REST API Server │
│ - Image Organization │ - FastAPI with OpenAPI │
│ - Video Operations │ - Background Tasks │
│ - Search & Export │ - CORS Support │
├─────────────────────────────────────────────────────────────┤
│ Core Processing Layer │
│ ┌───────────────┐ ┌────────────────┐ ┌────────────────┐ │
│ │ Image │ │ Video │ │ Search & │ │
│ │ Analyzer │ │ Analyzer │ │ Export │ │
│ │ - Visual AI │ │ - Frame Ext. │ │ - Embeddings │ │
│ │ - Text Ext. │ │ - Whisper │ │ - Filtering │ │
│ │ - Metadata │ │ - Tagging │ │ - Ranking │ │
│ └───────────────┘ └────────────────┘ └────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ Storage & Caching │
│ ┌───────────────┐ ┌────────────────┐ ┌────────────────┐ │
│ │ Redis │ │ LanceDB │ │ File System │ │
│ │ - Fast Cache │ │ - Vector DB │ │ - Original │ │
│ │ - Search Cache│ │ - Embeddings │ │ - Organized │ │
│ │ - Metadata │ │ - Similarity │ │ - Exports │ │
│ └───────────────┘ └────────────────┘ └────────────────┘ │
└─────────────────────────────────────────────────────────────┘
- Whisper (OpenAI): Audio transcription and sound recognition
- Sentence Transformers: Semantic embeddings for similarity search
- Vision Transformers: Visual content analysis and tagging
- LLama 3.2: Text processing and metadata generation
- OpenCV: Video frame extraction and processing
- Caching Strategy: Multi-layer caching with Redis for instant repeated searches
- Batch Processing: Efficient video analysis with configurable batch sizes
- Vector Search: Sub-second similarity search with LanceDB embeddings
- Background Tasks: Non-blocking exports with progress tracking
- Smart Filtering: Pre-filter before expensive operations
# Redis Configuration
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_DB=0
# Database Configuration
DB_PATH=./video_db
DB_TABLE=videos
# API Configuration
API_HOST=0.0.0.0
API_PORT=8000
# Model Configuration
WHISPER_MODEL=base
EMBEDDING_MODEL=all-MiniLM-L6-v2
VISION_MODEL=google/vit-base-patch16-224
# Processing Configuration
BATCH_SIZE=10
MAX_WORKERS=4
CACHE_TTL=3600You can customize AI models in config.yml:
models:
# Whisper models: tiny, base, small, medium, large
whisper_model: base
# Sentence transformer models
embedding_model: all-MiniLM-L6-v2
# Alternative: all-mpnet-base-v2 (better quality, slower)
# Vision models
vision_model: google/vit-base-patch16-224
# Alternative: microsoft/resnet-50For High-Volume Processing:
video:
analysis_batch_size: 20 # Process more videos at once
max_workers: 8 # Use more CPU cores
skip_transcription: false # Set true to speed up analysis
frame_sample_rate: 2 # Extract fewer frames per second
cache:
redis_ttl: 7200 # Longer cache retention
max_cache_size: 1000 # More cached searchesFor Resource-Constrained Systems:
video:
analysis_batch_size: 3 # Process fewer videos at once
max_workers: 2 # Use fewer CPU cores
frame_sample_rate: 0.5 # Extract fewer frames
models:
whisper_model: tiny # Smaller, faster model
embedding_model: all-MiniLM-L6-v2 # Lightweight embedding model# Redis connection failed
sudo systemctl status redis-server
sudo systemctl start redis-server
# Python dependencies conflict
pip install --upgrade pip
pip install -e . --force-reinstall
# FFmpeg not found
sudo apt-get install ffmpeg # Ubuntu/Debian
brew install ffmpeg # macOS# Model download failed
python -c "import whisper; whisper.load_model('base')"
# OpenCV issues
pip uninstall opencv-python
pip install opencv-python-headless
# Memory issues during analysis
# Reduce batch size in config.yml
analysis_batch_size: 3# No search results
# Check if videos have been analyzed
python -m contentorganizer --video-search "test" --limit 1
# Export permission denied
chmod 755 /export/directory
chown $USER:$USER /export/directory
# API server won't start
# Check if port is available
netstat -tulpn | grep :8000Enable detailed logging:
# Set debug environment
export PYTHONPATH=/path/to/ContentOrganizer/src
export LOG_LEVEL=DEBUG
# Run with verbose output
python -m contentorganizer --video-search "test" --limit 1 -vMonitor system performance:
# Check Redis memory usage
redis-cli info memory
# Monitor API server
curl http://localhost:8000/api/v1/stats
# Check database size
du -sh ./video_db/We welcome contributions! Here's how to get started:
# Clone and setup development environment
git clone https://github.com/jharri34/ContentOrganizer
cd ContentOrganizer
./setup.sh
# Install development dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/
# Code formatting
black src/
flake8 src/ContentOrganizer/
├── src/contentorganizer/ # Main package
│ ├── main.py # CLI entry point
│ ├── video_analyzer.py # Video analysis engine
│ ├── video_search.py # Search functionality
│ ├── video_api.py # REST API server
│ ├── image_data_processing.py # Image analysis
│ └── sortphoto/ # File utilities
├── tests/ # Test suite
├── docs/ # Documentation
├── config.yml # Configuration
├── setup.sh # Installation script
└── demo.py # Interactive demo
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Write tests for your feature
- Implement the feature following existing patterns
- Update documentation if needed
- Submit a pull request
- 📖 API Documentation (when server is running)
- 🎥 Video Tutorial (coming soon)
- 🔧 Advanced Configuration (coming soon)
- Nexa AI - Local AI model management
- Whisper - Audio transcription
- LanceDB - Vector database
- Redis - In-memory data structure store
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI for Whisper transcription models
- Hugging Face for transformer models and sentence-transformers
- Redis Labs for the Redis caching system
- LanceDB team for the vector database
- FastAPI team for the excellent web framework
- All contributors and users who make this project better
Made with ❤️ for privacy-conscious users who want intelligent media organization without compromising their data.