🧠 ContentOrganizer

AI-Powered, Privacy-First Media Organizer with Video Analysis & Search

Brings clarity to your clutter. 100% local. 100% private. Zero data leaves your machine.

🎯 What is ContentOrganizer?

ContentOrganizer is a comprehensive, AI-powered media management system that intelligently analyzes, organizes, and searches your image and video collections. Built with privacy-first principles, all processing happens locally on your machine with zero cloud dependencies.

🌟 Key Capabilities

📸 Smart Image Organization - AI-powered visual analysis and automatic folder structuring
🎬 Video Analysis & Tagging - Automated content analysis with visual + audio processing
🔍 Natural Language Search - Find videos with queries like "rope climbing videos" or "beach footage"
📦 Intelligent Export System - Automated retrieval and organization of matching content
🛡️ 100% Local & Private - All AI processing runs on your machine
⚡ Lightning Fast Search - Redis caching + vector embeddings for instant results

🚀 Quick Start

One-Command Installation

git clone https://github.com/jharri34/ContentOrganizer
cd ContentOrganizer
./setup.sh

Immediate Usage

# Activate environment
source venv/bin/activate

# Analyze your videos
python -m contentorganizer --video-analyze /path/to/videos

# Search with natural language
python -m contentorganizer --video-search "rope climbing videos"

# Start API server
python -m contentorganizer --api

🎬 Video Analysis & Search System

How It Works

🔍 AI Analysis Pipeline
- Visual Analysis: Extracts frames and identifies objects, scenes, activities
- Audio Transcription: Uses Whisper to transcribe speech and identify sounds
- Semantic Tagging: Generates meaningful tags from visual + audio content
- Embedding Generation: Creates vector embeddings for similarity search
💾 Smart Storage
- Redis: High-speed caching for instant search results
- LanceDB: Vector database for semantic similarity matching
- Metadata: Comprehensive video information with timestamps
🔍 Natural Language Search
- Semantic Understanding: Finds related content (e.g., "rope" matches "shibari")
- Advanced Filtering: Filter by date, duration, tags, file type
- Relevance Scoring: Results ranked by semantic similarity
📦 Automated Export
- Smart Organization: Creates timestamped export folders
- Manifest Tracking: JSON files with complete export metadata
- Flexible Options: Copy files or create symlinks

Search Examples

# Basic search
python -m contentorganizer --video-search "rope videos"

# Advanced search with filters
python -m contentorganizer --video-search "climbing footage" \
  --after "2024-01-01" \
  --min-duration 60 \
  --export ./my_exports

# API search
curl "http://localhost:8000/api/v1/search?q=rope%20climbing&limit=10"

Natural Language Queries

"rope climbing videos" → Finds climbing, rope work, and related activities
"shibari sessions" → Semantic search for rope bondage and related content
"beach sunset footage" → Coastal scenes, sunsets, ocean content
"outdoor performance art" → Nature settings with artistic performances
"forest aerial work" → Tree/forest environments with aerial activities

📖 Complete Usage Guide

Image Organization (Classic Mode)

# Run interactive image organization
python -m contentorganizer --images

# Or use the default mode
python -m contentorganizer

Process:

Select your image directory
AI analyzes each image for content, objects, scenes
Preview suggested folder structure
Confirm and organize automatically

Video Operations

Analyze Videos

# Analyze all videos in a directory
python -m contentorganizer --video-analyze /path/to/videos

# Analyze with custom settings
video-analyze /path/to/videos --batch-size 5 --skip-transcription

Search & Export Videos

# Basic search
python -m contentorganizer --video-search "your query"

# Search with export
python -m contentorganizer --video-search "rope videos" --export ./exports

# Advanced filtering
python -m contentorganizer --video-search "climbing" \
  --after "2024-01-01" \
  --before "2025-01-01" \
  --min-duration 30 \
  --max-duration 600 \
  --limit 50 \
  --export ./exports

# Use symlinks instead of copying
python -m contentorganizer --video-search "shibari" --export ./exports --symlink

Organize Video Files

# Move videos from source to organized destination
python -m contentorganizer --video-organize /source/path /dest/path

# Auto-scan and organize videos
python -m contentorganizer --video-auto-scan /path/to/scan

# Separate videos by file extension
python -m contentorganizer --video-separate /path/to/videos

REST API Server

Start Server

# Default settings (localhost:8000)
python -m contentorganizer --api

# Custom host and port
python -m contentorganizer --api --host 0.0.0.0 --port 9000

# Development mode with auto-reload
python -m contentorganizer --api --reload

API Endpoints

Search Videos:

# GET request with query parameters
GET /api/v1/search?q=rope%20climbing&limit=10&after=2024-01-01

# POST request with JSON body
POST /api/v1/search
{
  "query": "rope climbing videos",
  "after": "2024-01-01",
  "min_duration": 30,
  "limit": 50
}

Export Videos:

# Export specific video IDs
POST /api/v1/export
{
  "video_ids": ["video1", "video2", "video3"],
  "export_path": "/path/to/export",
  "query": "rope videos",
  "copy_mode": true
}

# Check export status
GET /api/v1/export/{task_id}/status

Interactive Documentation:

Open http://localhost:8000/docs for complete API documentation
Try endpoints directly in the browser interface

🛠️ Installation & Setup

System Requirements

Python: 3.8 or later
Operating System: Linux, macOS, Windows (WSL recommended)
Memory: 4GB RAM minimum, 8GB+ recommended
Storage: 10GB free space for AI models and databases
Redis: For caching and fast lookups
FFmpeg: For video processing (auto-installed by setup script)

Automated Installation

# Clone repository
git clone https://github.com/jharri34/ContentOrganizer
cd ContentOrganizer

# Run automated setup
./setup.sh

The setup script automatically:

✅ Checks Python version compatibility
✅ Installs system dependencies (Redis, FFmpeg, etc.)
✅ Creates Python virtual environment
✅ Installs all Python packages
✅ Downloads AI models (Whisper, sentence-transformers, vision models)
✅ Sets up database directories and configuration
✅ Runs installation tests

Manual Installation

1. Core Dependencies

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install core packages
pip install -e .

2. System Dependencies

Ubuntu/Debian:

sudo apt-get update
sudo apt-get install -y redis-server ffmpeg python3-opencv
sudo systemctl start redis-server

macOS:

brew install redis ffmpeg opencv
brew services start redis

Arch Linux:

sudo pacman -S redis ffmpeg opencv
sudo systemctl start redis

3. AI Models

# Download required models (done automatically on first run)
python -c "
import whisper
import sentence_transformers
whisper.load_model('base')
sentence_transformers.SentenceTransformer('all-MiniLM-L6-v2')
"

Configuration

Create config.yml in the project root:

# Redis settings
redis:
  host: localhost
  port: 6379
  db: 0

# Database settings  
database:
  path: ./video_db
  table_name: videos

# AI Models
models:
  embedding_model: all-MiniLM-L6-v2
  whisper_model: base
  vision_model: google/vit-base-patch16-224

# API settings
api:
  host: 0.0.0.0
  port: 8000
  cors_origins: ["*"]

# Export settings
export:
  base_path: ./exports
  max_file_size_mb: 1000
  copy_mode: true

� File Organization & Export Structure

Supported File Types

Images

.png, .jpg, .jpeg, .gif, .bmp, .tiff, .webp

Videos

.mp4, .avi, .mov, .mkv, .wmv, .flv, .webm, .m4v

Export Folder Structure

When you export videos, they're automatically organized:

exports/
├── rope_climbing_videos_2025-10-12_1430/
│   ├── export_manifest.json
│   ├── mountain_climbing_session.mp4
│   ├── rock_climbing_tutorial.mov
│   └── rope_access_work.mp4
├── shibari_sessions_2025-10-12_1445/
│   ├── export_manifest.json
│   ├── basic_ties_tutorial.mp4
│   └── advanced_suspension.mov
└── beach_sunset_footage_2025-10-12_1500/
    ├── export_manifest.json
    ├── golden_hour_beach.mp4
    └── ocean_waves_sunset.mov

Export Manifest

Each export includes a comprehensive manifest file:

{
  "query": "rope climbing videos",
  "export_date": "2025-10-12T14:30:00Z",
  "export_mode": "copy",
  "filters": {
    "after": "2024-01-01",
    "min_duration": 30
  },
  "total_videos": 15,
  "exported_videos": 12,
  "errors": [],
  "results": [
    {
      "file_name": "mountain_climbing_session.mp4",
      "original_path": "/mnt/f/sorted-videos/mp4_videos/mountain_climbing_session.mp4",
      "tags": ["rope", "climbing", "mountain", "outdoor", "sport"],
      "duration": 245.6,
      "file_size": 89456123,
      "score": 0.94
    }
  ]
}

🏗️ Architecture & Technical Details

System Architecture

┌─────────────────────────────────────────────────────────────┐
│                     ContentOrganizer                        │
├─────────────────────────────────────────────────────────────┤
│  CLI Interface                │  REST API Server            │
│  - Image Organization         │  - FastAPI with OpenAPI     │
│  - Video Operations           │  - Background Tasks         │
│  - Search & Export            │  - CORS Support             │
├─────────────────────────────────────────────────────────────┤
│                    Core Processing Layer                    │
│  ┌───────────────┐  ┌────────────────┐  ┌────────────────┐ │
│  │ Image         │  │ Video          │  │ Search &       │ │
│  │ Analyzer      │  │ Analyzer       │  │ Export         │ │
│  │ - Visual AI   │  │ - Frame Ext.   │  │ - Embeddings   │ │
│  │ - Text Ext.   │  │ - Whisper      │  │ - Filtering    │ │
│  │ - Metadata    │  │ - Tagging      │  │ - Ranking      │ │
│  └───────────────┘  └────────────────┘  └────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│                     Storage & Caching                       │
│  ┌───────────────┐  ┌────────────────┐  ┌────────────────┐ │
│  │ Redis         │  │ LanceDB        │  │ File System    │ │
│  │ - Fast Cache  │  │ - Vector DB    │  │ - Original     │ │
│  │ - Search Cache│  │ - Embeddings   │  │ - Organized    │ │
│  │ - Metadata    │  │ - Similarity   │  │ - Exports      │ │
│  └───────────────┘  └────────────────┘  └────────────────┘ │
└─────────────────────────────────────────────────────────────┘

AI Models & Processing

Whisper (OpenAI): Audio transcription and sound recognition
Sentence Transformers: Semantic embeddings for similarity search
Vision Transformers: Visual content analysis and tagging
LLama 3.2: Text processing and metadata generation
OpenCV: Video frame extraction and processing

Performance Optimizations

Caching Strategy: Multi-layer caching with Redis for instant repeated searches
Batch Processing: Efficient video analysis with configurable batch sizes
Vector Search: Sub-second similarity search with LanceDB embeddings
Background Tasks: Non-blocking exports with progress tracking
Smart Filtering: Pre-filter before expensive operations

🔧 Advanced Configuration

Environment Variables

# Redis Configuration
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_DB=0

# Database Configuration  
DB_PATH=./video_db
DB_TABLE=videos

# API Configuration
API_HOST=0.0.0.0
API_PORT=8000

# Model Configuration
WHISPER_MODEL=base
EMBEDDING_MODEL=all-MiniLM-L6-v2
VISION_MODEL=google/vit-base-patch16-224

# Processing Configuration
BATCH_SIZE=10
MAX_WORKERS=4
CACHE_TTL=3600

Custom Model Configuration

You can customize AI models in config.yml:

models:
  # Whisper models: tiny, base, small, medium, large
  whisper_model: base
  
  # Sentence transformer models
  embedding_model: all-MiniLM-L6-v2
  # Alternative: all-mpnet-base-v2 (better quality, slower)
  
  # Vision models
  vision_model: google/vit-base-patch16-224
  # Alternative: microsoft/resnet-50

Performance Tuning

For High-Volume Processing:

video:
  analysis_batch_size: 20      # Process more videos at once
  max_workers: 8               # Use more CPU cores
  skip_transcription: false    # Set true to speed up analysis
  frame_sample_rate: 2         # Extract fewer frames per second

cache:
  redis_ttl: 7200             # Longer cache retention
  max_cache_size: 1000        # More cached searches

For Resource-Constrained Systems:

video:
  analysis_batch_size: 3       # Process fewer videos at once
  max_workers: 2               # Use fewer CPU cores
  frame_sample_rate: 0.5       # Extract fewer frames
  
models:
  whisper_model: tiny          # Smaller, faster model
  embedding_model: all-MiniLM-L6-v2  # Lightweight embedding model

🚨 Troubleshooting

Common Issues

Installation Problems

# Redis connection failed
sudo systemctl status redis-server
sudo systemctl start redis-server

# Python dependencies conflict
pip install --upgrade pip
pip install -e . --force-reinstall

# FFmpeg not found
sudo apt-get install ffmpeg  # Ubuntu/Debian
brew install ffmpeg          # macOS

Video Analysis Issues

# Model download failed
python -c "import whisper; whisper.load_model('base')"

# OpenCV issues
pip uninstall opencv-python
pip install opencv-python-headless

# Memory issues during analysis
# Reduce batch size in config.yml
analysis_batch_size: 3

Search & Export Problems

# No search results
# Check if videos have been analyzed
python -m contentorganizer --video-search "test" --limit 1

# Export permission denied
chmod 755 /export/directory
chown $USER:$USER /export/directory

# API server won't start
# Check if port is available
netstat -tulpn | grep :8000

Debug Mode

Enable detailed logging:

# Set debug environment
export PYTHONPATH=/path/to/ContentOrganizer/src
export LOG_LEVEL=DEBUG

# Run with verbose output
python -m contentorganizer --video-search "test" --limit 1 -v

Performance Monitoring

Monitor system performance:

# Check Redis memory usage
redis-cli info memory

# Monitor API server
curl http://localhost:8000/api/v1/stats

# Check database size
du -sh ./video_db/

🤝 Contributing

We welcome contributions! Here's how to get started:

Development Setup

# Clone and setup development environment
git clone https://github.com/jharri34/ContentOrganizer
cd ContentOrganizer
./setup.sh

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/

# Code formatting
black src/
flake8 src/

Project Structure

ContentOrganizer/
├── src/contentorganizer/          # Main package
│   ├── main.py                   # CLI entry point
│   ├── video_analyzer.py         # Video analysis engine  
│   ├── video_search.py           # Search functionality
│   ├── video_api.py              # REST API server
│   ├── image_data_processing.py  # Image analysis
│   └── sortphoto/                # File utilities
├── tests/                        # Test suite
├── docs/                         # Documentation
├── config.yml                    # Configuration
├── setup.sh                      # Installation script
└── demo.py                       # Interactive demo

Adding Features

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Write tests for your feature
Implement the feature following existing patterns
Update documentation if needed
Submit a pull request

📚 Additional Resources

Documentation

📖 API Documentation (when server is running)
🎥 Video Tutorial (coming soon)
🔧 Advanced Configuration (coming soon)

Community & Support

Related Projects

Nexa AI - Local AI model management
Whisper - Audio transcription
LanceDB - Vector database
Redis - In-memory data structure store

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

OpenAI for Whisper transcription models
Hugging Face for transformer models and sentence-transformers
Redis Labs for the Redis caching system
LanceDB team for the vector database
FastAPI team for the excellent web framework
All contributors and users who make this project better

Made with ❤️ for privacy-conscious users who want intelligent media organization without compromising their data.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
docs		docs
src/contentorganizer		src/contentorganizer
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yml		config.yml
demo.py		demo.py
pyproject.toml		pyproject.toml
quick_setup.sh		quick_setup.sh
requirements.txt		requirements.txt
setup.py		setup.py
setup.sh		setup.sh
video_cli.py		video_cli.py

License

jharri34/ContentOrganizer

Folders and files

Latest commit

History

Repository files navigation