Skip to content

ContentOrganizer is a local-first, AI-powered tool that intelligently analyzes and organizes your image files into meaningful folders based on visual content—securely, privately, and with zero cloud dependencies.

License

Notifications You must be signed in to change notification settings

jharri34/ContentOrganizer

Repository files navigation

🧠 ContentOrganizer

AI-Powered, Privacy-First Media Organizer with Video Analysis & Search

Brings clarity to your clutter. 100% local. 100% private. Zero data leaves your machine.

Python 3.8+ License: MIT Redis FastAPI


🎯 What is ContentOrganizer?

ContentOrganizer is a comprehensive, AI-powered media management system that intelligently analyzes, organizes, and searches your image and video collections. Built with privacy-first principles, all processing happens locally on your machine with zero cloud dependencies.

🌟 Key Capabilities

  • 📸 Smart Image Organization - AI-powered visual analysis and automatic folder structuring
  • 🎬 Video Analysis & Tagging - Automated content analysis with visual + audio processing
  • 🔍 Natural Language Search - Find videos with queries like "rope climbing videos" or "beach footage"
  • 📦 Intelligent Export System - Automated retrieval and organization of matching content
  • 🛡️ 100% Local & Private - All AI processing runs on your machine
  • Lightning Fast Search - Redis caching + vector embeddings for instant results

🚀 Quick Start

One-Command Installation

git clone https://github.com/jharri34/ContentOrganizer
cd ContentOrganizer
./setup.sh

Immediate Usage

# Activate environment
source venv/bin/activate

# Analyze your videos
python -m contentorganizer --video-analyze /path/to/videos

# Search with natural language
python -m contentorganizer --video-search "rope climbing videos"

# Start API server
python -m contentorganizer --api

🎬 Video Analysis & Search System

How It Works

  1. 🔍 AI Analysis Pipeline

    • Visual Analysis: Extracts frames and identifies objects, scenes, activities
    • Audio Transcription: Uses Whisper to transcribe speech and identify sounds
    • Semantic Tagging: Generates meaningful tags from visual + audio content
    • Embedding Generation: Creates vector embeddings for similarity search
  2. 💾 Smart Storage

    • Redis: High-speed caching for instant search results
    • LanceDB: Vector database for semantic similarity matching
    • Metadata: Comprehensive video information with timestamps
  3. 🔍 Natural Language Search

    • Semantic Understanding: Finds related content (e.g., "rope" matches "shibari")
    • Advanced Filtering: Filter by date, duration, tags, file type
    • Relevance Scoring: Results ranked by semantic similarity
  4. 📦 Automated Export

    • Smart Organization: Creates timestamped export folders
    • Manifest Tracking: JSON files with complete export metadata
    • Flexible Options: Copy files or create symlinks

Search Examples

# Basic search
python -m contentorganizer --video-search "rope videos"

# Advanced search with filters
python -m contentorganizer --video-search "climbing footage" \
  --after "2024-01-01" \
  --min-duration 60 \
  --export ./my_exports

# API search
curl "http://localhost:8000/api/v1/search?q=rope%20climbing&limit=10"

Natural Language Queries

  • "rope climbing videos" → Finds climbing, rope work, and related activities
  • "shibari sessions" → Semantic search for rope bondage and related content
  • "beach sunset footage" → Coastal scenes, sunsets, ocean content
  • "outdoor performance art" → Nature settings with artistic performances
  • "forest aerial work" → Tree/forest environments with aerial activities

📖 Complete Usage Guide

Image Organization (Classic Mode)

# Run interactive image organization
python -m contentorganizer --images

# Or use the default mode
python -m contentorganizer

Process:

  1. Select your image directory
  2. AI analyzes each image for content, objects, scenes
  3. Preview suggested folder structure
  4. Confirm and organize automatically

Video Operations

Analyze Videos

# Analyze all videos in a directory
python -m contentorganizer --video-analyze /path/to/videos

# Analyze with custom settings
video-analyze /path/to/videos --batch-size 5 --skip-transcription

Search & Export Videos

# Basic search
python -m contentorganizer --video-search "your query"

# Search with export
python -m contentorganizer --video-search "rope videos" --export ./exports

# Advanced filtering
python -m contentorganizer --video-search "climbing" \
  --after "2024-01-01" \
  --before "2025-01-01" \
  --min-duration 30 \
  --max-duration 600 \
  --limit 50 \
  --export ./exports

# Use symlinks instead of copying
python -m contentorganizer --video-search "shibari" --export ./exports --symlink

Organize Video Files

# Move videos from source to organized destination
python -m contentorganizer --video-organize /source/path /dest/path

# Auto-scan and organize videos
python -m contentorganizer --video-auto-scan /path/to/scan

# Separate videos by file extension
python -m contentorganizer --video-separate /path/to/videos

REST API Server

Start Server

# Default settings (localhost:8000)
python -m contentorganizer --api

# Custom host and port
python -m contentorganizer --api --host 0.0.0.0 --port 9000

# Development mode with auto-reload
python -m contentorganizer --api --reload

API Endpoints

Search Videos:

# GET request with query parameters
GET /api/v1/search?q=rope%20climbing&limit=10&after=2024-01-01

# POST request with JSON body
POST /api/v1/search
{
  "query": "rope climbing videos",
  "after": "2024-01-01",
  "min_duration": 30,
  "limit": 50
}

Export Videos:

# Export specific video IDs
POST /api/v1/export
{
  "video_ids": ["video1", "video2", "video3"],
  "export_path": "/path/to/export",
  "query": "rope videos",
  "copy_mode": true
}

# Check export status
GET /api/v1/export/{task_id}/status

Interactive Documentation:

  • Open http://localhost:8000/docs for complete API documentation
  • Try endpoints directly in the browser interface

🛠️ Installation & Setup

System Requirements

  • Python: 3.8 or later
  • Operating System: Linux, macOS, Windows (WSL recommended)
  • Memory: 4GB RAM minimum, 8GB+ recommended
  • Storage: 10GB free space for AI models and databases
  • Redis: For caching and fast lookups
  • FFmpeg: For video processing (auto-installed by setup script)

Automated Installation

# Clone repository
git clone https://github.com/jharri34/ContentOrganizer
cd ContentOrganizer

# Run automated setup
./setup.sh

The setup script automatically:

  • ✅ Checks Python version compatibility
  • ✅ Installs system dependencies (Redis, FFmpeg, etc.)
  • ✅ Creates Python virtual environment
  • ✅ Installs all Python packages
  • ✅ Downloads AI models (Whisper, sentence-transformers, vision models)
  • ✅ Sets up database directories and configuration
  • ✅ Runs installation tests

Manual Installation

1. Core Dependencies

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install core packages
pip install -e .

2. System Dependencies

Ubuntu/Debian:

sudo apt-get update
sudo apt-get install -y redis-server ffmpeg python3-opencv
sudo systemctl start redis-server

macOS:

brew install redis ffmpeg opencv
brew services start redis

Arch Linux:

sudo pacman -S redis ffmpeg opencv
sudo systemctl start redis

3. AI Models

# Download required models (done automatically on first run)
python -c "
import whisper
import sentence_transformers
whisper.load_model('base')
sentence_transformers.SentenceTransformer('all-MiniLM-L6-v2')
"

Configuration

Create config.yml in the project root:

# Redis settings
redis:
  host: localhost
  port: 6379
  db: 0

# Database settings  
database:
  path: ./video_db
  table_name: videos

# AI Models
models:
  embedding_model: all-MiniLM-L6-v2
  whisper_model: base
  vision_model: google/vit-base-patch16-224

# API settings
api:
  host: 0.0.0.0
  port: 8000
  cors_origins: ["*"]

# Export settings
export:
  base_path: ./exports
  max_file_size_mb: 1000
  copy_mode: true

� File Organization & Export Structure

Supported File Types

Images

  • .png, .jpg, .jpeg, .gif, .bmp, .tiff, .webp

Videos

  • .mp4, .avi, .mov, .mkv, .wmv, .flv, .webm, .m4v

Export Folder Structure

When you export videos, they're automatically organized:

exports/
├── rope_climbing_videos_2025-10-12_1430/
│   ├── export_manifest.json
│   ├── mountain_climbing_session.mp4
│   ├── rock_climbing_tutorial.mov
│   └── rope_access_work.mp4
├── shibari_sessions_2025-10-12_1445/
│   ├── export_manifest.json
│   ├── basic_ties_tutorial.mp4
│   └── advanced_suspension.mov
└── beach_sunset_footage_2025-10-12_1500/
    ├── export_manifest.json
    ├── golden_hour_beach.mp4
    └── ocean_waves_sunset.mov

Export Manifest

Each export includes a comprehensive manifest file:

{
  "query": "rope climbing videos",
  "export_date": "2025-10-12T14:30:00Z",
  "export_mode": "copy",
  "filters": {
    "after": "2024-01-01",
    "min_duration": 30
  },
  "total_videos": 15,
  "exported_videos": 12,
  "errors": [],
  "results": [
    {
      "file_name": "mountain_climbing_session.mp4",
      "original_path": "/mnt/f/sorted-videos/mp4_videos/mountain_climbing_session.mp4",
      "tags": ["rope", "climbing", "mountain", "outdoor", "sport"],
      "duration": 245.6,
      "file_size": 89456123,
      "score": 0.94
    }
  ]
}

🏗️ Architecture & Technical Details

System Architecture

┌─────────────────────────────────────────────────────────────┐
│                     ContentOrganizer                        │
├─────────────────────────────────────────────────────────────┤
│  CLI Interface                │  REST API Server            │
│  - Image Organization         │  - FastAPI with OpenAPI     │
│  - Video Operations           │  - Background Tasks         │
│  - Search & Export            │  - CORS Support             │
├─────────────────────────────────────────────────────────────┤
│                    Core Processing Layer                    │
│  ┌───────────────┐  ┌────────────────┐  ┌────────────────┐ │
│  │ Image         │  │ Video          │  │ Search &       │ │
│  │ Analyzer      │  │ Analyzer       │  │ Export         │ │
│  │ - Visual AI   │  │ - Frame Ext.   │  │ - Embeddings   │ │
│  │ - Text Ext.   │  │ - Whisper      │  │ - Filtering    │ │
│  │ - Metadata    │  │ - Tagging      │  │ - Ranking      │ │
│  └───────────────┘  └────────────────┘  └────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│                     Storage & Caching                       │
│  ┌───────────────┐  ┌────────────────┐  ┌────────────────┐ │
│  │ Redis         │  │ LanceDB        │  │ File System    │ │
│  │ - Fast Cache  │  │ - Vector DB    │  │ - Original     │ │
│  │ - Search Cache│  │ - Embeddings   │  │ - Organized    │ │
│  │ - Metadata    │  │ - Similarity   │  │ - Exports      │ │
│  └───────────────┘  └────────────────┘  └────────────────┘ │
└─────────────────────────────────────────────────────────────┘

AI Models & Processing

  • Whisper (OpenAI): Audio transcription and sound recognition
  • Sentence Transformers: Semantic embeddings for similarity search
  • Vision Transformers: Visual content analysis and tagging
  • LLama 3.2: Text processing and metadata generation
  • OpenCV: Video frame extraction and processing

Performance Optimizations

  • Caching Strategy: Multi-layer caching with Redis for instant repeated searches
  • Batch Processing: Efficient video analysis with configurable batch sizes
  • Vector Search: Sub-second similarity search with LanceDB embeddings
  • Background Tasks: Non-blocking exports with progress tracking
  • Smart Filtering: Pre-filter before expensive operations

🔧 Advanced Configuration

Environment Variables

# Redis Configuration
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_DB=0

# Database Configuration  
DB_PATH=./video_db
DB_TABLE=videos

# API Configuration
API_HOST=0.0.0.0
API_PORT=8000

# Model Configuration
WHISPER_MODEL=base
EMBEDDING_MODEL=all-MiniLM-L6-v2
VISION_MODEL=google/vit-base-patch16-224

# Processing Configuration
BATCH_SIZE=10
MAX_WORKERS=4
CACHE_TTL=3600

Custom Model Configuration

You can customize AI models in config.yml:

models:
  # Whisper models: tiny, base, small, medium, large
  whisper_model: base
  
  # Sentence transformer models
  embedding_model: all-MiniLM-L6-v2
  # Alternative: all-mpnet-base-v2 (better quality, slower)
  
  # Vision models
  vision_model: google/vit-base-patch16-224
  # Alternative: microsoft/resnet-50

Performance Tuning

For High-Volume Processing:

video:
  analysis_batch_size: 20      # Process more videos at once
  max_workers: 8               # Use more CPU cores
  skip_transcription: false    # Set true to speed up analysis
  frame_sample_rate: 2         # Extract fewer frames per second

cache:
  redis_ttl: 7200             # Longer cache retention
  max_cache_size: 1000        # More cached searches

For Resource-Constrained Systems:

video:
  analysis_batch_size: 3       # Process fewer videos at once
  max_workers: 2               # Use fewer CPU cores
  frame_sample_rate: 0.5       # Extract fewer frames
  
models:
  whisper_model: tiny          # Smaller, faster model
  embedding_model: all-MiniLM-L6-v2  # Lightweight embedding model

🚨 Troubleshooting

Common Issues

Installation Problems

# Redis connection failed
sudo systemctl status redis-server
sudo systemctl start redis-server

# Python dependencies conflict
pip install --upgrade pip
pip install -e . --force-reinstall

# FFmpeg not found
sudo apt-get install ffmpeg  # Ubuntu/Debian
brew install ffmpeg          # macOS

Video Analysis Issues

# Model download failed
python -c "import whisper; whisper.load_model('base')"

# OpenCV issues
pip uninstall opencv-python
pip install opencv-python-headless

# Memory issues during analysis
# Reduce batch size in config.yml
analysis_batch_size: 3

Search & Export Problems

# No search results
# Check if videos have been analyzed
python -m contentorganizer --video-search "test" --limit 1

# Export permission denied
chmod 755 /export/directory
chown $USER:$USER /export/directory

# API server won't start
# Check if port is available
netstat -tulpn | grep :8000

Debug Mode

Enable detailed logging:

# Set debug environment
export PYTHONPATH=/path/to/ContentOrganizer/src
export LOG_LEVEL=DEBUG

# Run with verbose output
python -m contentorganizer --video-search "test" --limit 1 -v

Performance Monitoring

Monitor system performance:

# Check Redis memory usage
redis-cli info memory

# Monitor API server
curl http://localhost:8000/api/v1/stats

# Check database size
du -sh ./video_db/

🤝 Contributing

We welcome contributions! Here's how to get started:

Development Setup

# Clone and setup development environment
git clone https://github.com/jharri34/ContentOrganizer
cd ContentOrganizer
./setup.sh

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/

# Code formatting
black src/
flake8 src/

Project Structure

ContentOrganizer/
├── src/contentorganizer/          # Main package
│   ├── main.py                   # CLI entry point
│   ├── video_analyzer.py         # Video analysis engine  
│   ├── video_search.py           # Search functionality
│   ├── video_api.py              # REST API server
│   ├── image_data_processing.py  # Image analysis
│   └── sortphoto/                # File utilities
├── tests/                        # Test suite
├── docs/                         # Documentation
├── config.yml                    # Configuration
├── setup.sh                      # Installation script
└── demo.py                       # Interactive demo

Adding Features

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Write tests for your feature
  4. Implement the feature following existing patterns
  5. Update documentation if needed
  6. Submit a pull request

📚 Additional Resources

Documentation

Community & Support

Related Projects

  • Nexa AI - Local AI model management
  • Whisper - Audio transcription
  • LanceDB - Vector database
  • Redis - In-memory data structure store

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.


🙏 Acknowledgments

  • OpenAI for Whisper transcription models
  • Hugging Face for transformer models and sentence-transformers
  • Redis Labs for the Redis caching system
  • LanceDB team for the vector database
  • FastAPI team for the excellent web framework
  • All contributors and users who make this project better

Made with ❤️ for privacy-conscious users who want intelligent media organization without compromising their data.

About

ContentOrganizer is a local-first, AI-powered tool that intelligently analyzes and organizes your image files into meaningful folders based on visual content—securely, privately, and with zero cloud dependencies.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published