Skip to content

leo-dower/ocr-enhanced-projec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔍 OCR Enhanced - Advanced Document Processing

Python 3.8+ License: MIT Code style: black

A comprehensive OCR solution that combines local (Tesseract) and cloud (Mistral AI) processing with dynamic folder selection, searchable PDF generation, and hybrid processing modes.

✨ Features

🎯 Core Functionality

  • Hybrid OCR Processing: Try local first, fallback to cloud if needed
  • Multiple Engines: Tesseract (local) + Mistral AI (cloud)
  • Searchable PDFs: Generate PDFs with invisible text layer
  • Batch Processing: Handle multiple files efficiently
  • Dynamic Folders: Choose input/output directories via GUI

🔧 Processing Modes

  • 🔄 Hybrid: Local first, cloud fallback (recommended)
  • ☁️ Cloud Only: Mistral AI processing only
  • 💻 Local Only: Tesseract processing only
  • 🔒 Privacy: Force local processing (no data sent to cloud)

🎨 User Experience

  • Modern GUI: Intuitive Tkinter interface with drag & drop
  • Real-time Progress: Detailed progress tracking and logging
  • Folder Selection: Choose custom input/output directories
  • Multi-format Output: JSON, Markdown, and searchable PDF

🚀 Quick Start

Installation

# Install from PyPI (recommended)
pip install ocr-enhanced

# Or install from source
git clone https://github.com/leo-dower/ocr-enhanced-projec.git
cd ocr-enhanced-projec
pip install -e .

System Dependencies

Ubuntu/Debian:

sudo apt install tesseract-ocr tesseract-ocr-por tesseract-ocr-eng poppler-utils

Windows:

macOS:

brew install tesseract poppler

Usage

GUI Application:

ocr-enhanced-gui

Command Line:

ocr-cli --input /path/to/pdfs --output /path/to/results

Python API:

from src.core import OCRProcessor

processor = OCRProcessor(mode='hybrid')
result = processor.process_file('document.pdf')

📁 Project Structure

ocr-enhanced/
├── src/                    # Source code
│   ├── core/              # Core processing logic
│   ├── gui/               # User interface
│   ├── ocr/               # OCR engines
│   └── utils/             # Utilities
├── tests/                 # Test suite
├── docs/                  # Documentation
├── examples/              # Usage examples
└── requirements/          # Dependencies

🔧 Configuration

Environment Variables

# API Configuration
MISTRAL_API_KEY=your_api_key_here

# Default Folders
OCR_INPUT_PATH=/path/to/input
OCR_OUTPUT_PATH=/path/to/output

# Processing Settings
OCR_MODE=hybrid
OCR_LANGUAGE=por+eng

Configuration File

Create ~/.ocr-enhanced.json:

{
  "default_mode": "hybrid",
  "tesseract_path": "/usr/bin/tesseract",
  "default_language": "por+eng",
  "max_pages_per_batch": 200,
  "confidence_threshold": 0.75
}

🧪 Development

Setup Development Environment

# Clone repository
git clone https://github.com/leo-dower/ocr-enhanced-projec.git
cd ocr-enhanced-projec

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/Mac
# or
venv\Scripts\activate  # Windows

# Install development dependencies
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

Run Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=src --cov-report=html

# Run specific test types
pytest -m unit          # Unit tests only
pytest -m integration   # Integration tests only
pytest -m "not slow"    # Skip slow tests

Code Quality

# Format code
black src tests

# Sort imports
isort src tests

# Lint code
flake8 src tests

# Type checking
mypy src

# Security check
bandit -r src

📊 Performance

Mode Speed Accuracy Privacy Cost
Local Fast Good 100% Free
Cloud Medium Excellent Depends Paid
Hybrid Optimal Best Balanced Mixed

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Add tests for your changes
  5. Ensure all tests pass (pytest)
  6. Commit your changes (git commit -m 'Add amazing feature')
  7. Push to the branch (git push origin feature/amazing-feature)
  8. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

📞 Support


Made with ❤️ by the OCR Enhanced Team - Leo-dower and claudecode =)

About

Enhanced OCR application with local and cloud processing capabilities

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages