A comprehensive OCR solution that combines local (Tesseract) and cloud (Mistral AI) processing with dynamic folder selection, searchable PDF generation, and hybrid processing modes.
- Hybrid OCR Processing: Try local first, fallback to cloud if needed
- Multiple Engines: Tesseract (local) + Mistral AI (cloud)
- Searchable PDFs: Generate PDFs with invisible text layer
- Batch Processing: Handle multiple files efficiently
- Dynamic Folders: Choose input/output directories via GUI
- 🔄 Hybrid: Local first, cloud fallback (recommended)
- ☁️ Cloud Only: Mistral AI processing only
- 💻 Local Only: Tesseract processing only
- 🔒 Privacy: Force local processing (no data sent to cloud)
- Modern GUI: Intuitive Tkinter interface with drag & drop
- Real-time Progress: Detailed progress tracking and logging
- Folder Selection: Choose custom input/output directories
- Multi-format Output: JSON, Markdown, and searchable PDF
# Install from PyPI (recommended)
pip install ocr-enhanced
# Or install from source
git clone https://github.com/leo-dower/ocr-enhanced-projec.git
cd ocr-enhanced-projec
pip install -e .Ubuntu/Debian:
sudo apt install tesseract-ocr tesseract-ocr-por tesseract-ocr-eng poppler-utilsWindows:
- Download Tesseract from UB-Mannheim
- Install Poppler from conda-forge
macOS:
brew install tesseract popplerGUI Application:
ocr-enhanced-guiCommand Line:
ocr-cli --input /path/to/pdfs --output /path/to/resultsPython API:
from src.core import OCRProcessor
processor = OCRProcessor(mode='hybrid')
result = processor.process_file('document.pdf')ocr-enhanced/
├── src/ # Source code
│ ├── core/ # Core processing logic
│ ├── gui/ # User interface
│ ├── ocr/ # OCR engines
│ └── utils/ # Utilities
├── tests/ # Test suite
├── docs/ # Documentation
├── examples/ # Usage examples
└── requirements/ # Dependencies
# API Configuration
MISTRAL_API_KEY=your_api_key_here
# Default Folders
OCR_INPUT_PATH=/path/to/input
OCR_OUTPUT_PATH=/path/to/output
# Processing Settings
OCR_MODE=hybrid
OCR_LANGUAGE=por+engCreate ~/.ocr-enhanced.json:
{
"default_mode": "hybrid",
"tesseract_path": "/usr/bin/tesseract",
"default_language": "por+eng",
"max_pages_per_batch": 200,
"confidence_threshold": 0.75
}# Clone repository
git clone https://github.com/leo-dower/ocr-enhanced-projec.git
cd ocr-enhanced-projec
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
# or
venv\Scripts\activate # Windows
# Install development dependencies
pip install -e ".[dev]"
# Install pre-commit hooks
pre-commit install# Run all tests
pytest
# Run with coverage
pytest --cov=src --cov-report=html
# Run specific test types
pytest -m unit # Unit tests only
pytest -m integration # Integration tests only
pytest -m "not slow" # Skip slow tests# Format code
black src tests
# Sort imports
isort src tests
# Lint code
flake8 src tests
# Type checking
mypy src
# Security check
bandit -r src| Mode | Speed | Accuracy | Privacy | Cost |
|---|---|---|---|---|
| Local | Fast | Good | 100% | Free |
| Cloud | Medium | Excellent | Depends | Paid |
| Hybrid | Optimal | Best | Balanced | Mixed |
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests for your changes
- Ensure all tests pass (
pytest) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Tesseract OCR for local processing
- Mistral AI for cloud OCR capabilities
- PyMuPDF for PDF manipulation
Made with ❤️ by the OCR Enhanced Team - Leo-dower and claudecode =)