A lightweight PDF document processing and translation tool supporting Google Translate and OpenRouter APIs.
- π PDF Text Extraction - Extract text while preserving layout structure
- π OCR Support - Recognize text from scanned PDFs using Tesseract
- π Multi-language Translation - Support for 9+ languages (English, Chinese, Japanese, Korean, etc.)
- π Bilingual PDF Generation - Create side-by-side, line-by-line, or overlay bilingual documents
- π» CLI Interface - Command-line tool for batch processing
- π Web UI - Easy-to-use web interface
- π§ Multiple Translation Providers - Google Translate (free) and OpenRouter (LLM-based)
| Language | Code | OCR Support |
|---|---|---|
| English | en |
β |
| Simplified Chinese | zh-CN |
β |
| Traditional Chinese | zh-TW |
β |
| Japanese | ja |
β |
| Korean | ko |
β |
| French | fr |
β |
| German | de |
β |
| Spanish | es |
β |
| Italian | it |
β |
| Portuguese | pt |
β |
| Russian | ru |
β |
git clone https://github.com/GuillaumeLeone8/EasyPdfForYou.git
cd EasyPdfForYou
pip install -e .For OCR functionality, install Tesseract:
Ubuntu/Debian:
sudo apt-get install tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-chi-tramacOS:
brew install tesseract tesseract-langWindows: Download from https://github.com/UB-Mannheim/tesseract/wiki
# Extract text from PDF
epdf extract document.pdf -o output.txt
# Translate PDF
e pdf translate document.pdf --target zh-CN -o translated.pdf
# Use OCR for scanned PDFs
epdf translate scanned.pdf --target zh-CN --ocr -o translated.pdf
# Different layout styles
epdf translate document.pdf --target zh-CN --layout side_by_side
e pdf translate document.pdf --target zh-CN --layout line_by_line
e pdf translate document.pdf --target zh-CN --layout overlay
# OCR a specific page
epdf ocr document.pdf --page 0 --lang eng
# Show PDF info
epdf info document.pdf# Start the web server
epdf web --host 0.0.0.0 --port 5000
# Then open http://localhost:5000 in your browserfrom easypdfforyou import PdfExtractor, TranslationService, BilingualGenerator
# Extract text
extractor = PdfExtractor()
pages = extractor.extract_text("document.pdf")
# Translate
pages_text = [page.text for page in pages]
translator = TranslationService()
translated = translator.translate_batch(pages_text, "en", "zh-CN")
# Generate bilingual PDF
generator = BilingualGenerator()
generator.generate(
pages_text,
translated,
"output.pdf",
layout="side_by_side"
)# OpenRouter API (for LLM-based translation)
export OPENROUTER_API_KEY="your-api-key"
export OPENROUTER_MODEL="google/gemini-2.0-flash-001"
# Tesseract path (if not in PATH)
export TESSERACT_CMD="/usr/bin/tesseract"
# Default settings
export DEFAULT_TARGET_LANG="zh-CN"
export PDF_DPI="300"
export OUTPUT_DIR="./output"Create config.json:
{
"openrouter_api_key": "your-api-key",
"openrouter_model": "google/gemini-2.0-flash-001",
"default_target_lang": "zh-CN",
"dpi": 300
}Then use with CLI:
epdf --config config.json translate document.pdf- Free, no API key required
- Supports all major languages
- Rate limited by Google
- Higher quality LLM-based translation
- Supports models like Gemini, GPT, etc.
- Requires API key from https://openrouter.ai
git clone https://github.com/GuillaumeLeone8/EasyPdfForYou.git
cd EasyPdfForYou
pip install -e ".[dev]"pytest
pytest --cov=easypdfforyoublack easypdfforyou tests
flake8 easypdfforyou testseasypdfforyou/
βββ easypdfforyou/
β βββ core/
β β βββ config.py # Configuration management
β β βββ pdf_extractor.py # PDF text extraction
β β βββ ocr_engine.py # Tesseract OCR
β β βββ translator.py # Translation APIs
β β βββ bilingual_generator.py # PDF generation
β βββ cli/
β β βββ main.py # CLI interface
β βββ web/
β β βββ app.py # Flask web app
β β βββ templates/
β β βββ index.html # Web UI
β βββ utils/
β βββ __init__.py # Utility functions
βββ tests/ # Test suite
βββ docs/ # Documentation
βββ examples/ # Example files
βββ setup.py # Package setup
βββ README.md # This file
See docs/API.md for detailed API documentation.
See docs/USAGE.md for more usage examples.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- PyMuPDF - PDF processing
- Tesseract OCR - OCR engine
- Googletrans - Google Translate
- OpenRouter - LLM API
- ReportLab - PDF generation
- Flask - Web framework
- Initial release
- PDF text extraction with layout preservation
- OCR support for scanned documents
- Multi-language translation (Google + OpenRouter)
- Bilingual PDF generation (3 layouts)
- CLI interface
- Web UI
- Full test coverage