Skip to content

GuillaumeLeone8/EasyPdfForYou

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

EasyPdfForYou

Version Python License

A lightweight PDF document processing and translation tool supporting Google Translate and OpenRouter APIs.

Features

  • πŸ“„ PDF Text Extraction - Extract text while preserving layout structure
  • πŸ” OCR Support - Recognize text from scanned PDFs using Tesseract
  • 🌐 Multi-language Translation - Support for 9+ languages (English, Chinese, Japanese, Korean, etc.)
  • πŸ“ Bilingual PDF Generation - Create side-by-side, line-by-line, or overlay bilingual documents
  • πŸ’» CLI Interface - Command-line tool for batch processing
  • 🌐 Web UI - Easy-to-use web interface
  • πŸ”§ Multiple Translation Providers - Google Translate (free) and OpenRouter (LLM-based)

Supported Languages

Language Code OCR Support
English en βœ…
Simplified Chinese zh-CN βœ…
Traditional Chinese zh-TW βœ…
Japanese ja βœ…
Korean ko βœ…
French fr βœ…
German de βœ…
Spanish es βœ…
Italian it βœ…
Portuguese pt βœ…
Russian ru βœ…

Installation

From Source

git clone https://github.com/GuillaumeLeone8/EasyPdfForYou.git
cd EasyPdfForYou
pip install -e .

System Dependencies

For OCR functionality, install Tesseract:

Ubuntu/Debian:

sudo apt-get install tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-chi-tra

macOS:

brew install tesseract tesseract-lang

Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki

Quick Start

CLI Usage

# Extract text from PDF
epdf extract document.pdf -o output.txt

# Translate PDF
e pdf translate document.pdf --target zh-CN -o translated.pdf

# Use OCR for scanned PDFs
epdf translate scanned.pdf --target zh-CN --ocr -o translated.pdf

# Different layout styles
epdf translate document.pdf --target zh-CN --layout side_by_side
e pdf translate document.pdf --target zh-CN --layout line_by_line
e pdf translate document.pdf --target zh-CN --layout overlay

# OCR a specific page
epdf ocr document.pdf --page 0 --lang eng

# Show PDF info
epdf info document.pdf

Web UI

# Start the web server
epdf web --host 0.0.0.0 --port 5000

# Then open http://localhost:5000 in your browser

Python API

from easypdfforyou import PdfExtractor, TranslationService, BilingualGenerator

# Extract text
extractor = PdfExtractor()
pages = extractor.extract_text("document.pdf")

# Translate
pages_text = [page.text for page in pages]
translator = TranslationService()
translated = translator.translate_batch(pages_text, "en", "zh-CN")

# Generate bilingual PDF
generator = BilingualGenerator()
generator.generate(
    pages_text,
    translated,
    "output.pdf",
    layout="side_by_side"
)

Configuration

Environment Variables

# OpenRouter API (for LLM-based translation)
export OPENROUTER_API_KEY="your-api-key"
export OPENROUTER_MODEL="google/gemini-2.0-flash-001"

# Tesseract path (if not in PATH)
export TESSERACT_CMD="/usr/bin/tesseract"

# Default settings
export DEFAULT_TARGET_LANG="zh-CN"
export PDF_DPI="300"
export OUTPUT_DIR="./output"

Config File

Create config.json:

{
  "openrouter_api_key": "your-api-key",
  "openrouter_model": "google/gemini-2.0-flash-001",
  "default_target_lang": "zh-CN",
  "dpi": 300
}

Then use with CLI:

epdf --config config.json translate document.pdf

Translation Providers

Google Translate (Default)

  • Free, no API key required
  • Supports all major languages
  • Rate limited by Google

OpenRouter (Optional)

  • Higher quality LLM-based translation
  • Supports models like Gemini, GPT, etc.
  • Requires API key from https://openrouter.ai

Development

Setup Development Environment

git clone https://github.com/GuillaumeLeone8/EasyPdfForYou.git
cd EasyPdfForYou
pip install -e ".[dev]"

Run Tests

pytest
pytest --cov=easypdfforyou

Code Formatting

black easypdfforyou tests
flake8 easypdfforyou tests

Project Structure

easypdfforyou/
β”œβ”€β”€ easypdfforyou/
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ config.py           # Configuration management
β”‚   β”‚   β”œβ”€β”€ pdf_extractor.py    # PDF text extraction
β”‚   β”‚   β”œβ”€β”€ ocr_engine.py       # Tesseract OCR
β”‚   β”‚   β”œβ”€β”€ translator.py       # Translation APIs
β”‚   β”‚   └── bilingual_generator.py  # PDF generation
β”‚   β”œβ”€β”€ cli/
β”‚   β”‚   └── main.py             # CLI interface
β”‚   β”œβ”€β”€ web/
β”‚   β”‚   β”œβ”€β”€ app.py              # Flask web app
β”‚   β”‚   └── templates/
β”‚   β”‚       └── index.html      # Web UI
β”‚   └── utils/
β”‚       └── __init__.py         # Utility functions
β”œβ”€β”€ tests/                      # Test suite
β”œβ”€β”€ docs/                       # Documentation
β”œβ”€β”€ examples/                   # Example files
β”œβ”€β”€ setup.py                    # Package setup
└── README.md                   # This file

API Documentation

See docs/API.md for detailed API documentation.

Usage Examples

See docs/USAGE.md for more usage examples.

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Changelog

v0.1.0 (2026-02-15)

  • Initial release
  • PDF text extraction with layout preservation
  • OCR support for scanned documents
  • Multi-language translation (Google + OpenRouter)
  • Bilingual PDF generation (3 layouts)
  • CLI interface
  • Web UI
  • Full test coverage

About

A lightweight PDF document processing and translation tool supporting Google Translate and OpenRouter APIs

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors