Powerful long-form translation workflows tuned for books, subtitles, and research archives—all orchestrated through large language models you control.
HoloLingo delivers a full-stack studio for transforming expansive manuscripts into new languages while preserving structure, styling, and narrative tone. The system combines automated document detection, adaptive prompt engineering, and granular progress tracking so multilingual publishing teams can focus on storytelling instead of repetitive formatting chores.
- Adaptive chunking that keeps context intact across chapters, scenes, and dialogue-heavy passages.
- Dual translation engines for EPUB and SRT sources with formatting-aware reassembly.
- Pluggable LLM backends (local or cloud) selected at runtime, with safeguards for rate limits and retries.
- Interactive browser console for curating source material, monitoring progress, and downloading outputs.
- Command-line tooling suitable for automation pipelines and scripted batch runs.
- Automatic detection of EPUB simplicity requirements ensures compatibility with strict readers.
- Translation states persist between sessions so large jobs can be resumed after interruptions.
- Audit-ready logs record the exact prompts and responses used for every segment.
- Built-in guardrails sanitize metadata and remove unpublished drafts from the export path.
src/api: Flask blueprint collection that exposes REST and WebSocket interfaces.src/core: Translation engines, context optimization, and document adapters.src/core/epub: Simple and rich EPUB processors with DOM-safe rewriting utilities.src/core/subtitle_translator: Frame-accurate subtitle converter with timing preservation.src/utils: Cross-cutting helpers for configuration, file I/O, security, and logging.src/web: Lightweight front-end delivering the HoloLingo control surface.
- Python 3.10 or later.
- Git command line tools.
- An optional local LLM runtime such as Ollama, or access credentials for a remote provider.
# 1. Acquire the source
git clone <repository-url>
cd TranslateBookWithLLM
# 2. Create an isolated environment
python -m venv .venv
source .venv/Scripts/activate # Windows PowerShell: .venv\Scripts\Activate.ps1
# 3. Install dependencies
pip install --upgrade pip
pip install -r requirements.txt
# 4. Configure runtime values
cp .env.example .env # optional; adjust values as needed
# 5. Launch the studio
python translation_api.pyThe web experience runs at http://localhost:5000 by default. Use the PORT environment variable to change the listener port.
- Open the interface in your browser.
- Choose a translation provider (local, OpenAI-compatible, or Gemini-compatible).
- Select source and target languages and configure fallback options.
- Drag in EPUB, SRT, or plain-text files. Mixed media batches are supported.
- Start the session and monitor live progress updates per file.
- Download completed translations individually or as a bundled archive.
python translate.py \
--input path/to/source.epub \
--output translated/output.epub \
--source_lang "English" \
--target_lang "Spanish" \
--provider ollama \
--model mistral-large \
--simple-modeRun python translate.py --help to review every available flag, including batch folders, retry tuning, and context-window overrides.
- Curate context: The context optimizer adapts to document length; reducing chunk size improves latency on modest hardware.
- Pick the right engine: Simple EPUB mode maximizes e-reader compatibility; rich mode retains advanced styling.
- Track revisions: Each translation produces deterministic filenames that pair original and generated content for easy diffing.
- Stay resilient: Interruptions are safe—the translation state persists so you can resume without data loss.
Define these variables directly in the shell or within .env:
LLM_PROVIDER:ollama,openai, orgemini.API_ENDPOINT: Base URL for the active provider.DEFAULT_MODEL: Preferred model name for initial requests.DEFAULT_SOURCE_LANGUAGE/DEFAULT_TARGET_LANGUAGE: Override UI defaults.MAIN_LINES_PER_CHUNK: Chunk size for text segmentation.REQUEST_TIMEOUT: Seconds to wait before retrying a stalled call.OUTPUT_DIR: Destination folder for generated files.AUTO_ADJUST_CONTEXT: Enables adaptive prompt shaping.
- Unit tests focus on EPUB structure preservation and subtitle timing accuracy.
- Sample datasets in
tests/data(if present) illustrate expected layouts and edge cases. - Consider running your own diff checks or neural quality estimators for critical publishing work.
- Streaming translation over WebSocket for near-real-time preview.
- Vocabulary memory that carries terminology rules across chapters.
- Collaborative annotations so reviewers can approve sections before export.
- Fork or branch from
main. - Keep contributions self-contained with clear commit messages.
- Add or update tests when introducing new behaviour.
- Submit a pull request describing the motivation, approach, and validation steps.
This project is released under the terms specified in LICENSE. Review the file before redistributing or embedding the software in commercial products.
HoloLingo exists to make translated literature feel native to every reader. Happy localizing!