Skip to content

high-fidelity book translation with large language models, synchronizing chapters, context, and revisions for multilingual publishing.

Notifications You must be signed in to change notification settings

williamddobson3/hololingo

Repository files navigation

HoloLingo Translation Studio

Powerful long-form translation workflows tuned for books, subtitles, and research archives—all orchestrated through large language models you control.


Overview

HoloLingo delivers a full-stack studio for transforming expansive manuscripts into new languages while preserving structure, styling, and narrative tone. The system combines automated document detection, adaptive prompt engineering, and granular progress tracking so multilingual publishing teams can focus on storytelling instead of repetitive formatting chores.


Core Capabilities

  • Adaptive chunking that keeps context intact across chapters, scenes, and dialogue-heavy passages.
  • Dual translation engines for EPUB and SRT sources with formatting-aware reassembly.
  • Pluggable LLM backends (local or cloud) selected at runtime, with safeguards for rate limits and retries.
  • Interactive browser console for curating source material, monitoring progress, and downloading outputs.
  • Command-line tooling suitable for automation pipelines and scripted batch runs.

System Highlights

  • Automatic detection of EPUB simplicity requirements ensures compatibility with strict readers.
  • Translation states persist between sessions so large jobs can be resumed after interruptions.
  • Audit-ready logs record the exact prompts and responses used for every segment.
  • Built-in guardrails sanitize metadata and remove unpublished drafts from the export path.

Architecture at a Glance

  • src/api: Flask blueprint collection that exposes REST and WebSocket interfaces.
  • src/core: Translation engines, context optimization, and document adapters.
  • src/core/epub: Simple and rich EPUB processors with DOM-safe rewriting utilities.
  • src/core/subtitle_translator: Frame-accurate subtitle converter with timing preservation.
  • src/utils: Cross-cutting helpers for configuration, file I/O, security, and logging.
  • src/web: Lightweight front-end delivering the HoloLingo control surface.

Getting Started

Prerequisites

  • Python 3.10 or later.
  • Git command line tools.
  • An optional local LLM runtime such as Ollama, or access credentials for a remote provider.

Setup Steps

# 1. Acquire the source
git clone <repository-url>
cd TranslateBookWithLLM

# 2. Create an isolated environment
python -m venv .venv
source .venv/Scripts/activate  # Windows PowerShell: .venv\Scripts\Activate.ps1

# 3. Install dependencies
pip install --upgrade pip
pip install -r requirements.txt

# 4. Configure runtime values
cp .env.example .env  # optional; adjust values as needed

# 5. Launch the studio
python translation_api.py

The web experience runs at http://localhost:5000 by default. Use the PORT environment variable to change the listener port.


Running the Studio

Web Dashboard

  1. Open the interface in your browser.
  2. Choose a translation provider (local, OpenAI-compatible, or Gemini-compatible).
  3. Select source and target languages and configure fallback options.
  4. Drag in EPUB, SRT, or plain-text files. Mixed media batches are supported.
  5. Start the session and monitor live progress updates per file.
  6. Download completed translations individually or as a bundled archive.

Command Line Harness

python translate.py \
  --input path/to/source.epub \
  --output translated/output.epub \
  --source_lang "English" \
  --target_lang "Spanish" \
  --provider ollama \
  --model mistral-large \
  --simple-mode

Run python translate.py --help to review every available flag, including batch folders, retry tuning, and context-window overrides.


Workflow Tips

  • Curate context: The context optimizer adapts to document length; reducing chunk size improves latency on modest hardware.
  • Pick the right engine: Simple EPUB mode maximizes e-reader compatibility; rich mode retains advanced styling.
  • Track revisions: Each translation produces deterministic filenames that pair original and generated content for easy diffing.
  • Stay resilient: Interruptions are safe—the translation state persists so you can resume without data loss.

Configuration Keys

Define these variables directly in the shell or within .env:

  • LLM_PROVIDER: ollama, openai, or gemini.
  • API_ENDPOINT: Base URL for the active provider.
  • DEFAULT_MODEL: Preferred model name for initial requests.
  • DEFAULT_SOURCE_LANGUAGE / DEFAULT_TARGET_LANGUAGE: Override UI defaults.
  • MAIN_LINES_PER_CHUNK: Chunk size for text segmentation.
  • REQUEST_TIMEOUT: Seconds to wait before retrying a stalled call.
  • OUTPUT_DIR: Destination folder for generated files.
  • AUTO_ADJUST_CONTEXT: Enables adaptive prompt shaping.

Quality Assurance

  • Unit tests focus on EPUB structure preservation and subtitle timing accuracy.
  • Sample datasets in tests/data (if present) illustrate expected layouts and edge cases.
  • Consider running your own diff checks or neural quality estimators for critical publishing work.

Roadmap Snapshots

  • Streaming translation over WebSocket for near-real-time preview.
  • Vocabulary memory that carries terminology rules across chapters.
  • Collaborative annotations so reviewers can approve sections before export.

Contributing

  1. Fork or branch from main.
  2. Keep contributions self-contained with clear commit messages.
  3. Add or update tests when introducing new behaviour.
  4. Submit a pull request describing the motivation, approach, and validation steps.

License

This project is released under the terms specified in LICENSE. Review the file before redistributing or embedding the software in commercial products.


HoloLingo exists to make translated literature feel native to every reader. Happy localizing!

About

high-fidelity book translation with large language models, synchronizing chapters, context, and revisions for multilingual publishing.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published