In Tupi mythology, Nheengatu — "the good language" — was the lingua franca that bridged dozens of peoples across the Amazon basin, turning mutual incomprehension into shared understanding.
Translate and simplify EPUB books to a target CEFR language level using an LLM.
Give it a German novel and ask for Portuguese at A2 — it parses the EPUB, chunks the text, instructs the model to rewrite each chunk at the right level, and produces a new EPUB ready for your Kindle.
A sample book is included (Metamorphosis by Franz Kafka, public domain). Just run:
nheengatu -i examples/metamorphosis.epubThe interactive guide will walk you through the rest. The story chapters are 3, 4, and 5 (the EPUB also contains a cover page and Project Gutenberg header/footer). Pick chapter 3 to try just the first part and stay within Groq's free tier limits.
Here's what chapter I of Kafka's Metamorphosis looks like at three different levels:
| A1 — Beginner | A2 — Elementary | B1 — Intermediate |
![]() |
![]() |
![]() |
- Parses the EPUB and splits chapters into chunks (~2500 words each)
- Sends each chunk to an LLM with CEFR-level instructions
- Reassembles the translated/simplified chunks into a new EPUB
For A1 and A2, a two-pass pipeline runs automatically:
- Pass 1 — Simplify in the source language (vocabulary and structure only)
- Pass 2 — Translate the simplified text to the target language
Separating these steps produces significantly better results at low levels — in testing, two-pass A2 output showed 15–25% higher coverage of standard vocabulary lists compared to single-pass. It also prevents the language leakage that occurs when a model tries to simplify and translate simultaneously.
For B1 and above, a single pass is used. Forcing two passes at B1 hurts quality: Pass 1 over-simplifies sentence structure and Pass 2 cannot recover the natural flow.
Just run it — the interactive guide walks you through everything:
nheengatu? What book do you want to translate?
> book.epub
? What language is this book in?
> 1. German
? What language should it be in?
> 5. Portuguese
? What's your level?
> 2. Elementary (A2)
? Which chapters? (enter for all, or e.g. 1,3,5)
>
? Standard or Advanced config?
> 1. Standard (recommended)
Translating "Der Prozess" → Portuguese at A2...
[==================> ] 6/10 chapters
Done: der-prozess-pt-a2.epub
On first run it asks for your Groq API key (free at console.groq.com) and saves it so you don't need to enter it again.
Every prompt can be skipped by passing the corresponding flag:
nheengatu -i book.epub --source-lang de -t pt -l A2-i, --input <FILE> Input EPUB
-o, --output <FILE> Output EPUB (default: auto-generated from title)
-t, --target-lang <LANG> Target language (name or code: Portuguese, pt)
-l, --level <LEVEL> CEFR level: A1 A2 B1 B2 C1 C2
--source-lang <LANG> Source language (name or code: German, de)
-b, --backend <BACKEND> groq or ollama [default: groq]
-m, --model <MODEL> Model override
--simplify-backend <B> Backend for pass 1 (simplify). Defaults to --backend
--translate-model <MODEL> Model for pass 2 (translate)
--chapters <N,N,…> Only translate these chapters (1-based)
--two-pass Force two-pass at any level
--max-chunk-words <N> Words per chunk [default: 2500]
--groq-api-key <KEY> [env: GROQ_API_KEY]
--ollama-url <URL> [default: http://localhost:11434]
German, English, French, Spanish, Portuguese, Italian, Dutch, Polish, Russian, Japanese, Chinese.
These are the languages that Ollama models support well. The Groq backend can handle more, but the list reflects what works reliably across both backends.
| Level | Description |
|---|---|
| A1 | Beginner — very simple words, present tense only |
| A2 | Elementary — everyday topics, basic phrases |
| B1 | Intermediate — familiar topics, clear structure |
| B2 | Upper-intermediate — complex topics, some nuance |
| C1 | Advanced — nuanced, idiomatic |
| C2 | Mastery — near-native |
nheengatu-core exposes the full pipeline for use in your own applications:
use nheengatu_core::{
pipeline::{run_pipeline, PipelineConfig},
translator::GroqTranslator,
};
use std::sync::Arc;
let translator = Arc::new(GroqTranslator::new(api_key)?);
let config = PipelineConfig {
target_lang: "pt".into(),
level: "A2".into(),
..Default::default()
};
run_pipeline(
Path::new("book.epub"),
Path::new("book-pt-a2.epub"),
&config,
translator.as_ref(),
translator.as_ref(),
).await?;Implement the Translator trait to plug in any LLM backend:
#[async_trait]
impl Translator for MyBackend {
async fn translate_chunk(&self, text: &str, source: &str, target: &str, level: &str)
-> Result<String, TranslateError> { … }
}core/ Library crate — EPUB parsing, chunking, translator trait, pipeline
cli/ Binary crate — command-line interface
tests/ Integration tests and fixtures
cargo build
cargo testSee CONTRIBUTING.md.
MIT — see LICENSE.


