Merge two ePub files in different languages into a single bilingual ebook for parallel reading.
ePubLangMerger is an R/Shiny application that takes two ePub files -- each in a different language -- and merges them into a single bilingual ePub. Every paragraph and heading from the second language is inserted as an XML sibling directly after the corresponding element in the first language, producing an interleaved, side-by-side reading experience.
docker compose up -dThen open http://localhost:3838
- Paragraph-level merging -- Pairs
<p>and<h1>--<h5>elements across both ePubs and interleaves them as XML siblings. - Intelligent output naming -- Automatically generates the merged filename from the input filenames and their language codes.
- Caching -- Previously merged files are detected and served instantly without reprocessing.
- Web UI -- Upload two ePub files through a clean Shiny interface, click "Go!", and download the result.
- Batch / CLI mode --
script.Rprovides a standalone, non-interactive version for scripting and automation. - Duplicate ID resolution -- Appends a
_2suffix to allidattributes from the second ePub to prevent XHTML ID collisions.
| Dependency | Purpose |
|---|---|
| Docker (recommended) | Run the app with no manual dependency management |
| R (>= 3.5) | Runtime (manual installation only) |
| shiny | Web application framework |
| XML | XHTML parsing and manipulation |
| stringr | Filename string operations |
| readr | File I/O for cached results |
| Rcompression | ePub (ZIP) creation |
Note:
Rcompressionis not on CRAN. Install it from GitHub (see Manual Installation below).
git clone https://github.com/GeiserX/ePubLangMerger.git
cd ePubLangMergerInstall R dependencies:
install.packages(c("shiny", "XML", "stringr", "readr"))
devtools::install_github("omegahat/Rcompression")docker compose up -dThe web UI will be available at http://localhost:3838. Upload your two ePub files and download the merged result.
Launch the Shiny app from R:
shiny::runApp(".", port = 8080, host = "0.0.0.0", launch.browser = TRUE)- Upload the first language ePub (this language will appear first in each paragraph pair).
- Upload the second language ePub.
- Click Go!.
- Download the merged bilingual ePub.
Rscript script.R <input_dir> <file1.epub> <file2.epub> <output_dir>Example:
Rscript script.R ./books MyBook_EN.epub MyBook_ES.epub ./outputThis mode is useful for automated pipelines or bulk processing.
The tool expects input filenames in the format Title_LangCode.epub (e.g., MyBook_EN.epub, MyBook_ES.epub). The merged output is named by combining both language codes: MyBook_EN_ES.epub. An optional trailing segment (e.g., a date) is preserved when present.
- Extract -- Both ePub files are unzipped to reveal their internal XHTML chapter files (in
OEBPS/). - Parse -- Each
.xhtmlfile is parsed into an XML DOM tree. - Merge -- For every chapter, paragraphs (
<p>) and headings (<h1>--<h5>) from the second language are inserted as siblings immediately after their corresponding elements in the first language. Duplicateidattributes are suffixed with_2. - Reassemble -- The modified XHTML files are saved back, the directory structure from the first ePub is copied into a new folder, and the whole structure is re-compressed into a valid
.epubfile usingRcompression::zip.
- Both ePubs must share the same internal structure: identical number of XHTML chapter files with matching filenames inside
OEBPS/. - Paragraph and heading counts should match across languages. When they differ, only the minimum overlapping count is merged; extra elements in the longer file are left untouched (not duplicated).
- Only
<p>and<h1>through<h5>elements are merged. Other block-level elements (e.g.,<blockquote>,<div>,<ul>) are not processed. - The tool does not modify the ePub's OPF metadata (title, language, etc.).
- AskePub — Telegram bot for ePub annotation with GPT-4
- epub-and-vtt-to-llm — Fine-tune LLMs from ePub and subtitle data
Contributions are welcome. Please see CONTRIBUTING.md for guidelines.
This project is licensed under the GNU Lesser General Public License v3.0.