A Python-based automation tool that validates and automatically repairs EPUB files to comply with the W3C EPUB 3.3 standard. It detects issues using EPUBCheck and intelligently fixes problems using modern LLMs (Claude, GPT, Gemini, Grok).
This toolkit prioritizes high repair accuracy over raw speed. It analyzes and fixes errors sequentially, and is designed to minimize disk I/O by performing in-memory edits.
- Full EPUBCheck integration: Uses the official W3C EPUBCheck tool for precise validation (including Usage-level messages with
-u). - Multi-LLM support: Choose between Anthropic Claude, OpenAI GPT, Google Gemini, and xAI Grok.
- Sequential error fixes: Processes detected issues one at a time to reduce hallucinations and increase accuracy.
- I/O optimization: Read-once, modify-in-memory, write-once per file to minimize disk operations.
- Rule customization: Define error-specific fix instructions and routing logic via a JSON guide.
- Intelligent routing: Automatically determine the actual file to modify when an error's reported location differs from the fix target (e.g., modify content.opf for missing resources).
- Python 3.8 or newer
- Java Runtime Environment (JRE) 8+ for running EPUBCheck
-
Clone the repository:
git clone https://github.com/jun-hyung-joon/eBook-Standardization-Toolkit.git cd eBook-Standardization-Toolkit -
Install Python dependencies:
pip install -r requirements.txt
-
Install external tools (EPUBCheck):
python main.py --install-tools
Create a .env file in the project root and add your API keys. This file is ignored by Git for security.
# .env
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=AIza...
XAI_API_KEY=xai-...(Optional) Set the default AI provider:
DEFAULT_AI_MODEL=Gemini
Adjust default model names or token limits in this YAML file.
models:
gemini:
default: "gemini-2.5-flash"
max_tokens: 8192
claude:
default: "claude-sonnet-4-20250514"
max_tokens: 4096Define how the AI should handle specific EPUBCheck error codes. The JSON controls:
- hints: detailed fix instructions per error code
- target_overrides: force routing to a different file (e.g., modify content.opf for certain errors)
Basic conversion using the configured default AI model:
python main.py book.epubSpecify an AI provider (claude, gpt, gemini, grok):
python main.py book.epub --ai claudeRun validation only (no AI fixes):
python main.py book.epub --check-only- -o, --output: specify output filename
- -v, --verbose: enable verbose logging (for debugging)
- -q, --quiet: minimal output
This project is distributed under the MIT License. See the LICENSE file for details.
This project builds upon several outstanding tools and research efforts in the EPUB ecosystem:
- epubcheck — the official EPUB validation tool by W3C (used as the core validation engine)
This toolkit integrates with and requires:
- epubcheck (BSD 3-Clause License) - Official EPUB validation tool
This project is an independent automation toolkit and is not affiliated with, endorsed by, or sponsored by the W3C, the EPUBCheck project, or any LLM provider(OpenAI, Anthropic, Google, xAI).