A comprehensive viewer/editor web application for comparing PDF sources with TEI extraction and annotation results, specifically designed for creating gold standard datasets of TEI documents from legal and humanities literature.
- Dual-pane interface with synchronized PDF viewer and XML editor
- AI-powered extraction supporting multiple extraction engines (GROBID, etc.)
- Version management with branching, merging, and comparison tools
- Schema validation with automatic TEI compliance checking
- Access control with role-based permissions and collection management
- Collection organization for managing document sets
- WebDAV synchronization for external system integration
- Revision tracking with detailed change documentation
- Creating gold standard datasets for reference extraction research
- Manual validation and correction of AI-extracted bibliographic data
- Collaborative annotation of legal and humanities literature
- Training data preparation for machine learning models
- Quality assurance for large-scale digitization projects
This repository is part of the "Legal Theory Knowledge Graph" project at the Max Planck Institute of Legal History and Legal Theory.
Related repositories:
The fastest way to try PDF-TEI Editor using our deployment script:
# Clone the repository
git clone https://github.com/mpilhlt/pdf-tei-editor.git
cd pdf-tei-editor
# Deploy demo container (builds and starts automatically)
npm run deploy .env.deploy.demo.localhostThen visit: http://localhost:8080
- Login:
admin/adminordemo/demo
Or use Docker directly:
docker run -p 8000:8000 -e APP_ADMIN_PASSWORD=admin123 cboulanger/pdf-tei-editor:latestVisit: http://localhost:8000 - Login: admin / admin123
📖 For detailed setup: See the Docker Deployment Guide
# Clone the repository
git clone https://github.com/mpilhlt/pdf-tei-editor.git
cd pdf-tei-editor
# Configure environment
cp .env.development .env
# Install dependencies
npm install
# Start development server
npm run start:devVisit: http://localhost:8000
📖 For complete setup instructions: See the Installation Guide
- User Manual - How to use the application
- Getting Started - First-time user guide
- Interface Overview - Understanding the interface
- Workflows - Extraction, editing, and sync workflows
- Developer Documentation - Architecture and development guides
- Architecture Overview - System design and structure
- Testing Guide - Running and writing tests
- Installation Guide - Development environment setup
- CLI - Command line interface
- Code Assistant Documentation - Concise technical guides
Backend: FastAPI (Python 3.13+), SQLite, lxml Frontend: ES6 modules, CodeMirror 6, PDF.js, Shoelace Testing: Playwright (E2E), pytest (backend), Node.js test runner (API)
See the project repository for license information.
Developers interested in contributing should:
- Read the Developer Documentation
- Follow Coding Standards
- Write tests following the Testing Guide
- Submit pull requests with proper documentation
Questions or issues? Check the documentation or open an issue.
