Skip to content

A viewer/editor web app to compare the PDF source and the TEI extraction/annotation result

License

Notifications You must be signed in to change notification settings

mpilhlt/pdf-tei-editor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

983 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF-TEI Editor

A comprehensive viewer/editor web application for comparing PDF sources with TEI extraction and annotation results, specifically designed for creating gold standard datasets of TEI documents from legal and humanities literature.

grafik

Key Features

  • Dual-pane interface with synchronized PDF viewer and XML editor
  • AI-powered extraction supporting multiple extraction engines (GROBID, etc.)
  • Version management with branching, merging, and comparison tools
  • Schema validation with automatic TEI compliance checking
  • Access control with role-based permissions and collection management
  • Collection organization for managing document sets
  • WebDAV synchronization for external system integration
  • Revision tracking with detailed change documentation

Target Use Cases

  • Creating gold standard datasets for reference extraction research
  • Manual validation and correction of AI-extracted bibliographic data
  • Collaborative annotation of legal and humanities literature
  • Training data preparation for machine learning models
  • Quality assurance for large-scale digitization projects

About

This repository is part of the "Legal Theory Knowledge Graph" project at the Max Planck Institute of Legal History and Legal Theory.

Related repositories:

🚀 Quick Start

Try with Docker (Fastest)

The fastest way to try PDF-TEI Editor using our deployment script:

# Clone the repository
git clone https://github.com/mpilhlt/pdf-tei-editor.git
cd pdf-tei-editor

# Deploy demo container (builds and starts automatically)
npm run deploy .env.deploy.demo.localhost

Then visit: http://localhost:8080

  • Login: admin / admin or demo / demo

Or use Docker directly:

docker run -p 8000:8000 -e APP_ADMIN_PASSWORD=admin123 cboulanger/pdf-tei-editor:latest

Visit: http://localhost:8000 - Login: admin / admin123

📖 For detailed setup: See the Docker Deployment Guide

Development Setup

# Clone the repository
git clone https://github.com/mpilhlt/pdf-tei-editor.git
cd pdf-tei-editor

# Configure environment
cp .env.development .env

# Install dependencies
npm install

# Start development server
npm run start:dev

Visit: http://localhost:8000

📖 For complete setup instructions: See the Installation Guide

Documentation

📚 Complete Documentation

For End Users

For Developers

General reference

For Code Assistants

Technology Stack

Backend: FastAPI (Python 3.13+), SQLite, lxml Frontend: ES6 modules, CodeMirror 6, PDF.js, Shoelace Testing: Playwright (E2E), pytest (backend), Node.js test runner (API)

License

See the project repository for license information.

Contributing

Developers interested in contributing should:

  1. Read the Developer Documentation
  2. Follow Coding Standards
  3. Write tests following the Testing Guide
  4. Submit pull requests with proper documentation

Questions or issues? Check the documentation or open an issue.

About

A viewer/editor web app to compare the PDF source and the TEI extraction/annotation result

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 5