-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
Description
OCR publication thread for early 2025
Do not close this issue until all checkboxes below are complete or have been rescheduled:
List of corpora:
In Processed OCR folder (need chapter divisions + sentence splitting+full automatic NLP processing like the bible corpora)
- Giron Legendes (11 docs)
- chapter divisions added/checked
- metadata updated
all documents above need to be moved to https://github.com/CopticScriptorium/auto-corpora
In GitDox
-
apocalypse.paul (2)
- [ ] corpus name needed
- [ ] other metadata updated
- possibly error in data -- translation on p. 1043 begins with folio 24a but OCR coptic begins in the middle of folio 6a p. 533 -
pscyril.alexandria
- On Mary still in XML mode (auto tagging?)
- entities and identities
-
pscyril.jerusalem
- on the cross
- needs corpus name
- metadata updated
- chapter & verse need to be updated in spreadsheet based on open tags in XML
- on Mary
- needs corpus name
- metadata updated
- chapter & verse need to be updated in spreadsheet based on open tags in XML
- on the cross
-
psepiphanius on Mary
- translation span
- entities and identities
- metadata
-
pschrysostom
- translation span
- entities and identities
- metadata
-
pscelestinus
-
pstimothy.alex
-
psote.psoi
-
timothy.discourse