-
Notifications
You must be signed in to change notification settings - Fork 15
Description
Early 2025 Corpora for release
Timeline
- REVIEW state by June 15
- version date 2025-08-15
- version n 6.2.0
High Priority
-
Nag Hammadi Marcion (do not check off until this issue is complete)
- G Philip
-
Life of Shenoute (Bohairic)
- @bkrawiec organizes & sends to @amir-zeldes
- Part 2
- metadata @ctschroeder
- @amir-zeldes & @EarlyCodices NLP
- @EarlyCodices translation and verses added
- @EarlyCodices add segmentation/tagging metadata information
- Part 3 @ctschroeder
- metadata
- @amir-zeldes & @EarlyCodices NLP
- @EarlyCodices translation and verses added
- @EarlyCodices add segmentation/tagging metadata information
- Parts 4-6 for next round
-
Bohairic Lausiac History
- Part 2
- metadata @ctschroeder
- @amir-zeldes & @EarlyCodices NLP
- @EarlyCodices translation and verses added
- @EarlyCodices add segmentation/tagging metadata information
- Part 3
- metadata @ctschroeder
- @amir-zeldes & @EarlyCodices NLP
- @EarlyCodices translation and verses added
- @EarlyCodices add segmentation/tagging metadata information
- Part 2
-
Paul Dilley's G Thomas @PaulCDilley
-
Sahidic Jonah (@stepcla) do not check off until this issue is complete
-
NBFB (Arabic trans by @PhilippeZaher)
-
maybe Abraham (Arabic trans by @PhilippeZaher)
- review & validation by @ctschroeder
-
Because of you too o prince of evil (Arabic trans & tagging corrections by @PhilippeZaher)
- review & validation by @ctschroeder
OCR @ctschroeder
- OCR'd material in GitDox (requires sentence splitting, auto-NLP); see OCR google doc from June for details
-
Do not check off until checked off in the OCR checklist issue
-
apocalypse.paul (2)
-
Ps Cyril of Alexandria on Mary
-
ps Cyril of Jerusalem on the cross (3 docs)
-
ps Cyril of Jerusalem on Mary
-
ps John Chrysostom on Raphael
-
psepiphanius
-
pscelestinus
-
pstimothy.alex
-
psote.psoi
-
timothy.discourse
-
should be placed in https://github.com/CopticScriptorium/auto-corpora
-
Part 1
-
- fully automatic tagging of OCR data in https://github.com/CopticScriptorium/OCR/tree/main/Processed%20OCR
- Do not check off until checked off in the OCR checklist issue
- Giron
- should be placed in https://github.com/CopticScriptorium/auto-corpora
Other fully automatic corpora @amir-zeldes + ??
-
Sahidic CoptOT corpus moved to it's own issue New CoptOT release #129
-
Life of Aphou (translations added -- by Aurelio and @PhilippeZaher )
- Part 2 (Arabic trans ready, English in progress; if English not complete by 6/15 still publish with Arabic)
-
Other Marcion? (full auto NLP)
- Resurrection of Jesus Christ (13 k words)
- metadata
- check pb xml:id's and ed_page_n's
- if time check bound groups
- chapters
- https://atlas.paths-erc.eu/manuscripts/191
- Resurrection of Jesus Christ (13 k words)
Other Manually annotated corpora
-
Life of Longinus and Lucius Part 1 (Arabic and English trans by Safaa and Randy)
-
AP (new + Arabic translations) (do not check off until this thread is complete) @cluckmarq
- entire AP corpus should be republished due to author change from anonymous to unknown
-
A22 (do not check off until this thread is complete) @bkrawiec
-
Crislip material @ctschroeder
-
So Listen (do not check off until this thread is complete) @ctschroeder
-
Tito's editions? (need conversion to unicode, versification, metadata)
- ** needs annotator assigned **
- Encomium on Raphael
- Cyril ?
Threads we should check/close with this release
- Regularize POS tag for Christos tagger-part-of-speech#6
- lemmatization tagger-part-of-speech#3
- check "redundant" fields #96
- Update about page to reflect developments in 23 CopticScriptorium.github.io#14
- treebank: https://github.com/CopticScriptorium/treebank-dev/issues/2
- Arabic_translation vs arabic_translation #141
Questionable corpora
- Life of Phib (translations added by R Komforty)
- Jonah dialects beyond Sahidic
- Bohairic (@EarlyCodices)
- Akhmimic?
- others?
- R Komforty's Life of Hilaria (Sahidic)??
- Sahidic Life of Antony (from HT, see June 2024 meeting notes)