Synthetic trained spaCy pipelines for Latin NLP. Developed by Patrick J. Burns.
LatinCy models are trained on large amounts of Latin data, including all five Latin Universal Dependency treebanks, and deliver strong performance across core NLP tasks:
- POS tagging: 97.41% accuracy
- Lemmatization: 94.66% accuracy
- Morphological tagging: 92.76% accuracy
| Model | Description |
|---|---|
la_core_web_trf |
Transformer pipeline |
la_core_web_lg |
Large pipeline with floret vectors |
la_core_web_md |
Medium pipeline with floret vectors |
la_core_web_sm |
Small pipeline |
| Model | Description |
|---|---|
grc_dep_web_trf |
Transformer pipeline |
grc_dep_web_lg |
Large pipeline with floret vectors |
grc_dep_web_md |
Medium pipeline with floret vectors |
grc_dep_web_sm |
Small pipeline |
| Model | Framework |
|---|---|
la_udpipe_latincy |
UDPipe |
la_stanza_latincy |
Stanza |
la_flair_latincy |
Flair |
- Paper (arXiv)
- All models on HuggingFace
- LatinCy Dashboard — interactive demo
latincy-readers— corpus readers for Latin text collections
@misc{burns_latincy_2023,
title = {{LatinCy}: Synthetic Trained Pipelines for Latin {NLP}},
author = {Burns, Patrick J.},
url = {https://arxiv.org/abs/2305.04365v1},
date = {2023-05-07},
}