Skip to content
@latincy

LatinCy

Models/tools/data for Latin NLP

LatinCy

Synthetic trained spaCy pipelines for Latin NLP. Developed by Patrick J. Burns.

LatinCy models are trained on large amounts of Latin data, including all five Latin Universal Dependency treebanks, and deliver strong performance across core NLP tasks:

  • POS tagging: 97.41% accuracy
  • Lemmatization: 94.66% accuracy
  • Morphological tagging: 92.76% accuracy

Models

Latin (spaCy)

Model Description
la_core_web_trf Transformer pipeline
la_core_web_lg Large pipeline with floret vectors
la_core_web_md Medium pipeline with floret vectors
la_core_web_sm Small pipeline

Ancient Greek (spaCy)

Model Description
grc_dep_web_trf Transformer pipeline
grc_dep_web_lg Large pipeline with floret vectors
grc_dep_web_md Medium pipeline with floret vectors
grc_dep_web_sm Small pipeline

Multi-framework

Model Framework
la_udpipe_latincy UDPipe
la_stanza_latincy Stanza
la_flair_latincy Flair

Links

Citation

@misc{burns_latincy_2023,
    title = {{LatinCy}: Synthetic Trained Pipelines for Latin {NLP}},
    author = {Burns, Patrick J.},
    url = {https://arxiv.org/abs/2305.04365v1},
    date = {2023-05-07},
}

Pinned Loading

  1. latincy-readers latincy-readers Public

    LatinCy-powered corpus readers for Latin text collections

    Python 7

Repositories

Showing 10 of 13 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…