feat: add Danish corpus by JeppeKlitgaard · Pull Request #61 · Apsu/cmini

JeppeKlitgaard · 2024-10-19T23:18:40Z

This adds a Danish corpus computed from Wortschatz using the analyser found at https://github.com/JeppeKlitgaard/Corpora/ (see: https://github.com/JeppeKlitgaard/Corpora/blob/master/analyser/danish_recipe.json)

This contains the computed monograms, bigrams, trigrams, and words from 3 million sentences sourced from Wortschatz. 1 million originates from news, 1 million from web, and 1 million from wikipedia entries.

The words.json can either be truncated or omitted as it is quite large.

I can quickly generate these for other languages as well if there is interest.

Let me know if this is useful and whether I need to make changes to get this into cmini.

f5b7 · 2025-01-29T12:24:33Z

I can quickly generate these for other languages as well if there is interest.

German, please. 🙏❤

feat: add Danish corpus

e7178ae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Danish corpus#61

feat: add Danish corpus#61
JeppeKlitgaard wants to merge 1 commit intoApsu:masterfrom
JeppeKlitgaard:master

JeppeKlitgaard commented Oct 19, 2024

Uh oh!

f5b7 commented Jan 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JeppeKlitgaard commented Oct 19, 2024

Uh oh!

f5b7 commented Jan 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants