From 4c96765c1f24325e3c52c03512c6a3a9cdbd7042 Mon Sep 17 00:00:00 2001 From: whysage Date: Mon, 18 Apr 2022 15:49:01 +0300 Subject: [PATCH] Issue: 1265 Desc: add information about Ukrainian language apostrophes --- docs/crawl-vectors.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/crawl-vectors.md b/docs/crawl-vectors.md index 5a734861c..18ece4dc7 100644 --- a/docs/crawl-vectors.md +++ b/docs/crawl-vectors.md @@ -106,6 +106,8 @@ We used the [*Stanford word segmenter*](https://nlp.stanford.edu/software/segmen For languages using the Latin, Cyrillic, Hebrew or Greek scripts, we used the tokenizer from the [*Europarl*](http://www.statmt.org/europarl/) preprocessing tools. For the remaining languages, we used the ICU tokenizer. +For Ukrainian language apostrophes are removed. + More information about the training of these models can be found in the article [*Learning Word Vectors for 157 Languages*](https://arxiv.org/abs/1802.06893). ### License