-
Notifications
You must be signed in to change notification settings - Fork 285
newmm tokenization
Wannaphong Phatthiyaphaibun edited this page Dec 14, 2020
·
3 revisions
newmm is a code name for The next maximal matching engine on PyThaiNLP. (It's not real name of word tokenizer engine.) It is a default of pythainlp.word_tokenize. Now, newmm is onecut engine.
- multi_cut (PyThaiNLP 1.4 - 1.5): Thai word segmentation with maximum matching. The original source code is from Korakot Chaovavanich. Now, It's
mmengine in PyThaiNLP. - onecut (PyThaiNLP 1.6 - Now): Dictionary-based maximal matching word segmentation, constrained with Thai Character Cluster (TCC) boundaries. created by Korakot Chaovavanich
PyThaiNLP