You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Remove old, obsolated, deprecated, and experimental code.
16
-
- Thai2fit (Upgrade ULMFiT-related codes to fastai 1.0)
17
-
- ThaiNER 1.0
18
-
- Remove sentiment analysis
19
-
- Improved word_tokenize (newmm, mm) and dict_word_tokenize
20
-
- Improved POS-tagging
21
-
- See examples in [Get Started notebook](https://github.com/PyThaiNLP/pythainlp/blob/dev/notebooks/pythainlp-get-started.ipynb)
14
+
- Improved `word_tokenize` ("newmm" and "mm" engine), a `custom_dict` dictionary can be provided
15
+
- Improved `pos_tag` Part-Of-Speech tagging
16
+
- New `NorvigSpellChecker` spell checker class, which can be initialized with custom dictionary.
17
+
- New `thai2fit` (replacing `thai2vec`, upgrade ULMFiT-related code to fastai 1.0)
18
+
- Updated ThaiNER to 1.0
19
+
- You may need to [update your existing ThaiNER models from PyThaiNLP 1.7](https://github.com/PyThaiNLP/pythainlp/wiki/Upgrade-ThaiNER-from-PyThaiNLP-1.7-to-PyThaiNLP-2.0)
20
+
- Remove old, obsolated, deprecated, duplicated, and experimental code.
21
+
- Sentiment analysis is no longer part of the library, but rather [a text classification example](https://github.com/PyThaiNLP/pythainlp/blob/dev/notebooks/sentiment_analysis.ipynb).
22
+
- See more examples in [Get Started notebook](https://github.com/PyThaiNLP/pythainlp/blob/dev/notebooks/pythainlp-get-started.ipynb)
Copy file name to clipboardExpand all lines: README.md
+44-45Lines changed: 44 additions & 45 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ PyThaiNLP is a Python package for text processing and linguistic analysis, simil
16
16
17
17
**This is a document for development branch (post 2.0). Things will break.**
18
18
19
-
- The latest stable release is [2.0.3](https://github.com/PyThaiNLP/pythainlp/tree/master)
19
+
- The latest stable release is [2.0.4](https://github.com/PyThaiNLP/pythainlp/tree/master)
20
20
- PyThaiNLP 2 supports Python 3.6+. Some functions may work with older version of Python 3, but it is not well-tested and will not be supported. See [change log](https://github.com/PyThaiNLP/pythainlp/issues/118).
21
21
-[Upgrading from 1.7](https://thainlp.org/pythainlp/docs/2.0/notes/pythainlp-1_7-2_0.html)
22
22
-[Upgrade ThaiNER from 1.7](https://github.com/PyThaiNLP/pythainlp/wiki/Upgrade-ThaiNER-from-PyThaiNLP-1.7-to-PyThaiNLP-2.0)
@@ -26,15 +26,15 @@ PyThaiNLP is a Python package for text processing and linguistic analysis, simil
26
26
27
27
## Capabilities
28
28
29
-
- Convenient character and word classes, like Thai consonants (```pythainlp.thai_consonants```), vowels (```pythainlp.thai_vowels```), digits (```pythainlp.thai_digits```), and stop words (```pythainlp.corpus.thai_stopwords```) -- comparable to constants like ```string.letters```, ```string.digits```, and ```string.punctuation```
30
-
- Thai word segmentation (```word_tokenize```), including subword segmentation based on Thai Character Cluster (```tcc```) and ETCC (```etcc```)
31
-
- Thai romanization and transliteration (```romanize```, ```transliterate```)
32
-
- Thai part-of-speech taggers (```pos_tag```)
33
-
- Read out number to Thai words (```bahttext```, ```num_to_thaiword```)
34
-
- Thai collation (sort by dictionoary order) (```collate```)
- Convenient character and word classes, like Thai consonants (`pythainlp.thai_consonants`), vowels (`pythainlp.thai_vowels`), digits (`pythainlp.thai_digits`), and stop words (`pythainlp.corpus.thai_stopwords`) -- comparable to constants like `string.letters`, `string.digits`, and `string.punctuation`
30
+
- Thai word segmentation (`word_tokenize`), including subword segmentation based on Thai Character Cluster (`subword_tokenize`)
31
+
- Thai transliteration (`transliterate`)
32
+
- Thai part-of-speech taggers (`pos_tag`)
33
+
- Read out number to Thai words (`bahttext`, `num_to_thaiword`)
34
+
- Thai collation (sort by dictionoary order) (`collate`)
- Thai spelling suggestion and correction (`spell` and `correct`)
37
+
- Thai soundex (`soundex`) with three engines (`lk82`, `udom83`, `metasound`)
38
38
- Thai WordNet wrapper
39
39
- and much more - see examples in [PyThaiNLP Get Started notebook](https://github.com/PyThaiNLP/pythainlp/blob/dev/notebooks/pythainlp-get-started.ipynb).
40
40
@@ -62,20 +62,20 @@ For some advanced functionalities, like word vector, extra packages may be neede
62
62
$ pip install pythainlp[extra1,extra2,...]
63
63
```
64
64
65
-
where ```extras``` can be
66
-
-```artagger``` (to support artagger part-of-speech tagger)*
67
-
-```deepcut``` (to support deepcut machine-learnt tokenizer)
68
-
-```icu``` (for ICU support in transliteration and tokenization)
69
-
-```ipa``` (for International Phonetic Alphabet support in transliteration)
70
-
-```ml``` (to support fastai 1.0.22 ULMFiT models)
71
-
-```ner``` (for named-entity recognizer)
72
-
-```thai2fit``` (for Thai word vector)
73
-
-```thai2rom``` (for machine-learnt romanization)
74
-
-```full``` (install everything)
65
+
where `extras` can be
66
+
-`artagger` (to support artagger part-of-speech tagger)*
67
+
-`deepcut` (to support deepcut machine-learnt tokenizer)
68
+
-`icu` (for ICU, International Components for Unicode, support in transliteration and tokenization)
69
+
-`ipa` (for IPA, International Phonetic Alphabet, support in transliteration)
70
+
-`ml` (to support fastai 1.0.22 ULMFiT models)
71
+
-`ner` (for named-entity recognizer)
72
+
-`thai2fit` (for Thai word vector)
73
+
-`thai2rom` (for machine-learnt romanization)
74
+
-`full` (install everything)
75
75
76
-
* Note: standard ```artagger``` package from PyPI will not work on Windows, please ```pip install https://github.com/wannaphongcom/artagger/tarball/master#egg=artagger``` instead.
76
+
* Note: standard `artagger` package from PyPI will not work on Windows, please ```pip install https://github.com/wannaphongcom/artagger/tarball/master#egg=artagger``` instead.
77
77
78
-
** see ```extras``` and ```extras_require``` in [```setup.py```](https://github.com/PyThaiNLP/pythainlp/blob/dev/setup.py) for package details.
78
+
** see `extras` and `extras_require` in [`setup.py`](https://github.com/PyThaiNLP/pythainlp/blob/dev/setup.py) for package details.
0 commit comments