update readme

bact · bact · commit 92da9e017b01 · 2019-04-21T01:03:07.000+02:00
diff --git a/README-pypi.md b/README-pypi.md
@@ -11,8 +11,8 @@ PyThaiNLP includes Thai word tokenizers, transliterators, soundex converters, pa
 ## What's new in 2.0 ?
 
 - Terminate Python 2 support. Remove all Python 2 compatibility code.
-- Improved `word_tokenize` ("newmm" and "mm" engine) and `dict_word_tokenize`
-- Improved Part-Of-Speech tagging
+- Improved `word_tokenize` ("newmm" and "mm" engine), a `custom_dict` dictionary can be provided
+- Improved `pos_tag` Part-Of-Speech tagging
 - New `NorvigSpellChecker` spell checker class, which can be initialized with custom dictionary.
 - New `thai2fit` (replacing `thai2vec`, upgrade ULMFiT-related code to fastai 1.0)
 - Updated ThaiNER to 1.0
diff --git a/README.md b/README.md
@@ -26,15 +26,15 @@ PyThaiNLP is a Python package for text processing and linguistic analysis, simil
 
 ## Capabilities
 
-- Convenient character and word classes, like Thai consonants (```pythainlp.thai_consonants```), vowels (```pythainlp.thai_vowels```), digits (```pythainlp.thai_digits```), and stop words (```pythainlp.corpus.thai_stopwords```) -- comparable to constants like ```string.letters```, ```string.digits```, and ```string.punctuation```
-- Thai word segmentation (```word_tokenize```), including subword segmentation based on Thai Character Cluster (```subword_tokenize```)
-- Thai transliteration (```transliterate```)
-- Thai part-of-speech taggers (```pos_tag```)
-- Read out number to Thai words (```bahttext```, ```num_to_thaiword```)
-- Thai collation (sort by dictionoary order) (```collate```)
-- Thai-English keyboard misswitched fix (```eng_to_thai```, ```thai_to_eng```)
-- Thai spelling suggestion and correction (```spell``` and ```correct```)
-- Thai soundex (```lk82```, ```udom83```, ```metasound```)
+- Convenient character and word classes, like Thai consonants (`pythainlp.thai_consonants`), vowels (`pythainlp.thai_vowels`), digits (`pythainlp.thai_digits`), and stop words (`pythainlp.corpus.thai_stopwords`) -- comparable to constants like `string.letters`, `string.digits`, and `string.punctuation`
+- Thai word segmentation (`word_tokenize`), including subword segmentation based on Thai Character Cluster (`subword_tokenize`)
+- Thai transliteration (`transliterate`)
+- Thai part-of-speech taggers (`pos_tag`)
+- Read out number to Thai words (`bahttext`, `num_to_thaiword`)
+- Thai collation (sort by dictionoary order) (`collate`)
+- Thai-English keyboard misswitched fix (`eng_to_thai`, `thai_to_eng`)
+- Thai spelling suggestion and correction (`spell` and `correct`)
+- Thai soundex (`soundex`) with three engines (`lk82`, `udom83`, `metasound`)
 - Thai WordNet wrapper
 - and much more - see examples in [PyThaiNLP Get Started notebook](https://github.com/PyThaiNLP/pythainlp/blob/dev/notebooks/pythainlp-get-started.ipynb).
 
@@ -62,20 +62,20 @@ For some advanced functionalities, like word vector, extra packages may be neede
 $ pip install pythainlp[extra1,extra2,...]
 ```
 
-where ```extras``` can be
-  - ```artagger``` (to support artagger part-of-speech tagger)*
-  - ```deepcut``` (to support deepcut machine-learnt tokenizer)
-  - ```icu``` (for ICU, International Components for Unicode, support in transliteration and tokenization)
-  - ```ipa``` (for IPA, International Phonetic Alphabet, support in transliteration)
-  - ```ml``` (to support fastai 1.0.22 ULMFiT models)
-  - ```ner``` (for named-entity recognizer)
-  - ```thai2fit``` (for Thai word vector)
-  - ```thai2rom``` (for machine-learnt romanization)
-  - ```full``` (install everything)
+where `extras` can be
+  - `artagger` (to support artagger part-of-speech tagger)*
+  - `deepcut` (to support deepcut machine-learnt tokenizer)
+  - `icu` (for ICU, International Components for Unicode, support in transliteration and tokenization)
+  - `ipa` (for IPA, International Phonetic Alphabet, support in transliteration)
+  - `ml` (to support fastai 1.0.22 ULMFiT models)
+  - `ner` (for named-entity recognizer)
+  - `thai2fit` (for Thai word vector)
+  - `thai2rom` (for machine-learnt romanization)
+  - `full` (install everything)
 
-* Note: standard ```artagger``` package from PyPI will not work on Windows, please ```pip install https://github.com/wannaphongcom/artagger/tarball/master#egg=artagger``` instead.
+* Note: standard `artagger` package from PyPI will not work on Windows, please ```pip install https://github.com/wannaphongcom/artagger/tarball/master#egg=artagger``` instead.
 
-** see ```extras``` and ```extras_require``` in [```setup.py```](https://github.com/PyThaiNLP/pythainlp/blob/dev/setup.py) for package details.
+** see `extras` and `extras_require` in [`setup.py`](https://github.com/PyThaiNLP/pythainlp/blob/dev/setup.py) for package details.
 
 ## Documentation
 
@@ -114,15 +114,15 @@ PyThaiNLP เป็นไลบารีภาษาไพทอนเพื่
 
 ## ความสามารถ
 
-- ชุดค่าคงที่ตัวอักษระและคำไทยที่เรียกใช้ได้สะดวก เช่น พยัญชนะ (```pythainlp.thai_consonants```), สระ (```pythainlp.thai_vowels```), ตัวเลขไทย (```pythainlp.thai_digits```), และ stop word (```pythainlp.corpus.thai_stopwords```) -- เหมือนกับค่าคงที่อย่าง ```string.letters```, ```string.digits```, และ ```string.punctuation```
-- ตัดคำภาษาไทย (```word_tokenize```) และรองรับการตัดระดับต่ำกว่าคำโดยใช้ Thai Character Clusters (```subword_tokenize```)
-- ถอดเสียงภาษาไทยเป็นอักษรละตินและสัทอักษร (```transliterate```)
-- ระบุชนิดคำ (part-of-speech) ภาษาไทย (```pos_tag```)
-- อ่านตัวเลขเป็นข้อความภาษาไทย (```bahttext```, ```num_to_thaiword```)
-- เรียงลำดับคำตามพจนานุกรมไทย (```collate```)
-- แก้ไขปัญหาการพิมพ์ลืมเปลี่ยนภาษา (```eng_to_thai```, ```thai_to_eng```)
-- ตรวจคำสะกดผิดในภาษาไทย (```spell```, ```correct```)
-- soundex ภาษาไทย (```lk82```, ```udom83```, ```metasound```)
+- ชุดค่าคงที่ตัวอักษระและคำไทยที่เรียกใช้ได้สะดวก เช่น พยัญชนะ (`pythainlp.thai_consonants`), สระ (`pythainlp.thai_vowels`), ตัวเลขไทย (`pythainlp.thai_digits`), และ stop word (`pythainlp.corpus.thai_stopwords`) -- เหมือนกับค่าคงที่อย่าง `string.letters`, `string.digits`, และ `string.punctuation`
+- ตัดคำภาษาไทย (`word_tokenize`) และรองรับการตัดระดับต่ำกว่าคำโดยใช้ Thai Character Clusters (`subword_tokenize`)
+- ถอดเสียงภาษาไทยเป็นอักษรละตินและสัทอักษร (`transliterate`)
+- ระบุชนิดคำ (part-of-speech) ภาษาไทย (`pos_tag`)
+- อ่านตัวเลขเป็นข้อความภาษาไทย (`bahttext`, `num_to_thaiword`)
+- เรียงลำดับคำตามพจนานุกรมไทย (`collate`)
+- แก้ไขปัญหาการพิมพ์ลืมเปลี่ยนภาษา (`eng_to_thai`, `thai_to_eng`)
+- ตรวจคำสะกดผิดในภาษาไทย (`spell`, `correct`)
+- soundex ภาษาไทย (`soundex`) 3 วิธีการ (`lk82`, `udom83`, `metasound`)
 - Thai WordNet wrapper
 - และอื่น ๆ ดูตัวอย่างได้ใน [PyThaiNLP Get Started notebook](https://github.com/PyThaiNLP/pythainlp/blob/dev/notebooks/pythainlp-get-started.ipynb)
 
@@ -146,20 +146,20 @@ $ pip install https://github.com/PyThaiNLP/pythainlp/archive/dev.zip
 $ pip install pythainlp[extra1,extra2,...]
 ```
 
-โดยที่ ```extras``` คือ
-  - ```artagger``` (สำหรับตัวติดป้ายกำกับชนิดคำ artagger)*
-  - ```deepcut``` (สำหรับตัวตัดคำ deepcut)
-  - ```icu``` (สำหรับการถอดตัวสะกดเป็นสัทอักษรและการตัดคำด้วย ICU)
-  - ```ipa``` (สำหรับการถอดตัวสะกดเป็นสัทอักษรสากล (IPA))
-  - ```ml``` (สำหรับการรองรับโมเดล ULMFiT)
-  - ```ner``` (สำหรับการติดป้ายชื่อเฉพาะ (named-entity))
-  - ```thai2fit``` (สำหรับ word vector)
-  - ```thai2rom``` (สำหรับการถอดตัวสะกดเป็นอักษรละติน)
-  - ```full``` (ติดตั้งทุกอย่าง)
+โดยที่ `extras` คือ
+  - `artagger` (สำหรับตัวติดป้ายกำกับชนิดคำ artagger)*
+  - `deepcut` (สำหรับตัวตัดคำ deepcut)
+  - `icu` (สำหรับการถอดตัวสะกดเป็นสัทอักษรและการตัดคำด้วย ICU)
+  - `ipa` (สำหรับการถอดตัวสะกดเป็นสัทอักษรสากล (IPA))
+  - `ml` (สำหรับการรองรับโมเดล ULMFiT)
+  - `ner` (สำหรับการติดป้ายชื่อเฉพาะ (named-entity))
+  - `thai2fit` (สำหรับ word vector)
+  - `thai2rom` (สำหรับการถอดตัวสะกดเป็นอักษรละติน)
+  - `full` (ติดตั้งทุกอย่าง)
 
-* หมายเหตุ: แพคเกจ ```artagger``` มาตรฐานจาก PyPI อาจมีปัญหาการถอดรหัสข้อความบน Windows กรุณาติดตั้ง artagger รุ่นแก้ไขด้วยคำสั่ง ```pip install https://github.com/wannaphongcom/artagger/tarball/master#egg=artagger``` แทน ก่อนจะติดตั้ง PyThaiNLP
+* หมายเหตุ: แพคเกจ `artagger` มาตรฐานจาก PyPI อาจมีปัญหาการถอดรหัสข้อความบน Windows กรุณาติดตั้ง artagger รุ่นแก้ไขด้วยคำสั่ง ```pip install https://github.com/wannaphongcom/artagger/tarball/master#egg=artagger``` แทน ก่อนจะติดตั้ง PyThaiNLP
 
-** นักพัฒนาสามารถดู ```extras``` และ ```extras_require``` ใน [```setup.py```](https://github.com/PyThaiNLP/pythainlp/blob/dev/setup.py) สำหรับรายละเอียดแพคเกจของเสริม
+** สามารถดู `extras` และ `extras_require` ใน [`setup.py`](https://github.com/PyThaiNLP/pythainlp/blob/dev/setup.py) สำหรับรายละเอียดแพคเกจของเสริม
 
 ## เอกสารการใช้งาน
 
diff --git a/docs/api/tokenize.rst b/docs/api/tokenize.rst
@@ -8,10 +8,10 @@ The :class:`pythainlp.tokenize` contains multiple functions for tokenizing a chu
 Modules
 -------
 
+.. autofunction:: sent_tokenize
 .. autofunction:: word_tokenize
-.. autofunction:: dict_word_tokenize
+.. autofunction:: syllable_tokenize
 .. autofunction:: subword_tokenize
-.. autofunction:: sent_tokenize
 .. autofunction:: dict_trie
 .. autoclass:: Tokenizer
    :members: word_tokenize, set_tokenize_engine