Skip to content

2471023025/langdetect_zh

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

langdetect_zh

Installation

$ pip install langdetect_zh

Supported Python versions 2.7, 3.4+.

Languages

langdetect_zh supports 2 languages out of the box (ISO 639-1 codes):

zh-cn, zh-tw

Basic usage

Directly output the most similar language code:

>>> from langdetect_zh import detect
>>> detect("这是一段中文文本")
'zh-cn'

To find out the probabilities for the top languages:

>>> from langdetect_zh import detect_langs
>>> detect_langs("这是一段中文文本")
[zh-cn:0.999997316441747]

NOTE

Language detection algorithm is non-deterministic, which means that if you try to run it on a text which is either too short or too ambiguous, you might get different results everytime you run it.

To enforce consistent results, call following code before the first language detection:

from langdetect_zh import DetectorFactory
DetectorFactory.seed = 0

Original project

This package is an optimization of langdetect. The specific optimization measure is to subdivide simplified Chinese and traditional Chinese under the condition of pure Chinese.

About

Google's langdetect modified for Chinese texts

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages