Skip to content

mimno/PyMallet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyMallet

This package provides tools for extracting latent semantic representations of text, particularly probabilistic topic models.

The implementation of LDA uses Gibbs sampling, which is simple but reliable. People often find the resulting models more useful than the stochastic variational algorithm used in Gensim.

To compile:

python setup.py build_ext --inplace

As an example, the sample_data directory contains 10000 posts from the stats Stack Exchange forum.

To run on this sample collection with 50 topics:

python lda.py sample_data/stats_10k.txt 50

The script lda_reference.py contains a reference implementation in pure Python (no Cython) to compare speed. The Cython version is currently about 100x faster.

About

Python tools for text

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published