🍅 tomoto - high performance topic modeling - for Ruby
Add this line to your application’s Gemfile:
gem "tomoto"Train a model
model = Tomoto::LDA.new(k: 2)
model.add_doc(["tokens", "from", "document", "one"])
model.add_doc(["tokens", "from", "document", "two"])
model.add_doc(["tokens", "from", "document", "three"])
model.train(100) # iterationsGet the summary
model.summaryGet topic words
model.topic_wordsSave the model to a file
model.save("model.bin")Load the model from a file
model = Tomoto::LDA.load("model.bin")Get topic probabilities for a document
doc = model.docs[0]
doc.topicsGet the number of words for each topic
model.count_by_topicsGet the vocab
model.vocabsGet the log likelihood per word
model.ll_per_wordPerform inference for unseen documents
doc = model.make_doc(["unseen", "doc"])
topic_dist, ll = model.infer(doc)Supports:
- Latent Dirichlet Allocation (
LDA) - Labeled LDA (
LLDA) - Partially Labeled LDA (
PLDA) - Supervised LDA (
SLDA) - Dirichlet Multinomial Regression (
DMR) - Generalized Dirichlet Multinomial Regression (
GDMR) - Hierarchical Dirichlet Process (
HDP) - Hierarchical LDA (
HLDA) - Multi Grain LDA (
MGLDA) - Pachinko Allocation (
PA) - Hierarchical PA (
HPA) - Correlated Topic Model (
CT) - Dynamic Topic Model (
DT)
This library follows the tomotopy API. There are a few changes to make it more Ruby-like:
- The
get_prefix has been removed from methods (topic_wordsinstead ofget_topic_words) - Methods that return booleans use
?instead ofis_(live_topic?instead ofis_live_topic)
If a method or option you need isn’t supported, feel free to open an issue.
tomoto uses AVX2, AVX, or SSE2 instructions to increase performance on machines that support it. Check which instruction set architecture it’s using with:
Tomoto.isaChoose a parallelism algorithm with:
model.train(parallel: :partition)Supported values are :default, :none, :copy_merge, and :partition.
View the changelog
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone --recursive https://github.com/ankane/tomoto-ruby.git
cd tomoto-ruby
bundle install
bundle exec rake compile
bundle exec rake test