documentation

I'm new to hdp.  Your documentation is helpful.  I noticed some issues.

1.
'--data points to a file where each line is of the form (the LDA-C format):'

--data should read data without -- out front.

2.
Regarding the documentation for inference, I am unable to run as you've shown and the output files are not as specified.

In order to not receive a core dump, I have to use --model_prefix.  I would change:
'hdp --test_data  data --directory my_model_dir '
to
hdp --test_data  data --directory my_model_dir --model_prefix my_model_dir/final

The output files are different from the documentation:
test-_-topics.dat: the word counts for each topic, with each line as a topic
test_-word-assignments.dat: print each word's assignment to the topic and the table, which is in R-friendly format.
test.log: various information to monitor the Markov chain.
test-*.bin: the binary model file used for inference on newer data.

I do not receive any of those output files.  Instead, I receive:
final-test.beta
final-test.doc.states
final-test.pi
final-test.topics
final-test.counts
final-test.info
final-test.log

What is most confusing to me though is that the program seems to recreate a new set of topics instead of using the topics and parameters found during the --train_data initial run.  The number of topics are often different from those generated during training phase, for example.  Perhaps new vocabulary found during the testing phase adds more topics?  The number of items in the vocabulary according to final-test.info is the same as that from the training phase (though my test phase input does have words not found during training).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

documentation #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

documentation #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions