Skip to content

documentation #1

@sjneph

Description

@sjneph

I'm new to hdp. Your documentation is helpful. I noticed some issues.

'--data points to a file where each line is of the form (the LDA-C format):'

--data should read data without -- out front.

Regarding the documentation for inference, I am unable to run as you've shown and the output files are not as specified.

In order to not receive a core dump, I have to use --model_prefix. I would change:
'hdp --test_data data --directory my_model_dir '
to
hdp --test_data data --directory my_model_dir --model_prefix my_model_dir/final

The output files are different from the documentation:
test--topics.dat: the word counts for each topic, with each line as a topic
test
-word-assignments.dat: print each word's assignment to the topic and the table, which is in R-friendly format.
test.log: various information to monitor the Markov chain.
test-*.bin: the binary model file used for inference on newer data.

I do not receive any of those output files. Instead, I receive:
final-test.beta
final-test.doc.states
final-test.pi
final-test.topics
final-test.counts
final-test.info
final-test.log

What is most confusing to me though is that the program seems to recreate a new set of topics instead of using the topics and parameters found during the --train_data initial run. The number of topics are often different from those generated during training phase, for example. Perhaps new vocabulary found during the testing phase adds more topics? The number of items in the vocabulary according to final-test.info is the same as that from the training phase (though my test phase input does have words not found during training).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions