Skip to content

fix random assignment of topic value (box size?) #8

@carlosparadis

Description

@carlosparadis

Currently, topic values are sized randomly, since the paper or original code did not clarify where the toy dataset values came from:

topicflow/run.py

Lines 344 to 352 in 942457c

for i in range(len(month_list)):
for j in range(10):
tmp = {}
name = str(i) + '_' + str(j)
# how to calculate the value of a topic? the paper didn't define clearly
# so here I use a random number
value = np.random.randint(1,100)
tmp['name'], tmp['value'] = name, value
nodes.append(tmp)

I believe these values are mapped to the boxes associated with each topic in the visualization, but this should also be verified.

Given our work since then with other visualizations, it is safe to say the size of the boxes is associated with the number of documents deterministically assigned to the topics (via maximum likelihood or threshold).

If topic values represent box sizes, this should be changed to contain the correct information or at least a fixed size. Using random values is misleading.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions