forked from alisonmsmith/topicflow
-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
Description
Currently, topic values are sized randomly, since the paper or original code did not clarify where the toy dataset values came from:
Lines 344 to 352 in 942457c
| for i in range(len(month_list)): | |
| for j in range(10): | |
| tmp = {} | |
| name = str(i) + '_' + str(j) | |
| # how to calculate the value of a topic? the paper didn't define clearly | |
| # so here I use a random number | |
| value = np.random.randint(1,100) | |
| tmp['name'], tmp['value'] = name, value | |
| nodes.append(tmp) |
I believe these values are mapped to the boxes associated with each topic in the visualization, but this should also be verified.
Given our work since then with other visualizations, it is safe to say the size of the boxes is associated with the number of documents deterministically assigned to the topics (via maximum likelihood or threshold).
If topic values represent box sizes, this should be changed to contain the correct information or at least a fixed size. Using random values is misleading.