Skip to content

Dynamically build clusters#55

Draft
robojumper wants to merge 1 commit intoMijago:masterfrom
robojumper:dynamically_cluster
Draft

Dynamically build clusters#55
robojumper wants to merge 1 commit intoMijago:masterfrom
robojumper:dynamically_cluster

Conversation

@robojumper
Copy link
Copy Markdown

Instead of hardcoding the cluster count and centroids, this uses k-means clustering to build clusters from the selected items only.

This does not include the necessary lockfile (package_lock.json) update because my npm installation insisted on lockfile format 2.


Please consider adding a license to this project.

@Mijago
Copy link
Copy Markdown
Owner

Mijago commented Nov 20, 2021

Hi, did you test the accuracy, spread and significance of the generated clusters?
I pre-calculated the clusters using over 5000 armor pieces to gain the most accurate results (I could do the same now with 2 million armor pieces), hence the hard-coded clusters.

@robojumper robojumper marked this pull request as draft November 20, 2021 16:28
@robojumper
Copy link
Copy Markdown
Author

Do you have a documented way to obtain these stats to compare? I suspect we have different goals for the clustering feature and the pre-generated clusters simply do something different from what I want to achieve. I'm fine if this doesn't end up getting merged and would love to know more details about what you want the clustering feature to be.

In any case, what prompted this change was that the items in my vault are very biased towards particular stats (and stat totals! the pre-generated clusters have totals means of 57-63 and mine have 63-65) and a large number of centroids simply don't contribute at all or create clusters with very few items. E.g. of 72 legendary armor pieces on my warlock, the existing centroids create 10 clusters with 0 items, 7 clusters with 1 item, and 2 clusters with 2 items. This means 85% of my armor (61/72) is assigned to 25% of the buckets (6/25) and tells me that 11 items are close to a cluster of potentially existing armor that I categorically tend to not keep (but I kept them for a reason!), but removes a lot of detail in the 6 buckets and makes it difficult to find pieces I would consider similar enough to consider not worth keeping both of.

A weakness of this k-means clustering approach is the randomization -- the clusters between different runs can be very different. I dont know which algorithm the cluster pre-generation used and how you chose to deal with the randomness, if any.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

2 participants