Skip to content

Conversation

@luav
Copy link

@luav luav commented Aug 4, 2019

Fix for #8 (ported to the latest gensim), output extended with the .mat format

@luav luav changed the title Ported to the latest gensim, output extended with the .mat format Ported to the latest gensim, line model dimentionality fixed, output format extended Aug 4, 2019
Copy link
Owner

@GTmac GTmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the effort!

Copy link
Owner

@GTmac GTmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be your clone of DeepWalk -- could you change this to the official DeepWalk repo? Thanks!

Copy link
Owner

@GTmac GTmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this! I actually feel it would be better to set the default number of workers to a smaller value (for example, gensim word2vec uses 3: https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/word2vec.py#L660). Sometimes it is not desirable to use up all the CPUs, especially when you are running a model on a shared server. Let me know your thoughts :-)

Copy link
Owner

@GTmac GTmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for adding this!

src/baseline.py Outdated
@@ -1,7 +1,10 @@
from gensim.models import Word2Vec
from gensim.models.word2vec import Word2Vec
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could you remove that extra space? Thanks!

Copy link
Owner

@GTmac GTmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

Copy link
Owner

@GTmac GTmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

Copy link
Owner

@GTmac GTmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit is touching the model part, so have you tested this change in terms of classification performance? Do you still get similar classification F1 score compared to the numbers in the paper / README file? Thanks!

@luav
Copy link
Author

luav commented Aug 9, 2019

set the default number of workers to a smaller value (for example, gensim word2vec uses 3

Up to you, in all other functions I set the default number of workers to at least half of the available logical cpus: int(cpu_num() / 2) + 1. Any hardcoded value is not desirable, because the host might have 1 or 2 cores then a harcoded value 3 may affect the performance unlike the value dependent on cpu_num.

This commit is touching the model part, so have you tested this change in terms of classification

I have not changed any model parameters or internals except adaptation to the updated gensim API. I did perform training and evaluation of the harp + deepwalk / line embeddings on my datasets and they look fine.

@luav
Copy link
Author

luav commented Aug 9, 2019

This seems to be your clone of DeepWalk -- could you change this to the official DeepWalk repo? Thanks!

In the official deepwalk, the walks persistence expects text and not numbers (I made a pull request to the official repository). I'm not sure whether the numerical values necessity there was caused by some bugs that occurred and fixed during Harp porting to the updated Deepwalk and gensim, or the numeric walk items are required by Harp from Deepwalk. Anyway, Harp works fine with the extended version (accepting numerical walk items) of Deepwalk in my repository but I have not tested whether it works with the official repository without that extension.

@luav
Copy link
Author

luav commented Aug 9, 2019

This seems to be your clone of DeepWalk -- could you change this to the official DeepWalk repo? Thanks!

In the official deepwalk, the walks persistence expects text and not numbers (I made a pull request to the official repository). I'm not sure whether the numerical values necessity there was caused by some bugs that occurred and fixed during Harp porting to the updated Deepwalk and gensim, or the numeric walk items are required by Harp from Deepwalk. Anyway, Harp works fine with the extended version (accepting numerical walk items) of Deepwalk in my repository but I have not tested whether it works with the official repository without that extension.

I just verified, the original latest Deepwalk lacks support of the numerical walk items to work with HARP, so the specified repository should be used until this Deepwalk pull request is merged.

@luav
Copy link
Author

luav commented Aug 9, 2019

set the default number of workers to a smaller value (for example, gensim word2vec uses 3

Up to you, in all other functions I set the default number of workers to at least half of the available logical cpus: int(cpu_num() / 2) + 1. Any hardcoded value is not desirable, because the host might have 1 or 2 cores then a harcoded value 3 may affect the performance unlike the value dependent on cpu_num.

Workers number is set to 1 by default and to cpu_num for the specified workers = -1 since
71c490c.

@luav luav closed this Aug 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants