Reduce memory usage on loading embedding from txt by yeyinthtoon · Pull Request #191 · facebookresearch/MUSE

yeyinthtoon · 2021-11-26T19:39:50Z

Original implementation of read_txt_embeddings takes a lot of memory. For example, to load an embedding txt file that contains a vocab size of 2,000,000 with 300 embedding dimension, vectors list takes 643002,000,000=4.8 GB, np.concatenate takes 4.8 GB and torch.from_numpy takes 2.4 GB, totally it takes around 12 GB. Knowing vocab_size in advance and setting dtype of vector to np.float32, memory requirement can be reduced to around 2.4 GB instead of 12GB.

facebook-github-bot · 2021-11-26T19:39:54Z

Hi @yeyinthtoon!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

facebook-github-bot · 2021-11-26T20:13:52Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

yeyinthtoon added 2 commits November 27, 2021 00:03

fix read_txt_embeddings utility function to use less memory

92f4c45

fix bug when vocab size of txt file is smaller than max_vocab paramter

1acc851

facebook-github-bot added the CLA Signed label Nov 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory usage on loading embedding from txt#191

Reduce memory usage on loading embedding from txt#191
yeyinthtoon wants to merge 2 commits intofacebookresearch:mainfrom
yeyinthtoon:reduce_memory_usage_on_loading_embedding_from_txt

yeyinthtoon commented Nov 26, 2021

Uh oh!

facebook-github-bot commented Nov 26, 2021

Uh oh!

facebook-github-bot commented Nov 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yeyinthtoon commented Nov 26, 2021

Uh oh!

facebook-github-bot commented Nov 26, 2021

Action Required

Process

Uh oh!

facebook-github-bot commented Nov 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants