After the weight refactor, some tie weight might not have gone as expected. If the lm head was removed, then we are kinda fucked when using `from_pretrained`. #42362

ArthurZucker · 2025-11-24T14:44:50Z

What does this PR do?

Tie weights

from transformers import AutoModel, MvpModel, MvpForConditionalGeneration
from transformers import AutoformerModel

MvpForConditionalGeneration._tied_weights_keys = {"model.shared.weight":"lm_head.weight", "model.decoder.embed_tokens.weight": "model.shared.weight", "model.encoder.embed_tokens.weight":"lm_head.weight"}
model = MvpForConditionalGeneration.from_pretrained("RUCAIBox/mvp")

MvpModel._tied_weights_keys = {"model.shared.weight":"lm_head.weight", "model.decoder.embed_tokens.weight": "model.shared.weight", "model.encoder.embed_tokens.weight":"lm_head.weight"}
model = MvpForConditionalGeneration.from_pretrained("RUCAIBox/mvp")

There are a few issues, I think we are gonna put the tie weight keys into a weight converter -> properly copies them even if the source key is unexpected?

model = UMT5EncoderModel.from_pretrained('google/umt5-xxl')

fix

47c3028

ArthurZucker changed the title ~~fix~~ After the weight refactor, some tie weight might not have gone as expected. If the lm head was removed, then we are kinda fucked when using from_pretrained. Nov 24, 2025

remi-or mentioned this pull request Nov 24, 2025

Many small fixes for the CI #42364

Merged

ebezzam mentioned this pull request Nov 25, 2025

Remove unnecessary tied weights for Seamless M4T? #42377

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

After the weight refactor, some tie weight might not have gone as expected. If the lm head was removed, then we are kinda fucked when using `from_pretrained`. #42362

After the weight refactor, some tie weight might not have gone as expected. If the lm head was removed, then we are kinda fucked when using `from_pretrained`. #42362

Uh oh!

ArthurZucker commented Nov 24, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

After the weight refactor, some tie weight might not have gone as expected. If the lm head was removed, then we are kinda fucked when using from_pretrained. #42362

Are you sure you want to change the base?

After the weight refactor, some tie weight might not have gone as expected. If the lm head was removed, then we are kinda fucked when using from_pretrained. #42362

Uh oh!

Conversation

ArthurZucker commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

After the weight refactor, some tie weight might not have gone as expected. If the lm head was removed, then we are kinda fucked when using `from_pretrained`. #42362

After the weight refactor, some tie weight might not have gone as expected. If the lm head was removed, then we are kinda fucked when using `from_pretrained`. #42362

ArthurZucker commented Nov 24, 2025 •

edited

Loading