Skip to content

Conversation

@ArthurZucker
Copy link
Collaborator

@ArthurZucker ArthurZucker commented Nov 24, 2025

What does this PR do?

Tie weights

from transformers import AutoModel, MvpModel, MvpForConditionalGeneration
from transformers import AutoformerModel

MvpForConditionalGeneration._tied_weights_keys = {"model.shared.weight":"lm_head.weight", "model.decoder.embed_tokens.weight": "model.shared.weight", "model.encoder.embed_tokens.weight":"lm_head.weight"}
model = MvpForConditionalGeneration.from_pretrained("RUCAIBox/mvp")

MvpModel._tied_weights_keys = {"model.shared.weight":"lm_head.weight", "model.decoder.embed_tokens.weight": "model.shared.weight", "model.encoder.embed_tokens.weight":"lm_head.weight"}
model = MvpForConditionalGeneration.from_pretrained("RUCAIBox/mvp")

There are a few issues, I think we are gonna put the tie weight keys into a weight converter -> properly copies them even if the source key is unexpected?

model = UMT5EncoderModel.from_pretrained('google/umt5-xxl')

@ArthurZucker ArthurZucker changed the title fix After the weight refactor, some tie weight might not have gone as expected. If the lm head was removed, then we are kinda fucked when using from_pretrained. Nov 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants