Skip to content

Conversation

@martin-marek
Copy link

Qwen3 0.6B, 1.7B, and 4B use tied embeddings. This PR updates model.py to support tied embeddings. It also fixes a bug in chkpt_utils.py: for whatever reason, the checkpoints of some Qwen3 models that use tied embeddings store both lm_head and model.embed_tokens even though these are identical tensors (the embeddings are tied). I added a check for this, and when it happens simply delete the lm_head tensor from the checkpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant