Skip to content

Conversation

@molbap
Copy link
Contributor

@molbap molbap commented Nov 25, 2025

What does this PR do?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@molbap
Copy link
Contributor Author

molbap commented Nov 25, 2025

run-slow: mistral

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/mistral"]
quantizations: []

@github-actions
Copy link
Contributor

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

@molbap
Copy link
Contributor Author

molbap commented Nov 25, 2025

run-slow: mistral3

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/mistral3"]
quantizations: []

@github-actions
Copy link
Contributor

CI Results

Workflow Run ⚙️

Model CI Report

❌ Failed tests

  • mistral3:
    tests/models/mistral3/test_modeling_mistral3.py::Mistral3ModelTest::test_config

@molbap
Copy link
Contributor Author

molbap commented Nov 25, 2025

run-slow: mistral, mistral3

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/mistral", "models/mistral3"]
quantizations: []

@github-actions
Copy link
Contributor

CI Results

Workflow Run ⚙️

Model CI Report

❌ Failed tests

  • mistral3:
    tests/models/mistral3/test_modeling_mistral3.py::Mistral3ModelTest::test_config

@molbap
Copy link
Contributor Author

molbap commented Nov 25, 2025

run-slow: olmoe

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/olmoe"]
quantizations: []

@github-actions
Copy link
Contributor

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

@molbap
Copy link
Contributor Author

molbap commented Nov 25, 2025

run-slow: olmoe, mistral, mistral3

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/mistral", "models/mistral3", "models/olmoe"]
quantizations: []

@github-actions
Copy link
Contributor

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

@molbap
Copy link
Contributor Author

molbap commented Nov 25, 2025

run-slow: olmoe, mistral, mistral3, phi3, starcoder

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/mistral", "models/mistral3", "models/olmoe", "models/phi3"]
quantizations: []

@molbap
Copy link
Contributor Author

molbap commented Nov 25, 2025

run-slow: olmoe, mistral, mistral3, phi3, starcoder, bigbird, bigbird_pegasus

@github-actions
Copy link
Contributor

CI Results

Workflow Run ⚙️

⚠️ No test being reported (jobs are skipped or cancelled)!

@molbap
Copy link
Contributor Author

molbap commented Nov 25, 2025

run-slow: olmoe, mistral, mistral3, phi3, starcoder, bigbird, bigbird_pegasus, whisper, internvl, llava, llava_next, llava_next_video, qwen, fsmt

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/bigbird_pegasus", "models/fsmt", "models/internvl", "models/llava", "models/llava_next", "models/llava_next_video", "models/mistral", "models/mistral3", "models/olmoe", "models/phi3", "models/whisper"]
quantizations: []

@github-actions
Copy link
Contributor

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

@molbap
Copy link
Contributor Author

molbap commented Nov 26, 2025

run-slow: olmoe, mistral, mistral3, phi3, starcoder, bigbird, bigbird_pegasus, whisper, internvl, llava, llava_next, llava_next_video, qwen, fsmt, video_llava, deepseek_v3

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/bigbird_pegasus", "models/deepseek_v3", "models/fsmt", "models/internvl", "models/llava", "models/llava_next", "models/llava_next_video", "models/mistral", "models/mistral3", "models/olmoe", "models/phi3", "models/video_llava", "models/whisper"]
quantizations: []

@github-actions
Copy link
Contributor

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: fsmt

@molbap
Copy link
Contributor Author

molbap commented Nov 26, 2025

run-slow: olmoe, mistral, mistral3, phi3, starcoder, bigbird, bigbird_pegasus, whisper, internvl, llava, llava_next, llava_next_video, qwen, fsmt, video_llava, deepseek_v3, qwen3_vl_moe

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/bigbird_pegasus", "models/deepseek_v3", "models/fsmt", "models/internvl", "models/llava", "models/llava_next", "models/llava_next_video", "models/mistral", "models/mistral3", "models/olmoe", "models/phi3", "models/qwen3_vl_moe", "models/video_llava", "models/whisper"]
quantizations: []

@github-actions
Copy link
Contributor

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

@molbap molbap mentioned this pull request Nov 27, 2025
@molbap molbap changed the title Various fixes Fix weight tying logic between _tied_weights_keys and tie_word_embeddings Nov 27, 2025
@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: fsmt

@molbap
Copy link
Contributor Author

molbap commented Nov 27, 2025

I removed the fallback to parent config in case tie_word_embeddings is not found in the text. I also attempted to add a test, might be a bit overkill, let's see

@molbap
Copy link
Contributor Author

molbap commented Nov 27, 2025

A remaining headscratcher is tie_encoder_decoder which is now redundant. I did a fallback that looks like

should_tie = tie_encoder_decoder if tie_word_embeddings is None else tie_word_embeddings

and it seems to work/keep the key as source of authority. LMK!

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: fsmt, kyutai_speech_to_text, musicgen, musicgen_melody

1 similar comment
@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: fsmt, kyutai_speech_to_text, musicgen, musicgen_melody

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants