Skip to content

Conversation

@Rocketknight1
Copy link
Member

The test_load_balancing_loss test is intermittently flaky for me. This PR sets a manual seed to make it more reproducible, which hopefully solves the issue!

@Rocketknight1 Rocketknight1 marked this pull request as ready for review November 25, 2025 16:35
@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: ernie4_5_moe, jamba, minimax, mixtral, qwen2_moe, qwen3_moe

@Rocketknight1
Copy link
Member Author

cc @ydshieh

@Rocketknight1
Copy link
Member Author

run-slow: ernie4_5_moe, jamba, minimax, mixtral, qwen2_moe, qwen3_moe

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/ernie4_5_moe", "models/jamba", "models/minimax", "models/mixtral", "models/qwen2_moe", "models/qwen3_moe"]
quantizations: []

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions
Copy link
Contributor

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

@ydshieh
Copy link
Collaborator

ydshieh commented Nov 28, 2025

Hi @Rocketknight1

Could you share the job run pages , or the full error log you have if you are running some tests locally.

This test doesn't seem to require seed (from what this test is actually testing), so I would like to know what is wrong without seed.

(and the daily CI doesn't have this test failing, only once for mixtral)

@Rocketknight1
Copy link
Member Author

@ydshieh the test has a model initialized from config, so all of the weights are randomly initialized. This is the bit that causes flakiness, and the bit that we need to set a seed for! There's a random failure about ~1% of the time when the initialization we get behaves badly. People have tried to fix it by setting really wide atol and rtol values, but I think fixing the seed is a much more sensible approach.

@ydshieh
Copy link
Collaborator

ydshieh commented Nov 28, 2025

From my experience of dealing failing tests, use seed in this scenario is a quick hot fix but not the best approach handling it.
Seems it's 1% as you said, maybe we could wait until I check them next week?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants