Seed test for reproducibility #42400

Rocketknight1 · 2025-11-25T16:35:20Z

The test_load_balancing_loss test is intermittently flaky for me. This PR sets a manual seed to make it more reproducible, which hopefully solves the issue!

github-actions · 2025-11-25T16:36:19Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: ernie4_5_moe, jamba, minimax, mixtral, qwen2_moe, qwen3_moe

Rocketknight1 · 2025-11-25T16:37:38Z

cc @ydshieh

Rocketknight1 · 2025-11-25T16:41:11Z

run-slow: ernie4_5_moe, jamba, minimax, mixtral, qwen2_moe, qwen3_moe

github-actions · 2025-11-25T16:42:29Z

This comment contains run-slow, running the specified jobs:

models: ["models/ernie4_5_moe", "models/jamba", "models/minimax", "models/mixtral", "models/qwen2_moe", "models/qwen3_moe"]
quantizations: []

HuggingFaceDocBuilderDev · 2025-11-25T16:43:41Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2025-11-25T16:59:03Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

ydshieh · 2025-11-28T15:15:22Z

Hi @Rocketknight1

Could you share the job run pages , or the full error log you have if you are running some tests locally.

This test doesn't seem to require seed (from what this test is actually testing), so I would like to know what is wrong without seed.

(and the daily CI doesn't have this test failing, only once for mixtral)

Rocketknight1 · 2025-11-28T17:10:53Z

@ydshieh the test has a model initialized from config, so all of the weights are randomly initialized. This is the bit that causes flakiness, and the bit that we need to set a seed for! There's a random failure about ~1% of the time when the initialization we get behaves badly. People have tried to fix it by setting really wide atol and rtol values, but I think fixing the seed is a much more sensible approach.

ydshieh · 2025-11-28T18:15:22Z

From my experience of dealing failing tests, use seed in this scenario is a quick hot fix but not the best approach handling it.
Seems it's 1% as you said, maybe we could wait until I check them next week?

Seed test for reproducibility

c471fec

Rocketknight1 marked this pull request as ready for review November 25, 2025 16:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Seed test for reproducibility #42400

Seed test for reproducibility #42400

Rocketknight1 commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025

Uh oh!

Rocketknight1 commented Nov 25, 2025

Uh oh!

Rocketknight1 commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025

Uh oh!

ydshieh commented Nov 28, 2025

Uh oh!

Rocketknight1 commented Nov 28, 2025

Uh oh!

ydshieh commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Seed test for reproducibility #42400

Are you sure you want to change the base?

Seed test for reproducibility #42400

Conversation

Rocketknight1 commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025

Uh oh!

Rocketknight1 commented Nov 25, 2025

Uh oh!

Rocketknight1 commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025

CI Results

Uh oh!

ydshieh commented Nov 28, 2025

Uh oh!

Rocketknight1 commented Nov 28, 2025

Uh oh!

ydshieh commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants