Support Muon optimizer by Kai-46 · Pull Request #1 · Kai-46/minFM

Kai-46 · 2025-08-18T03:45:14Z

Muon has been shown to be scalable for LLM training. We support this optimizer in the DiT training context under FSDP2 implementation. The Muon implementation is borrowed from the dion repo with some modifications to force bfloat16 dtype in NewtonSchulz5 iterations.

Early signals seem very positive by training the FLUX tiny model on imagenet, where Muon leads to more rapid drop in validation loss than Adam.

Totoro97

Excited to see such a performance boost using Muon!! 🚀 🚀 🚀 🚀 (Leave some minor comments)

Totoro97 · 2025-08-18T05:47:04Z

configs/flux_inference.yaml

    # AdamW Optimizer Settings
-    max_lr: 0.0001 # Maximum learning rate for AdamW optimizer
-    min_lr: 0.00001 # Minimum learning rate for cosine decay schedule
+    adam_max_lr: 0.0001 # Maximum learning rate for AdamW optimizer


adam_max_lr-> adamw_max_lr?

Totoro97 · 2025-08-18T05:50:52Z

configs/flux_tiny_imagenet_muon.yaml

+    # Optimizer Settings
+    adam_max_lr: 0.0003 # Maximum learning rate for AdamW optimizer
+    adam_betas: [ 0.9, 0.95 ] # Beta coefficients for AdamW momentum terms [beta1, beta2]
+    use_muon: true


Totoro97 · 2025-08-18T06:03:56Z

utils/optim.py

+    matched_param_names: list[str]
+    matched_params: list[nn.Parameter]


Is it better to directly define a dictionary of parameters, such as matched_params_dict: Dict[str, nn.Parameter]?

Properly handle grad scale in the case of gradient accumulation

add muon

deecc98

Totoro97 approved these changes Aug 18, 2025

View reviewed changes

Kai-46 added 2 commits August 19, 2025 23:53

handle gradient scaling properly

c03d819

Merge pull request #2 from Kai-46/grad-scale

f1b6bf7

Properly handle grad scale in the case of gradient accumulation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Muon optimizer#1

Support Muon optimizer#1
Kai-46 wants to merge 3 commits intomainfrom
muon

Kai-46 commented Aug 18, 2025 •

edited

Loading

Uh oh!

Totoro97 left a comment

Uh oh!

Totoro97 Aug 18, 2025

Uh oh!

Totoro97 Aug 18, 2025

Uh oh!

Totoro97 Aug 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		matched_param_names: list[str]
		matched_params: list[nn.Parameter]

Conversation

Kai-46 commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Totoro97 left a comment

Choose a reason for hiding this comment

Uh oh!

Totoro97 Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

Totoro97 Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

Totoro97 Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Kai-46 commented Aug 18, 2025 •

edited

Loading