fix moe lora gemm by Lcysabcu · Pull Request #709 · PaddlePaddle/PaddleFleet

Lcysabcu · 2026-04-01T05:58:19Z

在为 PaddleFormers 增加 LoRA GEMM 功能的过程中，发现并修复了两个 fleet 侧的问题：

1.开启 EP 时 grouped_gemm_experts 缺少 LoRA 梯度
原因：当 moe_use_fusion_node=true 时，模型走 fusion_moe_forward 分支，但 fleet 侧未对 LoRA 梯度做相应处理。
修复：在ExpertsGroupGemmContiguousNode中补充了_lora_weight_grad函数和相应处理。
2.lora训练时，模型Layer 0 始终未参与训练
原因：开启 recompute 后，layer 0 错误继承了 embed_tokens 的 stop_gradient 状态。
修复：参考 formers 中 pp_model.py 的同类修复，将相应逻辑同步至 fleet 侧即可解决。

lugimzzz

lgtm

From00

LGTM

Lcysabcu added 2 commits April 1, 2026 13:45

fix moe lora gemm

2983356

fix codestyle

aa7e913

lugimzzz approved these changes Apr 2, 2026

View reviewed changes

From00 approved these changes Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix moe lora gemm#709

fix moe lora gemm#709
Lcysabcu wants to merge 2 commits intoPaddlePaddle:developfrom
Lcysabcu:fuse_moe_lora

Lcysabcu commented Apr 1, 2026 •

edited

Loading

Uh oh!

lugimzzz left a comment

Uh oh!

From00 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Lcysabcu commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lugimzzz left a comment

Choose a reason for hiding this comment

Uh oh!

From00 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Lcysabcu commented Apr 1, 2026 •

edited

Loading