Skip to content

fix moe lora gemm#709

Open
Lcysabcu wants to merge 2 commits intoPaddlePaddle:developfrom
Lcysabcu:fuse_moe_lora
Open

fix moe lora gemm#709
Lcysabcu wants to merge 2 commits intoPaddlePaddle:developfrom
Lcysabcu:fuse_moe_lora

Conversation

@Lcysabcu
Copy link
Copy Markdown

@Lcysabcu Lcysabcu commented Apr 1, 2026

在为 PaddleFormers 增加 LoRA GEMM 功能的过程中,发现并修复了两个 fleet 侧的问题:

1.开启 EP 时 grouped_gemm_experts 缺少 LoRA 梯度
原因:当 moe_use_fusion_node=true 时,模型走 fusion_moe_forward 分支,但 fleet 侧未对 LoRA 梯度做相应处理。
修复:在ExpertsGroupGemmContiguousNode中补充了_lora_weight_grad函数和相应处理。
2.lora训练时,模型Layer 0 始终未参与训练
原因:开启 recompute 后,layer 0 错误继承了 embed_tokens 的 stop_gradient 状态。
修复:参考 formers 中 pp_model.py 的同类修复,将相应逻辑同步至 fleet 侧即可解决。

Copy link
Copy Markdown

@lugimzzz lugimzzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Copy Markdown
Collaborator

@From00 From00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants