Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions src/paddlefleet/fusions/fused_bias_swiglu.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@
import logging

import paddle
import paddle.nn.functional as F

from paddlefleet.jit import jit_fuser
from paddlefleet.utils import nvtx_decorator
Expand All @@ -38,8 +37,7 @@ def swiglu(y):
Returns:
paddle.Tensor: Result of SwiGLU activation: SiLU(y1) * y2, where y1, y2 are the split halves.
"""
y_1, y_2 = paddle.chunk(y, 2, -1)
return F.silu(y_1) * y_2
return paddle.nn.functional.swiglu(y)
Comment on lines 38 to +40
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里把 swiglu() 的实现切换为 paddle.nn.functional.swiglu 后,仓库内现有的单卡单测(例如 tests/single_card_tests/transformer/test_mlp.py)仍默认使用 hidden_act=F.gelu,因此不会覆盖到 bias_activation_fusion=True + gated_linear_unit=True + hidden_act=F.silu 这条会走到本文件 swiglu() 的路径。建议补充一个单测场景显式设置 hidden_act=F.silu,并对 forward 输出及 backward 梯度做回归对齐(参考之前 paddle.chunk + F.silu 的结果)。

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里我暂时不能把它当成“等价重构”处理。

从当前 PR 的集成日志看,这个替换已经在多组 GLM4.5 场景里引入了可观测的精度漂移,并且直接触发了 The precision has been changed and requires approvals.:例如 H20 multi-card 的 glm45_lora_multi_card_gt_loss 最大绝对误差到 0.01418428、相对误差到 1.52807667,A100 的 glm45_lora_multi_card_a100_gt_loss 最大绝对误差到 0.975、相对误差到 0.12721816;即便是 PT 场景也有 3e-05 ~ 6e-05 的漂移。

建议先做两件事再合:

  1. 明确确认 paddle.nn.functional.swiglu 在当前 Paddle 版本 / 目标硬件上的数值行为是否允许与原 chunk + silu 存在这些差异;
  2. 补一个直接覆盖 hidden_act=F.silu + gated_linear_unit=True + bias_activation_fusion=True 的 forward/backward 回归单测,避免这条路径继续没有单测兜底。

如果这些精度变化是预期的,也请同步更新基线并走对应的 precision approval 流程。



@jit_fuser
Expand Down
Loading