Skip to content

[Fusion] use paddle.nn.functional.swiglu to replace manual chunk+silu#707

Open
huangjiyi wants to merge 1 commit intoPaddlePaddle:developfrom
huangjiyi:use-paddle-swiglu
Open

[Fusion] use paddle.nn.functional.swiglu to replace manual chunk+silu#707
huangjiyi wants to merge 1 commit intoPaddlePaddle:developfrom
huangjiyi:use-paddle-swiglu

Conversation

@huangjiyi
Copy link
Copy Markdown
Member

Summary

  • Replace manual paddle.chunk + F.silu implementation in swiglu() with the native paddle.nn.functional.swiglu() API for cleaner code and potential performance benefit.

Test plan

  • Verify existing unit tests pass with the new implementation
  • Confirm paddle.nn.functional.swiglu is available in the target Paddle version

Copilot AI review requested due to automatic review settings April 1, 2026 02:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

该 PR 将 fused_bias_swiglu.py 中的 swiglu() 前向实现从手写的 chunk + silu 逻辑替换为 Paddle 原生的 paddle.nn.functional.swiglu(),以简化代码并复用框架内置实现(也可能带来更好的性能/融合机会)。

Changes:

  • 移除 import paddle.nn.functional as F 依赖。
  • swiglu(y) 实现替换为 paddle.nn.functional.swiglu(y) 调用。

Comment on lines 38 to +40
paddle.Tensor: Result of SwiGLU activation: SiLU(y1) * y2, where y1, y2 are the split halves.
"""
y_1, y_2 = paddle.chunk(y, 2, -1)
return F.silu(y_1) * y_2
return paddle.nn.functional.swiglu(y)
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里把 swiglu() 的实现切换为 paddle.nn.functional.swiglu 后,仓库内现有的单卡单测(例如 tests/single_card_tests/transformer/test_mlp.py)仍默认使用 hidden_act=F.gelu,因此不会覆盖到 bias_activation_fusion=True + gated_linear_unit=True + hidden_act=F.silu 这条会走到本文件 swiglu() 的路径。建议补充一个单测场景显式设置 hidden_act=F.silu,并对 forward 输出及 backward 梯度做回归对齐(参考之前 paddle.chunk + F.silu 的结果)。

Copilot uses AI. Check for mistakes.
@huangjiyi
Copy link
Copy Markdown
Member Author

@ShigureNyako 帮我 Review 一下这个 PR

Copy link
Copy Markdown
Contributor

@ShigureNyako ShigureNyako left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我先给 request changes。

这个 PR 的目标我理解为:把 src/paddlefleet/fusions/fused_bias_swiglu.py 里的手写 chunk + silu 前向实现替换成 Paddle 原生 paddle.nn.functional.swiglu,代码会更简洁。

但从当前证据看,它还不能直接按“无行为变化的重构”合入:

  • 单元测试层面:现有单卡测试 tests/single_card_tests/transformer/test_mlp.pyTestBiasFusedGatedMLP 还没有显式走到 hidden_act=F.silu + gated_linear_unit=True + bias_activation_fusion=True 这条路径,对这次改动没有直接回归覆盖。
  • CI 层面:Unit test (single card)Unit test (multi-card) 已通过,但 3 个 integration job 都是 failure。日志里不是单纯环境报错,而是多个 GLM4.5 场景的 loss baseline 发生变化并触发了 The precision has been changed and requires approvals.。例如:
    • H20 multi-card glm45_lora_multi_card_gt_loss:max abs diff = 0.01418428,max relative diff = 1.52807667
    • A100 glm45_lora_multi_card_a100_gt_loss:max abs diff = 0.975,max relative diff = 0.12721816
    • H20/A100 PT 场景也分别出现了 3.242e-056.294e-05 量级的漂移

建议这一版至少补齐两件事:

  1. 增加针对该融合路径的 forward/backward 回归测试,直接验证新旧实现的一致性;
  2. 明确解释这些精度变化是否符合预期。若属于预期行为,请同步更新对应 baseline,并走 required 的 precision approval 流程后再合入。

"""
y_1, y_2 = paddle.chunk(y, 2, -1)
return F.silu(y_1) * y_2
return paddle.nn.functional.swiglu(y)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里我暂时不能把它当成“等价重构”处理。

从当前 PR 的集成日志看,这个替换已经在多组 GLM4.5 场景里引入了可观测的精度漂移,并且直接触发了 The precision has been changed and requires approvals.:例如 H20 multi-card 的 glm45_lora_multi_card_gt_loss 最大绝对误差到 0.01418428、相对误差到 1.52807667,A100 的 glm45_lora_multi_card_a100_gt_loss 最大绝对误差到 0.975、相对误差到 0.12721816;即便是 PT 场景也有 3e-05 ~ 6e-05 的漂移。

建议先做两件事再合:

  1. 明确确认 paddle.nn.functional.swiglu 在当前 Paddle 版本 / 目标硬件上的数值行为是否允许与原 chunk + silu 存在这些差异;
  2. 补一个直接覆盖 hidden_act=F.silu + gated_linear_unit=True + bias_activation_fusion=True 的 forward/backward 回归单测,避免这条路径继续没有单测兜底。

如果这些精度变化是预期的,也请同步更新基线并走对应的 precision approval 流程。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants