Skip to content

Conversation

zhupengyang
Copy link
Collaborator

@zhupengyang zhupengyang commented Sep 11, 2025

支持 xpu wint8 ep,精度对齐

自定义算子:

  • moe_topk_select 支持 token_num=0 的情况

组网:

  • xpu 相关的 fused_moe.py 和 ep.py 迁移到 fastdeploy/model_executor/layers/backends/xpu/moe 目录下单独管理
  • weight_only 做权重在线量化的时候有维度限制,对过大的维度做切分
  • 公共组网 ep.py 单独抽取了 DeepEPEngineBase 出来,方便其它 backend 继承 DeepEPEngineBase 后做扩展
  • EPRunner 目前也带了 gpu 的代码,本次只做了必要的修改(后续最好也抽取出一个base类出来),保证 xpu 继承后 EPRunner 也能直接用

engine:

  • ep 暂时不支持 warmup,先直接跳过,取经验值分配 block
    • 目前的 warmup 在跑 ep 的情况下专家不均衡,计算出来的显存也不准确
  • 修复 is_dummy_run 参数位置不对的问题:之前参数会传错,但是有些参数没有用到,所以没有发现问题

Copy link

paddle-bot bot commented Sep 11, 2025

Thanks for your contribution!

@zhupengyang zhupengyang force-pushed the xpu_ep_4 branch 2 times, most recently from 5191cb9 to 0789edc Compare September 12, 2025 03:06
Copy link
Collaborator

@hong19860320 hong19860320 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@iosmers
Copy link
Collaborator

iosmers commented Sep 12, 2025

LGTM

@zhupengyang zhupengyang merged commit 9409665 into PaddlePaddle:develop Sep 15, 2025
34 of 39 checks passed
@zhupengyang zhupengyang deleted the xpu_ep_4 branch September 15, 2025 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants