Skip to content

[Cpp API Compatibility] Remove deepep legacy apis#78549

Open
youge325 wants to merge 4 commits intoPaddlePaddle:developfrom
youge325:remove-deepep-legacy-apis
Open

[Cpp API Compatibility] Remove deepep legacy apis#78549
youge325 wants to merge 4 commits intoPaddlePaddle:developfrom
youge325:remove-deepep-legacy-apis

Conversation

@youge325
Copy link
Copy Markdown
Contributor

@youge325 youge325 commented Apr 1, 2026

PR Category

Execute Infrastructure

PR Types

Improvements

Description

拆分自 #78484

是否引起精度变化

Copilot AI review requested due to automatic review settings April 1, 2026 11:11
@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 1, 2026

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Apr 1, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR continues the C++ API compatibility cleanup by removing DeepEP-specific legacy stream/event APIs from the compat headers, aiming to converge on a PyTorch-shaped surface.

Changes:

  • Tighten CUDA stream pool initialization by adding error checking and align getStreamFromPool overload shape to support an int priority entry point.
  • Remove deprecated legacy APIs that exposed raw CUDA streams (CUDAStream::raw_stream(), Event::record(cudaStream_t), and Tensor::record_stream(cudaStream_t / at::cuda::CUDAStream)).
  • Expose c10::Stream into namespace at to support at::Stream usage.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
paddle/phi/api/include/compat/c10/cuda/CUDAStream.h Adds C10_CUDA_CHECK for priority range init; adds priority-based getStreamFromPool overload and maps bool overload to it.
paddle/phi/api/include/compat/c10/core/Stream.h Introduces at::Stream aliasing/export of c10::Stream.
paddle/phi/api/include/compat/c10/core/Event.h Removes legacy record(cudaStream_t) overload.
paddle/phi/api/include/compat/ATen/ops/record_stream.h Removes legacy record_stream overloads taking at::cuda::CUDAStream and cudaStream_t.
paddle/phi/api/include/compat/ATen/core/TensorBody.h Removes corresponding legacy record_stream declarations from Tensor.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +109 to +112

namespace at {
using c10::Stream;
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding namespace at { using c10::Stream; } here conflicts with an existing at::Stream definition in paddle/phi/api/include/compat/ATen/core/TensorBody.h (currently using Stream = c10::Stream;). Since TensorBody.h includes <c10/core/Stream.h>, this will trigger a redefinition error when compiling any TU that includes ATen/core/TensorBody.h.

To fix: keep a single canonical at::Stream definition (either remove the Stream alias from TensorBody.h, or drop this new at::Stream export and rely on the existing one).

Suggested change
namespace at {
using c10::Stream;
}

Copilot uses AI. Check for mistakes.
@youge325
Copy link
Copy Markdown
Contributor Author

youge325 commented Apr 1, 2026

/re-run all-failed

1 similar comment
@youge325
Copy link
Copy Markdown
Contributor Author

youge325 commented Apr 1, 2026

/re-run all-failed

Copy link
Copy Markdown
Contributor

@ShigureNyako ShigureNyako left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这轮清理方向和 PyTorch 对齐本身我认可:

  • c10::cuda::CUDAStream 上游只有 stream(),没有 raw_stream()
  • c10::Event 上游只有 record(const Stream&)
  • getStreamFromPool 也确实有 int priority 形态

但当前时点我还是得先卡住,因为这会造成明确的 BREAKING CHANGE,且下游闭环还没完成:

  1. PaddleFleet develop 目前仍固定 third_party/DeepEP3fed158b4751a4a89c92902aeb20bb8c8384f1e7
  2. 这个版本的 DeepEP 还在直接使用这次要删掉的旧接口:
    • csrc/event.hppevent->record(const cudaStream_t&)at::cuda::getCurrentCUDAStream().raw_stream()
    • csrc/deep_ep.cpp:多处 t.record_stream(comm_stream),其中 comm_stream 仍是 cudaStream_t
  3. DeepEP 侧迁移 PR PFCCLab/DeepEP#10 虽然已经合到了 paddle 分支(merge commit 5f63568c9810b53e999fa56fd6981c40ba4a5d57),但 PaddleFleet 这边还没有把 third_party/DeepEP bump 过去。
  4. 当前 PaddleFleet #712 标题写的是 “update submodule DeepEP”,但 diff 实际只改了 third_party/DeepGEMM,没有更新 third_party/DeepEP,所以还不能作为这个 cleanup 的解锁证据。

因此“暂时不能合入”这个判断我认为仍然成立。
请先把 PaddleFleet / 实际下游切到不依赖 raw_stream() / Event::record(cudaStream_t) / Tensor::record_stream(cudaStream_t) 的版本,并给出对应 cross-reference / CI 证据,再来清理这批兼容接口。

@youge325 youge325 force-pushed the remove-deepep-legacy-apis branch from dc7610a to f71e518 Compare April 3, 2026 03:37
@ShigureNyako
Copy link
Copy Markdown
Contributor

ShigureNyako commented Apr 3, 2026

我重新看了这次 force-push。

当前 diff 相比上一版,主要是:

  • 删除 Tensor::record_stream(cudaStream_t) 的 TODO / 测试;
  • 保留 Tensor::record_stream(at::cuda::CUDAStream),实现改为转发到 record_stream(s.unwrap())
  • 继续移除 Event::record(cudaStream_t)CUDAStream::raw_stream()

这些调整没有改变我上一轮卡住的核心点:下游闭环还没完成。

到目前为止,PaddlePaddle/PaddleFleet#712 的 head 仍然是 9789e7f,diff 也仍只改了 third_party/DeepGEMM,没有把 third_party/DeepEP bump 到兼容这批删除项的版本,所以还不能作为“下游已经切完”的证据。

因此我这边 blocker 维持不变:还是需要先补齐 DeepEP / PaddleFleet 侧的实际切换,以及对应的 cross-reference / CI 证据,再来清理这批兼容接口。

@youge325
Copy link
Copy Markdown
Contributor Author

youge325 commented Apr 3, 2026

/re-run all-failed

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@cebfafd). Learn more about missing BASE report.

Additional details and impacted files
@@             Coverage Diff             @@
##             develop    #78549   +/-   ##
===========================================
  Coverage           ?   100.00%           
===========================================
  Files              ?         1           
  Lines              ?         1           
  Branches           ?         0           
===========================================
  Hits               ?         1           
  Misses             ?         0           
  Partials           ?         0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants