[Cpp API Compatibility] Align event api#78553
Conversation
|
你的PR提交成功,感谢你对开源项目的贡献! |
There was a problem hiding this comment.
Pull request overview
This PR updates the C++ compat layer to better match PyTorch’s stream/event APIs, including stream-pool signature parity and a more feature-complete c10::Event surface.
Changes:
- Update
c10::cuda::getStreamFromPoolto support the(int priority, DeviceIndex)overload and map negative priorities to the high-priority pool. - Expand
c10::Eventto track device metadata, add PyTorch-like methods (e.g.,recordOnce,query,eventId,synchronize), and tighten stream/device-type checks. - Remove deprecated/raw stream record_stream overloads and legacy
raw_stream()exposure.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| paddle/phi/api/include/compat/c10/cuda/CUDAStream.h | Adds PyTorch-shaped getStreamFromPool(int priority, ...) and updates priority selection logic. |
| paddle/phi/api/include/compat/c10/core/Stream.h | Exposes c10::Stream into namespace at for API compatibility. |
| paddle/phi/api/include/compat/c10/core/Event.h | Refactors/extends c10::Event interface and behavior to resemble PyTorch. |
| paddle/phi/api/include/compat/ATen/ops/record_stream.h | Removes deprecated record_stream overloads (CUDAStream / cudaStream_t) in favor of at::Stream. |
| paddle/phi/api/include/compat/ATen/core/TensorBody.h | Removes the corresponding record_stream overload declarations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #include <c10/core/Stream.h> | ||
|
|
||
| #ifdef PADDLE_WITH_CUDA | ||
| #include <c10/cuda/CUDAStream.h> |
There was a problem hiding this comment.
This header uses std::mutex/std::unique_lock in EventPool, but doesn't include <mutex> directly (it currently relies on transitive includes from other headers). Adding an explicit #include <mutex> under the PADDLE_WITH_CUDA guard would make this file self-contained and less fragile to include-order changes.
| #include <c10/cuda/CUDAStream.h> | |
| #include <c10/cuda/CUDAStream.h> | |
| #include <mutex> |
| } | ||
|
|
||
| void record_stream(at::Stream s) const; |
There was a problem hiding this comment.
Now that the record_stream(at::cuda::CUDAStream) overload was removed, the forward-declaration block for at::cuda::CUDAStream near the top of this file (and its comment) appears unused and misleading (the file no longer declares any overload needing it). Consider removing that forward declaration/comment to avoid stale documentation and unnecessary namespace pollution.
ShigureNyako
left a comment
There was a problem hiding this comment.
这轮我先给 request changes。
主要有两类 blocker:
-
当前 diff 仍然包含 #78549 那组 legacy API 清理(例如
CUDAStream::raw_stream()、Event::record(cudaStream_t)、record_stream(at::cuda::CUDAStream/cudaStream_t))。这不是纯粹的 Event 对齐,而是对外兼容层删除;结合 #78484 里已经讨论过的 DeepEP / PaddleFleet 迁移闭环,这里属于隐藏的 BREAKING CHANGE。 -
Event 对齐本身还没有对齐到 upstream PyTorch 语义:
cudaEvent_t在构造时就从全局 pool 创建,绑定的是“构造时当前 device”,不是第一次record()的 stream.device();多卡场景有兼容风险。EventFlag目前没有参与 event 创建,elapsedTime()直接返回0.0,会把 timing 语义静默改坏。
建议先把 legacy API 清理从这个 PR 里拆掉/基于 #78549 之后重新整理 diff;同时把 Event 的 lazy-create、device 约束和 timing 语义补齐后再继续。
| #endif | ||
| } | ||
|
|
||
| #if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP) |
There was a problem hiding this comment.
这一段删除不是单纯的 Event 对齐,而是对外可见的兼容层清理:record_stream(at::cuda::CUDAStream) / record_stream(cudaStream_t) 被一起拿掉了。
但从 #78484 的讨论看,这类 DeepEP 迁移接口的清理之前已经被指出需要先完成下游闭环,不能在 split PR 里顺手删掉。现在 #78553 的描述只说对齐 Event API,实际 diff 仍然把 #78549 里的 legacy API 删除一并带进来了,这里属于隐藏的 BREAKING CHANGE。
建议把兼容层清理从这个 PR 里拆掉,或者至少等下游迁移闭环完成后再删;否则这个 split PR 的 scope 和风险都不对。
ShigureNyako
left a comment
There was a problem hiding this comment.
我重新看了一遍这次增量:
- 之前我提的两类主要问题已经修正:legacy stream API 删除已回退;Event 也改成了 lazy create,并补上了
EventFlag/elapsedTime()语义和对应测试。 - 但目前还有一个剩余兼容风险:Event 实现仍然只覆盖
PADDLE_WITH_CUDA,而同层 compat 接口和新增测试却按PADDLE_WITH_CUDA || PADDLE_WITH_HIP暴露。这会让 HIP 路径处于“接口看起来存在,但行为/重载并未真正补齐”的半对齐状态。
这一点如果本 PR 不处理,我这边还不能转 approve。
|
/re-run all-failed |
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (89.28%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #78553 +/- ##
==========================================
Coverage ? 89.28%
==========================================
Files ? 1
Lines ? 84
Branches ? 0
==========================================
Hits ? 75
Misses ? 9
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ShigureNyako
left a comment
There was a problem hiding this comment.
我重新核对了最新一版:
- 上一轮提的两类主要问题都已经修正:legacy stream API 删除已回退;Event 的 lazy create /
EventFlag/elapsedTime()语义也已经补齐。 - 我补提的 HIP 路径半对齐问题,这次也一起补上了:
Event的 backend event / stream 处理已经扩展到PADDLE_WITH_CUDA || PADDLE_WITH_HIP,测试侧也相应收窄/补齐了宏范围。
就这轮代码 review 来看,我这边没有新的 blocker 了,先转 approve。
当前剩余的是 CI / coverage 侧的问题(包括 coverage 阈值未过),请按 CI 结果继续处理;这部分不属于我这轮 code review 的阻塞项。
|
/skip-reason: 未覆盖分支均为正常逻辑不可达分支,为确保与 PyTorch 逻辑对齐而添加的完善逻辑 |
PR Category
Execute Infrastructure
PR Types
Improvements
Description
拆分自 #78484
Align Event related APIs,将
c10::Event实现为完整的跨平台兼容类。变更详情
问题背景
历史上
c10::Event存在以下问题:#ifdef PADDLE_WITH_CUDA包裹,非 CUDA 构建时不可用EventFlag枚举和对应构造函数变更内容 (
c10/core/Event.h)重构
c10::Event为完整的跨平台兼容类:跨平台支持
c10::Event移出#ifdef PADDLE_WITH_CUDA条件编译Backend doesn't support events异常对齐效果
EventCompatTest.EventDefault/EventWithFlag/EventRecordThrows/EventRecordOnceThrows/EventMove/EventDevice相关文档
是否引起精度变化
否