Skip to content

[Cpp API Compatibility] Delete useless code and rename test files#78580

Open
youge325 wants to merge 11 commits intoPaddlePaddle:developfrom
youge325:cNorm
Open

[Cpp API Compatibility] Delete useless code and rename test files#78580
youge325 wants to merge 11 commits intoPaddlePaddle:developfrom
youge325:cNorm

Conversation

@youge325
Copy link
Copy Markdown
Contributor

@youge325 youge325 commented Apr 3, 2026

PR Category

Execute Infrastructure

PR Types

Improvements

Description

统一使用 ATen 或 c10 命名测试文件,便于回归测试

处理未使用的变量来抑制警告

删除无用的预处理文件

是否引起精度变化

Copilot AI review requested due to automatic review settings April 3, 2026 10:19
@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 3, 2026

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Apr 3, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR cleans up the C++ compat test suite by standardizing test file/target naming to ATen/c10 conventions, removing unused-code paths used to silence warnings, and deleting a shared CUDA-runtime skip utility header.

Changes:

  • Renamed several compat test targets/files (e.g., compat_*/torch_library_testATen_*/c10_*) and updated test/cpp/compat/CMakeLists.txt accordingly.
  • Deleted test/cpp/compat/cuda_test_utils.h and removed its include + skip macro usages from CUDA/HIP-related tests.
  • Added/adjusted tests and small warning-suppression tweaks (e.g., (void)state, (void)threw_exception, new assertions).

Reviewed changes

Copilot reviewed 22 out of 26 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
test/cpp/compat/cuda_test_utils.h Removed shared CUDA runtime availability helper + skip macro.
test/cpp/compat/CMakeLists.txt Renamed several test targets/source filenames to ATen/c10 naming.
test/cpp/compat/c10_Stream_test.cc Removed CUDA runtime skip macro usage from CUDA stream tests.
test/cpp/compat/c10_Event_test.cc Removed CUDA runtime skip macro usage from CUDA event tests.
test/cpp/compat/c10_cuda_generator_test.cc Silenced an unused variable warning.
test/cpp/compat/ATen_Utils_test.cc Removed CUDA runtime skip macro usage from CUDA tensor backend tests.
test/cpp/compat/ATen_toString_test.cc Added renamed toString() API tests (CPU + optional CUDA/HIP section).
test/cpp/compat/ATen_torch_library_test.cc Added renamed torch library/registry API tests.
test/cpp/compat/ATen_to_test.cc Removed CUDA runtime skip macro usage from CUDA to(...) tests.
test/cpp/compat/ATen_split_test.cc Removed CUDA runtime skip macro usage from CUDA split tests.
test/cpp/compat/ATen_select_test.cc Removed CUDA runtime skip macro usage from CUDA select/index_select/masked_select tests.
test/cpp/compat/ATen_record_stream_test.cc Removed CUDA runtime skip macro usage from record_stream tests.
test/cpp/compat/ATen_pin_memory_creation_test.cc Removed CUDA runtime skip macro usage from pinned-memory creation tests.
test/cpp/compat/ATen_memory_test.cc Removed CUDA runtime skip macro usage from CUDA reciprocal/detach tests.
test/cpp/compat/ATen_local_scalar_dense_test.cc Removed CUDA runtime skip macro usage from CUDA local-scalar tests.
test/cpp/compat/ATen_from_blob_test.cc Removed CUDA runtime skip macro usage from GPU-pointer from_blob tests.
test/cpp/compat/ATen_eye_test.cc Removed CUDA runtime skip macro usage from CUDA eye tests.
test/cpp/compat/ATen_equal_test.cc Removed CUDA runtime skip macro usage from CUDA equality tests.
test/cpp/compat/ATen_empty_test.cc Removed CUDA runtime skip macro usage from pinned-memory empty tests.
test/cpp/compat/ATen_dense_sparse_conversion_test.cc Added renamed dense/sparse conversion tests for _PD_ConvertToSparseIfNeeded.
test/cpp/compat/ATen_CUDAContext_test.cc Removed CUDA runtime skip macro usage from CUDA context light tests.
test/cpp/compat/ATen_CUDABlas_test.cc Removed CUDA runtime skip macro usage from CUDABlas tests.
test/cpp/compat/ATen_cuda_test.cc Removed CUDA runtime skip macro usage from Tensor::cuda() tests; adjusted preprocessor structure.
test/cpp/compat/ATen_clamp_test.cc Silenced an unused variable warning in an edge-case test.
test/cpp/compat/ATen_basic_test.cc Added renamed “basic” tests; includes additional CUDA/HIP coverage blocks.
test/cpp/compat/ATen_as_strided_test.cc Added an assertion about data_ptr changes after as_strided_ with offset.
Comments suppressed due to low confidence (7)

test/cpp/compat/ATen_to_test.cc:199

  • This test assumes a CUDA device is available and calls at::tensor(... device=c10::kCUDA). If the binary is run on a machine without GPUs, it will throw and fail the run. Please add a runtime guard (e.g., if !torch::cuda::is_available() then GTEST_SKIP()/return) before the first CUDA tensor creation in each CUDA/HIP test.
#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
TEST(TensorToTest, ToDtype_GPU_FloatToDouble) {
  at::Tensor t = at::tensor(
      {1.0f, 2.0f},
      at::TensorOptions().dtype(at::kFloat).device(c10::Device(c10::kCUDA, 0)));
  at::Tensor result = t.to(at::kDouble);

  ASSERT_EQ(result.scalar_type(), at::kDouble);
  ASSERT_EQ(result.device().type(), c10::DeviceType::CUDA);

test/cpp/compat/ATen_pin_memory_creation_test.cc:46

  • Pinned-memory tests are compiled under CUDA/HIP, but now run unconditionally. If the test binary is executed without an available CUDA runtime/device, pinned-memory allocations and/or CUDA device constructs can throw and fail the run. Consider guarding these tests with a runtime availability check (e.g., if !torch::cuda::is_available() then GTEST_SKIP()/return) before exercising pinned-memory behavior that depends on CUDA/HIP.
TEST(ATenPinMemoryCreationTest, FullPinMemory) {
  // Test using TensorOptions with pinned_memory
  auto by_options = at::full(
      {2, 3}, 1.5f, at::TensorOptions().dtype(at::kFloat).pinned_memory(true));
  AssertPinned(by_options);

test/cpp/compat/ATen_memory_test.cc:320

  • These CUDA/HIP tests now allocate CUDA tensors and invoke CUDA ops without checking runtime/device availability. On CUDA/HIP builds executed on machines without a GPU, at::empty(... device=at::kCUDA) / at::arange(... device=at::kCUDA) can throw and fail the suite. Add a runtime guard (e.g., if !torch::cuda::is_available() then GTEST_SKIP()/return) before the first CUDA tensor creation in each CUDA/HIP test.
#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
// Test reciprocal on CUDA
TEST(ReciprocalTest, ReciprocalCUDA) {
  auto tensor =
      at::empty({4}, at::TensorOptions().dtype(at::kFloat).device(at::kCUDA));
  auto cpu_tensor = at::empty({4}, at::TensorOptions().dtype(at::kFloat));
  cpu_tensor.data_ptr<float>()[0] = 1.0f;

test/cpp/compat/ATen_from_blob_test.cc:125

  • GPU-pointer tests now run without any runtime/device availability guard. On CUDA/HIP builds executed without GPUs (or with an unusable runtime), cudaMalloc/hipMalloc and subsequent from_blob calls will fail and break the suite. Add a runtime guard (e.g., if c10::cuda::device_count() <= 0 or !torch::cuda::is_available() then GTEST_SKIP()/return) before the first GPU allocation in these tests.
// No device specified: GPU pointer → tensor must be on CUDA automatically.
TEST(ATenFromBlobTest, GpuPtrDefaultsToCuda) {
  float* d_data = nullptr;
#if defined(PADDLE_WITH_CUDA)
  cudaMalloc(&d_data, 4 * sizeof(float));
#else
  hipMalloc(&d_data, 4 * sizeof(float));
#endif

test/cpp/compat/ATen_equal_test.cc:46

  • This CUDA/HIP test assumes a CUDA device exists and constructs a CUDA tensor. If the test binary is run without an available GPU, tensor creation will throw and fail the suite. Add a runtime guard (e.g., if !torch::cuda::is_available() then GTEST_SKIP()/return) before creating the CUDA tensor.
#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
TEST(TensorEqualTest, DeviceMismatchThrows) {
  at::Tensor cpu = at::ones({2, 2}, at::kFloat);
  at::Tensor gpu =
      at::ones({2, 2}, at::TensorOptions().dtype(at::kFloat).device(at::kCUDA));

test/cpp/compat/ATen_CUDABlas_test.cc:82

  • This CUDA test file performs cudaMalloc/cudaMemcpy/cudaDeviceSynchronize without any runtime/device availability guard. If the binary is executed on a CUDA build without GPUs (device_count==0) or with an unusable runtime, these calls will fail and break the test run. Consider adding a check at the start of each test (e.g., if !at::cuda::is_available() then GTEST_SKIP()/return) before running device allocations.
  void Run() {
    std::vector<T> h_a = {T(1), T(3), T(2), T(4)};
    std::vector<T> h_b = {T(5), T(7), T(6), T(8)};
    std::vector<T> h_c(N * N, T(0));

    MathT alpha = static_cast<MathT>(1);
    MathT beta = static_cast<MathT>(0);

    runOnDevice(h_a, h_b, &h_c, [&](T* d_a, T* d_b, T* d_c) {

test/cpp/compat/ATen_cuda_test.cc:40

  • The Tensor::cuda() tests now run without checking CUDA availability. In CUDA/HIP builds executed on machines without any GPU devices, cpu_t.cuda() is expected to throw and will fail the suite. Please add a runtime guard (e.g., if !torch::cuda::is_available() then GTEST_SKIP()/return) before calling cuda() in these tests.
// After cuda(), the tensor should reside on a GPU device.
TEST(TensorCudaTest, CpuTensorMovesToCuda) {
  at::Tensor cpu_t = at::tensor({1.0f, 2.0f, 3.0f}, at::kFloat);
  ASSERT_TRUE(cpu_t.is_cpu());

  at::Tensor cuda_t = cpu_t.cuda();
  ASSERT_TRUE(cuda_t.is_cuda());
  ASSERT_FALSE(cuda_t.is_cpu());
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@ShigureNyako ShigureNyako left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这次 PR 的主方向我理解为两件事:

  1. 统一 test/cpp/compat 下部分测试的命名(ATen / c10);
  2. 顺手清理未使用变量、删除 cuda_test_utils.h

我这边先不给过,主要有两个需要先处理的问题:

  1. 测试 target 改名带来了工程风险
    当前 CMakeLists.txt 不只是改了文件名,也把多个 test target 一起改名了。Coverage job 已经把这些 target 识别成“删除单测”并失败,日志里明确列出了 torch_library_testcompat_basic_testcompat_toString_testcuda_generator_testcompat_dense_sparse_conversion_test。如果只是想统一源码文件名,建议保留原 target 名;如果确实要改 target 名,需要把对应的删测校验 / 注册链路一并处理掉。

  2. 删除运行时 guard 后,CUDA/HIP 测试会在无可用设备环境下硬失败
    这批改动把很多 SKIP_IF_CUDA_RUNTIME_UNAVAILABLE() 一起删掉了,但 #if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP) 只保证编译期开启 GPU,不保证执行时一定有可用 GPU / runtime。像 getCurrentCUDAStream()getNumGPUs()、构造 CUDA tensor 这类调用,在无卡或驱动不可用环境里会直接失败。这里更稳妥的做法还是保留原有 guard,或者统一改成 at::cuda::is_available() + GTEST_SKIP()

补充看了一下 CI:

  • 当前 mergeability 显示只差 review/approval,required checks 没有拦住合并;
  • 但有两个非通过项里,Coverage failure 是和这次改动直接相关的
  • Linux-IXUCA / Build and Test 看起来是外部 patch apply 失败,和本 PR 改的文件关系不大,我没有把它作为这次阻塞结论的依据。

建议先把上面两个问题收掉,我再继续看下一轮。

Comment on lines +44 to +50
nv_test(c10_cuda_generator_test SRCS c10_cuda_generator_test.cc)
nv_test(c10_generator_impl_test SRCS c10_generator_impl_test.cc)
nv_test(c10_intrusive_ptr_lifecycle_test
SRCS c10_intrusive_ptr_lifecycle_test.cc)
nv_test(c10_DispatchKey_test SRCS c10_DispatchKey_test.cc)
nv_test(c10_DispatchKeySet_test SRCS c10_DispatchKeySet_test.cc)
cc_test(torch_library_test SRCS torch_library_test.cc)
cc_test(ATen_torch_library_test SRCS ATen_torch_library_test.cc)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里改的不只是文件名,还改了测试 target 名。当前 Coverage job 已经把这些 target 识别成“删除单测”,并直接报错:torch_library_testcompat_basic_testcompat_toString_testcuda_generator_testcompat_dense_sparse_conversion_test 都在失败日志里被列出来了。

如果这次 PR 的目标主要是统一源码文件命名,建议保留原来的 target 名,只修改 SRCS 指向的新文件名;如果确实需要连 target 一起改名,就要同步处理删测校验/注册逻辑,否则这条 CI 会一直是红的。

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个没关系,等 review 就好了

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个没关系,等 review 就好了

收到,我刚补看了一下 Coverage 日志,这里确实更像是 approval-gated 的删测校验,不是实现本身必然有问题:当前是因为 target rename 被识别成删除了 torch_library_test / compat_basic_test / compat_toString_test / cuda_generator_test / compat_dense_sparse_conversion_test,但还没有对应 review,所以先报红。

如果 maintainer 这边确认“改 target 名”本身可以接受,那这条我就不把它单独当成实现层面的 blocker 了;我现在更关注的还是下面这个运行时 guard 被删掉后的行为退化问题。

Comment on lines 30 to 44
TEST(CUDAFunctionsTest, DeviceSynchronize) {
SKIP_IF_CUDA_RUNTIME_UNAVAILABLE();
// Exercises the PADDLE_ENFORCE_GPU_SUCCESS(cudaDeviceSynchronize()) branch
ASSERT_NO_THROW(c10::cuda::device_synchronize());
}

TEST(CUDAFunctionsTest, StreamSynchronize) {
SKIP_IF_CUDA_RUNTIME_UNAVAILABLE();
// Exercises phi::backends::gpu::GpuStreamSync()
auto stream = c10::cuda::getCurrentCUDAStream();
ASSERT_NO_THROW(c10::cuda::stream_synchronize(stream));
}

TEST(CUDAFunctionsTest, AtNamespaceAliases) {
SKIP_IF_CUDA_RUNTIME_UNAVAILABLE();
// Exercises the using aliases in at::cuda namespace
ASSERT_NO_THROW(at::cuda::device_synchronize());
auto stream = c10::cuda::getCurrentCUDAStream();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里把运行时 guard 一起删掉后,#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP) 只保证“编译时启用了 GPU”,并不保证“执行环境一定有可用设备 / runtime”。

像下面这些调用:getCurrentCUDAStream()getNumGPUs()getCurrentDeviceProperties(),在无卡或驱动不可用的环境里都可能直接抛错;而原来的 cuda_test_utils.h 实际上同时检查了 device_count()cudaFree/hipFree 是否可用。

建议保留这层运行时判断,或者统一改成 if (!at::cuda::is_available()) { GTEST_SKIP() << ...; }。否则这批 compat tests 会从“可跳过”变成“硬失败”。

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@youge325 话说这个当初为啥要加,现在又为啥要删?是因为只在 CUDA 上跑吗?其实我觉得如果可以的话,我还是希望这些单测能测就尽可能在 CPU 上测一测的,当初 CMakeListsWITH_GPU 只是早期希望快速跑通而已

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

当初用 Agent 加测试提高 coverage rate 的时候自动加的,按我的理解,如果能编译成功的话,那么测试也不能随便就 skip,所以现在就删了。

有些单测也确实可以在CPU上跑,条件编译一下测试用例就行,比如 #if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP), 就 ASSERT_NO_THROW#elseASSERT_THROW,我看看有哪些测试用例可以这样改

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShigureNyako 帮我看看 test/cpp/compat/CMakeLists.txt 是否可以把 if(WITH_GPU) 删掉

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

刚刚 auth 过期了,@ShigureNyako 看看呢!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShigureNyako 请给我一个 GPU build 但运行时无可用 device/runtime 的例子,在 CI 当中并没有出现这种情况, GPU build 就是为了在有可用 device/runtime 的场景中运行,我原来的代码写错了所以现在删掉,你可能倒果为因了

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请给我一个 GPU build 但运行时无可用 device/runtime 的例子,在 CI 当中并没有出现这种情况, GPU build 就是为了在有可用 device/runtime 的场景中运行,我原来的代码写错了所以现在删掉,你可能倒果为因了

你这个提醒是对的,我这里把表述收敛一下:就这条 PR 当前的 CI 证据来看,我手头没有一个已经实际打出来的 “GPU build 但运行时无 device/runtime” 失败样例。 现在能直接看到的红灯,还是 Coverage 那条删测校验,以及 Linux-IXUCA 的外部 patch apply 失败。

我这边现在更准确的关注点,其实是另一层:不要因为补了部分 CPU-only compiling branch,就直接推导出可以把整块 if(WITH_GPU) 去掉。

原因是当前这批 test 里仍然混着不少明确的 GPU-only case,例如:

  • ATen_CUDABlas_test.cc 直接依赖 cudaMalloc/cudaMemcpy/cudaDeviceSynchronize
  • ATen_cuda_test.cc 直接测试 Tensor::cuda()
  • ATen_record_stream_test.cc 现在在 fixture SetUp() 里也还是会创建 cuda_tensor

所以我认同你说的:“当前 CI 没有证明 runtime 不可用场景会打到这里”。但从代码拆分角度看,我还是倾向于把结论限定成:

  • 这次 a0be3deeATen_CUDAContext_test.cc / c10_Stream_test.cc / c10_Event_test.cc 这类补 CPU compiling branch 的方向是对的;
  • 但如果后面要把更多 compat tests 真正下沉到 CPU 路径,还是得先把 CPU-capable case 和 GPU-only case 拆开,而不是现在就整体删 if(WITH_GPU)

我会按这个收敛后的判断继续看,不再把“当前 CI 已经证明有 runtime availability 问题”当成论据本身。

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShigureNyako 如果从严谨性上来看,确实会有编译期 WITH_GPU 但是实际 runtime 没有的场景,比如我们 CI 里所有机器都是 CPU 编包的,即便是 Coverage build 也是,但是虽然 Coverage build 本身在 CPU 上,但其产物一定会在后续 runtime 有 GPU 的 Coverage test 上运行,也就是说目前 CI 上基本上有一个假设就是,编译时带了 GPU,那运行时就一定带 GPU,这其实不应该是考虑的重点

当初早期 C++ 兼容性第一个 PR #74402 为赶 3.2 发版在拉分支前夕合入临时跳过了一些内容,应该是 defd6a3 跳过了一些新硬件(根据上一个 commit 99f7b88 的报错和该 commit message 看起来是 XPU 和 NPU,但日志已经过期看不到了)上的运行时问题,请着眼于该场景分析下具体可能存在的问题,以及是否有推荐的解决方案

注意,我们即便不能推广到所有硬件(比如 XPU 和 NPU),即便跳过部分硬件,也比直接放弃 CPU 测试更好,我们的目标不是一次修复所有硬件上的问题(当然可以更好),只是希望在确保 CPU 上能够有所监控

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请着眼于该场景分析下具体可能存在的问题,以及是否有推荐的解决方案

我按你给的线索回看了一下 PR #74402 里的相关 commit,能从代码里直接确认到的点大概是这样:

  1. 99f7b88d321d68d7fe8597c4d5bd31f7b424272b 本身只是给 ivalue.h 里的 generic_to 补前置声明,主要是 warning / 编译告警层面的修正,和 XPU/NPU 场景本身关系不大。
  2. 真正直接指向“新硬件 build 问题”的是 defd6a3cc60ca6bc3a2df5f58f2043ee0616a48c (fix build on xpu&npu),它至少做了三件很关键的事:
    • test/cpp/compat/CMakeLists.txt 里把 compat_basic_testtorch_library_test 从外层挪进了 if(WITH_GPU)
    • paddle/utils/pybind.h 里把 <ATen/core/TensorBody.h> 包进了 #if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
    • 同一次 commit 还删除了旧的 torch_like_api/ATen/core/TensorBase.h

所以如果按这个历史上下文收敛,我现在更倾向于认为:当时真正要绕开的,不是单纯“这些测试在 runtime 上能不能 skip”,而是“CPU-capable 的测试文件里混入了 CUDA-only 头文件 / 链接依赖,导致 XPU/NPU build 只能整体退回到 WITH_GPU 里面”。

也就是说,这个场景下更具体的问题其实是:

  • 一部分 case 本身是 CPU 可测的;
  • 但它们和真正依赖 CUDA runtime / CUDA headers / CUDA libs 的 case 混在一个 target / 一个文件里;
  • 结果为了规避少数异构硬件问题,只能把整文件都锁回 WITH_GPU,CPU 监控也一起没了。

如果按这个目标来推,我这边比较推荐的解法会是分层处理,而不是一步到位追求“所有后端都修完”:

  1. 先把 CPU-capable case 拆出来恢复 CPU 监控

    • 这类最明显的还是 torch_library_testcompat_basic_test 里的 CPU 段,以及 ATen_toString_test 的 CPU 段;
    • 这部分更适合独立成 cc_test / CPU target,而不是继续和 GPU-only case 绑在一起。
  2. 把“CPU 可编译,但 GPU 语义不同”的 case 继续保留成条件分支

    • 比如这次 ATen_CUDAContext_test.cc 新补的 ASSERT_THROW / ASSERT_FALSE 这种 CPU-only branch,我觉得方向是对的;
    • 它解决的是“同一个 API 在 CPU-only build 下的 contract”。
  3. 真正依赖 CUDA runtime 的 case 继续留在 if(WITH_GPU) / nv_test

    • 例如 ATen_CUDABlas_test.ccATen_cuda_test.ccATen_record_stream_test.cc、GPU pointer 的 ATen_from_blob_test.cc
    • 这些 case 不适合为了扩 CPU 覆盖而整体挪到 CPU 路径。
  4. XPU / NPU 上如果仍有个别 API 不通,优先做 backend-specific 规避,而不是整体放弃 CPU 测试

    • 也就是尽量把“跳过部分硬件”收敛到具体 case / 具体 backend,避免再次把整批 CPU-capable 监控一起关掉。

从这个角度看,我会把当前 review 立场进一步收敛成:支持把 CPU-capable case 往 CPU 流水线拆,但不支持在这一步直接整体移除 if(WITH_GPU),也不支持把仍然混有 GPU-only 依赖的 target 直接默认成 CPU 可跑。

这样比较贴近当年 defd6a3 想绕开的具体问题,也更符合“先把 CPU 上能监控的部分恢复起来,再逐步收敛 XPU/NPU 等异构硬件差异”的目标。

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我看了最新提交 d6e9be4 (fix xpu),能理解你这里是在为 XPU build 继续收敛 CMakeLists,但我这里还是想把结论再收窄一下:我不太建议把这批 case 继续整体收回 nv_test

这次从 CPU 路径挪回 GPU 路径的文件里,至少有几类情况其实不太一样:

  1. 明显还是 CPU-capable 的 case

    • ATen_all_test.cc
    • ATen_as_strided_test.cc
    • ATen_index_test.cc

    这几个文件我回看了下,正文基本都是纯 CPU tensor / indexing / view 语义,没有实际用到 CUDA runtime。这里更像是“文件级依赖或目标划分”问题,而不是 test case 本身只能放在 GPU 路径里。

  2. 正文基本是 CPU case,但文件头带了无关 CUDA include

    • ATen_transpose_test.cc
    • ATen_viewAs_test.cc

    这两个文件目前也没有真正用到 c10::cuda::* / at::cuda::* 逻辑,感觉更像是可以先把无关的 CUDA 头删掉,再继续留在 CPU 路径监控。

  3. CPU / GPU case 混在同一个文件里,适合继续拆分

    • ATen_basic_test.cc
    • ATen_from_blob_test.cc

    这类我能理解为什么会在 XPU 上出问题,但如果问题来源是同文件里混有 GPU 段,我还是更倾向于:把 CPU-only 部分和 GPU-only 部分拆开,而不是把整文件都回退到 nv_test

所以从 review 角度看,我现在的判断还是一致的:如果目标是“先把 CPU 上能监控的部分保住”,那优先级应该是清理无关 CUDA include / 拆分混合文件,而不是把一批本来可以在 CPU 上监控的 compat tests 再收回 GPU-only 路径。

也就是说,这个 fix xpu 方向我理解,但我这边还不能把它当成最终解法;更理想的收敛方式还是尽量把 CPU-capable case 留在 cc_test,只把确实有 CUDA 依赖的部分留在 nv_test

@youge325
Copy link
Copy Markdown
Contributor Author

youge325 commented Apr 5, 2026

/re-run all-failed

@youge325
Copy link
Copy Markdown
Contributor Author

youge325 commented Apr 5, 2026

@ShigureNyako 这不对吧,为什么 Mac-CPU 的编译会出现这种错误

[2487/3112] Building CXX object test/CMakeFiles/ATen_basic_test.dir/cpp/compat/ATen_basic_test.cc.o
FAILED: test/CMakeFiles/ATen_basic_test.dir/cpp/compat/ATen_basic_test.cc.o 
/opt/homebrew/bin/ccache /Library/Developer/CommandLineTools/usr/bin/c++ -DGLOG_NO_ABBREVIATED_SEVERITIES -DHPPL_STUB_FUNC -DLAPACK_FOUND -DPADDLE_DISABLE_PROFILER -DPADDLE_SLEEF_POW_PRECISION=10 -DPADDLE_USE_ACCELERATE -DPADDLE_VERSION=0.0.0 -DPADDLE_VERSION_INTEGER=0 -DPADDLE_WITH_ARM -DPADDLE_WITH_CRYPTO -DPADDLE_WITH_POCKETFFT -DPADDLE_WITH_SLEEF -DPADDLE_WITH_TESTING -DPHI_INNER -DPHI_SHARED -DYAML_CPP_STATIC_DEFINE -I. -I../paddle/fluid/framework/io -Ithird_party/install/zlib/include -Ithird_party/install -Ithird_party/install/gflags/include -Ithird_party/install/glog/include -I../third_party/eigen3 -I../third_party/threadpool -I../third_party/dlpack/include -Ithird_party/install/xxhash/include -Ithird_party/install/warpctc/include -Ithird_party/install/warprnnt/include -Ithird_party/install/utf8proc/include -Ithird_party/install/protobuf/include -I../third_party/nlohmann_json/include -Ithird_party/install/yaml-cpp/include -I/Library/Frameworks/Python.framework/Versions/3.10/include/python3.10 -I/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/numpy/_core/include -Ithird_party/pybind/src/extern_pybind/include -Ithird_party/install/gtest/include -Ithird_party/install/libuv/include -Ithird_party/install/cryptopp/include -Ithird_party/pocketfft/src -Ithird_party/install/sleef/include -I../ -I../paddle/phi/api/include/compat -I../paddle/phi/api/include/compat/torch/csrc/api/include -DCRYPTOPP_ARM_CRC32_AVAILABLE=0 -std=c++17 -Wno-deprecated-register -Werror=format -Werror=braced-scalar-init -Werror=uninitialized -Werror=tautological-constant-out-of-range-compare -Werror=literal-conversion -Werror=pragma-pack -Werror=c++17-extensions  -fPIC -O3 -DNDEBUG -arch arm64 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.4.sdk -mmacosx-version-min=15.1 -MD -MT test/CMakeFiles/ATen_basic_test.dir/cpp/compat/ATen_basic_test.cc.o -MF test/CMakeFiles/ATen_basic_test.dir/cpp/compat/ATen_basic_test.cc.o.d -o test/CMakeFiles/ATen_basic_test.dir/cpp/compat/ATen_basic_test.cc.o -c ../test/cpp/compat/ATen_basic_test.cc
In file included from ../test/cpp/compat/ATen_basic_test.cc:15:
In file included from ../paddle/phi/api/include/compat/ATen/Functions.h:17:
In file included from ../paddle/phi/api/include/compat/ATen/ops/_local_scalar_dense.h:21:
In file included from ../paddle/phi/api/include/compat/ATen/core/Tensor.h:17:
In file included from ../paddle/phi/api/include/compat/ATen/core/TensorBody.h:18:
In file included from ../paddle/phi/api/include/compat/ATen/core/TensorBase.h:18:
In file included from ../paddle/phi/api/include/compat/c10/core/Device.h:37:
In file included from ../paddle/phi/core/platform/device_event_base.h:19:
In file included from ../paddle/phi/core/platform/device_context.h:25:
In file included from ../paddle/phi/core/memory/malloc.h:22:
../paddle/phi/core/memory/allocation/allocator.h:293:20: warning: 'AllocateImpl' overrides a member function but is not marked 'override' [-Winconsistent-missing-override]
  phi::Allocation* AllocateImpl(size_t UNUSED) { return nullptr; }
                   ^
../paddle/phi/core/memory/allocation/allocator.h:210:28: note: overridden virtual function is here
  virtual phi::Allocation* AllocateImpl(size_t size) = 0;
                           ^
../test/cpp/compat/ATen_basic_test.cc:222:42: error: a type named 'TensorOptions' is hidden by a declaration in a different namespace
        torch::from_blob(pp, {3}, torch::TensorOptions().dtype(torch::kInt64));
                                  ~~~~~~~^
../paddle/phi/api/include/compat/c10/core/TensorOptions.h:55:19: note: type declaration hidden
struct PADDLE_API TensorOptions {
                  ^
../paddle/phi/api/include/compat/ATen/core/TensorBody.h:61:7: note: declaration hides type
using TensorOptions = c10::TensorOptions;
      ^
1 warning and 1 error generated.

@SigureMo
Copy link
Copy Markdown
Member

SigureMo commented Apr 5, 2026

@ShigureNyako 这不对吧,为什么 Mac-CPU 的编译会出现这种错误

消息的自动处理还是有不少问题,这个又没响应,还在优化

PaddlePaddle/Paddle/actions/runs/24000765303/job/69998247025 xpu 确实有一部分算子不支持

这些是不是可以看看能不能 XPU 先跳过下

@ShigureNyako
Copy link
Copy Markdown
Contributor

@youge325 你这个提醒是对的,我补看了你贴的 Mac-CPU 日志之后,这里把判断再收窄一下。

ATen_basic_test.cc 来说,这个问题已经不是抽象的“XPU 上可能有不支持算子”,而是一个 CPU 流水线里的实锤编译期 blocker

torch::from_blob(pp, {3}, torch::TensorOptions().dtype(torch::kInt64));

在你贴的那条 Mac-CPU 日志里,这里会触发 TensorOptions 的名字查找/隐藏问题:

  • compat/c10/core/TensorOptions.h 里有 c10::TensorOptions
  • ATen/core/TensorBody.h 里又有 using TensorOptions = c10::TensorOptions
  • 在这条编译路径上,torch::TensorOptions() 这里被 clang/mac 报成 "a type named 'TensorOptions' is hidden by a declaration in a different namespace"

所以 ATen_basic_test 按当前文件内容,确实不能直接继续留在 CPU 路径里。这一点你举的 Mac-CPU 证据是成立的,我前一条把它和纯 CPU-capable 文件并列得不够严谨,这里更正一下。

不过我这边想表达的点也还是再限定一下:

  1. ATen_basic_test 这个文件单独看,收回 GPU-only / 或继续拆分,我都能理解

    • 短期先从 CPU 路径挪开,避免 Mac-CPU 直接红,是合理止血;
    • 长期如果还想恢复 CPU 监控,更好的方式还是把 CPU-only 段和 GPU/compat 相关段拆开,或者把这里的 TensorOptions / include 问题单独修掉。
  2. 但这条 Mac-CPU 证据不自动推出其它文件也都必须一起收回 nv_test

    • 比如我前面单独点到的 ATen_all_test.ccATen_as_strided_test.ccATen_index_test.cc,从文件正文看还是更偏纯 CPU 语义;
    • ATen_transpose_test.cc / ATen_viewAs_test.cc 也更像是文件头里的无关 CUDA include 需要清理。

所以我现在更准确的立场是:

  • ATen_basic_test 这一个文件,你这次基于 Mac-CPU/XPU 问题把它先移出 CPU 路径,我认可它背后有具体证据;
  • 不建议把这条理由直接外推到其它本来还能保住 CPU 监控的 compat tests 上

我会按这个更精确的口径继续看后面的 CI / 拆分情况。

@youge325
Copy link
Copy Markdown
Contributor Author

youge325 commented Apr 5, 2026

@ShigureNyako 我给你贴一下完整的日志,证明确实是 xpu 不支持部分算子,毕竟 cpu 都没问题

  =========================================
  This is the fourth time to re-run
  =========================================
  The following unittest will be re-run:
  ATen_all_test 
  ATen_as_strided_test 
  ATen_from_blob_test 
  ATen_index_test 
  ATen_transpose_test 
  ATen_viewAs_test 
  ATen_basic_test 
  test_compat_slogdet 
  =========================================
  Test project /paddle/build
      Start  207: ATen_all_test
  1/8 Test  #207: ATen_all_test ....................***Failed    0.41 sec
  XCCL /paddle/build/python/paddle/base/../libs/libbkcl.so loaded
  [==========] Running 48 tests from 3 test cases.
  [----------] Global test environment set-up.
  [----------] 17 tests from TestAll
  [ RUN      ] TestAll.AllNoDim
  /paddle/test/cpp/compat/ATen_all_test.cc:40: Failure
  Expected equality of these values:
    result.item<bool>()
      Which is: true
    false
  [  FAILED  ] TestAll.AllNoDim (0 ms)
  [ RUN      ] TestAll.AllWithDim
  /paddle/test/cpp/compat/ATen_all_test.cc:56: Failure
  Expected equality of these values:
    result_dim0.data_ptr<bool>()[0]
      Which is: true
    false
  [  FAILED  ] TestAll.AllWithDim (0 ms)
  [ RUN      ] TestAll.AllWithDimKeepdim
  [       OK ] TestAll.AllWithDimKeepdim (0 ms)
  [ RUN      ] TestAll.AllWithOptionalDim
  [       OK ] TestAll.AllWithOptionalDim (0 ms)
  [ RUN      ] TestAll.AllNoDimAllFalse
  [       OK ] TestAll.AllNoDimAllFalse (0 ms)
  [ RUN      ] TestAll.AllNoDimSingleElement
  [       OK ] TestAll.AllNoDimSingleElement (0 ms)
  [ RUN      ] TestAll.AllWithNegativeDim
  /paddle/test/cpp/compat/ATen_all_test.cc:106: Failure
  Expected equality of these values:
    result.data_ptr<bool>()[0]
      Which is: true
    false
  [  FAILED  ] TestAll.AllWithNegativeDim (0 ms)
  [ RUN      ] TestAll.AllWithDimKeepdimTrue
  /paddle/test/cpp/compat/ATen_all_test.cc:117: Failure
  Expected equality of these values:
    result_dim0.data_ptr<bool>()[0]
      Which is: true
    false
  [  FAILED  ] TestAll.AllWithDimKeepdimTrue (1 ms)
  [ RUN      ] TestAll.AllWithOptionalDimNullopt
  [       OK ] TestAll.AllWithOptionalDimNullopt (0 ms)
  [ RUN      ] TestAll.AllWithOptionalDimNulloptHasFalse
  /paddle/test/cpp/compat/ATen_all_test.cc:143: Failure
  Expected equality of these values:
    result.item<bool>()
      Which is: true
    false
  [  FAILED  ] TestAll.AllWithOptionalDimNulloptHasFalse (0 ms)
  [ RUN      ] TestAll.AllWithOptionalDimKeepdim
  [       OK ] TestAll.AllWithOptionalDimKeepdim (0 ms)
  [ RUN      ] TestAll.AllWithOptionalMultipleDims
  [       OK ] TestAll.AllWithOptionalMultipleDims (0 ms)
  [ RUN      ] TestAll.MemberAllWithOptionalNullopt
  [       OK ] TestAll.MemberAllWithOptionalNullopt (0 ms)
  [ RUN      ] TestAll.MemberAllWithOptionalNulloptKeepdim
  [       OK ] TestAll.MemberAllWithOptionalNulloptKeepdim (0 ms)
  [ RUN      ] TestAll.StandaloneFunction
  /paddle/test/cpp/compat/ATen_all_test.cc:188: Failure
  Expected equality of these values:
    result.item<bool>()
      Which is: true
    false
  [  FAILED  ] TestAll.StandaloneFunction (0 ms)
  [ RUN      ] TestAll.StandaloneFunctionWithDim
  /paddle/test/cpp/compat/ATen_all_test.cc:198: Failure
  Expected equality of these values:
    result.data_ptr<bool>()[0]
      Which is: true
    false
  [  FAILED  ] TestAll.StandaloneFunctionWithDim (0 ms)
  [ RUN      ] TestAll.AllWith3DTensor
  /paddle/test/cpp/compat/ATen_all_test.cc:212: Failure
  Expected equality of these values:
    result_all.item<bool>()
      Which is: true
    false
  [  FAILED  ] TestAll.AllWith3DTensor (0 ms)
  [----------] 17 tests from TestAll (1 ms total)
  
  [----------] 24 tests from TestAllclose
  [ RUN      ] TestAllclose.AllcloseBasic
  [       OK ] TestAllclose.AllcloseBasic (0 ms)
  [ RUN      ] TestAllclose.AllcloseNotEqual
  /paddle/test/cpp/compat/ATen_all_test.cc:237: Failure
  Expected equality of these values:
    result
      Which is: true
    false
  [  FAILED  ] TestAllclose.AllcloseNotEqual (0 ms)
  [ RUN      ] TestAllclose.StandaloneFunction
  [       OK ] TestAllclose.StandaloneFunction (0 ms)
  [ RUN      ] TestAllclose.AllcloseWithCustomRtol
  /paddle/test/cpp/compat/ATen_all_test.cc:257: Failure
  Expected equality of these values:
    result_default
      Which is: true
    false
  [  FAILED  ] TestAllclose.AllcloseWithCustomRtol (0 ms)
  [ RUN      ] TestAllclose.AllcloseWithCustomAtol
  /paddle/test/cpp/compat/ATen_all_test.cc:272: Failure
  Expected equality of these values:
    result_default
      Which is: true
    false
  [  FAILED  ] TestAllclose.AllcloseWithCustomAtol (0 ms)
  [ RUN      ] TestAllclose.AllcloseMemberWithAllParams
  [       OK ] TestAllclose.AllcloseMemberWithAllParams (0 ms)
  [ RUN      ] TestAllclose.AllcloseMemberNotClose
  /paddle/test/cpp/compat/ATen_all_test.cc:295: Failure
  Expected equality of these values:
    result
      Which is: true
    false
  [  FAILED  ] TestAllclose.AllcloseMemberNotClose (0 ms)
  [ RUN      ] TestAllclose.AllcloseMemberWithCustomTolerance
  /paddle/test/cpp/compat/ATen_all_test.cc:305: Failure
  Expected equality of these values:
    tensor1.allclose(tensor2)
      Which is: true
    false
  [  FAILED  ] TestAllclose.AllcloseMemberWithCustomTolerance (0 ms)
  [ RUN      ] TestAllclose.AllcloseExactZeros
  [       OK ] TestAllclose.AllcloseExactZeros (0 ms)
  [ RUN      ] TestAllclose.AllcloseHighDim
  [       OK ] TestAllclose.AllcloseHighDim (0 ms)
  [ RUN      ] TestAllclose.AllcloseEqualNanDefaultFalse
  [       OK ] TestAllclose.AllcloseEqualNanDefaultFalse (0 ms)
  [ RUN      ] TestAllclose.AllcloseEqualNanTrue
  [       OK ] TestAllclose.AllcloseEqualNanTrue (0 ms)
  [ RUN      ] TestAllclose.AllcloseEqualNanTrueAllNan
  [       OK ] TestAllclose.AllcloseEqualNanTrueAllNan (0 ms)
  [ RUN      ] TestAllclose.AllcloseMemberEqualNanTrue
  [       OK ] TestAllclose.AllcloseMemberEqualNanTrue (0 ms)
  [ RUN      ] TestAllclose.AllcloseMixedNanAndValues
  [       OK ] TestAllclose.AllcloseMixedNanAndValues (0 ms)
  [ RUN      ] TestAllclose.AllcloseDouble
  /paddle/test/cpp/compat/ATen_all_test.cc:429: Failure
  Expected equality of these values:
    result_diff
      Which is: true
    false
  [  FAILED  ] TestAllclose.AllcloseDouble (0 ms)
  [ RUN      ] TestAllclose.AllcloseDoubleEqualNan
  [       OK ] TestAllclose.AllcloseDoubleEqualNan (0 ms)
  [ RUN      ] TestAllclose.AllcloseStandaloneWithExplicitParams
  [       OK ] TestAllclose.AllcloseStandaloneWithExplicitParams (0 ms)
  [ RUN      ] TestAllclose.AllcloseInfinityValues
  [       OK ] TestAllclose.AllcloseInfinityValues (0 ms)
  [ RUN      ] TestAllclose.AllcloseInt32
  /paddle/test/cpp/compat/ATen_all_test.cc:498: Failure
  Expected equality of these values:
    result_diff
      Which is: true
    false
  [  FAILED  ] TestAllclose.AllcloseInt32 (0 ms)
  [ RUN      ] TestAllclose.AllcloseInt64
  /paddle/test/cpp/compat/ATen_all_test.cc:518: Failure
  Expected equality of these values:
    result_diff
      Which is: true
    false
  [  FAILED  ] TestAllclose.AllcloseInt64 (0 ms)
  [ RUN      ] TestAllclose.AllcloseEmptyTensor
  [       OK ] TestAllclose.AllcloseEmptyTensor (0 ms)
  [ RUN      ] TestAllclose.AllcloseScalarTensor
  [       OK ] TestAllclose.AllcloseScalarTensor (0 ms)
  [ RUN      ] TestAllclose.AllcloseWithDifferentRtolAtolOrder
  /paddle/test/cpp/compat/ATen_all_test.cc:570: Failure
  Expected equality of these values:
    result2
      Which is: true
    false
  [  FAILED  ] TestAllclose.AllcloseWithDifferentRtolAtolOrder (0 ms)
  [----------] 24 tests from TestAllclose (0 ms total)
  
  [----------] 7 tests from TestAbsolute
  [ RUN      ] TestAbsolute.AbsoluteBasic
  [       OK ] TestAbsolute.AbsoluteBasic (0 ms)
  [ RUN      ] TestAbsolute.AbsoluteNegativeOnly
  [       OK ] TestAbsolute.AbsoluteNegativeOnly (0 ms)
  [ RUN      ] TestAbsolute.AbsoluteZero
  [       OK ] TestAbsolute.AbsoluteZero (0 ms)
  [ RUN      ] TestAbsolute.AbsoluteInPlace
  [       OK ] TestAbsolute.AbsoluteInPlace (0 ms)
  [ RUN      ] TestAbsolute.AbsoluteInPlaceNegative
  [       OK ] TestAbsolute.AbsoluteInPlaceNegative (0 ms)
  [ RUN      ] TestAbsolute.AbsoluteDouble
  [       OK ] TestAbsolute.AbsoluteDouble (0 ms)
  [ RUN      ] TestAbsolute.AbsoluteMatchesAbs
  [       OK ] TestAbsolute.AbsoluteMatchesAbs (0 ms)
  [----------] 7 tests from TestAbsolute (0 ms total)
  
  [----------] Global test environment tear-down
  [==========] 48 tests from 3 test cases ran. (1 ms total)
  [  PASSED  ] 31 tests.
  [  FAILED  ] 17 tests, listed below:
  [  FAILED  ] TestAll.AllNoDim
  [  FAILED  ] TestAll.AllWithDim
  [  FAILED  ] TestAll.AllWithNegativeDim
  [  FAILED  ] TestAll.AllWithDimKeepdimTrue
  [  FAILED  ] TestAll.AllWithOptionalDimNulloptHasFalse
  [  FAILED  ] TestAll.StandaloneFunction
  [  FAILED  ] TestAll.StandaloneFunctionWithDim
  [  FAILED  ] TestAll.AllWith3DTensor
  [  FAILED  ] TestAllclose.AllcloseNotEqual
  [  FAILED  ] TestAllclose.AllcloseWithCustomRtol
  [  FAILED  ] TestAllclose.AllcloseWithCustomAtol
  [  FAILED  ] TestAllclose.AllcloseMemberNotClose
  [  FAILED  ] TestAllclose.AllcloseMemberWithCustomTolerance
  [  FAILED  ] TestAllclose.AllcloseDouble
  [  FAILED  ] TestAllclose.AllcloseInt32
  [  FAILED  ] TestAllclose.AllcloseInt64
  [  FAILED  ] TestAllclose.AllcloseWithDifferentRtolAtolOrder
  
  17 FAILED TESTS
  
      Start  209: ATen_as_strided_test
  2/8 Test  #209: ATen_as_strided_test .............***Failed    0.40 sec
  XCCL /paddle/build/python/paddle/base/../libs/libbkcl.so loaded
  [==========] Running 11 tests from 1 test case.
  [----------] Global test environment set-up.
  [----------] 11 tests from TensorAsStridedTest
  [ RUN      ] TensorAsStridedTest.AsStridedBasic
  [       OK ] TensorAsStridedTest.AsStridedBasic (0 ms)
  [ RUN      ] TensorAsStridedTest.AsStridedWithOffset
  [       OK ] TensorAsStridedTest.AsStridedWithOffset (0 ms)
  [ RUN      ] TensorAsStridedTest.AsStridedWithDifferentStrides
  [       OK ] TensorAsStridedTest.AsStridedWithDifferentStrides (0 ms)
  [ RUN      ] TensorAsStridedTest.AsStridedInplace
  [       OK ] TensorAsStridedTest.AsStridedInplace (0 ms)
  [ RUN      ] TensorAsStridedTest.AsStridedInplaceWithOffset
  [       OK ] TensorAsStridedTest.AsStridedInplaceWithOffset (0 ms)
  [ RUN      ] TensorAsStridedTest.AsStridedInplaceModifiesView
  [       OK ] TensorAsStridedTest.AsStridedInplaceModifiesView (0 ms)
  [ RUN      ] TensorAsStridedTest.AsStridedScatterBasic
  [       OK ] TensorAsStridedTest.AsStridedScatterBasic (0 ms)
  [ RUN      ] TensorAsStridedTest.AsStridedScatterOriginalUnchanged
  [       OK ] TensorAsStridedTest.AsStridedScatterOriginalUnchanged (0 ms)
  [ RUN      ] TensorAsStridedTest.AsStridedScatterWithOffset
  [       OK ] TensorAsStridedTest.AsStridedScatterWithOffset (0 ms)
  [ RUN      ] TensorAsStridedTest.AsStridedTranspose
  unknown file: Failure
  C++ exception with description "
  
  --------------------------------------
  C++ Traceback (most recent call last):
  --------------------------------------
  0   float* phi::DenseTensor::data<float>()
  1   phi::DenseTensor::data()
  2   phi::DenseTensor::check_memory_size() const
  3   common::enforce::GetCurrentTraceBackString[abi:cxx11](bool)
  
  ----------------------
  Error Message Summary:
  ----------------------
  FatalError: FLAGS_use_stride_kernel is closed. Not contiguous Tensor found, something wrong has happened! (at /paddle/paddle/phi/core/tensor_meta.cc:221)
  " thrown in the test body.
  [  FAILED  ] TensorAsStridedTest.AsStridedTranspose (4 ms)
  [ RUN      ] TensorAsStridedTest.AsStridedContiguous
  unknown file: Failure
  C++ exception with description "
  
  --------------------------------------
  C++ Traceback (most recent call last):
  --------------------------------------
  0   common::enforce::GetCurrentTraceBackString[abi:cxx11](bool)
  
  ----------------------
  Error Message Summary:
  ----------------------
  FatalError: FLAGS_use_stride_kernel is closed. Not contiguous Tensor found, something wrong has happened! (at /paddle/paddle/phi/core/tensor_meta.cc:221)
  " thrown in the test body.
  [  FAILED  ] TensorAsStridedTest.AsStridedContiguous (1 ms)
  [----------] 11 tests from TensorAsStridedTest (5 ms total)
  
  [----------] Global test environment tear-down
  [==========] 11 tests from 1 test case ran. (5 ms total)
  [  PASSED  ] 9 tests.
  [  FAILED  ] 2 tests, listed below:
  [  FAILED  ] TensorAsStridedTest.AsStridedTranspose
  [  FAILED  ] TensorAsStridedTest.AsStridedContiguous
  
   2 FAILED TESTS
  
      Start  221: ATen_from_blob_test
  3/8 Test  #221: ATen_from_blob_test ..............***Failed    0.40 sec
  XCCL /paddle/build/python/paddle/base/../libs/libbkcl.so loaded
  [==========] Running 7 tests from 1 test case.
  [----------] Global test environment set-up.
  [----------] 7 tests from ATenFromBlobTest
  [ RUN      ] ATenFromBlobTest.CpuPtrDefaultsToCpu
  [       OK ] ATenFromBlobTest.CpuPtrDefaultsToCpu (1 ms)
  [ RUN      ] ATenFromBlobTest.CpuPtrWithCpuOptions
  [       OK ] ATenFromBlobTest.CpuPtrWithCpuOptions (0 ms)
  [ RUN      ] ATenFromBlobTest.DataPtrPreserved
  [       OK ] ATenFromBlobTest.DataPtrPreserved (0 ms)
  [ RUN      ] ATenFromBlobTest.ShapeAndStrides
  [       OK ] ATenFromBlobTest.ShapeAndStrides (0 ms)
  [ RUN      ] ATenFromBlobTest.ExplicitStrides
  unknown file: Failure
  C++ exception with description "
  
  --------------------------------------
  C++ Traceback (most recent call last):
  --------------------------------------
  0   phi::DenseTensor::ResetHolder(std::shared_ptr<phi::Allocation> const&)
  1   common::enforce::GetCurrentTraceBackString[abi:cxx11](bool)
  
  ----------------------
  Error Message Summary:
  ----------------------
  FatalError: FLAGS_use_stride_kernel is closed. Not contiguous Tensor found, something wrong has happened! (at /paddle/paddle/phi/core/tensor_meta.cc:221)
  " thrown in the test body.
  [  FAILED  ] ATenFromBlobTest.ExplicitStrides (2 ms)
  [ RUN      ] ATenFromBlobTest.DeleterCalled
  [       OK ] ATenFromBlobTest.DeleterCalled (0 ms)
  [ RUN      ] ATenFromBlobTest.DeleterWithStrides
  [       OK ] ATenFromBlobTest.DeleterWithStrides (0 ms)
  [----------] 7 tests from ATenFromBlobTest (3 ms total)
  
  [----------] Global test environment tear-down
  [==========] 7 tests from 1 test case ran. (3 ms total)
  [  PASSED  ] 6 tests.
  [  FAILED  ] 1 test, listed below:
  [  FAILED  ] ATenFromBlobTest.ExplicitStrides
  
   1 FAILED TEST
  
      Start  223: ATen_index_test
  4/8 Test  #223: ATen_index_test ..................***Failed    0.49 sec
  XCCL /paddle/build/python/paddle/base/../libs/libbkcl.so loaded
  [==========] Running 15 tests from 2 test cases.
  [----------] Global test environment set-up.
  [----------] 9 tests from TensorIndexTest
  [ RUN      ] TensorIndexTest.IndexWithSingleTensor
  [       OK ] TensorIndexTest.IndexWithSingleTensor (0 ms)
  [ RUN      ] TensorIndexTest.SliceKeepsStrideWithoutContiguousCopy
  /paddle/test/cpp/compat/ATen_index_test.cc:58: Failure
  Value of: transposed.is_contiguous()
    Actual: true
  Expected: false
  [  FAILED  ] TensorIndexTest.SliceKeepsStrideWithoutContiguousCopy (0 ms)
  [ RUN      ] TensorIndexTest.IndexWithEmptyInitializerListReturnsSelf
  [       OK ] TensorIndexTest.IndexWithEmptyInitializerListReturnsSelf (0 ms)
  [ RUN      ] TensorIndexTest.IndexWithTensorInitializerList
  [       OK ] TensorIndexTest.IndexWithTensorInitializerList (0 ms)
  [ RUN      ] TensorIndexTest.MemberIndexWithArrayRefTensorIndices
  /paddle/test/cpp/compat/ATen_index_test.cc:107: Failure
  Expected equality of these values:
    sliced.strides()
      Which is: { 3, 1 }
    c10::IntArrayRef({1, 6})
      Which is: { 1, 6 }
  [  FAILED  ] TensorIndexTest.MemberIndexWithArrayRefTensorIndices (0 ms)
  [ RUN      ] TensorIndexTest.MixedSliceAndTensorIndicesThrows
  [       OK ] TensorIndexTest.MixedSliceAndTensorIndicesThrows (0 ms)
  [ RUN      ] TensorIndexTest.IndexWithEmptyList
  [       OK ] TensorIndexTest.IndexWithEmptyList (0 ms)
  [ RUN      ] TensorIndexTest.IndexWithMultipleIndices
  [       OK ] TensorIndexTest.IndexWithMultipleIndices (0 ms)
  [ RUN      ] TensorIndexTest.IndexWithOptionalNone
  [       OK ] TensorIndexTest.IndexWithOptionalNone (0 ms)
  [----------] 9 tests from TensorIndexTest (0 ms total)
  
  [----------] 6 tests from TensorIndexPutTest
  [ RUN      ] TensorIndexPutTest.IndexPutInplaceWithTensor
  [       OK ] TensorIndexPutTest.IndexPutInplaceWithTensor (70 ms)
  [ RUN      ] TensorIndexPutTest.IndexPutInplaceWithScalar
  [       OK ] TensorIndexPutTest.IndexPutInplaceWithScalar (0 ms)
  [ RUN      ] TensorIndexPutTest.IndexPutNonInplace
  [       OK ] TensorIndexPutTest.IndexPutNonInplace (1 ms)
  [ RUN      ] TensorIndexPutTest.IndexPutAccumulate
  [       OK ] TensorIndexPutTest.IndexPutAccumulate (0 ms)
  [ RUN      ] TensorIndexPutTest.IndexPutWith2D
  [       OK ] TensorIndexPutTest.IndexPutWith2D (0 ms)
  [ RUN      ] TensorIndexPutTest.IndexPutNonInplaceAccumulate
  [       OK ] TensorIndexPutTest.IndexPutNonInplaceAccumulate (0 ms)
  [----------] 6 tests from TensorIndexPutTest (71 ms total)
  
  [----------] Global test environment tear-down
  [==========] 15 tests from 2 test cases ran. (71 ms total)
  [  PASSED  ] 13 tests.
  [  FAILED  ] 2 tests, listed below:
  [  FAILED  ] TensorIndexTest.SliceKeepsStrideWithoutContiguousCopy
  [  FAILED  ] TensorIndexTest.MemberIndexWithArrayRefTensorIndices
  
   2 FAILED TESTS
  
      Start  238: ATen_transpose_test
  5/8 Test  #238: ATen_transpose_test ..............***Failed    0.40 sec
  XCCL /paddle/build/python/paddle/base/../libs/libbkcl.so loaded
  [==========] Running 6 tests from 1 test case.
  [----------] Global test environment set-up.
  [----------] 6 tests from TensorTransposeInplaceTest
  [ RUN      ] TensorTransposeInplaceTest.Transpose2D_SwapDims
  [       OK ] TensorTransposeInplaceTest.Transpose2D_SwapDims (0 ms)
  [ RUN      ] TensorTransposeInplaceTest.Transpose3D_SwapFirstTwo
  [       OK ] TensorTransposeInplaceTest.Transpose3D_SwapFirstTwo (0 ms)
  [ RUN      ] TensorTransposeInplaceTest.Transpose3D_SwapLastTwo
  [       OK ] TensorTransposeInplaceTest.Transpose3D_SwapLastTwo (0 ms)
  [ RUN      ] TensorTransposeInplaceTest.TransposeInplace_PreservesValues
  /paddle/test/cpp/compat/ATen_transpose_test.cc:81: Failure
  Expected equality of these values:
    t[2][0].item<float>()
      Which is: 3
    2.0f
      Which is: 2
  [  FAILED  ] TensorTransposeInplaceTest.TransposeInplace_PreservesValues (0 ms)
  [ RUN      ] TensorTransposeInplaceTest.TransposeInplace_SameDim_NoOp
  [       OK ] TensorTransposeInplaceTest.TransposeInplace_SameDim_NoOp (0 ms)
  [ RUN      ] TensorTransposeInplaceTest.TransposeInplace_DoubleTranspose_RestoresShape
  [       OK ] TensorTransposeInplaceTest.TransposeInplace_DoubleTranspose_RestoresShape (0 ms)
  [----------] 6 tests from TensorTransposeInplaceTest (0 ms total)
  
  [----------] Global test environment tear-down
  [==========] 6 tests from 1 test case ran. (1 ms total)
  [  PASSED  ] 5 tests.
  [  FAILED  ] 1 test, listed below:
  [  FAILED  ] TensorTransposeInplaceTest.TransposeInplace_PreservesValues
  
   1 FAILED TEST
  
      Start  241: ATen_viewAs_test
  6/8 Test  #241: ATen_viewAs_test .................***Failed    0.40 sec
  XCCL /paddle/build/python/paddle/base/../libs/libbkcl.so loaded
  [==========] Running 5 tests from 1 test case.
  [----------] Global test environment set-up.
  [----------] 5 tests from TensorViewAsTest
  [ RUN      ] TensorViewAsTest.ViewAsSameShape
  [       OK ] TensorViewAsTest.ViewAsSameShape (0 ms)
  [ RUN      ] TensorViewAsTest.ViewAsDifferentShape_CompatibleNumel
  /paddle/test/cpp/compat/ATen_viewAs_test.cc:52: Failure
  Expected equality of these values:
    result.dim()
      Which is: 1
    2
  [  FAILED  ] TensorViewAsTest.ViewAsDifferentShape_CompatibleNumel (0 ms)
  [ RUN      ] TensorViewAsTest.ViewAsPreservesData
  unknown file: Failure
  C++ exception with description "
  
  --------------------------------------
  C++ Traceback (most recent call last):
  --------------------------------------
  0   paddle::experimental::slice(paddle::Tensor const&, std::vector<long, std::allocator<long> > const&, paddle::experimental::IntArrayBase<paddle::Tensor> const&, paddle::experimental::IntArrayBase<paddle::Tensor> const&, std::vector<long, std::allocator<long> > const&, std::vector<long, std::allocator<long> > const&, paddle::optional<paddle::Tensor*>)
  1   common::enforce::GetCurrentTraceBackString[abi:cxx11](bool)
  
  ----------------------
  Error Message Summary:
  ----------------------
  InvalidArgumentError: The axis value should be less than the rank of input, but received axes[0] = 0, rank of input is 0.
    [Hint: Expected axis < in_dims.size(), but received axis:0 >= in_dims.size():0.] (at /paddle/paddle/phi/kernels/funcs/slice_utils.h:252)
  " thrown in the test body.
  [  FAILED  ] TensorViewAsTest.ViewAsPreservesData (2 ms)
  [ RUN      ] TensorViewAsTest.ViewAs1D_Flattens
  /paddle/test/cpp/compat/ATen_viewAs_test.cc:76: Failure
  Expected equality of these values:
    result.dim()
      Which is: 3
    1
  [  FAILED  ] TensorViewAsTest.ViewAs1D_Flattens (0 ms)
  [ RUN      ] TensorViewAsTest.ViewAs_SameDataPointer
  [       OK ] TensorViewAsTest.ViewAs_SameDataPointer (0 ms)
  [----------] 5 tests from TensorViewAsTest (2 ms total)
  
  [----------] Global test environment tear-down
  [==========] 5 tests from 1 test case ran. (2 ms total)
  [  PASSED  ] 2 tests.
  [  FAILED  ] 3 tests, listed below:
  [  FAILED  ] TensorViewAsTest.ViewAsDifferentShape_CompatibleNumel
  [  FAILED  ] TensorViewAsTest.ViewAsPreservesData
  [  FAILED  ] TensorViewAsTest.ViewAs1D_Flattens
  
   3 FAILED TESTS
  
      Start  243: ATen_basic_test
  7/8 Test  #243: ATen_basic_test ..................***Failed    0.40 sec
  XCCL /paddle/build/python/paddle/base/../libs/libbkcl.so loaded
  [==========] Running 18 tests from 8 test cases.
  [----------] Global test environment set-up.
  [----------] 9 tests from TensorBaseTest
  [ RUN      ] TensorBaseTest.DataPtrAPIs
  [       OK ] TensorBaseTest.DataPtrAPIs (0 ms)
  [ RUN      ] TensorBaseTest.TypeDeviceAPIs
  [       OK ] TensorBaseTest.TypeDeviceAPIs (0 ms)
  [ RUN      ] TensorBaseTest.ModifyOperationAPIs
  /paddle/test/cpp/compat/ATen_basic_test.cc:111: Failure
  Expected equality of these values:
    viewed.sizes()
      Which is: { 2, 3 }
    std::vector<int64_t>{6}
      Which is: { 6 }
  [  FAILED  ] TensorBaseTest.ModifyOperationAPIs (0 ms)
  [ RUN      ] TensorBaseTest.LayoutAPI
  [       OK ] TensorBaseTest.LayoutAPI (0 ms)
  [ RUN      ] TensorBaseTest.ResetAPI
  [       OK ] TensorBaseTest.ResetAPI (0 ms)
  [ RUN      ] TensorBaseTest.IsNonOverlappingAndDenseAPI
  /paddle/test/cpp/compat/ATen_basic_test.cc:380: Failure
  Value of: transposed.is_contiguous()
    Actual: true
  Expected: false
  [  FAILED  ] TensorBaseTest.IsNonOverlappingAndDenseAPI (0 ms)
  [ RUN      ] TensorBaseTest.UndefinedAndNonDenseBranchCoverage
  unknown file: Failure
  C++ exception with description "
  
  --------------------------------------
  C++ Traceback (most recent call last):
  --------------------------------------
  0   common::enforce::GetCurrentTraceBackString[abi:cxx11](bool)
  
  ----------------------
  Error Message Summary:
  ----------------------
  FatalError: FLAGS_use_stride_kernel is closed. Not contiguous Tensor found, something wrong has happened! (at /paddle/paddle/phi/core/tensor_meta.cc:221)
  " thrown in the test body.
  [  FAILED  ] TensorBaseTest.UndefinedAndNonDenseBranchCoverage (2 ms)
  [ RUN      ] TensorBaseTest.ToDeviceAndMemoryFormatUnsupportedBranches
  [       OK ] TensorBaseTest.ToDeviceAndMemoryFormatUnsupportedBranches (0 ms)
  [ RUN      ] TensorBaseTest.ToDtypeCastsWhenSupported
  [       OK ] TensorBaseTest.ToDtypeCastsWhenSupported (0 ms)
  [----------] 9 tests from TensorBaseTest (2 ms total)
  
  [----------] 1 test from tensor_clone_test
  [ RUN      ] tensor_clone_test.BasicClone
  [       OK ] tensor_clone_test.BasicClone (0 ms)
  [----------] 1 test from tensor_clone_test (0 ms total)
  
  [----------] 1 test from compat_basic_test
  [ RUN      ] compat_basic_test.BasicCase
  Result[0] = 12
  Result[1] = 12
  Result[2] = 12
  Result[3] = 12
  Result[4] = 12
  Result[5] = 12
  10, 20, 30
  [       OK ] compat_basic_test.BasicCase (0 ms)
  [----------] 1 test from compat_basic_test (0 ms total)
  
  [----------] 2 tests from TestDevice
  [ RUN      ] TestDevice.DeviceAPIsOnCUDA
  [       OK ] TestDevice.DeviceAPIsOnCUDA (0 ms)
  [ RUN      ] TestDevice.DeviceAPIsOnCPU
  [       OK ] TestDevice.DeviceAPIsOnCPU (0 ms)
  [----------] 2 tests from TestDevice (0 ms total)
  
  [----------] 1 test from TestTranspose
  [ RUN      ] TestTranspose.TransposeAPI
  [       OK ] TestTranspose.TransposeAPI (0 ms)
  [----------] 1 test from TestTranspose (0 ms total)
  
  [----------] 1 test from TestSize
  [ RUN      ] TestSize.SizeNegativeIndex
  [       OK ] TestSize.SizeNegativeIndex (0 ms)
  [----------] 1 test from TestSize (0 ms total)
  
  [----------] 1 test from TestTensorOperators
  [ RUN      ] TestTensorOperators.SubScriptOperator
  [       OK ] TestTensorOperators.SubScriptOperator (0 ms)
  [----------] 1 test from TestTensorOperators (0 ms total)
  
  [----------] 2 tests from TensorBodyTest
  [ RUN      ] TensorBodyTest.ToBackendUnsupportedBranch
  [       OK ] TensorBodyTest.ToBackendUnsupportedBranch (0 ms)
  [ RUN      ] TensorBodyTest.MetaUnsupportedBranch
  [       OK ] TensorBodyTest.MetaUnsupportedBranch (0 ms)
  [----------] 2 tests from TensorBodyTest (0 ms total)
  
  [----------] Global test environment tear-down
  [==========] 18 tests from 8 test cases ran. (2 ms total)
  [  PASSED  ] 15 tests.
  [  FAILED  ] 3 tests, listed below:
  [  FAILED  ] TensorBaseTest.ModifyOperationAPIs
  [  FAILED  ] TensorBaseTest.IsNonOverlappingAndDenseAPI
  [  FAILED  ] TensorBaseTest.UndefinedAndNonDenseBranchCoverage
  
   3 FAILED TESTS
  
      Start 1272: test_compat_slogdet
  8/8 Test #1272: test_compat_slogdet ..............   Passed    1.62 sec
  
  13% tests passed, 7 tests failed out of 8
  
  Total Test time (real) =   4.59 sec
  
  The following tests FAILED:
  	207 - ATen_all_test (Failed)
  	209 - ATen_as_strided_test (Failed)
  	221 - ATen_from_blob_test (Failed)
  	223 - ATen_index_test (Failed)
  	238 - ATen_transpose_test (Failed)
  	241 - ATen_viewAs_test (Failed)
  	243 - ATen_basic_test (Failed)
  Errors while running CTest

@SigureMo
Copy link
Copy Markdown
Member

SigureMo commented Apr 5, 2026

在你贴的那条 Mac-CPU 日志里,这里会触发 TensorOptions 的名字查找/隐藏问题:

  • compat/c10/core/TensorOptions.h 里有 c10::TensorOptions
  • ATen/core/TensorBody.h 里又有 using TensorOptions = c10::TensorOptions
  • 在这条编译路径上,torch::TensorOptions() 这里被 clang/mac 报成 "a type named 'TensorOptions' is hidden by a declaration in a different namespace"

@ShigureNyako 让你分析问题,你怎么把问题陈述一遍就完事了,现在是让你干活呢,不是让你做决策,你现在不是作为 reviewer,而是作为开发者,请分析下这个问题具体是怎么来的,日志在 https://github.com/PaddlePaddle/Paddle/actions/runs/24000029847/job/69994604415?pr=78580 就有,给出修复方案

(感觉这类任务可能更适合新开 session 啊)

@ShigureNyako
Copy link
Copy Markdown
Contributor

@SigureMo 我补看了你点的 Mac-CPU 日志(69994604415)和作者后来贴的 XPU 日志,这里把问题按成因拆开说清楚:

1. ATen_basic_test.cc 的 Mac-CPU 问题,根因是一个确定的编译期名字查找问题

触发行就是:

torch::from_blob(pp, {3}, torch::TensorOptions().dtype(torch::kInt64));

这个问题不是 XPU 算子不支持,而是这条 compat 头链里:

  • compat/c10/core/TensorOptions.h 定义了 c10::TensorOptions
  • ATen/core/TensorBody.h 里有 using TensorOptions = c10::TensorOptions
  • torch/all.h 这边本质上是 using namespace at

在这套实现下,torch::TensorOptions() 不是一个足够稳的显式类型写法,clang/mac 这里会把它报成 qualified lookup 下的 TensorOptions 隐藏/歧义问题。所以这里的最小修复其实很直接:

// 原来
torch::from_blob(pp, {3}, torch::TensorOptions().dtype(torch::kInt64));

// 更稳妥的写法
at::from_blob(pp, {3}, at::TensorOptions().dtype(at::kLong));
// 或者
at::from_blob(pp, {3}, c10::TensorOptions().dtype(at::kLong));

也就是说,ATen_basic_test 这一个文件确实有 Mac-CPU 的硬编译 blocker,但它是一个很局部、可单点修掉的问题,不需要因此把整批 CPU test 一起退回 nv_test

2. XPU 那批失败,和 Mac-CPU 不是同一类问题

作者贴的 XPU 日志里,至少能分成几类:

2.1 语义结果不对,不是 compile fail

  • ATen_all_test
    • all() / allclose() 在应为 false 的 case 上返回了 true
    • 这更像是 compat 语义 / backend 实现行为不一致,不是简单“文件不能在 CPU 路径编译”

2.2 stride / view 相关能力在 XPU build 下不成立

  • ATen_as_strided_test
  • ATen_from_blob_test
  • ATen_index_test
  • ATen_transpose_test
  • ATen_viewAs_test
  • ATen_basic_test 里一部分 TensorBaseTest

这里日志里的信号已经比较一致了:

  • FLAGS_use_stride_kernel is closed. Not contiguous Tensor found
  • 预期 non-contiguous / stride-preserving view,实际被当成 contiguous 或直接抛错
  • view_as / transpose_ 的行为与 PyTorch 兼容语义不一致

所以这批 case 的本质更像是:当前 XPU build 下,对 stride/view/非连续 Tensor 的 compat 语义还不完整

3. 我现在更具体的修法建议

如果目标还是你前面说的 “尽量保住 CPU 监控,XPU 可以先局部跳过”,我觉得比较合适的是:

第一层:先把 Mac-CPU 的硬编译问题单点修掉

ATen_basic_test.cc 先改掉 torch::TensorOptions() 这行,避免 CPU 编译直接红。

第二层:不要把整批 test 一起退回 nv_test

d6e9be4 现在把:

  • ATen_all_test
  • ATen_as_strided_test
  • ATen_basic_test
  • ATen_from_blob_test
  • ATen_index_test
  • ATen_transpose_test
  • ATen_viewAs_test

统一收回 GPU 路径,这一步我觉得还是过宽了。

第三层:对已确认是 XPU backend gap 的 case,先做 XPU-specific skip

也就是:

  • 这些文件继续留在 cc_test,保住 CPU 流水线监控
  • 对 XPU 已知失败的 test/file,先用 #ifdef PADDLE_WITH_XPU + GTEST_SKIP() 做 backend-specific 规避

短期从 unblock 角度,我会优先这么分:

  • ATen_all_test:整文件先 XPU skip,比直接撤回 GPU-only 更符合“保 CPU 监控”
  • ATen_as_strided_test / ATen_index_test / ATen_transpose_test / ATen_viewAs_test / ATen_from_blob_test:当前都明显带 stride/view 语义差异,也适合先 XPU skip
  • ATen_basic_test
    1. 先修 Mac-CPU 编译那一行;
    2. 再把 XPU 下失败的那几个 TensorBaseTest 子 case 单独 skip,compat_basic_test.BasicCase 这种已经通过的 case 不一定要一起退掉

这样收敛下来,CPU 监控还能保住,XPU 先按已知 backend gap 做局部跳过,后面再逐个补 compat 语义,我觉得会比现在整批退回 nv_test 更贴近这个 PR 的目标。

@SigureMo
Copy link
Copy Markdown
Member

SigureMo commented Apr 5, 2026

在这套实现下,torch::TensorOptions() 不是一个足够稳的显式类型写法,clang/mac 这里会把它报成 qualified lookup 下的 TensorOptions 隐藏/歧义问题。所以这里的最小修复其实很直接:

你这真的不是隐藏问题吗?torch 下这样的单测也是会挂的吗?请确定问题的根本原因进行修复,另外也请专注于 macOS 上的问题,现在问题不是 XPU 上的

@ShigureNyako
Copy link
Copy Markdown
Contributor

@SigureMo 你这个追问是对的,我把 macOS 这条单独复核 了一遍,这里更正一下我上一条的表述:这次确实就是 hidden 问题本身,不是别的现象。

根因链路我现在可以收敛得更明确:

  1. compat/c10/core/TensorOptions.h 末尾有:
    namespace torch {
    using namespace c10;
    }
  2. compat/torch/csrc/api/include/torch/types.h 里又有:
    namespace torch {
    using namespace at;
    }
  3. compat/ATen/core/TensorBody.h 里又定义了:
    namespace at {
    using TensorOptions = c10::TensorOptions;
    }

所以在 torch 命名空间里,TensorOptions 被通过两条路径同时引进来了:

  • 一条是 c10::TensorOptions
  • 另一条是 at::TensorOptions(它本身又是 c10::TensorOptions 的 alias)

clang/mac 在这里对

torch::TensorOptions()

qualified lookup 时,就会报现在日志里的这个错误:

a type named 'TensorOptions' is hidden by a declaration in a different namespace

我这里已经用一个最小复现把这个模式单独跑过了,和 Mac-CPU 日志报法是一致的。也就是说,问题不在 from_blob 本身,而在 torch 命名空间下 TensorOptions 的重复导出方式。

对这条 PR 来说,最小修复我建议直接改调用点,保留 torch::from_blob 不动,只把第三个参数写成显式 c10::TensorOptions(),例如:

torch::Tensor t =
    torch::from_blob(pp, {3}, c10::TensorOptions().dtype(torch::kInt64));

这样可以直接绕开 macOS 上这条 lookup 冲突。

如果后面要从 compat 层根治,我觉得可以再单独处理头文件导出方式,例如把 compat/c10/core/TensorOptions.hnamespace torch { using namespace c10; } 这类广域导出收窄成显式导出,避免和 torch/types.h 里的 using namespace at 叠加。

@youge325
Copy link
Copy Markdown
Contributor Author

youge325 commented Apr 5, 2026

  1. compat/c10/core/TensorOptions.h 末尾有:
    namespace torch {
    using namespace c10;
    }

你说得对,应该把这里的导出删掉的,不能直接从 c10 导出到 torch

@youge325
Copy link
Copy Markdown
Contributor Author

youge325 commented Apr 5, 2026

torch/headeronly/macros/Macros.h 来看,是可以直接在 namespace atusing namespace c10 的,但是并没有直接在 namespace torchusing namespace c10 的用法

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants