Skip to content

Conversation

@cpcloud
Copy link
Contributor

@cpcloud cpcloud commented Nov 19, 2025

This PR enables testing cufile on a larger variety of filesystems where the root filesystem (/) might have a different mount point than the mount point of the checkout directory, by looking at all of /proc/mounts instead of only the first matching mount point.

@cpcloud cpcloud requested review from leofang and mdboom November 19, 2025 19:07
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Nov 19, 2025

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cpcloud cpcloud changed the title fix incorrect cufile skipping logic test: fix overly restrictive cufile skipping logic Nov 19, 2025
@cpcloud
Copy link
Contributor Author

cpcloud commented Nov 19, 2025

/ok to test

@github-actions
Copy link

@cpcloud
Copy link
Contributor Author

cpcloud commented Nov 19, 2025

/ok to test

@cpcloud
Copy link
Contributor Author

cpcloud commented Nov 19, 2025

/ok to test

@cpcloud
Copy link
Contributor Author

cpcloud commented Nov 19, 2025

Turns out these tests are never run on main because they are skipped. See https://github.com/NVIDIA/cuda-python/actions/runs/19510167362/job/55859660002 and search for cuda.bindings, you'll see this:

image

@cpcloud
Copy link
Contributor Author

cpcloud commented Nov 19, 2025

The logic I have here is I believe correct, but I'm not sure if these tests are just broken, or this is a latent bug that we've been living with for a while.

@rwgk
Copy link
Collaborator

rwgk commented Nov 19, 2025

Turns out these tests are never run on main because they are skipped.

Tangential:

Yeah, I ran into that twice with the ctk-next testing. SWQA was the first to run in environments where they are not skipped, but I didn't think of checking first if they actually ran in the cuda-python CI and started debugging painstakingly assuming they did.

It'd be awesome to have some sort of automatic flagging, after all jobs finish, to report which tests

  • are skipped everywhere
  • are always skipped on linux-64, linux-aarch64, win-64

@leofang
Copy link
Member

leofang commented Nov 19, 2025

  1. For cuFILE related changes, let us always rope in the GDS team for consultation (cc @sourabgupta3).
  2. Currently lots of things cannot be run in the CI ([FEA]: Improve CI coverage of cuFile #1143), not even in cuFILE's compat mode IIUC.

@cpcloud
Copy link
Contributor Author

cpcloud commented Nov 19, 2025

It's really an issue of incorrect skipping logic, not whether a thing runs in CI. This should have been running, because the runners have a compatible file system.

This wasn't running, but it was unintentionally not running.

@cpcloud
Copy link
Contributor Author

cpcloud commented Nov 19, 2025

Given all the failures happen on 13.0.2, and the 12.9.x tests all pass, that seems like a good place to start digging.

@cpcloud
Copy link
Contributor Author

cpcloud commented Nov 19, 2025

Ah, well those are not running the cuda.bindings tests, so that wouldn't explain the failure.

@cpcloud
Copy link
Contributor Author

cpcloud commented Nov 19, 2025

/ok to test

@cpcloud
Copy link
Contributor Author

cpcloud commented Nov 19, 2025

/ok to test

@leofang
Copy link
Member

leofang commented Nov 19, 2025

Let's give Sourab some time. I pinged him offline and he's looking into it.

@leofang
Copy link
Member

leofang commented Nov 19, 2025

Let's give Sourab some time. I pinged him offline and he's looking into it.

Ping'd you both in an internal thread.

@leofang leofang added bug Something isn't working triage Needs the team's attention test Improvements or additions to tests cuda.bindings Everything related to the cuda.bindings module labels Nov 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cuda.bindings Everything related to the cuda.bindings module test Improvements or additions to tests triage Needs the team's attention

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants