Skip to content

Conversation

@cpcloud
Copy link
Contributor

@cpcloud cpcloud commented Nov 19, 2025

Add pytest-randomly to cuda_bindings. Tests are randomized by default. pytest-randomly docs

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Nov 19, 2025

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cpcloud
Copy link
Contributor Author

cpcloud commented Nov 19, 2025

/ok to test

@cpcloud cpcloud requested review from kkraus14 and leofang and removed request for leofang November 19, 2025 14:44
@github-actions
Copy link

@rwgk
Copy link
Collaborator

rwgk commented Nov 19, 2025

Looks like this particular CI run uncovered 3 types of errors.

It seems to be — random — unsurprisingly I guess, e.g. https://github.com/NVIDIA/cuda-python/actions/runs/19505585558?pr=1268 happened to succeed, and one of the errors here was also triggered under https://github.com/NVIDIA/cuda-python/actions/runs/19505615062?pr=1269

At the initial stage introducing pytest-randomly, maybe it'll help us to start with a fixed seed, e.g.

export RANDOMLY_SEED=12345

?

@cpcloud
Copy link
Contributor Author

cpcloud commented Nov 19, 2025

At the initial stage introducing pytest-randomly, maybe it'll help us to start with a fixed seed, e.g.

In practice this leads to never changing that because we'll forget about it, so we should fix the errors and then let it continue to run in a random order. The idea is to be robust to the randomness, acknowledging that this can make things seem flaky.

While I don't like leaving PRs open, this one seems like it might need to be open to get to a point where it's worth merging.

@rwgk
Copy link
Collaborator

rwgk commented Nov 19, 2025

so we should fix the errors and then let it continue to run in a random order.

This could be exhausting, and we risk not getting this done in a reasonable timeframe.

It could also be distracting in unfortunate ways, e.g. around releases.

because we'll forget about it,

I think we should do this in a controlled way and create a bug to track the stages, roughly:

  • Get tests working with fixed random seed.
  • After working with that for about a month, change the random seed (but still fixed).
  • After no significant distractions for a month, remove the fixed random seed entirely.

@kkraus14
Copy link
Collaborator

@cpcloud there's an inherent level of statefulness in CUDA, i.e. driver initialization, context creation and setting it to be current, etc. This statefulness is also quite expensive where we absolutely can't afford to create and tear down contexts for each test for example. Is there a way we can randomize at the module or class level instead of the test level so that we can have more control over some assumptions and control some of these costs?

@leofang leofang added enhancement Any code-related improvements triage Needs the team's attention test Improvements or additions to tests cuda.bindings Everything related to the cuda.bindings module labels Nov 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.bindings Everything related to the cuda.bindings module enhancement Any code-related improvements test Improvements or additions to tests triage Needs the team's attention

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants