CV-CUDA Kernels Inherit torch.cuda.current_stream() #9308

justincdavis · 2025-12-09T20:04:00Z

Summary

CV-CUDA has its own default stream which it will use to execute its kernels. This behavior is fine when using the Compose transform API with explicit conversion at the start and end or when using F.cvcuda_to_tensor and F.tensor_to_cvcuda. This is because CV-CUDA will synchronize its own stream when sharing with external memory. However, there are a few edge cases which I believe give us motivation to have CV-CUDA share the PyTorch current CUDA stream.

When using torch.cuda.synchronize() with cvcuda.Tensor in the functional API, there is no work to synchronize on for PyTorch, since the work has been queued on a different stream.
If a user specifies a specific CUDA stream with with torch.cuda.current_stream() or similar call, the work will get scheduled in a separate stream. In certain scenarios this could result in degraded performance via context switching and in general is non-intuitive behavior.
Should a user want to synchronize while using the functional API and CV-CUDA backend, they would have to use the cvcuda.Stream.current.sync() call which introduces unneeded complexity/library-mixing in user code.

I propose we implement a decorator/wrapper function which will handle assignment of the current CV-CUDA stream based on the current torch.cuda stream. This allows the behavior of CV-CUDA kernels in TorchVision to function much closer variants with PyTorch tensors.

Implementation

def _cvcuda_shared_stream(fn: Callable[P, R]) -> Callable[P, R]:
    # import cvcuda once during function wrapping time
    cvcuda = _import_cvcuda()

    @functools.wraps(fn)
    def wrapper(*args: P.args, **kwargs: P.kwargs) -> R:
        # get the current torch cuda stream during function call time
        stream = torch.cuda.current_stream()

        # cvcuda.Stream supports context managers to assign the threadlocal current stream
        with cvcuda.as_stream(stream):
            # will call the cvcuda operator, which will use the current stream by default
            # since this is wrapped with a cvcuda.Stream context manager, it will use that stream
            result = fn(*args, **kwargs)

        return result

    return wrapper

Example of wrapping the existing vertical_flip kernel for CV-CUDA:

def _vertical_flip_image_cvcuda(image: "cvcuda.Tensor") -> "cvcuda.Tensor":
    return _import_cvcuda().flip(image, flipCode=0)


if CVCUDA_AVAILABLE:
    _register_kernel_internal(vertical_flip, _import_cvcuda().Tensor)(
        _cvcuda_shared_stream(_vertical_flip_image_cvcuda)
    )

Testing

As of right now, there is no testing strategy for this change in place. The naive approach would be to assert that the CV-CUDA kernels do not block without this behavior, and blocks with this behavior (via the higher-level functional version) with torch.cuda.synchronize(). An alternative could potentially use torch.cuda.Event

Feedback

I would love to get feedback on whether this change should be pursued and the testing strategy if this is behavior the team wants in TorchVision.

… setup

pytorch-bot · 2025-12-09T20:04:03Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9308

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

NicolasHug · 2025-12-10T13:45:34Z

Thanks for the PR @justincdavis and for bringing this up. I'll have to think more, but this seems reasonable from a quick look.

Re testing, does CVCUDA expose an API to get the current stream it's working on, something like https://docs.pytorch.org/docs/stable/generated/torch.cuda.current_stream.html ? If it does, maybe a small test like this one would be enough

new_stream = torch.cuda.stream(None)

def assert_cvcuda_is_using_torch_stream():
    assert cvcuda.current_stream() == new_stream

with torch.cuda.stream(new_stream):
    _cvcuda_shared_stream(assert_cvcuda_is_using_torch_stream)()

justincdavis · 2025-12-10T16:51:56Z

Hi @NicolasHug CVCUDA does expose this behavior! I added a simple positive/negative test to check the handles of the two streams.

justincdavis added 15 commits November 25, 2025 09:14

implement additional cvcuda infra for all branches to avoid duplicate…

44db71c

… setup

update make_image_cvcuda to have default batch dim

e3dd700

add stanardized setup to main for easier updating of PRs and branches

c035df1

update is_cvcuda_tensor

98d7dfb

add cvcuda to pil compatible to transforms by default

ddc116d

remove cvcuda from transform class

e51dc7e

merge with main

e14e210

resolve more formatting naming

4939355

update is cvcuda tensor impl

fbea584

Merge remote-tracking branch 'upstream/main'

f2fd457

remove bad import from test file on main

245a47f

add cvcuda torch shared stream wrapper

96139db

minimize diff

79c7b36

minimize diff

9f07585

rename func

56823ab

meta-cla bot added the cla signed label Dec 9, 2025

add some basic tests for checking the stream is reassigned

cd83cc8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CV-CUDA Kernels Inherit torch.cuda.current_stream() #9308

CV-CUDA Kernels Inherit torch.cuda.current_stream() #9308

Uh oh!

justincdavis commented Dec 9, 2025

Uh oh!

pytorch-bot bot commented Dec 9, 2025

Uh oh!

NicolasHug commented Dec 10, 2025 •

edited

Loading

Uh oh!

justincdavis commented Dec 10, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CV-CUDA Kernels Inherit torch.cuda.current_stream() #9308

Are you sure you want to change the base?

CV-CUDA Kernels Inherit torch.cuda.current_stream() #9308

Uh oh!

Conversation

justincdavis commented Dec 9, 2025

Summary

Implementation

Testing

Feedback

Uh oh!

pytorch-bot bot commented Dec 9, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9308

Uh oh!

NicolasHug commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

justincdavis commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NicolasHug commented Dec 10, 2025 •

edited

Loading

justincdavis commented Dec 10, 2025 •

edited

Loading