Add peer access control for DeviceMemoryResource #1289

Andy-Jost · 2025-11-26T21:05:20Z

Description

closes #479

Add peer access control for DeviceMemoryResource

This PR implements peer access control for DeviceMemoryResource, allowing memory pool allocations to be accessed from devices other than the owner device.

Overview

Memory pools created by DeviceMemoryResource can now grant peer access permissions to other GPUs, enabling multi-GPU workflows where allocations on one device need to be accessed by kernels running on another device.

Key Changes

1. New peer_accessible_by property

Get/set which devices can access allocations from a memory pool
Returns a tuple of sorted device IDs that currently have peer access
Setter accepts a sequence of Device objects or device IDs
Automatically excludes the owner device from the peer access list
Uses incremental updates (only changes permissions for devices being added/removed)

Example usage:

dmr = DeviceMemoryResource(0)
dmr.peer_accessible_by = [1, 2]  # Grant access to devices 1 and 2
assert dmr.peer_accessible_by == (1, 2)
dmr.peer_accessible_by = []      # Revoke all peer access

2. Implementation details

Uses cuMemPoolSetAccess to modify peer access permissions at runtime
Properly allocates C arrays for passing CUmemAccessDesc structures to the driver API
Memory-safe with try/finally blocks ensuring allocated memory is freed
Supports both granting (CU_MEM_ACCESS_FLAGS_PROT_READWRITE) and revoking (CU_MEM_ACCESS_FLAGS_PROT_NONE) access in a single transaction

3. Debugging support

Added DMR_mempool_get_access() helper function to probe actual access permissions via cuMemPoolGetAccess
Useful for debugging and verifying expected access states

4. Workaround for driver bug (nvbug 5698116)

Added cleanup in __dealloc__ to reset peer access before destroying memory pools
Works around an issue where recycled memory pool handles inherit peer access state from previous handles

Technical Notes

Required adding malloc/free imports from libc.stdlib for C array allocation
The implementation properly handles the conversion from Python collections to C structures required by the low-level CUDA driver API
No trailing whitespace in source files (adheres to project code style)

Testing

This feature is designed to work with the existing multi-GPU test suite and enables peer access scenarios for buffer copy operations between devices.

copy-pr-bot · 2025-11-26T21:05:23Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Andy-Jost · 2025-11-26T21:20:26Z

/ok to test 84378f4

github-actions · 2025-11-26T21:30:04Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-1289/
https://nvidia.github.io/cuda-python/pr-preview/pr-1289/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-1289/cuda-bindings/
https://nvidia.github.io/cuda-python/pr-preview/pr-1289/cuda-pathfinder/
Preview will be ready when the GitHub Pages deployment is complete.

cuda_core/cuda/core/experimental/_device.pyx

Andy-Jost · 2025-12-01T16:07:57Z

cuda_core/cuda/core/experimental/_memory/_device_memory_resource.pyx

+        from .._device import Device
+
+        # Convert all devices to device IDs
+        cdef set target_ids = {Device(dev).device_id for dev in devices}


I think I should filter this set by cudaDeviceCanAccessPeer

Done in the latest. I added a check that raises an error if a non-accessible device is added to the list. Unfortunately, I don't see a good way to test it.

Andy-Jost · 2025-12-01T16:15:04Z

cuda_core/cuda/core/experimental/_memory/_device_memory_resource.pxd

        bint                  _mempool_owned
        IPCData               _ipc_data
        object                _attributes
+        object                _peer_accessible_by


I will address peer access with IPC memory pools in a follow-up change. The peer access attributes are not inherited when an allocation is sent to another process via IPC, but access can be set. It will require a new test and possibly a small code change.

Andy-Jost · 2025-12-01T18:18:54Z

/ok to test b155df7

Ignore .cursorrules

eb774e7

Andy-Jost requested review from leofang and rparolin November 26, 2025 21:05

Andy-Jost self-assigned this Nov 26, 2025

Andy-Jost added P0 High priority - Must do! feature New feature or request cuda.core Everything related to the cuda.core module labels Nov 26, 2025

Implement DeviceMemoryResource.peer_accessible_by

634f931

Andy-Jost force-pushed the multi-gpu2 branch from bd48cf9 to 634f931 Compare November 26, 2025 21:17

Merge branch 'main' into multi-gpu2

84378f4

leofang reviewed Nov 27, 2025

View reviewed changes

cuda_core/cuda/core/experimental/_device.pyx Show resolved Hide resolved

leofang added this to the cuda.core beta 10 milestone Nov 27, 2025

Andy-Jost commented Dec 1, 2025

View reviewed changes

Andy-Jost added 2 commits December 1, 2025 10:17

Add a check for device accessibility in peer_accessible_by.

3f808fa

Merge branch 'main' into multi-gpu2

b155df7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add peer access control for DeviceMemoryResource #1289

Add peer access control for DeviceMemoryResource #1289

Andy-Jost commented Nov 26, 2025

Uh oh!

copy-pr-bot bot commented Nov 26, 2025

Uh oh!

Andy-Jost commented Nov 26, 2025

Uh oh!

github-actions bot commented Nov 26, 2025

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

Uh oh!

Andy-Jost Dec 1, 2025

Uh oh!

Andy-Jost Dec 1, 2025

Uh oh!

Andy-Jost Dec 1, 2025

Uh oh!

Andy-Jost commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add peer access control for DeviceMemoryResource #1289

Are you sure you want to change the base?

Add peer access control for DeviceMemoryResource #1289

Conversation

Andy-Jost commented Nov 26, 2025

Description