Skip to content

Conversation

@Andy-Jost
Copy link
Contributor

Description

closes #479

Add peer access control for DeviceMemoryResource

This PR implements peer access control for DeviceMemoryResource, allowing memory pool allocations to be accessed from devices other than the owner device.

Overview

Memory pools created by DeviceMemoryResource can now grant peer access permissions to other GPUs, enabling multi-GPU workflows where allocations on one device need to be accessed by kernels running on another device.

Key Changes

1. New peer_accessible_by property

  • Get/set which devices can access allocations from a memory pool
  • Returns a tuple of sorted device IDs that currently have peer access
  • Setter accepts a sequence of Device objects or device IDs
  • Automatically excludes the owner device from the peer access list
  • Uses incremental updates (only changes permissions for devices being added/removed)

Example usage:

dmr = DeviceMemoryResource(0)
dmr.peer_accessible_by = [1, 2]  # Grant access to devices 1 and 2
assert dmr.peer_accessible_by == (1, 2)
dmr.peer_accessible_by = []      # Revoke all peer access

2. Implementation details

  • Uses cuMemPoolSetAccess to modify peer access permissions at runtime
  • Properly allocates C arrays for passing CUmemAccessDesc structures to the driver API
  • Memory-safe with try/finally blocks ensuring allocated memory is freed
  • Supports both granting (CU_MEM_ACCESS_FLAGS_PROT_READWRITE) and revoking (CU_MEM_ACCESS_FLAGS_PROT_NONE) access in a single transaction

3. Debugging support

  • Added DMR_mempool_get_access() helper function to probe actual access permissions via cuMemPoolGetAccess
  • Useful for debugging and verifying expected access states

4. Workaround for driver bug (nvbug 5698116)

  • Added cleanup in __dealloc__ to reset peer access before destroying memory pools
  • Works around an issue where recycled memory pool handles inherit peer access state from previous handles

Technical Notes

  • Required adding malloc/free imports from libc.stdlib for C array allocation
  • The implementation properly handles the conversion from Python collections to C structures required by the low-level CUDA driver API
  • No trailing whitespace in source files (adheres to project code style)

Testing

This feature is designed to work with the existing multi-GPU test suite and enables peer access scenarios for buffer copy operations between devices.

@Andy-Jost Andy-Jost self-assigned this Nov 26, 2025
@Andy-Jost Andy-Jost added P0 High priority - Must do! feature New feature or request cuda.core Everything related to the cuda.core module labels Nov 26, 2025
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Nov 26, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Andy-Jost
Copy link
Contributor Author

/ok to test 84378f4

@github-actions
Copy link

@leofang leofang added this to the cuda.core beta 10 milestone Nov 27, 2025
from .._device import Device

# Convert all devices to device IDs
cdef set target_ids = {Device(dev).device_id for dev in devices}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I should filter this set by cudaDeviceCanAccessPeer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the latest. I added a check that raises an error if a non-accessible device is added to the list. Unfortunately, I don't see a good way to test it.

bint _mempool_owned
IPCData _ipc_data
object _attributes
object _peer_accessible_by
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will address peer access with IPC memory pools in a follow-up change. The peer access attributes are not inherited when an allocation is sent to another process via IPC, but access can be set. It will require a new test and possibly a small code change.

@Andy-Jost
Copy link
Contributor Author

/ok to test b155df7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module feature New feature or request P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add peer access support

3 participants