-
Notifications
You must be signed in to change notification settings - Fork 225
Add peer access control for DeviceMemoryResource #1289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
bd48cf9 to
634f931
Compare
|
/ok to test 84378f4 |
|
| from .._device import Device | ||
|
|
||
| # Convert all devices to device IDs | ||
| cdef set target_ids = {Device(dev).device_id for dev in devices} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I should filter this set by cudaDeviceCanAccessPeer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in the latest. I added a check that raises an error if a non-accessible device is added to the list. Unfortunately, I don't see a good way to test it.
| bint _mempool_owned | ||
| IPCData _ipc_data | ||
| object _attributes | ||
| object _peer_accessible_by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will address peer access with IPC memory pools in a follow-up change. The peer access attributes are not inherited when an allocation is sent to another process via IPC, but access can be set. It will require a new test and possibly a small code change.
|
/ok to test b155df7 |
Description
closes #479
Add peer access control for DeviceMemoryResource
This PR implements peer access control for
DeviceMemoryResource, allowing memory pool allocations to be accessed from devices other than the owner device.Overview
Memory pools created by
DeviceMemoryResourcecan now grant peer access permissions to other GPUs, enabling multi-GPU workflows where allocations on one device need to be accessed by kernels running on another device.Key Changes
1. New
peer_accessible_bypropertyExample usage:
2. Implementation details
cuMemPoolSetAccessto modify peer access permissions at runtimeCUmemAccessDescstructures to the driver APICU_MEM_ACCESS_FLAGS_PROT_READWRITE) and revoking (CU_MEM_ACCESS_FLAGS_PROT_NONE) access in a single transaction3. Debugging support
DMR_mempool_get_access()helper function to probe actual access permissions viacuMemPoolGetAccess4. Workaround for driver bug (nvbug 5698116)
__dealloc__to reset peer access before destroying memory poolsTechnical Notes
malloc/freeimports fromlibc.stdlibfor C array allocationTesting
This feature is designed to work with the existing multi-GPU test suite and enables peer access scenarios for buffer copy operations between devices.