-
Notifications
You must be signed in to change notification settings - Fork 228
Labels
P1Medium priority - Should doMedium priority - Should docuda.coreEverything related to the cuda.core moduleEverything related to the cuda.core modulefeatureNew feature or requestNew feature or request
Milestone
Description
Initializing a TMA descriptor through the driver APIs
https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TENSOR__MEMORY.html
is really tedious and error prone. We need a way to abstract it out, which aligns well with the mission of cuda.core. This also allows JIT compilers to easier consume and incorporate into the compilation pipelines.
In my understanding there are two (implicit?) requirements for this to be useful:
- Creating/initializing a TMA object on host
- Passing the object to the
cuda.core.launch()API as a kernel arg
Sub-issues
Metadata
Metadata
Assignees
Labels
P1Medium priority - Should doMedium priority - Should docuda.coreEverything related to the cuda.core moduleEverything related to the cuda.core modulefeatureNew feature or requestNew feature or request