-
Notifications
You must be signed in to change notification settings - Fork 225
Description
Today, LaunchConfig only supports cuLaunchKernel driver API to launch kernels on a single GPU. When extending to broader usecases where there is a need for inter-SM synchronization or multi-GPU synchronization, one would need to use cuLaunchCooperativeKernel to launch kernels safely in a deadlock-free manner. To support this, one could extend LaunchConfig(..., launch_attr=None) with an optional launch_attr that could set equivalent cuda-python data-type for CUlaunchAttribute.
Background:
This issue came out of discussion: NVIDIA/numba-cuda#128 (comment) where existing implementation of cuda driver bindings in numba-cuda uses cuLaunchCooperativeKernel or cuLaunchKernel based on the existence of grid.sync() in the kernel and in the effort to migrate it to cuda.core, one would need to provide the capability to select launch kernel API variant at runtime based on the LaunchConfig.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status