softmax-kernels

GPU kernel optimization: Softmax

Usage

The following code was tested using the docker image: nvidia/cuda:12.4.0-devel-ubuntu22.04 on a Geforce RTX 2070

cd cuda
pip install .

python3 assertions.py

python3 benchmark.py

ncu --set full [-o output_path] python3 -O assertions.py