Skip to content

Latest commit

 

History

History
42 lines (26 loc) · 949 Bytes

File metadata and controls

42 lines (26 loc) · 949 Bytes

softmax-kernels

GPU kernel optimization: Softmax

Performance plot

Usage

The following code was tested using the docker image: nvidia/cuda:12.4.0-devel-ubuntu22.04 on a Geforce RTX 2070

  • Build Python library with CUDA bindings
cd cuda
pip install .
  • Test both implementations against Pytorch baseline
python3 assertions.py
  • Run a benchmark
python3 benchmark.py
  • Profile both implementations
ncu --set full [-o output_path] python3 -O assertions.py

Articles