A header only lib containing SOTA flash attention kernels, mega kernels, and full model serving. Built from scratch, documenting every step along the way.
So far all code has only been tested on systems with CUDA >= 12.8 and Ubuntu 22.04.
git clone https://github.com/govindansriram/CobraML2.git
cd CobraML2
sudo chmod +x ./runner.shYou can now build the executables by running:
./runner.shAnd run them using:
./runner.sh -r exe_name...
- MHA
- Iter 1: 287.925 GFLOPs
- Flash Attention 1
- Iter 1: 6776.64 GFLOPs
- Flash Attention 2
- Flash Attention 3
- Matmul
...
All files must be formatted to follow the style specified by .clang-format.
Ensure clang-format is installed by running clang-format --version.
Formatting can be applied to all files by running:
./runner.sh -fOr to a single file:
./runner.