CUTLASS Notes

The CUTLASS notes series will begin with a minimal GEMM implementation, gradually expand to incorporate CuTe and various CUTLASS components, as well as features of new architectures, e.g. Hopper and Blackwell, ultimately achieving a high-performance fused GEMM operator.

Usage

git clone https://github.com/ArthurinRUC/cutlass-notes.git
# clone cutlass
cd cutlass-notes
git submodule update --init --recursive

Run sample code

All example code in this GitHub repository can be compiled and run by simply executing the Python script. For example:

cd 01-minimal-gemm
python minimal_gemm.py

Note list

Notes	Summary	Links
00-Intro	Brief introduction to CUTLASS	intro
01-minimal-gemm	Introduces CuTe fundamentals Implements 16x8x8 GEMM kernel using single MMA instruction from scratch Python kernel invocation, precision validation & performance benchmarking Profiling with Nsight Compute (ncu)	minimal-gemm
02-mixed-precision-gemm	Implements mixed-precision GEMM supporting varying input/output/accumulation precisions Explores technical details for numerical precision conversion within kernels Demonstrates custom FP8 GEMM kernel implementation via PTX instructions (for CUTLASS-unsupported MMA ops)	mixed-precision-gemm
03-tiled-mma	Introduces the key conceptual model of GEMM operator: Three-Level Tiling Details the implementation of Tiled MMA operations in CUTLASS CuTe Explains the usage and semantics of various parameters in the Tiled MMA API Extends the GEMM kernel from single instruction to single tile operation	tiled-mma
04-tiled-copy	Coming soon	Stay tuned

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
01-minimal-gemm		01-minimal-gemm
02-mixed-precision-gemm		02-mixed-precision-gemm
03-tiled-mma		03-tiled-mma
04-tiled-copy		04-tiled-copy
05-block-mma		05-block-mma
06-block-copy		06-block-copy
07-swizzling		07-swizzling
08-dynamic-mma		08-dynamic-mma
09-pipelining		09-pipelining
10-gemm-api		10-gemm-api
11-tma-load-store		11-tma-load-store
12-tma-multicast-reduce		12-tma-multicast-reduce
13-warpgroup-mma		13-warpgroup-mma
14-warp-specialization		14-warp-specialization
third-party		third-party
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUTLASS Notes

Usage

Run sample code

Note list

License

About

Uh oh!

Releases

Packages

Languages

License

ipengx1029/cutlass-notes

Folders and files

Latest commit

History

Repository files navigation

CUTLASS Notes

Usage

Run sample code

Note list

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages