Ghiora/TritonLearning
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
Use the same instructions as in
../transformer_translation_python/HowToTrain.README
python transformer_translation_triton.py
python transformer_translation_triton.py --train train.en train.de
RTX 4090 optimizations:
Block sizes tuned for SM89 architecture (BLOCK_M=64, BLOCK_N=64)
Memory-efficient attention avoids materializing full attention matrix
Fused operations reduce global memory traffic