GitHub - Ghiora/TritonLearning: Comments I added when going through the Triton GPU Tutorial code. (using Claud/AI)

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Comprehensive_Transformer_Architecture_Guide.md		Comprehensive_Transformer_Architecture_Guide.md
README		README
Triton-01-vectoraddCUDA.py		Triton-01-vectoraddCUDA.py
src_vocab.json		src_vocab.json
tgt_vocab.json		tgt_vocab.json
train.de		train.de
train.en		train.en
transformer_translation_triton_annotated.py		transformer_translation_triton_annotated.py

Repository files navigation

Use the same instructions as in 
  ../transformer_translation_python/HowToTrain.README


python transformer_translation_triton.py
python transformer_translation_triton.py --train train.en train.de


RTX 4090 optimizations:

    Block sizes tuned for SM89 architecture (BLOCK_M=64, BLOCK_N=64)
    Memory-efficient attention avoids materializing full attention matrix
    Fused operations reduce global memory traffic