A minimal decoder-only Transformer language model trained from scratch to generate Shakespeare-like text. The model is implemented in PyTorch and trained at the character level.
- Architecture: Decoder-only Transformer (GPT-style)
- Parameters: ~10M
- Training Hardware: NVIDIA A100 GPU
- Dataset: Shakespeare text
- Framework: PyTorch
The model captures Shakespearean style, vocabulary, and dialogue structure, but due to its small size and character-level training, the generated text often lacks long-range coherence and semantic consistency. Outputs resemble Shakespeare in form and tone, but may contain repetition, grammatical drift, or nonsensical passages at longer lengths.
The project focuses on clarity and correctness, implementing causal self-attention, multi-head attention, residual connections, and layer normalization.
- 6 Transformer blocks
- Multi-head self-attention with 6 heads per block
- Embedding size: 384
- Context length: 256 characters
- Feed-forward expansion: 4× embedding size
- Causal self-attention (autoregressive)
The model predicts the next character given a fixed-length context in a left-to-right, autoregressive manner.
The model is already trained. To test text generation, simply run:
python models/Char-Transformer/transformer-generate.pyThis will load the pretrained checkpoint and generate Shakespeare-like text directly in the terminal.
This project follows Andrej Karpathy’s tutorial on building a GPT-style language model from scratch, where he explains the Transformer architecture, self-attention, and training process in a clear and intuitive way.