Minimal Transformer Language Model

About

A minimal decoder-only Transformer language model trained from scratch to generate Shakespeare-like text. The model is implemented in PyTorch and trained at the character level.

Architecture: Decoder-only Transformer (GPT-style)
Parameters: ~10M
Training Hardware: NVIDIA A100 GPU
Dataset: Shakespeare text
Framework: PyTorch

The model captures Shakespearean style, vocabulary, and dialogue structure, but due to its small size and character-level training, the generated text often lacks long-range coherence and semantic consistency. Outputs resemble Shakespeare in form and tone, but may contain repetition, grammatical drift, or nonsensical passages at longer lengths.

The project focuses on clarity and correctness, implementing causal self-attention, multi-head attention, residual connections, and layer normalization.

Architecture

6 Transformer blocks
Multi-head self-attention with 6 heads per block
Embedding size: 384
Context length: 256 characters
Feed-forward expansion: 4× embedding size
Causal self-attention (autoregressive)

The model predicts the next character given a fixed-length context in a left-to-right, autoregressive manner.

Usage

The model is already trained. To test text generation, simply run:

python models/Char-Transformer/transformer-generate.py

This will load the pretrained checkpoint and generate Shakespeare-like text directly in the terminal.

Acknowledgements

This project follows Andrej Karpathy’s tutorial on building a GPT-style language model from scratch, where he explains the Transformer architecture, self-attention, and training process in a clear and intuitive way.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Dataset		Dataset
models		models
tokenizer		tokenizer
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Minimal Transformer Language Model

About

Architecture

Usage

Acknowledgements

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Minimal Transformer Language Model

About

Architecture

Usage

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages