🧠 Transformer Sort 🔢

Educational project for learning encoder–decoder Transformers through a simple algorithmic task. This small sequence-to-sequence Transformer is trained to map an unsorted integer sequence to sorted sequence.

⚡ This transformer model can be trained in a few minutes on a consumer-grade GPU. Let's give it a try!

🎯 Task

Sort integer sequences by using a Transformer.

Input:

[5, 2, 8, 1, 9]

Output:

[BOS, 1, 2, 5, 8, 9, EOS]

🏗️ Model and Data

Vanilla encoder–decoder Transformer (Pre-LN)
Learned token embeddings + sinusoidal positional encodings
Greedy decoding during evaluation

🔢 Data generation and tokenization

Integer sequences are generated synthetically at runtime
Token IDs are offset by NUM_SPECIAL_TOKENS = 3 to reserve:
- PAD = 0
- BOS = 1
- EOS = 2
Integer values are mapped to tokens >= 3

🔗 Sequence construction

For a sorted sequence [5, 2, 8]:

Encoder input: unsorted sequence e.g.

[5 + NUM_SPECIAL_TOKENS, 2 + NUM_SPECIAL_TOKENS, 8 + NUM_SPECIAL_TOKENS] = [8, 5, 11]

Decoder input (teacher forcing):

[BOS] + [2 + NUM_SPECIAL_TOKENS, 5 + NUM_SPECIAL_TOKENS, 8 + NUM_SPECIAL_TOKENS] = [1, 5, 8, 11]

Training targets (loss labels):

[2 + NUM_SPECIAL_TOKENS, 5 + NUM_SPECIAL_TOKENS, 8 + NUM_SPECIAL_TOKENS] + [EOS] = [5, 8, 11, 2]

Cross-entropy loss is used, with PAD tokens ignored.

📦 Installation

Make sure you have uv installed on your system. If you don't have it yet, you can install it by following the instructions here.

uv sync --extra dev
uv pip install -e .

🚀 Quick Start

Train:

uv run scripts/run_training.py

This will train the model and save the checkpoints in the checkpoints/ directory.

You can then run the model interactively:

uv run scripts/play_model.py checkpoints/model_YYYYMMDD_HHMMSS.pt

⚙️ Configuration

All settings are defined in config.yaml:

data.* (sequence length, value range, dataset size, batch size)
model.* (dimensions, heads, layers, dropout)
training.* (optimizer, epochs, eval interval)
seed (reproducibility)

📁 Project Layout

.
├── config.yaml                         # Training configuration and random seed
├── checkpoints/                        # Saved model checkpoints
├── logs/                               # Training and evaluation logs
├── scripts/
│   ├── run_training.py                 # Training loop with periodic evaluation
│   └── play_model.py                   # Interactive inference script
├── transformer_sort/
│   ├── models/
│   │   └── *.py                        # Transformer model components
│   └── utils/
│       └── *.py                        # Dataset, masks, evaluation, seeding
├── tests/
│   └── test_padding_equivalence.py     # Padding equivalence tests
├── pyproject.toml                      # Project metadata and tooling
└── README.md                           # This document

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
scripts		scripts
tests		tests
transformer_sort		transformer_sort
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
README.md		README.md
config.yaml		config.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Transformer Sort 🔢

🎯 Task

🏗️ Model and Data

🔢 Data generation and tokenization

🔗 Sequence construction

📦 Installation

🚀 Quick Start

⚙️ Configuration

📁 Project Layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 Transformer Sort 🔢

🎯 Task

🏗️ Model and Data

🔢 Data generation and tokenization

🔗 Sequence construction

📦 Installation

🚀 Quick Start

⚙️ Configuration

📁 Project Layout

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages