mini_llm

Toy project inspired by nanochat modded where people tried speedrun training GPT2 level model with modern improvements.

This is just for my own learning.

Layout

mini_llm/config.py: model, training, sampling, and W&B config defaults
mini_llm/data.py: dataset bootstrap, tokenizer, and batch sampling
mini_llm/model.py: GPT blocks, attention, forward pass, generation
mini_llm/train.py: device selection, optimizer setup, loss estimation, training loop, and W&B logging
mini_llm/cli.py: runnable entrypoint that reads config defaults and trains/samples
main.py: thin entrypoint

uv run python main.py

Configs and Hyperparameters are in mini_llm/config.py

For Weights & Biases logging, enable TRAIN_CONFIG.wandb.enabled and set the project, run name, or mode there as needed.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
mini_llm		mini_llm
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock