Toy project inspired by nanochat modded where people tried speedrun training GPT2 level model with modern improvements.
This is just for my own learning.
mini_llm/config.py: model, training, sampling, and W&B config defaultsmini_llm/data.py: dataset bootstrap, tokenizer, and batch samplingmini_llm/model.py: GPT blocks, attention, forward pass, generationmini_llm/train.py: device selection, optimizer setup, loss estimation, training loop, and W&B loggingmini_llm/cli.py: runnable entrypoint that reads config defaults and trains/samplesmain.py: thin entrypoint
uv run python main.pyConfigs and Hyperparameters are in mini_llm/config.py
For Weights & Biases logging, enable TRAIN_CONFIG.wandb.enabled and set the project, run name, or mode there as needed.