Hi! I'm Ernesto CedeΓ±o, a Software Engineering student, and this is my own mini GPT-style language model, built from scratch using Python and NumPy only.
The goal of this project is educational: to understand how an autoregressive language model works internally, without relying on high-level deep learning frameworks.
- Trains a character-level language model from plain text (
data/input.txt) - Uses:
- Character vocabulary
- Embeddings
- Fixed context window (block size)
- A small MLP (hidden layer with tanh)
- Cross-entropy loss and gradient descent
- Generates new text character by character, in the style of the training data
This is not meant to compete with GPT-4, of course π
But it helps to understand the core ideas behind large language models.
The file mini_gpt_v2.py implements:
MiniGPTMLPclass:- Builds a vocabulary from the training text
- Creates (context, next_char) pairs using a sliding window
- Learns embeddings for each character
- Concatenates embeddings β passes them through an MLP
- Predicts the probability distribution over the next character
- Training loop:
- Mini-batch gradient descent
- Cross-entropy loss
- Periodic loss logging
- Text generation:
- Starts from an initial text like
"Hola" - Uses the last
block_sizecharacters as context - Samples the next character from the model's probabilities
- Repeats autoregressively
- Starts from an initial text like
- Weight saving:
- Saves trained parameters to
mini_gpt_v2_weights.npz
- Saves trained parameters to
mini_gpt_ernesto/
β
ββ data/
β ββ input.txt # Training text dataset
β
ββ mini_gpt_v2.py # Model + training + generation
ββ README.md # Project documentation