Letter-Forge symbolizes both the ancient craft of shaping written language
and the modern act of building models that learn linguistic structure from scratch.
A forge where letters learn to think.
Letter-Forge is a from-scratch implementation of Transformer architectures for character-level learning and language modeling.
It explores how attention, memory, and positional structure can emerge from simple sequences of letters - transforming raw symbols into learned meaning.
Letter-Forge is built to craft language understanding from the ground up.
It begins with a minimalist Transformer Encoder that learns counting and pattern recognition at the character level,
and extends to a full Transformer Language Model capable of predicting and generating text sequences.
Each component - from self-attention to positional encoding - is implemented manually to illustrate the inner mechanics of modern deep learning models.
| Component | Description |
|---|---|
| Custom Transformer Encoder | Built from first principles using PyTorch layers (Linear, Softmax, ReLU) — no off-the-shelf Transformer modules. |
| Self-Attention Mechanism | Implements single-head attention using learned queries, keys, and values. Visualizes attention maps between character positions. |
| Positional Encoding | Supports both learned and sinusoidal positional embeddings to inject order awareness into the model. |
| Transformer Language Model (LM) | Extends the encoder into a causal language model predicting the next character given context. |
| Visualization & Analysis | Generates heatmaps showing how the model “looks back” over previous symbols while learning structure. |
letter-forge/
│
├── part-1_encoder/ # Transformer Encoder (character-level)
│ ├── data/
│ │ ├── lettercounting-train.txt
│ │ └── lettercounting-dev.txt
│ ├── letter_counting.py # Driver script
│ ├── transformer.py # Core encoder + attention implementation
│ └── utils.py # Indexer & helper utilities
│
├── part-2_lm/ # Transformer Language Model
│ ├── data/
│ │ ├── text8-100k.txt
│ │ ├── text8-dev.txt
│ │ └── text8-test.txt
│ ├── lm.py # Driver & evaluation (perplexity, sanity checks)
│ ├── transformer_lm.py # LM model + training loop
│ └── utils.py
│
├── sandbox_utils/ # Development & testing scripts
│ ├── data_pipeline_verifier.py
│ ├── attention_pe_module_test.py
│ ├── attention_validation_suite.py
│ └── repro_training_logger.py
│
├── artifacts/ # Saved models & metadata
├── plots/ # Attention heatmaps & visual outputs
└── README.md
# Clone the repository
git clone https://github.com/<your-username>/letter-forge.git
cd letter-forge
# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate # or .\venv\Scripts\activate on Windows
# Install dependencies
pip install torch numpy matplotlibTrain and test the character-level counting model:
cd part-1_encoder
python letter_counting.py- Predicts how many times each letter has appeared before in a sequence.
- Visualizes attention maps showing how the model “looks back” over earlier tokens.
Sample Attention Visualization:
Train a Transformer to predict the next character in a text sequence:
cd part-2_lm
python lm.py --model NEURAL- Learns from the text8 dataset (100k character subset).
- Evaluates on perplexity and token-level likelihood.
- Produces valid probability distributions for every step.
| Model | Task | Metric | Result |
|---|---|---|---|
| Transformer Encoder | Character counting (BEFORE) | Accuracy | 98.3 % |
| Transformer Encoder | Character counting (BEFOREAFTER) | Accuracy | 97–99 % (tuned) |
| Transformer LM | Next-char prediction (text8) | Perplexity | ≤ 7 (target) |
| Attention Visualization | Pattern detection | Highlights same-character attention clusters |
The model successfully learns to identify prior occurrences of letters and extends this ability to generate context-aware text sequences.
- Self-Attention = Contextual Memory:
Each token attends to relevant predecessors, forming a learned memory of prior occurrences. - Positional Encoding Enables Order Awareness:
Without positional information, the model treats input as a bag of symbols; with it, order emerges. - From Counting to Composition:
The same underlying structure that counts characters can generate language — showing the continuum from perception to composition.
- Checkpoints:
artifacts/model_*.pt - Metadata:
artifacts/run_meta.json - Plots:
plots/*.png– attention heatmaps, loss curves, etc.
| Script | Purpose |
|---|---|
data_pipeline_verifier.py |
Validates dataset shapes and preprocessing pipeline. |
attention_pe_module_test.py |
Unit-tests Positional Encoding and Attention modules. |
attention_validation_suite.py |
Verifies attention masks and tensor consistency. |
repro_training_logger.py |
Reproducible 3-epoch experiment logger (saves artifacts). |
Letter-Forge is built on a simple idea:
that language understanding isn’t magic - it’s forged through repeated interaction between memory, order, and meaning.
By crafting each layer manually, we can see how modern intelligence emerges, one letter at a time.
d-senyaka
AI & Data Science Developer · Deep Learning Enthusiast · Language Technology Researcher
This project is released under the MIT License.
You are free to use, modify, and distribute it with attribution.
- Inspired by open research in attention mechanisms and neural sequence modeling.
- Crafted with curiosity, patience, and an appreciation for both language and logic.
“To forge a mind of letters is to understand the art of attention.”

