GPT From Scratch

A comprehensive implementation of the Transformer architecture from scratch, building understanding through first principles and detailed explanations.

What's Included

Complete Transformer Implementation: From positional encodings to multi-head attention
Educational Focus: Detailed explanations of why each component exists and how it works
From-Scratch Approach: Built using PyTorch primitives without relying on high-level transformer libraries
Architecture Coverage:
- Positional Embeddings (sinusoidal)
- Scaled Dot-Product Attention
- Multi-Head Attention
- Feed-Forward Networks
- Encoder and Decoder Blocks
- Causal Masking
- Full Encoder-Decoder Transformer
- Full Decoder-Only Transformer and Core Differences

Implementation Details

Core Components

PositionalEncoding: Fixed sinusoidal embeddings that provide position information
ScaledDotProductAttention: Core attention mechanism with proper scaling
MultiHeadAttention: Parallel attention heads with learned projections
FeedForward: Position-wise non-linear transformations
EncoderBlock: Self-attention + FFN with residual connections
DecoderBlock: Masked self-attention + cross-attention + FFN
Transformer: Complete encoder-decoder architecture

Key Design Choices

Post-LayerNorm: Following the original Transformer paper
Residual Connections: Enable training of deep models
Causal Masking: Ensures autoregressive behavior in decoder
Learned Projections: Separate Q, K, V projections for flexibility

File Structure

GPT_From_Scratch/
├── GPT_From_Scratch (2).ipynb    # Main implementation notebook
├── index.html                    # Static HTML export of notebook
└── README.md                     # This file

Getting Started

Prerequisites

Python 3.7+
PyTorch
Jupyter Notebook

Installation

# Clone the repository
git clone <repository-url>
cd GPT_From_Scratch

# Install dependencies
pip install torch torchvision torchaudio
pip install jupyter notebook

Running the Code

# Start Jupyter notebook
jupyter notebook

# Open and run "GPT_From_Scratch (2).ipynb"

Educational Value

This implementation is designed for learning and understanding. Each component includes:

Detailed Comments: Explaining the "why" behind design choices
Mathematical Context: References to original papers and theory
Architectural Rationale: Why certain components are necessary
Historical Context: Evolution from RNNs to Transformers

Contributing

This repository is primarily educational. Contributions that improve clarity, fix bugs, or enhance the educational value are welcome.

License

This project is provided for educational purposes. Please refer to the original papers for proper attribution when using these concepts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT From Scratch

What's Included

Implementation Details

Core Components

Key Design Choices

File Structure

Getting Started

Prerequisites

Installation

Running the Code

Educational Value

Contributing

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
GPT_From_Scratch (2).ipynb		GPT_From_Scratch (2).ipynb
README.md		README.md
index.html		index.html

pythongiant/GPT-From-Scratch

Folders and files

Latest commit

History

Repository files navigation

GPT From Scratch

What's Included

Implementation Details

Core Components

Key Design Choices

File Structure

Getting Started

Prerequisites

Installation

Running the Code

Educational Value

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages