LLM_from_Scratch

This repository documents my journey of building a Large Language Model (LLM) from scratch

Daily Progress

Day 1: Understanding LLMs & Revisiting Fundamentals

Studied the basics of Large Language Models (LLMs)
Revised Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks
Started watching videos and reading basics of:
- Attention Is All You Need
- Language Models Are Few-Shot Learners

Day 2: Tokenization & Preprocessing

Implemented a simple tokenizer from scratch & Added special character tokens → Tokenizer.ipynb
Implemented Byte Pair Encoding (BPE) using tiktoken → Bytepairencoding.ipynb

Day 3: Input-Target Pairs & Token Embedding

Implemented Input-Target data pair generation using DataLoader → TargetPair.ipynb
Explored vector embedding → Word2Vec Google News (300D)
Created a token embedder in Torch using torch.nn.Embedding→ TokenEmbedding.ipynb
Implemented positional token embedding in Torch → PositionalTokenEmbedding.ipynb

Day 4: Basics of Attention Mechanism

Read Attention Is All You Need
Explored simplified , self ,causal , multi-head attention, why RNN fails
History of RNN , LSTM , Transformer
Learned about Bahdanau attention

Day 5: Attention Mechanism from Scratch

Implemented simplified attention mechanism with non-trainable weights from scratch → SimplifiedAttention.ipynb
Implemented self-attention mechanism using key, query, and value matrices with trainable weights from scratch → SelfAttention.ipynb
Implemented casual-attention mechanism with dropout from scratch → CasualAttention.ipynb

Day 6: Multihead Attention Mechanism from Scratch

Implemented Multihead Attention Mechanism from Scratch using simple Implementation → Multihead.ipynb
Implemented Multihead Attention Mechanism from Scratch with weight split and one class( no wrapper class ) → Multihead.ipynb

Day 7: GPT-2 Core Components

Added boilerplate Code for gpt 2 architecture → BoilerplateCode.ipynb
Implemented Layer Normalization class for LLM → LayerNorm.ipynb
Implemented a feed forward network with GELU activations for LLM → Gelu.ipynb
Shortcut /Skips connections for LLM → ShortCutconnection.ipynb

Day 8: Transformer Block & Training

Implemented Entire LLM Transformer Block → Transformer.ipynb
Coding the 124 million parameter GPT-2 model → GPT2.ipynb
Coding the GPT-2 to predict the next token → nextwordprediction.ipynb
Implemented Cross entropy and perplexity loss for LLM → Loss.ipynb

Day 9: Evaluation and Sampling

Evaluating LLM performance on real dataset → GPT2_RealDataset.ipynb
Coding the Entire LLM Pre-training Loop → GPT2_entirePretraining.ipynb
Implemented Temperature Scaling in LLM → TemperatureScaling.ipynb
Implemented Top-k sampling in LLM → TOP-Ksampling.ipynb

Day 10: Weight Loading

Saving and loading LLM model weights using PyTorch → Save_Load_weights.ipynb
Loading pre-trained weights from OpenAI GPT-2 124M → Loading_OPEN-AI_weights.ipynb
Training using OpenAI GPT-2 774M weights → 774M_weights.ipynb

Day 11: Classification Fine-Tuning

Classification Finetuned 124M Model on Spam classification dataset → SpamClassificationFinetuned.ipynb
Classification Finetuned 124M Model on PubMed 20k dataset → MedicalClassificationFinetuned.ipynb

Day 12: Instruction Fine-Tuning

Instruction Finetuned 124M Model on Alpaca-style prompt formatting → InstructionFinetuned.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
1_Preprocessing		1_Preprocessing
2_Attention_Mechanism		2_Attention_Mechanism
3_GPT		3_GPT
4_FineTune		4_FineTune
5_LLama_scratch		5_LLama_scratch
6_hf/1_medical		6_hf/1_medical
7_Lora		7_Lora
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM_from_Scratch

Daily Progress

Day 1: Understanding LLMs & Revisiting Fundamentals

Day 2: Tokenization & Preprocessing

Day 3: Input-Target Pairs & Token Embedding

Day 4: Basics of Attention Mechanism

Day 5: Attention Mechanism from Scratch

Day 6: Multihead Attention Mechanism from Scratch

Day 7: GPT-2 Core Components

Day 8: Transformer Block & Training

Day 9: Evaluation and Sampling

Day 10: Weight Loading

Day 11: Classification Fine-Tuning

Day 12: Instruction Fine-Tuning

About

Uh oh!

Releases

Packages

Uh oh!

Languages

DevManpreet5/LLM_from_Scratch

Folders and files

Latest commit

History

Repository files navigation

LLM_from_Scratch

Daily Progress

Day 1: Understanding LLMs & Revisiting Fundamentals

Day 2: Tokenization & Preprocessing

Day 3: Input-Target Pairs & Token Embedding

Day 4: Basics of Attention Mechanism

Day 5: Attention Mechanism from Scratch

Day 6: Multihead Attention Mechanism from Scratch

Day 7: GPT-2 Core Components

Day 8: Transformer Block & Training

Day 9: Evaluation and Sampling

Day 10: Weight Loading

Day 11: Classification Fine-Tuning

Day 12: Instruction Fine-Tuning

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages