AlignmentBank is an ongoing repository of paper notes and selective code reproductions, focused on foundational LLM techniques, reasoning research, and, above all, alignment. It is a “bank” which aims to facilitate alignment-related knowledge flow with zero interest rate 😎.
The mission is to foster a continuously evolving, high-quality archive of both theoretical and empirical developments in LLM alignment. The broader vision is to scale this initiative into a community-oriented resource in the future.
The repository is actively maintained with weekly updates.
Theme | Subtopics |
---|---|
Attention Mechanisms | MHA, MQA, SWA, GQA |
LLM Architectures | Fundamentals, Implementations |
LLM Family | GPT, LLaMA, Qwen, DeepSeek, etc. |
LLM Alignment | Algorithms, Frameworkds, LLM Reasoning etc. |
xxxxx | xxxxx |
-
Scaled dot-product self-attention / Multi-Head Attention (MHA): Attnetion is all you need
-
Multi-Query Attention (MQA): Fast Transformer Decoding: One Write-Head is All You Need
-
Sliding Window Attention (SWA): Longformer: The Long-Document Transformer
-
Grouped-Query Attention: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
-
Reproducing dense models with GPT-2 as an example (largely inspired by Andrej Karpathy's video)
-
Reproducing MoE models
-
GPT: Improving Language Understanding by Generative Pre-Training (Radford et al., 2018)
-
GPT-2: Language Models are Unsupervised Multitask Learners (Radford et al., 2019)
-
GPT-3: Language Models are Few-Shot Learners (Brown et al., 2020)
-
InstructGPT: Training language models to follow instructions with human feedback (Ouyang et al., 2022)
-
LLaMA: Open and Efficient Foundation Language Models (Touvron et al., 2023)
-
Llama 2: Open Foundation and Fine-Tuned Chat Models (GenAI, Meta, 2023.07)
-
Llama 3
-
Llama 4
-
DeepSeek LLM Scaling Open-Source Language Models with Longtermism
-
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
-
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
-
DeepSeek-V3
-
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
-
OLMo
-
2OLMo 2 Furious
-
Gemma
-
Gemma 2
-
PPO: Proximal Policy Optimization Algorithms (Schulman, et al., 2017)
-
GRPO:
-
DAPO:
-
VAPO:
-
Training Verifiers to Solve Math Word Problems (Cobbe et al., 2021)
-
Solving math word problems with process and outcome-based feedback (Ursato et al., 2022)
-
Generative Verifiers: Reward Modeling as Next-Token Prediction