diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md
index 7e187132f..99579901a 100644
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@@ -9,6 +9,8 @@ Rust library for NP-hard problem reductions. Implements computational problems w
 - [add-rule](skills/add-rule/SKILL.md) -- Add a new reduction rule. Can be used standalone (brainstorms with user) or called from `issue-to-pr`.
 - [review-implementation](skills/review-implementation/SKILL.md) -- Review implementation completeness by dispatching parallel subagents (structural + quality) with fresh context. Auto-detects new models/rules from git diff. Called automatically at the end of `add-model`/`add-rule`, after each `executing-plans` batch, or standalone via `/review-implementation`.
 - [fix-pr](skills/fix-pr/SKILL.md) -- Resolve PR review comments (user + Copilot), fix CI failures, and address codecov coverage gaps. Uses `gh api` for codecov (not local `cargo-llvm-cov`).
+- [write-model-in-paper](skills/write-model-in-paper/SKILL.md) -- Write or improve a problem-def entry in the Typst paper. Covers formal definition, background, example with visualization, and algorithm list.
+- [write-rule-in-paper](skills/write-rule-in-paper/SKILL.md) -- Write or improve a reduction-rule entry in the Typst paper. Covers complexity citation, self-contained proof, detailed example, and verification.
 - [release](skills/release/SKILL.md) -- Create a new crate release. Determines version bump from diff, verifies tests/clippy, then runs `make release`.
 
 ## Commands
diff --git a/.claude/skills/add-model/SKILL.md b/.claude/skills/add-model/SKILL.md
index aa3ea0df1..d09ccd816 100644
--- a/.claude/skills/add-model/SKILL.md
+++ b/.claude/skills/add-model/SKILL.md
@@ -120,9 +120,7 @@ Link the test file via `#[cfg(test)] #[path = "..."] mod tests;` at the bottom o
 
 ## Step 6: Document in paper
 
-Update `docs/paper/reductions.typ`:
-- Add to the `display-name` dictionary: `"ProblemName": [Display Name],`
-- Add a `#problem-def("ProblemName")[...]` block with the mathematical definition
+Invoke the `/write-model-in-paper` skill to write the problem-def entry in `docs/paper/reductions.typ`. That skill covers the full authoring process: formal definition, background, example with visualization, algorithm list, and verification checklist.
 
 ## Step 7: Verify
 
diff --git a/.claude/skills/add-rule/SKILL.md b/.claude/skills/add-rule/SKILL.md
index a54e85d87..1ce40cd78 100644
--- a/.claude/skills/add-rule/SKILL.md
+++ b/.claude/skills/add-rule/SKILL.md
@@ -129,20 +129,7 @@ example_fn!(test_<source>_to_<target>, reduction_<source>_to_<target>);
 
 ## Step 5: Document in paper
 
-Update `docs/paper/reductions.typ`:
-
-```typst
-#reduction-rule("Source", "Target",
-  example: true,
-  example-caption: [Caption text],
-)[
-  Reduction rule statement...
-][
-  Proof sketch...
-]
-```
-
-Present the example in tutorial style with clear intuition. Reference the KColoring -> QUBO section for style guidance.
+Invoke the `/write-rule-in-paper` skill to write the reduction-rule entry in `docs/paper/reductions.typ`. That skill covers the full authoring process: complexity citation, self-contained proof, detailed worked example, and verification checklist.
 
 ## Step 6: Regenerate graph and verify
 
diff --git a/.claude/skills/fix-pr/SKILL.md b/.claude/skills/fix-pr/SKILL.md
index f42d7ae96..ed9d8cb9d 100644
--- a/.claude/skills/fix-pr/SKILL.md
+++ b/.claude/skills/fix-pr/SKILL.md
@@ -23,13 +23,13 @@ Three sources of feedback to check:
 
 ```bash
 # Copilot and user inline review comments (on code lines)
-gh api repos/{owner}/{repo}/pulls/$PR/comments --jq '.[] | "[\(.user.login)] \(.path):\(.line // .original_line) — \(.body)"'
+gh api repos/{owner}/{repo}/pulls/$PR/comments --jq '.[] | "[" + .user.login + "] " + .path + ":" + ((.line // .original_line) | tostring) + " — " + .body'
 
 # Review-level comments (top-level review body)
-gh api repos/{owner}/{repo}/pulls/$PR/reviews --jq '.[] | select(.body != "") | "[\(.user.login)] \(.state): \(.body)"'
+gh api repos/{owner}/{repo}/pulls/$PR/reviews --jq '.[] | select(.body != "") | "[" + .user.login + "] " + .state + ": " + .body'
 
 # Issue-level comments (general discussion)
-gh api repos/{owner}/{repo}/issues/$PR/comments --jq '.[] | select(.user.login | test("codecov|copilot") | not) | "[\(.user.login)] \(.body)"'
+gh api repos/{owner}/{repo}/issues/$PR/comments --jq '.[] | select(.user.login | test("codecov|copilot") | not) | "[" + .user.login + "] " + .body'
 ```
 
 ### 1b. Check CI Status
@@ -37,7 +37,7 @@ gh api repos/{owner}/{repo}/issues/$PR/comments --jq '.[] | select(.user.login |
 ```bash
 # All check runs on the PR head
 gh api repos/{owner}/{repo}/commits/$HEAD_SHA/check-runs \
-  --jq '.check_runs[] | "\(.name): \(.conclusion // .status)"'
+  --jq '.check_runs[] | .name + ": " + (.conclusion // .status)'
 ```
 
 ### 1c. Check Codecov Report
@@ -104,7 +104,7 @@ For detailed line-by-line coverage, use the Codecov API:
 # Get file-level coverage for the PR
 gh api repos/{owner}/{repo}/pulls/$PR/comments \
   --jq '.[] | select(.user.login == "codecov[bot]") | .body' \
-  | grep -oP 'filepath=\K[^&]+'
+  | sed -n 's/.*filepath=\([^&]*\).*/\1/p'
 ```
 
 Then read the source files and identify which new/changed lines lack test coverage.
diff --git a/.claude/skills/write-model-in-paper/SKILL.md b/.claude/skills/write-model-in-paper/SKILL.md
new file mode 100644
index 000000000..9ebaf7986
--- /dev/null
+++ b/.claude/skills/write-model-in-paper/SKILL.md
@@ -0,0 +1,181 @@
+---
+name: write-model-in-paper
+description: Use when writing or improving a problem-def entry in the Typst paper (docs/paper/reductions.typ)
+---
+
+# Write Problem Model in Paper
+
+Full authoring guide for writing a `problem-def` entry in `docs/paper/reductions.typ`. Covers formal definition, background, examples with visualization, and verification.
+
+## Prerequisites
+
+Before using this skill, ensure:
+- The problem model is implemented (`src/models/<category>/<name>.rs`)
+- The problem is registered with schema and variant metadata
+- JSON exports are up to date (`make rust-export && make export-schemas`)
+
+## Reference Example
+
+**MaximumIndependentSet** in `docs/paper/reductions.typ` is the gold-standard model example. Search for `problem-def("MaximumIndependentSet")` to see the complete entry. Use it as a template for style, depth, and structure.
+
+## The `problem-def` Function
+
+```typst
+#problem-def("ProblemName")[
+  Formal definition...          // parameter 1: def
+][
+  Background, example, figure...  // parameter 2: body
+]
+```
+
+**Three parameters:**
+- `name` (string) — problem name matching `display-name` dictionary key
+- `def` (content) — formal mathematical definition
+- `body` (content) — background, examples, figures, algorithm list
+
+**Auto-generated between `def` and `body`:**
+- Variant complexity table (from Rust `declare_variants!` metadata)
+- Reduction links (from reduction graph JSON)
+- Schema field table (from problem schema JSON)
+
+## Step 1: Register Display Name
+
+Add to the `display-name` dictionary near the top of `reductions.typ`:
+
+```typst
+"ProblemName": [Display Name],
+```
+
+## Step 2: Write the Formal Definition (`def` parameter)
+
+One self-contained sentence or short paragraph. Requirements:
+
+1. **Introduce all inputs first** — graph, weights, sets, variables with their domains
+2. **State the objective or constraint** — what is being optimized or satisfied
+3. **Define all notation before use** — every symbol must be introduced before it appears
+
+### Pattern for optimization problems
+
+```typst
+Given [inputs with domains], find [solution variable] [maximizing/minimizing] [objective] such that [constraints].
+```
+
+### Pattern for satisfaction problems
+
+```typst
+Given [inputs with domains], find [solution variable] such that [constraints].
+```
+
+### Example (MIS)
+
+```typst
+Given $G = (V, E)$ with vertex weights $w: V -> RR$, find $S subset.eq V$
+maximizing $sum_(v in S) w(v)$ such that no two vertices in $S$ are
+adjacent: $forall u, v in S: (u, v) in.not E$.
+```
+
+## Step 3: Write the Body
+
+The body goes AFTER the auto-generated sections (complexity, reductions, schema). It contains four parts in order:
+
+### 3a. Background & Motivation
+
+1-3 sentences covering:
+- Historical context (e.g., "One of Karp's 21 NP-complete problems")
+- Applications (e.g., "appears in wireless network scheduling, register allocation")
+- Notable structural properties (e.g., "Solvable in polynomial time on bipartite graphs, interval graphs, chordal graphs")
+
+If the user provides specific justification or motivation, incorporate it here.
+
+### 3b. Best Known Algorithms
+
+Must clearly state which algorithm gives the best complexity and cite reference. Add a warning as footnote if no reliable reference is found.
+
+Integrate algorithm complexity naturally into the background prose — do NOT append a terse "Best known: $O^*(...)$" at the end:
+
+```typst
+% Good: names the algorithm, cites reference
+The best known algorithm runs in $O^*(1.1996^n)$ time via measure-and-conquer
+branching @xiao2017.
+
+% Good: brute-force with footnote when no better algorithm is known
+The best known algorithm runs in $O^*(2^n)$ by brute-force
+enumeration#footnote[No algorithm improving on brute-force is known for ...].
+
+% Bad: terse appendage, no algorithm name, no reference
+Best known: $O^*(2^n)$.
+```
+
+For problems with multiple notable algorithms or special cases, weave them into the text:
+```typst
+Solvable in $O(n+m)$ for $k=2$ via bipartiteness testing. For $k=3$, the best
+known algorithm runs in $O^*(1.3289^n)$ @beigel2005; in general, inclusion-exclusion
+achieves $O^*(2^n)$ @bjorklund2009.
+```
+
+**Citation rules:**
+- Every complexity claim MUST have a citation (`@key`) identifying the algorithm
+- If the best known is brute-force enumeration with no specialized algorithm, add footnote: `#footnote[No algorithm improving on brute-force enumeration is known for ...]`
+- If a reference exists but has not been independently verified, add footnote: `#footnote[Complexity not independently verified from literature.]`
+- Include approximation results where relevant (e.g., "0.878-approximation @goemans1995")
+
+**Consistency note:** The auto-generated complexity table (from `declare_variants!`) also shows complexity per variant. The written text and the auto-generated table may overlap. Keep both — the written text provides references and context; the auto-generated table provides per-variant detail. A future verification step will check consistency between them.
+
+### 3c. Example with Visualization
+
+A concrete small instance that illustrates the problem. Requirements:
+
+1. **Small enough to verify by hand** — readers should be able to check the solution
+2. **Include a diagram/graph** using the paper's visualization helpers
+3. **Show a valid/optimal solution** and explain why it is valid/optimal
+4. **Walk through evaluation** — show how the objective/verifier computes the solution value
+
+Structure:
+
+```typst
+*Example.* Consider [instance description with concrete numbers].
+[Describe the solution and why it's valid/optimal].
+
+#figure({
+  // visualization code — see MaximumIndependentSet for graph rendering pattern
+},
+caption: [Caption describing the figure with key parameters],
+) <fig:problem-example>
+```
+
+**For graph problems**, use the paper's existing graph helpers:
+- `petersen-graph()`, `house-graph()` or define custom vertex/edge lists
+- `canvas(length: ..., { ... })` with `g-node()` and `g-edge()`
+- Highlight solution elements with `graph-colors.at(0)` (blue) and use `white` fill for non-solution
+
+Refer to the **MaximumIndependentSet** entry for the complete graph rendering pattern. Adapt it to your problem.
+
+### 3d. Evaluation Explanation
+
+Explain how a configuration is evaluated — this maps to the Rust `evaluate()` method:
+- For optimization: show the cost function computation on the example solution
+- For satisfaction: show the verifier check on the example solution
+
+This can be woven into the example text (as MIS does: "$w(S) = sum_(v in S) w(v) = 4 = alpha(G)$").
+
+## Step 4: Build and Verify
+
+```bash
+# Regenerate exports (if not already done)
+make rust-export && make export-schemas
+
+# Build the paper
+make paper
+```
+
+### Verification Checklist
+
+- [ ] **Display name registered**: entry exists in `display-name` dictionary
+- [ ] **Notation self-contained**: every symbol in `def` is defined before first use
+- [ ] **Background present**: historical context, applications, or structural properties
+- [ ] **Algorithms cited**: every complexity claim has `@citation` or footnote warning
+- [ ] **Example present**: concrete small instance with visualization
+- [ ] **Evaluation shown**: objective/verifier computed on the example solution
+- [ ] **Diagram included**: figure with caption and label for graph/matrix/set visualization
+- [ ] **Paper compiles**: `make paper` succeeds without errors
+- [ ] **Complexity consistency**: written complexity and auto-generated variant table are compatible (note any discrepancies for later review)
diff --git a/.claude/skills/write-rule-in-paper/SKILL.md b/.claude/skills/write-rule-in-paper/SKILL.md
new file mode 100644
index 000000000..e08d852c8
--- /dev/null
+++ b/.claude/skills/write-rule-in-paper/SKILL.md
@@ -0,0 +1,240 @@
+---
+name: write-rule-in-paper
+description: Use when writing or improving a reduction-rule entry in the Typst paper (docs/paper/reductions.typ)
+---
+
+# Write Reduction Rule in Paper
+
+Full authoring guide for writing a `reduction-rule` entry in `docs/paper/reductions.typ`. Covers Typst mechanics, writing quality, and verification.
+
+## Reference Example
+
+**KColoring → QUBO** in `docs/paper/reductions.typ` is the gold-standard reduction example. Search for `reduction-rule("KColoring", "QUBO"` to see the complete entry. Use it as a template for style, depth, and structure.
+
+## Prerequisites
+
+Before using this skill, ensure:
+- The reduction is implemented and tested (`src/rules/<source>_<target>.rs`)
+- An example program exists (`examples/reduction_<source>_to_<target>.rs`)
+- Example JSON is generated (`make examples`)
+- The reduction graph is up to date (`make rust-export`)
+
+## Step 1: Load Example Data
+
+```typst
+#let src_tgt = load-example("<source>_to_<target>")
+#let src_tgt_r = load-results("<source>_to_<target>")
+#let src_tgt_sol = src_tgt_r.solutions.at(0)
+```
+
+Where:
+- `load-example(name)` loads `examples/{name}.json` — contains source/target problem instances
+- `load-results(name)` loads `examples/{name}.result.json` — contains solution configs
+- Access fields: `src_tgt.source.instance`, `src_tgt_sol.source_config`, `src_tgt_sol.target_config`
+
+## Step 2: Write the Theorem Body (Rule Statement)
+
+The theorem body is a concise block with three parts:
+
+### 2a. Complexity with Reference
+
+State the reduction's time complexity with a citation. Examples:
+
+```typst
+% With verified reference:
+This $O(n + m)$ reduction @Author2023 constructs ...
+
+% Without verified reference — add footnote:
+This $O(n^2)$ reduction#footnote[Complexity not independently verified from literature.] constructs ...
+```
+
+**Verification**: Identify the best known reference for this reduction's complexity. If you cannot find a peer-reviewed or textbook source, you MUST add the footnote.
+
+### 2b. Construction Summary
+
+One sentence describing what the reduction builds:
+
+```typst
+... constructs an intersection graph $G' = (V', E')$ where ...
+```
+
+### 2c. Overhead Hint
+
+State target dimensions in terms of source. This complements the auto-derived overhead (which appears automatically from JSON edge data):
+
+```typst
+... ($n k$ variables indexed by $v dot k + c$).
+```
+
+### Complete theorem body example
+
+```typst
+][
+  Given $G = (V, E)$ with $k$ colors, construct upper-triangular
+  $Q in RR^(n k times n k)$ using one-hot encoding $x_(v,c) in {0,1}$
+  ($n k$ variables indexed by $v dot k + c$).
+]
+```
+
+## Step 3: Write the Proof Body
+
+The proof must be **self-contained** (all notation defined before use) and **reproducible** (enough detail to reimplement the reduction from the proof alone).
+
+### Structure
+
+Use these subsections in order. Use italic labels exactly as shown:
+
+```typst
+][
+  _Construction._ ...
+
+  _Correctness._ ...
+
+  _Variable mapping._ ...    // only if the reduction has a non-trivial variable mapping
+
+  _Solution extraction._ ...
+]
+```
+
+### 3a. Construction
+
+Full mathematical construction of the target instance. Define all symbols and notation here.
+
+**For standard reductions** (< 300 LOC): Write the complete construction with enough math to reimplement.
+
+**For heavy reductions** (300+ LOC): Briefly describe the approach and cite a reference:
+```typst
+_Construction._ The reduction follows the standard Cook–Levin construction @Cook1971,
+encoding each gate as a set of clauses. See @Source for full details.
+```
+
+### 3b. Correctness
+
+Bidirectional (iff) argument showing solution correspondence. Use ($arrow.r.double$) and ($arrow.l.double$) for each direction:
+
+```typst
+_Correctness._ ($arrow.r.double$) If $S$ is independent, then ...
+($arrow.l.double$) If $C$ is a vertex cover, then ...
+```
+
+### 3c. Variable Mapping (if applicable)
+
+Explicitly state how source variables map to target variables. Include this section when the mapping is non-trivial (encoding, expansion, reindexing). Omit for identity mappings or trivial complement operations.
+
+```typst
+_Variable mapping._ Vertices $= {S_1, ..., S_m}$, edges $= {(S_i, S_j) : S_i inter S_j != emptyset}$, $w(v_i) = w(S_i)$.
+```
+
+### 3d. Solution Extraction
+
+How to convert a target solution back to a source solution:
+
+```typst
+_Solution extraction._ For each vertex $v$, find $c$ with $x_(v,c) = 1$.
+```
+
+## Step 4: Write the Worked Example (Extra Block)
+
+Detailed by default. Only use a brief example for trivially obvious reductions (complement, identity).
+
+### 4a. Typst Skeleton
+
+```typst
+#reduction-rule("Source", "Target",
+  example: true,
+  example-caption: [Description ($n = ...$, $|E| = ...$)],
+  extra: [
+    // Optional: graph visualization
+    #{
+      // canvas code for graph rendering
+    }
+
+    *Step 1 -- [action].* [description with concrete numbers]
+
+    *Step 2 -- [action].* [construction details]
+
+    // ... more steps as needed
+
+    *Step N -- Verify a solution.* [end-to-end verification]
+
+    *Count:* #src_tgt_r.solutions.len() optimal solutions ...
+  ],
+)
+```
+
+### 4b. Step-by-Step Content
+
+Each step should:
+1. **Name the action** in bold: `*Step K -- [verb phrase].*`
+2. **Show concrete numbers** from the example instance (use Typst expressions to extract from JSON, not hardcoded values)
+3. **Explain where overhead comes from** — e.g., "5 vertices x 3 colors = 15 QUBO variables"
+
+### 4c. Required Steps
+
+| Step | Content |
+|------|---------|
+| First | Show the source instance (dimensions, structure). Include graph visualization if applicable. |
+| Middle | Walk through the construction. Show intermediate values. Explicitly quantify overhead. |
+| Second-to-last | Verify a concrete solution end-to-end (source config → target config, check validity). |
+| Last | Solution count: `#src_tgt_r.solutions.len()` with brief combinatorial justification. |
+
+### 4d. Graph Visualization (if applicable)
+
+```typst
+#{
+  let fills = src_tgt_sol.source_config.map(c => graph-colors.at(c))
+  align(center, canvas(length: 0.8cm, {
+    for (u, v) in graph.edges { g-edge(graph.vertices.at(u), graph.vertices.at(v)) }
+    for (k, pos) in graph.vertices.enumerate() {
+      g-node(pos, name: str(k), fill: fills.at(k), label: str(k))
+    }
+  }))
+}
+```
+
+### 4e. Accessing Solution Data
+
+```typst
+// Source configuration (e.g., color assignments)
+#src_tgt_sol.source_config.map(str).join(", ")
+
+// Target configuration (e.g., binary encoding)
+#src_tgt_sol.target_config.map(str).join(", ")
+
+// Number of optimal solutions
+#src_tgt_r.solutions.len()
+
+// Source instance fields
+#src_tgt.source.instance.num_vertices
+```
+
+## Step 5: Register Display Name (if new problem)
+
+If this is a new problem not yet in the paper, add to the `display-name` dictionary near the top of `reductions.typ`:
+
+```typst
+"ProblemName": [Display Name],
+```
+
+## Step 6: Build and Verify
+
+```bash
+# Regenerate example JSON (if not already done)
+make examples
+
+# Build the paper
+make paper
+```
+
+### Verification Checklist
+
+- [ ] **Notation self-contained**: every symbol is defined before first use within the proof
+- [ ] **Complexity cited**: reference exists, or footnote added for unverified claims
+- [ ] **Overhead consistent**: prose dimensions match auto-derived overhead from JSON edge data
+- [ ] **Example uses JSON data**: concrete values come from `load-example`/`load-results`, not hardcoded
+- [ ] **Solution verified**: at least one solution checked end-to-end in the example
+- [ ] **Solution count**: `solutions.len()` stated with combinatorial explanation
+- [ ] **Paper compiles**: `make paper` succeeds without errors
+- [ ] **Completeness check**: no new warnings about missing edges in the paper
+
+For simpler reductions, see MinimumVertexCover ↔ MaximumIndependentSet as a minimal example.
diff --git a/docs/paper/lib.typ b/docs/paper/lib.typ
index 8652d4f68..c3c33ca51 100644
--- a/docs/paper/lib.typ
+++ b/docs/paper/lib.typ
@@ -88,6 +88,227 @@
   (vertices: vertices, edges: edges)
 }
 
+// ── Set diagram primitives ──────────────────────────────────────
+// For visualizing set packing, set covering, and similar problems.
+// Elements are small labeled dots; sets are smooth hobby-curve blobs.
+
+// Draw a universe element as a labeled dot.
+//   pos: (x, y) position
+//   label: content label (e.g., [$1$])
+//   name: CetZ element name
+//   fill: dot fill color
+#let selem(
+  pos,
+  label: none,
+  name: none,
+  fill: black,
+  radius: 0.06,
+  label-size: 7pt,
+) = {
+  draw.circle(pos, radius: radius, fill: fill, stroke: none, name: name)
+  if label != none {
+    draw.content(
+      (pos.at(0), pos.at(1) - 0.22),
+      text(label-size, label),
+    )
+  }
+}
+
+// Draw a set region as an ellipse enclosing given positions.
+//   positions: array of (x, y) positions the set should enclose
+//   pad: padding distance around the bounding box
+//   label: set label (e.g., [$S_1$]), placed above the ellipse
+//   fill: translucent fill color
+//   stroke: border stroke
+#let sregion(
+  positions,
+  pad: 0.3,
+  label: none,
+  fill: rgb("#4e79a7").transparentize(80%),
+  stroke: 0.8pt + rgb("#4e79a7"),
+  label-size: 8pt,
+  label-anchor: "south",
+) = {
+  if positions.len() == 0 { return }
+
+  let xs = positions.map(p => p.at(0))
+  let ys = positions.map(p => p.at(1))
+  let cx = (calc.min(..xs) + calc.max(..xs)) / 2
+  let cy = (calc.min(..ys) + calc.max(..ys)) / 2
+  let rx = (calc.max(..xs) - calc.min(..xs)) / 2 + pad
+  let ry = (calc.max(..ys) - calc.min(..ys)) / 2 + pad
+
+  draw.circle((cx, cy), radius: (rx, ry), fill: fill, stroke: stroke)
+  if label != none {
+    draw.content(
+      (cx, cy + ry + 0.15),
+      text(label-size, label), anchor: label-anchor,
+    )
+  }
+}
+
+// ── High-level graph drawing helpers ─────────────────────────────
+// Wrappers around g-node/g-edge for common visualization patterns.
+
+// Draw graph with a highlighted node subset (blue fill, white for others).
+#let draw-node-highlight(vertices, edges, highlights) = canvas(length: 1cm, {
+  for (u, v) in edges { g-edge(vertices.at(u), vertices.at(v)) }
+  for (k, pos) in vertices.enumerate() {
+    let s = highlights.contains(k)
+    g-node(pos, name: "v" + str(k),
+      fill: if s { graph-colors.at(0) } else { white },
+      label: if s { text(fill: white)[$v_#k$] } else { [$v_#k$] })
+  }
+})
+
+// Draw graph with highlighted edges (bold blue vs gray) and nodes.
+#let draw-edge-highlight(vertices, edges, edge-highlights, node-highlights) = canvas(length: 1cm, {
+  for (u, v) in edges {
+    let h = edge-highlights.any(e => (e.at(0) == u and e.at(1) == v) or (e.at(0) == v and e.at(1) == u))
+    g-edge(vertices.at(u), vertices.at(v),
+      stroke: if h { 2pt + graph-colors.at(0) } else { 1pt + luma(200) })
+  }
+  for (k, pos) in vertices.enumerate() {
+    let s = node-highlights.contains(k)
+    g-node(pos, name: "v" + str(k),
+      fill: if s { graph-colors.at(0) } else { white },
+      label: if s { text(fill: white)[$v_#k$] } else { [$v_#k$] })
+  }
+})
+
+// Draw graph with per-node coloring from a color-index array.
+#let draw-node-colors(vertices, edges, colors) = canvas(length: 1cm, {
+  for (u, v) in edges { g-edge(vertices.at(u), vertices.at(v)) }
+  for (k, pos) in vertices.enumerate() {
+    g-node(pos, name: "v" + str(k),
+      fill: graph-colors.at(colors.at(k)),
+      label: text(fill: white)[$v_#k$])
+  }
+})
+
+// ── Set region style presets ─────────────────────────────────────
+// Spread into sregion() calls: sregion(positions, ..sregion-selected, label: [$S_1$])
+#let sregion-selected = (
+  fill: graph-colors.at(0).transparentize(80%),
+  stroke: 1.2pt + graph-colors.at(0),
+)
+#let sregion-dimmed = (
+  fill: rgb("#999").transparentize(90%),
+  stroke: 0.8pt + rgb("#999"),
+)
+
+// ── Logic gate primitives ────────────────────────────────────────
+// For circuit diagrams (CircuitSAT examples).
+// Each gate is a CetZ group with named anchors: in0, in1, ..., out.
+
+// Cubic bezier point at parameter t ∈ [0, 1]
+#let bezier-at(p0, c1, c2, p3, t) = {
+  let u = 1 - t
+  let uu = u * u
+  let tt = t * t
+  (
+    uu * u * p0.at(0) + 3 * uu * t * c1.at(0) + 3 * u * tt * c2.at(0) + tt * t * p3.at(0),
+    uu * u * p0.at(1) + 3 * uu * t * c1.at(1) + 3 * u * tt * c2.at(1) + tt * t * p3.at(1),
+  )
+}
+
+// Concave-curve control points for OR/XOR left edge
+#let or-left-curve(w, r, d, dx: 0) = {
+  let x = -w / 2 - dx
+  ((x, -r), (x + d, -r / 3), (x + d, r / 3), (x, r))
+}
+
+// AND gate: D-shape (flat left + semicircular right)
+#let gate-and(
+  pos,
+  inputs: 2,
+  w: 0.8,
+  h: auto,
+  name: none,
+  fill: white,
+  stroke: 0.5pt,
+) = {
+  let h = if h == auto { calc.max(0.5, 0.3 * inputs + 0.1) } else { h }
+  let r = h / 2
+  draw.group(name: name, {
+    draw.set-origin(pos)
+    draw.anchor("default", (0, 0))
+    draw.merge-path(close: true, fill: fill, stroke: stroke, {
+      draw.line((-w / 2, -r), (-w / 2, r), (w / 2 - r, r))
+      draw.arc((), start: 90deg, stop: -90deg, radius: r)
+    })
+    for i in range(inputs) {
+      draw.anchor("in" + str(i), (-w / 2, r - (i + 0.5) * h / inputs))
+    }
+    draw.anchor("out", (w / 2, 0))
+  })
+}
+
+// OR gate: curved body with pointed output
+#let gate-or(
+  pos,
+  inputs: 2,
+  w: 0.8,
+  h: auto,
+  name: none,
+  fill: white,
+  stroke: 0.5pt,
+) = {
+  let h = if h == auto { calc.max(0.5, 0.3 * inputs + 0.1) } else { h }
+  let r = h / 2
+  let d = w / 6
+  let (bl, lc1, lc2, tl) = or-left-curve(w, r, d)
+  let tip = (w / 2, 0)
+  draw.group(name: name, {
+    draw.set-origin(pos)
+    draw.anchor("default", (0, 0))
+    draw.merge-path(close: true, fill: fill, stroke: stroke, {
+      draw.bezier(bl, tl, lc1, lc2)
+      draw.bezier(tl, tip, (-w / 6, r), (w / 4, r / 2))
+      draw.bezier(tip, bl, (w / 4, -r / 2), (-w / 6, -r))
+    })
+    for i in range(inputs) {
+      let t = 1 - (i + 0.5) / inputs
+      draw.anchor("in" + str(i), bezier-at(bl, lc1, lc2, tl, t))
+    }
+    draw.anchor("out", tip)
+  })
+}
+
+// XOR gate: OR shape + extra concave curve on left
+#let gate-xor(
+  pos,
+  inputs: 2,
+  w: 0.8,
+  h: auto,
+  name: none,
+  fill: white,
+  stroke: 0.5pt,
+) = {
+  let h = if h == auto { calc.max(0.5, 0.3 * inputs + 0.1) } else { h }
+  let r = h / 2
+  let d = w / 6
+  let gap = 0.15
+  let (bl, lc1, lc2, tl) = or-left-curve(w, r, d)
+  let (ebl, ec1, ec2, etl) = or-left-curve(w, r, d, dx: gap)
+  let tip = (w / 2, 0)
+  draw.group(name: name, {
+    draw.set-origin(pos)
+    draw.anchor("default", (0, 0))
+    draw.merge-path(close: true, fill: fill, stroke: stroke, {
+      draw.bezier(bl, tl, lc1, lc2)
+      draw.bezier(tl, tip, (-w / 6, r), (w / 4, r / 2))
+      draw.bezier(tip, bl, (w / 4, -r / 2), (-w / 6, -r))
+    })
+    draw.bezier(ebl, etl, ec1, ec2, stroke: stroke)
+    for i in range(inputs) {
+      let t = 1 - (i + 0.5) / inputs
+      draw.anchor("in" + str(i), bezier-at(ebl, ec1, ec2, etl, t))
+    }
+    draw.anchor("out", tip)
+  })
+}
+
 // ── Grid graph functions (JSON-driven) ─────────────────────────
 // Extract positions from JSON, draw with dense styling via g-node/g-edge.
 
diff --git a/docs/paper/reductions.typ b/docs/paper/reductions.typ
index 9b639e589..300415b09 100644
--- a/docs/paper/reductions.typ
+++ b/docs/paper/reductions.typ
@@ -2,7 +2,7 @@
 #let graph-data = json("../src/reductions/reduction_graph.json")
 #import "@preview/cetz:0.4.2": canvas, draw
 #import "@preview/ctheorems:1.1.3": thmbox, thmplain, thmproof, thmrules
-#import "lib.typ": g-node, g-edge, petersen-graph, house-graph, octahedral-graph, draw-grid-graph, draw-triangular-graph, graph-colors
+#import "lib.typ": g-node, g-edge, petersen-graph, house-graph, octahedral-graph, draw-grid-graph, draw-triangular-graph, graph-colors, selem, sregion, draw-node-highlight, draw-edge-highlight, draw-node-colors, sregion-selected, sregion-dimmed, gate-and, gate-or, gate-xor
 
 #set page(paper: "a4", margin: (x: 2cm, y: 2.5cm))
 #set text(font: "New Computer Modern", size: 10pt)
@@ -173,19 +173,23 @@
   fill: rgb("#f8f8f8"),
   stroke: (left: 2pt + rgb("#4a86e8")),
   inset: (x: 1em, y: 0.8em),
+  breakable: true,
   base_level: 1,
 )
 
 // Problem definition wrapper: auto-adds schema, complexity, reductions list, and label
-#let problem-def(name, body) = {
+#let problem-def(name, def, body) = {
   let lbl = label("def:" + name)
   let title = display-name.at(name)
   [#definition(title)[
-    #body
-    #render-schema(name)
+    #def
     #render-complexity(name)
     #render-reductions(name)
-  ] #lbl]
+    #render-schema(name)
+
+    #body
+  ]
+  #lbl]
 }
 
 // Find edge in graph-data by source/target names
@@ -328,47 +332,153 @@ In all graph problems below, $G = (V, E)$ denotes an undirected graph with $|V|
 
 #problem-def("MaximumIndependentSet")[
   Given $G = (V, E)$ with vertex weights $w: V -> RR$, find $S subset.eq V$ maximizing $sum_(v in S) w(v)$ such that no two vertices in $S$ are adjacent: $forall u, v in S: (u, v) in.not E$.
-  One of Karp's 21 NP-complete problems @karp1972. Best known: $O^*(1.1996^n)$ via measure-and-conquer branching @xiao2017. Solvable in polynomial time on bipartite, interval, and cograph classes.
+][
+One of Karp's 21 NP-complete problems @karp1972, MIS appears in wireless network scheduling, register allocation, and coding theory @shannon1956. Solvable in polynomial time on bipartite graphs (König's theorem), interval graphs, chordal graphs, and cographs. The best known algorithm runs in $O^*(1.1996^n)$ time via measure-and-conquer branching @xiao2017.
+
+*Example.* Consider the Petersen graph $G$ with $n = 10$ vertices, $|E| = 15$ edges, and unit weights $w(v) = 1$ for all $v in V$. The graph is 3-regular (every vertex has degree 3). A maximum independent set is $S = {v_1, v_3, v_5, v_9}$ with $w(S) = sum_(v in S) w(v) = 4 = alpha(G)$. No two vertices in $S$ share an edge, and no vertex can be added without violating independence.
+
+#figure({
+  let pg = petersen-graph()
+  draw-node-highlight(pg.vertices, pg.edges, (1, 3, 5, 9))
+},
+caption: [The Petersen graph with a maximum independent set $S = {v_1, v_3, v_5, v_9}$ shown in blue ($alpha(G) = 4$). Outer vertices $v_0, ..., v_4$ form a pentagon; inner vertices $v_5, ..., v_9$ form a pentagram. Unit weights $w(v_i) = 1$.],
+) <fig:petersen-mis>
 ]
 
 #problem-def("MinimumVertexCover")[
   Given $G = (V, E)$ with vertex weights $w: V -> RR$, find $S subset.eq V$ minimizing $sum_(v in S) w(v)$ such that every edge has at least one endpoint in $S$: $forall (u, v) in E: u in S or v in S$.
-  Best known: $O^*(1.1996^n)$ via MIS complement ($|"VC"| + |"IS"| = n$) @xiao2017. A central problem in parameterized complexity: admits FPT algorithms in $O^*(1.2738^k)$ time parameterized by solution size $k$.
+][
+One of Karp's 21 NP-complete problems @karp1972. Vertex Cover is the complement of Independent Set: $S$ is a vertex cover iff $V backslash S$ is an independent set, so $|"VC"| + |"IS"| = n$. Central to parameterized complexity, admitting FPT algorithms in $O^*(1.2738^k)$ time parameterized by solution size $k$. The best known exact algorithm runs in $O^*(1.1996^n)$ via the MIS complement @xiao2017.
+
+*Example.* Consider the house graph $G$ with $n = 5$ vertices, $|E| = 6$ edges, and unit weights $w(v) = 1$. A minimum vertex cover is $S = {v_0, v_3, v_4}$ with $w(S) = 3$: edge $(v_0, v_1)$ is covered by $v_0$; $(v_0, v_2)$ by $v_0$; $(v_1, v_3)$ by $v_3$; $(v_2, v_3)$ by $v_3$; $(v_2, v_4)$ by $v_4$; $(v_3, v_4)$ by both. The complement ${v_1, v_2}$ is a maximum independent set ($alpha(G) = 2$, confirming $|"VC"| = n - alpha = 3$).
+
+#figure({
+  let hg = house-graph()
+  draw-node-highlight(hg.vertices, hg.edges, (0, 3, 4))
+},
+caption: [The house graph with a minimum vertex cover $S = {v_0, v_3, v_4}$ shown in blue ($w(S) = 3$). Every edge is incident to at least one blue vertex.],
+) <fig:house-vc>
 ]
 
 #problem-def("MaxCut")[
   Given $G = (V, E)$ with weights $w: E -> RR$, find partition $(S, overline(S))$ maximizing $sum_((u,v) in E: u in S, v in overline(S)) w(u, v)$.
-  Best known: $O^*(2^(omega n slash 3))$ via algebraic 2-CSP techniques @williams2005, where $omega < 2.372$ is the matrix multiplication exponent; requires exponential space. Polynomial-time solvable on planar graphs. The Goemans-Williamson SDP relaxation achieves a 0.878-approximation @goemans1995.
+][
+Max-Cut is NP-hard on general graphs @barahona1982 but polynomial-time solvable on planar graphs. The Goemans-Williamson SDP relaxation achieves a 0.878-approximation ratio @goemans1995, which is optimal assuming the Unique Games Conjecture. The best known exact algorithm runs in $O^*(2^(omega n slash 3))$ time via algebraic 2-CSP techniques @williams2005, where $omega < 2.372$ is the matrix multiplication exponent.
+
+*Example.* Consider the house graph $G$ with $n = 5$ vertices, $|E| = 6$ edges, and unit weights $w(e) = 1$. The partition $S = {v_0, v_3}$, $overline(S) = {v_1, v_2, v_4}$ cuts 5 of 6 edges: $(v_0, v_1)$, $(v_0, v_2)$, $(v_1, v_3)$, $(v_2, v_3)$, $(v_3, v_4)$. Only the edge $(v_2, v_4)$ is uncut (both endpoints in $overline(S)$). The cut value is $sum w(e) = 5$.
+
+#figure({
+  let hg = house-graph()
+  let side-s = (0, 3)
+  let cut-edges = hg.edges.filter(e => side-s.contains(e.at(0)) != side-s.contains(e.at(1)))
+  draw-edge-highlight(hg.vertices, hg.edges, cut-edges, side-s)
+},
+caption: [The house graph with max cut $S = {v_0, v_3}$ (blue) vs $overline(S) = {v_1, v_2, v_4}$ (white). Cut edges shown in bold blue; 5 of 6 edges are cut.],
+) <fig:house-maxcut>
 ]
-
 #problem-def("KColoring")[
   Given $G = (V, E)$ and $k$ colors, find $c: V -> {1, ..., k}$ minimizing $|{(u, v) in E : c(u) = c(v)}|$.
-  Deciding $k$-colorability is NP-complete for $k >= 3$ @garey1979. Best known: $O(n+m)$ for $k=2$ (equivalent to bipartiteness testing by BFS); $O^*(1.3289^n)$ for $k=3$ @beigel2005; $O^*(1.7159^n)$ for $k=4$ @wu2024; $O^*((2-epsilon)^n)$ for $k=5$, the first to break the $2^n$ barrier @zamir2021; $O^*(2^n)$ in general via inclusion-exclusion over independent sets @bjorklund2009.
-]
+][
+Graph coloring arises in register allocation, frequency assignment, and scheduling @garey1979. Deciding $k$-colorability is NP-complete for $k >= 3$ but solvable in $O(n+m)$ for $k=2$ via bipartiteness testing. For $k = 3$, the best known algorithm runs in $O^*(1.3289^n)$ @beigel2005; for $k = 4$ in $O^*(1.7159^n)$ @wu2024; for $k = 5$ in $O^*((2-epsilon)^n)$ @zamir2021. In general, inclusion-exclusion achieves $O^*(2^n)$ @bjorklund2009.
+
+*Example.* Consider the house graph $G$ with $k = 3$ colors. The coloring $c(v_0) = 1$, $c(v_1) = 2$, $c(v_2) = 2$, $c(v_3) = 1$, $c(v_4) = 3$ is proper: no adjacent pair shares a color, so the number of conflicts is 0. The house graph has chromatic number $chi(G) = 3$ because the triangle $(v_2, v_3, v_4)$ requires 3 colors.
 
+#figure({
+  let hg = house-graph()
+  draw-node-colors(hg.vertices, hg.edges, (0, 1, 1, 0, 2))
+},
+caption: [A proper 3-coloring of the house graph. Colors: $c(v_0) = c(v_3) = 1$ (blue), $c(v_1) = c(v_2) = 2$ (red), $c(v_4) = 3$ (teal). Zero conflicts.],
+) <fig:house-coloring>
+]
 #problem-def("MinimumDominatingSet")[
   Given $G = (V, E)$ with weights $w: V -> RR$, find $S subset.eq V$ minimizing $sum_(v in S) w(v)$ s.t. $forall v in V: v in S or exists u in S: (u, v) in E$.
-  Best known: $O^*(1.4969^n)$ via branch-and-reduce with measure and conquer @vanrooij2011. W[2]-complete when parameterized by solution size, making it strictly harder than Vertex Cover in the parameterized hierarchy.
-]
+][
+Dominating Set models facility location: each vertex in $S$ "covers" itself and its neighbors. Applications include wireless sensor placement and social network influence maximization. W[2]-complete when parameterized by solution size $k$, making it strictly harder than Vertex Cover in the parameterized hierarchy. The best known exact algorithm runs in $O^*(1.4969^n)$ via measure-and-conquer @vanrooij2011.
 
+*Example.* Consider the house graph $G$ with $n = 5$ vertices and unit weights $w(v) = 1$. The set $S = {v_2, v_3}$ is a minimum dominating set with $w(S) = 2$: vertex $v_2$ dominates ${v_0, v_4}$ and $v_3$ dominates ${v_1}$ (both also dominate each other). No single vertex can dominate all others, so $gamma(G) = 2$.
+
+#figure({
+  let hg = house-graph()
+  draw-node-highlight(hg.vertices, hg.edges, (2, 3))
+},
+caption: [The house graph with minimum dominating set $S = {v_2, v_3}$ (blue, $gamma(G) = 2$). Every white vertex is adjacent to at least one blue vertex.],
+) <fig:house-ds>
+]
 #problem-def("MaximumMatching")[
   Given $G = (V, E)$ with weights $w: E -> RR$, find $M subset.eq E$ maximizing $sum_(e in M) w(e)$ s.t. $forall e_1, e_2 in M: e_1 inter e_2 = emptyset$.
-  Solvable in polynomial time $O(n^3)$ by Edmonds' blossom algorithm @edmonds1965, which introduced the technique of shrinking odd cycles into pseudo-nodes. Unlike most combinatorial optimization problems on general graphs, maximum matching is not NP-hard.
+][
+Unlike most combinatorial optimization problems on general graphs, maximum matching is solvable in polynomial time $O(n^3)$ by Edmonds' blossom algorithm @edmonds1965, which introduced the technique of shrinking odd cycles into pseudo-nodes. Matching theory underpins assignment problems, network flows, and the Tutte-Berge formula for matching deficiency.
+
+*Example.* Consider the house graph $G$ with $n = 5$ vertices, $|E| = 6$ edges, and unit weights $w(e) = 1$. A maximum matching is $M = {(v_0, v_1), (v_2, v_4)}$ with $w(M) = 2$. Each matched edge is vertex-disjoint from the others. Vertex $v_3$ is unmatched; since $n$ is odd, no perfect matching exists.
+
+#figure({
+  let hg = house-graph()
+  draw-edge-highlight(hg.vertices, hg.edges, ((0, 1), (2, 4)), (0, 1, 2, 4))
+},
+caption: [The house graph with a maximum matching $M = {(v_0, v_1), (v_2, v_4)}$ (blue edges, $w(M) = 2$). Matched vertices shown in blue; $v_3$ is unmatched.],
+) <fig:house-matching>
 ]
 
 #problem-def("TravelingSalesman")[
   Given an undirected graph $G=(V,E)$ with edge weights $w: E -> RR$, find an edge set $C subset.eq E$ that forms a cycle visiting every vertex exactly once and minimizes $sum_(e in C) w(e)$.
-  Best known: $O^*(2^n)$ via Held-Karp dynamic programming @heldkarp1962, requiring $O^*(2^n)$ space. No $O^*((2-epsilon)^n)$ time algorithm is known.
+][
+One of the most intensely studied NP-hard problems, with applications in logistics, circuit board drilling, and DNA sequencing. The best known exact algorithm runs in $O^*(2^n)$ time and space via Held-Karp dynamic programming @heldkarp1962. No $O^*((2-epsilon)^n)$ algorithm is known, and improving the exponential space remains open.
+
+*Example.* Consider the complete graph $K_4$ with vertices ${v_0, v_1, v_2, v_3}$ and edge weights $w(v_0, v_1) = 1$, $w(v_1, v_2) = 2$, $w(v_2, v_3) = 1$, $w(v_0, v_3) = 2$, $w(v_0, v_2) = 3$, $w(v_1, v_3) = 3$. The optimal tour is $v_0 -> v_1 -> v_2 -> v_3 -> v_0$ with cost $1 + 2 + 1 + 2 = 6$. The tour using diagonals, $v_0 -> v_2 -> v_1 -> v_3 -> v_0$, costs $3 + 2 + 3 + 2 = 10$.
+
+#figure({
+  let verts = ((0, 0), (1.5, 0), (1.5, 1.5), (0, 1.5))
+  let all-edges = ((0,1),(1,2),(2,3),(0,3),(0,2),(1,3))
+  let tour = ((0,1),(1,2),(2,3),(0,3))
+  let weights = ("1", "2", "1", "2", "3", "3")
+  canvas(length: 1cm, {
+    for (idx, (u, v)) in all-edges.enumerate() {
+      let on-tour = tour.any(t => (t.at(0) == u and t.at(1) == v) or (t.at(0) == v and t.at(1) == u))
+      g-edge(verts.at(u), verts.at(v),
+        stroke: if on-tour { 2pt + graph-colors.at(0) } else { 1pt + luma(200) })
+      let mx = (verts.at(u).at(0) + verts.at(v).at(0)) / 2
+      let my = (verts.at(u).at(1) + verts.at(v).at(1)) / 2
+      // offset diagonal labels to avoid overlap
+      let dx = if u == 0 and v == 2 { -0.25 } else if u == 1 and v == 3 { 0.25 } else { 0 }
+      let dy = if u == 0 and v == 2 { 0.15 } else if u == 1 and v == 3 { 0.15 } else { 0 }
+      draw.content((mx + dx, my + dy), text(7pt, fill: luma(80))[#weights.at(idx)])
+    }
+    for (k, pos) in verts.enumerate() {
+      g-node(pos, name: "v" + str(k),
+        fill: graph-colors.at(0),
+        label: text(fill: white)[$v_#k$])
+    }
+  })
+},
+caption: [Complete graph $K_4$ with weighted edges. The optimal tour $v_0 -> v_1 -> v_2 -> v_3 -> v_0$ (blue edges) has cost 6.],
+) <fig:k4-tsp>
 ]
-
 #problem-def("MaximumClique")[
   Given $G = (V, E)$, find $K subset.eq V$ maximizing $|K|$ such that all pairs in $K$ are adjacent: $forall u, v in K: (u, v) in E$. Equivalent to MIS on the complement graph $overline(G)$.
-  Best known: $O^*(1.1996^n)$ via complement reduction to MIS @xiao2017. Robson's direct algorithm achieves $O^*(1.2109^n)$ @robson1986 using exponential space.
-]
+][
+Maximum Clique arises in social network analysis (finding tightly-connected communities), bioinformatics (protein interaction clusters), and coding theory. The problem is equivalent to Maximum Independent Set on the complement graph $overline(G)$. The best known algorithm runs in $O^*(1.1996^n)$ via the complement reduction to MIS @xiao2017. Robson's direct backtracking algorithm achieves $O^*(1.1888^n)$ using exponential space @robson2001.
+
+*Example.* Consider the house graph $G$ with $n = 5$ vertices and $|E| = 6$ edges. The triangle $K = {v_2, v_3, v_4}$ is a maximum clique of size $omega(G) = 3$: all three pairs $(v_2, v_3)$, $(v_2, v_4)$, $(v_3, v_4)$ are edges. No 4-clique exists because vertices $v_0$ and $v_1$ each have degree 2 and are not adjacent to all of ${v_2, v_3, v_4}$.
 
+#figure({
+  let hg = house-graph()
+  draw-edge-highlight(hg.vertices, hg.edges, ((2,3), (2,4), (3,4)), (2, 3, 4))
+},
+caption: [The house graph with maximum clique $K = {v_2, v_3, v_4}$ (blue, $omega(G) = 3$). All edges within the clique are shown in bold blue.],
+) <fig:house-clique>
+]
 #problem-def("MaximalIS")[
   Given $G = (V, E)$ with vertex weights $w: V -> RR$, find $S subset.eq V$ maximizing $sum_(v in S) w(v)$ such that $S$ is independent ($forall u, v in S: (u, v) in.not E$) and maximal (no vertex $u in V backslash S$ can be added to $S$ while maintaining independence).
-  Best known: $O^*(3^(n slash 3))$ for enumerating all maximal independent sets @tomita2006. This bound is tight: Moon and Moser @moonmoser1965 showed that every $n$-vertex graph has at most $3^(n slash 3)$ maximal independent sets, achieved by disjoint triangles.
+][
+The maximality constraint (no vertex can be added) distinguishes this from MIS, which only requires maximum weight. Every maximum independent set is maximal, but not vice versa. The enumeration bound of $O^*(3^(n slash 3))$ for listing all maximal independent sets @tomita2006 is tight: Moon and Moser @moonmoser1965 showed every $n$-vertex graph has at most $3^(n slash 3)$ maximal independent sets, achieved by disjoint triangles.
+
+*Example.* Consider the path graph $P_5$ with $n = 5$ vertices, edges $(v_i, v_(i+1))$ for $i = 0, ..., 3$, and unit weights $w(v) = 1$. The set $S = {v_1, v_3}$ is a maximal independent set: no two vertices in $S$ are adjacent, and neither $v_0$ (adjacent to $v_1$), $v_2$ (adjacent to both), nor $v_4$ (adjacent to $v_3$) can be added. However, $S' = {v_0, v_2, v_4}$ with $w(S') = 3$ is a strictly larger maximal IS, illustrating that maximality does not imply maximum weight.
+
+#figure({
+  draw-node-highlight(((0, 0), (1, 0), (2, 0), (3, 0), (4, 0)), ((0,1),(1,2),(2,3),(3,4)), (1, 3))
+},
+caption: [Path $P_5$ with maximal IS $S = {v_1, v_3}$ (blue, $w(S) = 2$). $S$ is maximal — no white vertex can be added — but not maximum: ${v_0, v_2, v_4}$ achieves $w = 3$.],
+) <fig:path-maximal-is>
 ]
 
 
@@ -376,68 +486,319 @@ In all graph problems below, $G = (V, E)$ denotes an undirected graph with $|V|
 
 #problem-def("MaximumSetPacking")[
   Given universe $U$, collection $cal(S) = {S_1, ..., S_m}$ with $S_i subset.eq U$, weights $w: cal(S) -> RR$, find $cal(P) subset.eq cal(S)$ maximizing $sum_(S in cal(P)) w(S)$ s.t. $forall S_i, S_j in cal(P): S_i inter S_j = emptyset$.
-  One of Karp's 21 NP-complete problems @karp1972. Generalizes maximum matching (the special case where all sets have size 2, solvable in polynomial time). The optimization version is as hard to approximate as maximum clique. Best known: $O^*(2^m)$.
+][
+One of Karp's 21 NP-complete problems @karp1972. Generalizes maximum matching (the special case where all sets have size 2, solvable in polynomial time). Applications include resource allocation, VLSI design, and frequency assignment. The optimization version is as hard to approximate as maximum clique. The best known exact algorithm runs in $O^*(2^m)$ by brute-force enumeration over the $m$ sets#footnote[No algorithm improving on brute-force enumeration is known for general weighted set packing.].
+
+*Example.* Let $U = {1, 2, 3, 4, 5}$ and $cal(S) = {S_1, S_2, S_3, S_4}$ with $S_1 = {1, 2}$, $S_2 = {2, 3}$, $S_3 = {3, 4}$, $S_4 = {4, 5}$, and unit weights $w(S_i) = 1$. A maximum packing is $cal(P) = {S_1, S_3}$ with $w(cal(P)) = 2$: $S_1 inter S_3 = emptyset$. Adding $S_2$ would conflict with both ($S_1 inter S_2 = {2}$, $S_2 inter S_3 = {3}$), and $S_4$ conflicts with $S_3$ ($S_3 inter S_4 = {4}$). The alternative packing ${S_2, S_4}$ also achieves weight 2.
+
+#figure(
+  canvas(length: 1cm, {
+    // Element positions along a line
+    let elems = ((0, 0), (1, 0), (2, 0), (3, 0), (4, 0))
+    // Set regions: S1={1,2}, S2={2,3}, S3={3,4}, S4={4,5}
+    // Selected packing {S1, S3} in blue, others in gray
+    sregion(((0, 0), (1, 0)), label: [$S_1$], ..sregion-selected)
+    sregion(((1, 0), (2, 0)), label: [$S_2$], ..sregion-dimmed)
+    sregion(((2, 0), (3, 0)), label: [$S_3$], ..sregion-selected)
+    sregion(((3, 0), (4, 0)), label: [$S_4$], ..sregion-dimmed)
+    // Elements
+    for (k, pos) in elems.enumerate() {
+      selem(pos, label: [#(k + 1)], fill: black)
+    }
+  }),
+  caption: [Maximum set packing: $cal(P) = {S_1, S_3}$ (blue) are disjoint; $S_2, S_4$ (gray) conflict with the packing.],
+) <fig:set-packing>
 ]
 
 #problem-def("MinimumSetCovering")[
   Given universe $U$, collection $cal(S)$ with weights $w: cal(S) -> RR$, find $cal(C) subset.eq cal(S)$ minimizing $sum_(S in cal(C)) w(S)$ s.t. $union.big_(S in cal(C)) S = U$.
-  Best known: $O^*(2^m)$. The greedy algorithm achieves an $O(ln n)$-approximation where $n = |U|$, which is essentially optimal: cannot be approximated within $(1-o(1)) ln n$ unless P = NP.
+][
+One of Karp's 21 NP-complete problems @karp1972. Arises in facility location, crew scheduling, and test suite minimization. The greedy algorithm achieves an $O(ln n)$-approximation where $n = |U|$, which is essentially optimal: cannot be approximated within $(1-o(1)) ln n$ unless P = NP. The best known exact algorithm runs in $O^*(2^m)$ by brute-force enumeration over the $m$ sets#footnote[No algorithm improving on brute-force enumeration is known for general weighted set covering.].
+
+*Example.* Let $U = {1, 2, 3, 4, 5}$ and $cal(S) = {S_1, S_2, S_3}$ with $S_1 = {1, 2, 3}$, $S_2 = {2, 4}$, $S_3 = {3, 4, 5}$, and unit weights $w(S_i) = 1$. A minimum cover is $cal(C) = {S_1, S_3}$ with $w(cal(C)) = 2$: $S_1 union S_3 = {1, 2, 3, 4, 5} = U$. No single set covers all of $U$, so at least two sets are required.
+
+#figure(
+  canvas(length: 1cm, {
+    // 2D layout: S1={1,2,3} left, S3={3,4,5} right, S2={2,4} bridging bottom
+    let elems = (
+      (-1.2, 0.4),   // 1: only S1
+      (-0.5, -0.4),  // 2: S1 ∩ S2
+      (0.3, 0.4),    // 3: S1 ∩ S3
+      (1.0, -0.4),   // 4: S2 ∩ S3
+      (1.7, 0.4),    // 5: only S3
+    )
+    // Set regions: S1={1,2,3}, S2={2,4}, S3={3,4,5}
+    sregion((elems.at(0), elems.at(1), elems.at(2)), pad: 0.4, label: [$S_1$], ..sregion-selected)
+    sregion((elems.at(1), elems.at(3)), pad: 0.35, label: [$S_2$], ..sregion-dimmed)
+    sregion((elems.at(2), elems.at(3), elems.at(4)), pad: 0.4, label: [$S_3$], ..sregion-selected)
+    // Elements
+    for (k, pos) in elems.enumerate() {
+      selem(pos, label: [#(k + 1)], fill: black)
+    }
+  }),
+  caption: [Minimum set covering: $cal(C) = {S_1, S_3}$ (blue) cover all of $U$; $S_2$ (gray) is redundant.],
+) <fig:set-covering>
 ]
 
 == Optimization Problems
 
 #problem-def("SpinGlass")[
   Given $n$ spin variables $s_i in {-1, +1}$, pairwise couplings $J_(i j) in RR$, and external fields $h_i in RR$, minimize the Hamiltonian (energy function): $H(bold(s)) = -sum_((i,j)) J_(i j) s_i s_j - sum_i h_i s_i$.
-  NP-hard on general graphs @barahona1982; best known $O^*(2^n)$. On planar graphs without external field ($h_i = 0$), solvable in polynomial time via reduction to minimum-weight perfect matching. Central to statistical physics and quantum computing.
+][
+The Ising spin glass is the canonical model in statistical mechanics for disordered magnetic systems @barahona1982. Ground-state computation is NP-hard on general interaction graphs but polynomial-time solvable on planar graphs without external field ($h_i = 0$) via reduction to minimum-weight perfect matching. Central to quantum annealing, where hardware natively encodes spin Hamiltonians. The best known general algorithm runs in $O^*(2^n)$ by brute-force enumeration#footnote[On general interaction graphs, no algorithm improving on brute-force enumeration is known.].
+
+*Example.* Consider $n = 5$ spins on a triangular lattice with uniform antiferromagnetic couplings $J_(i j) = -1$ for all edges and no external field ($h_i = 0$). The Hamiltonian simplifies to $H(bold(s)) = sum_((i,j)) s_i s_j$, which counts parallel pairs minus antiparallel pairs. The lattice contains 7 edges and 3 triangular faces; since each triangle cannot have all three pairs antiparallel, frustration is unavoidable. A ground state is $bold(s) = (+, -, +, +, -)$ achieving $H = -3$: five edges are satisfied (antiparallel) and two are frustrated (parallel). No configuration can satisfy more than 5 of 7 edges.
+
+#figure(
+  canvas(length: 1cm, {
+    let h = calc.sqrt(3) / 2
+    let pos = ((0, h), (1, h), (2, h), (0.5, 0), (1.5, 0))
+    let edges = ((0,1), (1,2), (3,4), (0,3), (1,3), (1,4), (2,4))
+    let spins = (1, -1, 1, 1, -1)
+    // Draw edges: black solid = satisfied, dashed gray = frustrated
+    for (u, v) in edges {
+      let sat = spins.at(u) * spins.at(v) < 0
+      g-edge(pos.at(u), pos.at(v),
+        stroke: if sat { 1pt + black } else { (paint: rgb("#cc4444"), thickness: 1.2pt, dash: "dashed") })
+    }
+    // Draw spins: blue = +1, red = −1
+    for (k, p) in pos.enumerate() {
+      let up = spins.at(k) > 0
+      g-node(p, name: "s" + str(k), radius: 0.22,
+        fill: if up { graph-colors.at(0) } else { graph-colors.at(1) },
+        label: text(fill: white, if up { $+$ } else { $-$ }))
+    }
+  }),
+  caption: [Triangular lattice with $n = 5$ spins and antiferromagnetic couplings ($J = -1$). Ground state $bold(s) = (+, -, +, +, -)$ with $H = -3$. Solid edges: satisfied (antiparallel); dashed red: frustrated (parallel).],
+) <fig:spin-glass>
 ]
 
 #problem-def("QUBO")[
   Given $n$ binary variables $x_i in {0, 1}$, upper-triangular matrix $Q in RR^(n times n)$, minimize $f(bold(x)) = sum_(i=1)^n Q_(i i) x_i + sum_(i < j) Q_(i j) x_i x_j$ (using $x_i^2 = x_i$ for binary variables).
-  Equivalent to the Ising model via the linear substitution $s_i = 2x_i - 1$. The native formulation for quantum annealing hardware and a standard target for penalty-method reductions @glover2019. Best known: $O^*(2^n)$.
+][
+Equivalent to the Ising model via the linear substitution $s_i = 2x_i - 1$. The native formulation for quantum annealing hardware (e.g., D-Wave) and a standard target for penalty-method reductions @glover2019. QUBO unifies many combinatorial problems into a single unconstrained binary framework, making it a universal intermediate representation for quantum and classical optimization. The best known general algorithm runs in $O^*(2^n)$ by brute-force enumeration#footnote[QUBO inherits the Ising model's complexity; no algorithm improving on brute-force is known for the general case.].
+
+*Example.* Consider $n = 3$ with $Q = mat(-1, 2, 0; 0, -1, 2; 0, 0, -1)$. The objective is $f(bold(x)) = -x_1 - x_2 - x_3 + 2x_1 x_2 + 2x_2 x_3$. Evaluating all $2^3$ assignments: $f(0,0,0) = 0$, $f(1,0,0) = -1$, $f(0,1,0) = -1$, $f(0,0,1) = -1$, $f(1,1,0) = 0$, $f(0,1,1) = 0$, $f(1,0,1) = -2$, $f(1,1,1) = 1$. The minimum is $f^* = -2$ at $bold(x)^* = (1, 0, 1)$: selecting $x_1$ and $x_3$ avoids the penalty terms $2x_1 x_2$ and $2x_2 x_3$.
 ]
 
 #problem-def("ILP")[
   Given $n$ integer variables $bold(x) in ZZ^n$, constraint matrix $A in RR^(m times n)$, bounds $bold(b) in RR^m$, and objective $bold(c) in RR^n$, find $bold(x)$ minimizing $bold(c)^top bold(x)$ subject to $A bold(x) <= bold(b)$ and variable bounds.
-  Best known: $O^*(n^n)$ @dadush2012. When the number of integer variables $n$ is fixed, solvable in polynomial time by Lenstra's algorithm @lenstra1983 using the geometry of numbers, making ILP fixed-parameter tractable in $n$.
+][
+Integer Linear Programming is a universal modeling framework: virtually every NP-hard combinatorial optimization problem admits an ILP formulation. Relaxing integrality to $bold(x) in RR^n$ yields a linear program solvable in polynomial time, forming the basis of branch-and-bound solvers. When the number of integer variables $n$ is fixed, ILP is solvable in polynomial time by Lenstra's algorithm @lenstra1983 using the geometry of numbers, making it fixed-parameter tractable in $n$. The best known general algorithm achieves $O^*(n^n)$ via an FPT algorithm based on lattice techniques @dadush2012.
+
+*Example.* Minimize $bold(c)^top bold(x) = -5x_1 - 6x_2$ subject to $x_1 + x_2 <= 5$, $4x_1 + 7x_2 <= 28$, $x_1, x_2 >= 0$, $bold(x) in ZZ^2$. The LP relaxation optimum is $p_1 = (7 slash 3, 8 slash 3) approx (2.33, 2.67)$ with value $approx -27.67$, which is non-integral. Branch-and-bound yields the ILP optimum $bold(x)^* = (3, 2)$ with $bold(c)^top bold(x)^* = -27$.
+
+#figure(
+  canvas(length: 0.8cm, {
+    // Axes
+    draw.line((-0.3, 0), (5.5, 0), mark: (end: "straight"), stroke: 0.6pt)
+    draw.line((0, -0.3), (0, 4.8), mark: (end: "straight"), stroke: 0.6pt)
+    draw.content((5.7, -0.15), text(8pt)[$x_1$])
+    draw.content((-0.15, 5.0), text(8pt)[$x_2$])
+    // Tick marks
+    for i in range(1, 6) {
+      draw.line((i, -0.08), (i, 0.08), stroke: 0.4pt)
+      draw.content((i, -0.35), text(6pt)[#i])
+    }
+    for i in range(1, 5) {
+      draw.line((-0.08, i), (0.08, i), stroke: 0.4pt)
+      draw.content((-0.35, i), text(6pt)[#i])
+    }
+    // Feasible region polygon: (0,0) → (5,0) → (7/3, 8/3) → (0, 4)
+    draw.line((0,0), (5,0), (7/3, 8/3), (0, 4), close: true,
+      fill: green.lighten(70%), stroke: none)
+    // Constraint lines (extending beyond feasible region)
+    draw.line((0, 5), (5, 0), stroke: graph-colors.at(0))  // x1 + x2 = 5
+    draw.line((0, 4), (5.25, 1), stroke: orange)            // 4x1 + 7x2 = 28
+    // Objective function level curve (dashed): -5x1 - 6x2 = -23, i.e. x2 = (23 - 5x1)/6
+    draw.line((0, 23/6), (23/5, 0), stroke: (paint: luma(80), dash: "dashed"))
+    // Gradient direction arrow
+    draw.line((1.5, 2.5), (1.1, 1.9), mark: (end: "straight"), stroke: 1pt + luma(80))
+    draw.content((0.7, 1.75), text(6pt, fill: luma(80))[$bold(c)$])
+    // Constraint labels
+    draw.content((4.3, 1.0), text(6pt, fill: graph-colors.at(0))[$x_1 + x_2 = 5$], anchor: "west")
+    draw.content((4.5, 1.7), text(6pt, fill: orange)[$4x_1 + 7x_2 = 28$], anchor: "west")
+    draw.content((1.2, 4.3), text(6pt, fill: luma(80))[objective], anchor: "south")
+    // Integer lattice points (hollow circles)
+    for x1 in range(6) {
+      for x2 in range(5) {
+        draw.circle((x1, x2), radius: 0.06, fill: none, stroke: 0.4pt + luma(120))
+      }
+    }
+    // LP optimum (fractional, non-integer)
+    draw.circle((7/3, 8/3), radius: 0.1, fill: graph-colors.at(1), stroke: none)
+    draw.content((7/3 + 0.3, 8/3 + 0.3), text(7pt)[$p_1$])
+    // ILP optimum (integer)
+    draw.circle((3, 2), radius: 0.1, fill: graph-colors.at(1), stroke: none)
+    draw.content((3.3, 2.3), text(7pt)[$bold(x)^*$])
+  }),
+  caption: [ILP feasible region (green) with constraints $x_1 + x_2 <= 5$ (blue) and $4x_1 + 7x_2 <= 28$ (orange). Hollow circles mark the integer lattice. The LP relaxation optimum $p_1 = (7 slash 3, 8 slash 3)$ is non-integral; the ILP optimum $bold(x)^* = (3, 2)$ gives $bold(c)^top bold(x)^* = -27$.],
+) <fig:ilp-example>
 ]
 
 == Satisfiability Problems
 
 #problem-def("Satisfiability")[
   Given a CNF formula $phi = and.big_(j=1)^m C_j$ with $m$ clauses over $n$ Boolean variables, where each clause $C_j = or.big_i ell_(j i)$ is a disjunction of literals, find an assignment $bold(x) in {0, 1}^n$ such that $phi(bold(x)) = 1$ (all clauses satisfied).
-  Best known: $O^*(2^n)$. The Strong Exponential Time Hypothesis (SETH) @impagliazzo2001 conjectures that no $O^*((2-epsilon)^n)$ algorithm exists for general CNF-SAT. Despite this worst-case hardness, conflict-driven clause learning (CDCL) solvers handle large practical instances efficiently.
+][
+The Boolean Satisfiability Problem (SAT) is the first problem proven NP-complete @cook1971. SAT serves as the foundation of NP-completeness theory: showing a new problem NP-hard typically proceeds by reduction from SAT or one of its variants. Despite worst-case hardness, conflict-driven clause learning (CDCL) solvers handle industrial instances with millions of variables. The Strong Exponential Time Hypothesis (SETH) @impagliazzo2001 conjectures that no $O^*((2-epsilon)^n)$ algorithm exists for general CNF-SAT, and the best known algorithm runs in $O^*(2^n)$ by brute-force enumeration#footnote[SETH conjectures this is optimal; no $O^*((2-epsilon)^n)$ algorithm is known.].
+
+*Example.* Consider $phi = (x_1 or x_2) and (not x_1 or x_3) and (not x_2 or not x_3)$ with $n = 3$ variables and $m = 3$ clauses. The assignment $(x_1, x_2, x_3) = (1, 0, 1)$ satisfies all clauses: $C_1 = (1 or 0) = 1$, $C_2 = (0 or 1) = 1$, $C_3 = (1 or 0) = 1$. Hence $phi(1, 0, 1) = 1$.
 ]
 
 #problem-def("KSatisfiability")[
   SAT with exactly $k$ literals per clause.
-  $O(n+m)$ for $k=2$ via implication graph SCC decomposition @aspvall1979. $O^*(1.307^n)$ for $k=3$ via biased-PPSZ @hansen2019. Under SETH, $k$-SAT requires time $O^*(c_k^n)$ with $c_k -> 2$ as $k -> infinity$.
+][
+The restriction of SAT to exactly $k$ literals per clause reveals a sharp complexity transition: 2-SAT is polynomial-time solvable via implication graph SCC decomposition @aspvall1979 in $O(n+m)$, while $k$-SAT for $k >= 3$ is NP-complete. Random $k$-SAT exhibits a satisfiability threshold at clause density $m slash n approx 2^k ln 2$, a key phenomenon in computational phase transitions. The best known algorithm for 3-SAT runs in $O^*(1.307^n)$ via biased-PPSZ @hansen2019. Under SETH, $k$-SAT requires time $O^*(c_k^n)$ with $c_k -> 2$ as $k -> infinity$.
+
+*Example.* Consider the 3-SAT formula $phi = (x_1 or x_2 or x_3) and (not x_1 or not x_2 or x_3) and (x_1 or not x_2 or not x_3)$ with $n = 3$ variables and $m = 3$ clauses, each containing exactly 3 literals. The assignment $(x_1, x_2, x_3) = (1, 0, 1)$ satisfies all clauses: $C_1 = (1 or 0 or 1) = 1$, $C_2 = (0 or 1 or 1) = 1$, $C_3 = (1 or 1 or 0) = 1$.
 ]
 
 #problem-def("CircuitSAT")[
   Given a Boolean circuit $C$ composed of logic gates (AND, OR, NOT, XOR) with $n$ input variables, find an input assignment $bold(x) in {0,1}^n$ such that $C(bold(x)) = 1$.
-  NP-complete by the Cook-Levin theorem @cook1971, which established NP-completeness by showing any NP computation can be expressed as a boolean circuit. Reducible to CNF-SAT via the Tseitin transformation. Best known: $O^*(2^n)$.
+][
+Circuit Satisfiability is the most natural NP-complete problem: the Cook-Levin theorem @cook1971 proves NP-completeness by showing any nondeterministic polynomial-time computation can be encoded as a Boolean circuit. CircuitSAT is strictly more succinct than CNF-SAT, since a circuit with $g$ gates may require an exponentially larger CNF formula without auxiliary variables. The Tseitin transformation reduces CircuitSAT to CNF-SAT with only $O(g)$ clauses by introducing one auxiliary variable per gate. The best known algorithm runs in $O^*(2^n)$ by brute-force enumeration#footnote[No algorithm improving on brute-force is known for general circuits.].
+
+*Example.* Consider the circuit $C(x_1, x_2) = (x_1 "AND" x_2) "XOR" (x_1 "OR" x_2)$ with $n = 2$ inputs. Evaluating: $C(0,0) = (0) "XOR" (0) = 0$, $C(0,1) = (0) "XOR" (1) = 1$, $C(1,0) = (0) "XOR" (1) = 1$, $C(1,1) = (1) "XOR" (1) = 0$. The satisfying assignments are $(0, 1)$ and $(1, 0)$ -- precisely the inputs where exactly one variable is true.
+
+#figure(
+  canvas(length: 1cm, {
+    // Gate positions: AND/OR vertically stacked, XOR to the right
+    // With inputs=2, w=0.8: h = max(0.5, 0.7) = 0.7, ports at ±0.175 from center
+    gate-and((2, 0.8), name: "and")
+    gate-or((2, -0.8), name: "or")
+    gate-xor((4.5, 0), name: "xor")
+    // AND → XOR, OR → XOR (right-angle routing)
+    draw.line("and.out", (3.5, 0.8), (3.5, 0.175), "xor.in0")
+    draw.line("or.out", (3.5, -0.8), (3.5, -0.175), "xor.in1")
+    // Output wire and label
+    draw.line("xor.out", (5.5, 0), mark: (end: ">"))
+    draw.content((5.8, 0), text(8pt)[$C$])
+    // x1 fork: to and.in0 (y = 0.975) and or.in0 (y = −0.625)
+    draw.line((0, 0.975), (0.8, 0.975), "and.in0")
+    draw.line((0.8, 0.975), (0.8, -0.625), "or.in0")
+    draw.circle((0.8, 0.975), radius: 0.04, fill: black, stroke: none)
+    // x2 fork: to or.in1 (y = −0.975) and and.in1 (y = 0.625)
+    draw.line((0, -0.975), (0.5, -0.975), "or.in1")
+    draw.line((0.5, -0.975), (0.5, 0.625), "and.in1")
+    draw.circle((0.5, -0.975), radius: 0.04, fill: black, stroke: none)
+    // Input labels
+    draw.content((-0.3, 0.975), text(8pt)[$x_1$])
+    draw.content((-0.3, -0.975), text(8pt)[$x_2$])
+  }),
+  caption: [Circuit $C(x_1, x_2) = (x_1 and x_2) xor (x_1 or x_2)$. Junction dots mark where inputs fork to both gates. Satisfying assignments: $(0,1)$ and $(1,0)$.],
+) <fig:circuit-sat>
 ]
 
 #problem-def("Factoring")[
   Given a composite integer $N$ and bit sizes $m, n$, find integers $p in [2, 2^m - 1]$ and $q in [2, 2^n - 1]$ such that $p times q = N$. Here $p$ has $m$ bits and $q$ has $n$ bits.
-  Sub-exponential classically: $e^(O(b^(1 slash 3)(log b)^(2 slash 3)))$ via the General Number Field Sieve @lenstra1993, where $b$ is the bit length. Solvable in polynomial time on a quantum computer by Shor's algorithm @shor1994. Not known to be NP-complete; factoring lies in NP $inter$ co-NP.
+][
+The hardness of integer factorization underpins RSA cryptography and other public-key systems. Unlike most problems in this collection, Factoring is not known to be NP-complete; it lies in NP $inter$ co-NP, suggesting it may be of intermediate complexity. The best classical algorithm is the General Number Field Sieve @lenstra1993 running in sub-exponential time $e^(O(b^(1 slash 3)(log b)^(2 slash 3)))$ where $b$ is the bit length. Shor's algorithm @shor1994 solves Factoring in polynomial time on a quantum computer.
+
+*Example.* Let $N = 15$ with $m = 2$ bits and $n = 3$ bits, so $p in [2, 3]$ and $q in [2, 7]$. The solution is $p = 3$, $q = 5$, since $3 times 5 = 15 = N$. Note $p = 3$ fits in 2 bits and $q = 5$ fits in 3 bits. The alternative factorization $5 times 3$ requires $m = 3$, $n = 2$.
 ]
 
 == Specialized Problems
 
 #problem-def("BMF")[
   Given an $m times n$ boolean matrix $A$ and rank $k$, find boolean matrices $B in {0,1}^(m times k)$ and $C in {0,1}^(k times n)$ minimizing the Hamming distance $d_H (A, B circle.tiny C)$, where the boolean product $(B circle.tiny C)_(i j) = or.big_ell (B_(i ell) and C_(ell j))$.
-  NP-hard, even to approximate. Arises in data mining, text mining, and recommender systems. Best known: $O^*(2^n)$; practical algorithms use greedy rank-1 extraction.
+][
+Boolean Matrix Factorization decomposes binary data into interpretable boolean factors, unlike real-valued SVD which loses the discrete structure. NP-hard even to approximate, BMF arises in data mining, text classification, and role-based access control where factors correspond to latent binary features. Practical algorithms use greedy rank-1 extraction or alternating fixed-point methods. The best known exact algorithm runs in $O^*(2^(m k + k n))$ by brute-force search over $B$ and $C$#footnote[No algorithm improving on brute-force enumeration is known for general BMF.].
+
+*Example.* Let $A = mat(1, 1, 0; 1, 1, 1; 0, 1, 1)$ and $k = 2$. Set $B = mat(1, 0; 1, 1; 0, 1)$ and $C = mat(1, 1, 0; 0, 1, 1)$. Then $B circle.tiny C = mat(1, 1, 0; 1, 1, 1; 0, 1, 1) = A$, achieving Hamming distance $d_H = 0$ (exact factorization). The two boolean factors capture overlapping row/column patterns: factor 1 selects rows ${1, 2}$ and columns ${1, 2}$; factor 2 selects rows ${2, 3}$ and columns ${2, 3}$.
+
+#figure(
+  {
+    let cell(val, x, y, color) = {
+      let f = if val == 1 { color.transparentize(30%) } else { white }
+      box(width: 0.45cm, height: 0.45cm, fill: f, stroke: 0.4pt + luma(180),
+        align(center + horizon, text(7pt, if val == 1 { [1] } else { [0] })))
+    }
+    let mat-grid(data, color) = {
+      grid(columns: data.at(0).len(), column-gutter: 0pt, row-gutter: 0pt,
+        ..data.flatten().enumerate().map(((i, v)) => {
+          cell(v, calc.rem(i, data.at(0).len()), int(i / data.at(0).len()), color)
+        }))
+    }
+    let A = ((1,1,0),(1,1,1),(0,1,1))
+    let B = ((1,0),(1,1),(0,1))
+    let C = ((1,1,0),(0,1,1))
+    set text(8pt)
+    align(center, stack(dir: ltr, spacing: 0.3cm,
+      [$A =$], mat-grid(A, graph-colors.at(0)),
+      [$= B circle.tiny C =$],
+      mat-grid(B, graph-colors.at(1)),
+      [$circle.tiny$],
+      mat-grid(C, rgb("#76b7b2")),
+    ))
+  },
+  caption: [Boolean matrix factorization: $A = B circle.tiny C$ with rank $k = 2$. Factor 1 (red) covers the top-left block; factor 2 (teal) covers the bottom-right block.],
+) <fig:bmf>
 ]
 
 #problem-def("PaintShop")[
   Given a sequence of $2n$ positions where each of $n$ cars appears exactly twice, assign a binary color to each car (each car's two occurrences receive opposite colors) to minimize the number of color changes between consecutive positions.
-  NP-hard and APX-hard @epping2004. Arises in automotive manufacturing where color changes require setup time and increase paint waste. A natural benchmark for quantum annealing. Best known: $O^*(2^n)$.
+][
+NP-hard and APX-hard @epping2004. Arises in automotive manufacturing where color changes between consecutive cars on an assembly line require costly purging of paint nozzles. Each car appears twice in the sequence (two coats), and each car's two occurrences must receive opposite colors (one per side). A natural benchmark for quantum annealing due to its binary structure and industrial relevance. The best known algorithm runs in $O^*(2^n)$ by brute-force enumeration#footnote[No algorithm improving on brute-force is known for general Paint Shop.].
+
+*Example.* Consider $n = 3$ cars with sequence $(A, B, A, C, B, C)$. Each car gets one occurrence colored 0 and the other colored 1. The assignment $A: 0\/1$, $B: 0\/1$, $C: 1\/0$ yields color sequence $(0, 0, 1, 1, 1, 0)$ with 2 color changes (positions $2 -> 3$ and $5 -> 6$). The alternative $A: 1\/0$, $B: 0\/1$, $C: 0\/1$ yields $(1, 0, 0, 0, 1, 1)$ with 2 changes. The minimum is 2 changes.
+
+#figure(
+  {
+    let cars = ("A", "B", "A", "C", "B", "C")
+    let colors = (0, 0, 1, 1, 1, 0)  // optimal assignment
+    let blue = graph-colors.at(0)
+    let red = graph-colors.at(1)
+    align(center, stack(dir: ltr, spacing: 0pt,
+      ..cars.zip(colors).enumerate().map(((i, (car, c))) => {
+        let fill = if c == 0 { white } else { blue.transparentize(40%) }
+        let change = if i > 0 and colors.at(i) != colors.at(i - 1) {
+          place(dx: -0.08cm, dy: 0.55cm, text(6pt, fill: red, weight: "bold")[×])
+        }
+        stack(dir: ttb, spacing: 0.08cm,
+          box(width: 0.55cm, height: 0.55cm, fill: fill, stroke: 0.5pt + luma(120),
+            align(center + horizon, text(8pt, weight: "bold", car))),
+          text(6pt, fill: luma(100), str(c)),
+          change,
+        )
+      })))
+  },
+  caption: [Paint Shop: sequence $(A, B, A, C, B, C)$ with optimal coloring. White = color 0, blue = color 1. Two color changes (marked ×) at positions $2 -> 3$ and $5 -> 6$.],
+) <fig:paintshop>
 ]
 
 #problem-def("BicliqueCover")[
   Given a bipartite graph $G = (L, R, E)$ and integer $k$, find $k$ bicliques $(L_1, R_1), dots, (L_k, R_k)$ that cover all edges ($E subset.eq union.big_i L_i times R_i$) while minimizing the total size $sum_i (|L_i| + |R_i|)$.
-  NP-hard; connected to the Boolean rank of binary matrices and nondeterministic communication complexity. Best known: $O^*(2^n)$.
+][
+Biclique Cover is equivalent to factoring the biadjacency matrix $M$ of the bipartite graph as a Boolean sum of rank-1 binary matrices, connecting it to Boolean matrix rank and nondeterministic communication complexity. Applications include data compression, database optimization (covering queries with materialized views), and bioinformatics (gene expression biclustering). NP-hard even for fixed $k >= 2$. The best known algorithm runs in $O^*(2^(|L| + |R|))$ by brute-force enumeration#footnote[No algorithm improving on brute-force enumeration is known for general Biclique Cover.].
+
+*Example.* Consider $G = (L, R, E)$ with $L = {ell_1, ell_2}$, $R = {r_1, r_2, r_3}$, and edges $E = {(ell_1, r_1), (ell_1, r_2), (ell_2, r_2), (ell_2, r_3)}$. A biclique cover with $k = 2$: $(L_1, R_1) = ({ell_1}, {r_1, r_2})$ covering edges ${(ell_1, r_1), (ell_1, r_2)}$, and $(L_2, R_2) = ({ell_2}, {r_2, r_3})$ covering ${(ell_2, r_2), (ell_2, r_3)}$. Total size $= (1+2) + (1+2) = 6$. Merging into a single biclique is impossible since $(ell_1, r_3) in.not E$.
+
+#figure(
+  canvas(length: 1cm, {
+    // Bipartite layout: L on left, R on right
+    let lpos = ((0, 1), (0, 0))           // l1, l2
+    let rpos = ((2.5, 1.5), (2.5, 0.5), (2.5, -0.5))  // r1, r2, r3
+    let edges = ((0, 0), (0, 1), (1, 1), (1, 2))  // (li, rj) pairs
+    // Biclique 1: l1-{r1,r2} in blue; Biclique 2: l2-{r2,r3} in teal
+    let bc1 = ((0,0), (0,1))
+    let bc2 = ((1,1), (1,2))
+    for (li, rj) in edges {
+      let is-bc1 = bc1.contains((li, rj))
+      let c = if is-bc1 { graph-colors.at(0) } else { rgb("#76b7b2") }
+      g-edge(lpos.at(li), rpos.at(rj), stroke: 1.5pt + c)
+    }
+    // L nodes
+    for (k, p) in lpos.enumerate() {
+      g-node(p, name: "l" + str(k), fill: luma(240), label: $ell_#(k+1)$)
+    }
+    // R nodes
+    for (k, p) in rpos.enumerate() {
+      g-node(p, name: "r" + str(k), fill: luma(240), label: $r_#(k+1)$)
+    }
+  }),
+  caption: [Biclique cover of a bipartite graph: biclique 1 (blue) $= ({ell_1}, {r_1, r_2})$, biclique 2 (teal) $= ({ell_2}, {r_2, r_3})$. Edge $(ell_1, r_3)$ is absent, preventing a single biclique.],
+) <fig:biclique-cover>
 ]
 
 // Completeness check: warn about problem types in JSON but missing from paper
@@ -476,45 +837,74 @@ Each reduction is presented as a *Rule* (with linked problem names and overhead
     $|"VC"| + |"IS"| = #mvc_mis.source.instance.num_vertices = |V|$ #sym.checkmark
   ],
 )[
-  $S subset.eq V$ is independent iff $V backslash S$ is a vertex cover, with $|"IS"| + |"VC"| = |V|$.
+  Vertex cover and independent set are set complements: removing a cover from $V$ leaves vertices with no edges between them (an independent set), and vice versa. Since $|S| + |C| = |V|$ is constant, maximizing one is equivalent to minimizing the other. The reduction preserves the graph and weights unchanged.
 ][
-  ($arrow.r.double$) If $C$ is a vertex cover, for any $u, v in V backslash C$, $(u, v) in.not E$, so $V backslash C$ is independent. ($arrow.l.double$) If $S$ is independent, for any $(u, v) in E$, at most one endpoint is in $S$, so $V backslash S$ covers all edges. _Variable mapping:_ Given VC instance $(G, w)$, create IS instance $(G, w)$ with identical graph and weights. Solution extraction: for IS solution $S$, return $C = V backslash S$. The complement operation preserves optimality since $|S| + |C| = |V|$ is constant.
+  _Construction._ Given VC instance $(G, bold(w))$, create IS instance $(G, bold(w))$ with identical graph and weights. Variables correspond one-to-one: vertex $v$ in the source maps to vertex $v$ in the target.
+
+  _Correctness._ ($arrow.r.double$) If $C$ is a vertex cover, then for any $u, v in V backslash C$, the edge $(u, v) in.not E$ (otherwise $C$ would miss it), so $V backslash C$ is independent. ($arrow.l.double$) If $S$ is independent, then for any $(u, v) in E$, at most one endpoint lies in $S$, so $V backslash S$ covers every edge. Since $|S| + |C| = |V|$ is constant, a minimum vertex cover corresponds to a maximum independent set.
+
+  _Solution extraction._ For IS solution $S$, return $C = V backslash S$, i.e.\ flip each variable: $c_v = 1 - s_v$.
 ]
 
 #reduction-rule("MaximumIndependentSet", "MinimumVertexCover")[
-  The complement $C = V backslash S$ of an independent set is a vertex cover. Same graph and weights; reverse of VC $arrow.r$ IS.
+  The exact reverse of VC $arrow.r$ IS: complementing an independent set yields a vertex cover. The graph and weights are preserved unchanged, and $|"IS"| + |"VC"| = |V|$ ensures optimality carries over.
 ][
-  Identical to the reverse direction: $S$ is independent iff $V backslash S$ is a cover, with $|"IS"| + |"VC"| = |V|$. _Solution extraction:_ for VC solution $C$, return $S = V backslash C$.
+  _Construction._ Given IS instance $(G, bold(w))$, create VC instance $(G, bold(w))$ with identical graph and weights.
+
+  _Correctness._ ($arrow.r.double$) If $S$ is independent, no edge has both endpoints in $S$, so every edge has at least one endpoint in $V backslash S$, making $V backslash S$ a cover. ($arrow.l.double$) If $C$ is a vertex cover, every edge is incident to some vertex in $C$, so no edge connects two vertices of $V backslash C$, making $V backslash C$ independent.
+
+  _Solution extraction._ For VC solution $C$, return $S = V backslash C$, i.e.\ flip each variable: $s_v = 1 - c_v$.
 ]
 
 #reduction-rule("MaximumIndependentSet", "MaximumSetPacking")[
-  Each vertex becomes a singleton set of its incident edges; non-adjacent vertices have disjoint edge sets. Reverse of Set Packing $arrow.r$ IS.
+  The key insight is that two vertices are adjacent if and only if they share an edge. By representing each vertex $v$ as the set of its incident edges $S_v$, adjacency becomes set overlap: $S_u inter S_v != emptyset$ iff $(u,v) in E$. Thus an independent set (no two adjacent) maps exactly to a packing (no two overlapping).
 ][
-  _Variable mapping:_ Universe $U = E$ (edges), $S_v = {e in E : v in e}$, $w(S_v) = w(v)$. Independent vertices have no shared edges, so their edge sets are disjoint $arrow.r.double$ packing. _Solution extraction:_ for packing ${S_v : v in P}$, return IS $= P$.
+  _Construction._ Universe $U = E$ (edges, indexed $0, ..., |E|-1$). For each vertex $v$, define $S_v = {e in E : v in e}$ (the set of edge indices incident to $v$), with weight $w(S_v) = w(v)$. Variables correspond one-to-one: vertex $v$ maps to set $S_v$.
+
+  _Correctness._ ($arrow.r.double$) If $I$ is independent, then for any $u, v in I$, edge $(u,v) in.not E$, so $S_u inter S_v = emptyset$ — the sets are mutually disjoint, forming a valid packing. ($arrow.l.double$) If ${S_v : v in P}$ is a packing, then for any $u, v in P$, $S_u inter S_v = emptyset$, meaning $u$ and $v$ share no edge, so $P$ is independent. Weight sums are identical, so optimality is preserved.
+
+  _Solution extraction._ For packing ${S_v : v in P}$, return IS $= P$ (same variable assignment).
 ]
 
 #reduction-rule("MaximumSetPacking", "MaximumIndependentSet")[
-  Construct intersection graph $G' = (V', E')$ where $V' = cal(S)$ and $(S_i, S_j) in E'$ iff $S_i inter S_j != emptyset$, with $w(v_i) = w(S_i)$. Max packing $equiv$ Max IS on $G'$.
+  The _intersection graph_ captures set overlap as adjacency: two sets that share an element become neighbors, so a packing (mutually disjoint sets) corresponds exactly to an independent set (mutually non-adjacent vertices). This is the standard reduction from set packing to independent set.
 ][
-  Overlapping sets become adjacent vertices; disjoint sets become non-adjacent. A packing (mutually disjoint) maps to an IS (mutually non-adjacent). _Variable mapping:_ Vertices $= {S_1, ..., S_m}$, edges $= {(S_i, S_j) : S_i inter S_j != emptyset}$, $w(v_i) = w(S_i)$. Solution extraction: for IS $I subset.eq V'$, return packing $cal(P) = {S_i : v_i in I}$.
+  _Construction._ Build the intersection graph $G' = (V', E')$: create one vertex $v_i$ per set $S_i$ ($i = 1, ..., m$), and add edge $(v_i, v_j)$ iff $S_i inter S_j != emptyset$. Set $w(v_i) = w(S_i)$. Variables correspond one-to-one: set $S_i$ maps to vertex $v_i$.
+
+  _Correctness._ ($arrow.r.double$) If $cal(P)$ is a packing (all sets mutually disjoint), then for any $S_i, S_j in cal(P)$, $S_i inter S_j = emptyset$, so $(v_i, v_j) in.not E'$, meaning ${v_i : S_i in cal(P)}$ is independent. ($arrow.l.double$) If $I subset.eq V'$ is independent, then for any $v_i, v_j in I$, $(v_i, v_j) in.not E'$, so $S_i inter S_j = emptyset$, meaning ${S_i : v_i in I}$ is a valid packing. Weight sums match, so optimality is preserved.
+
+  _Solution extraction._ For IS $I subset.eq V'$, return packing $cal(P) = {S_i : v_i in I}$ (same variable assignment).
 ]
 
 #reduction-rule("MinimumVertexCover", "MinimumSetCovering")[
-  Construct $U = {0, ..., |E|-1}$, $S_v = {i : e_i "incident to" v}$, $w(S_v) = w(v)$. Then $C$ is a cover iff ${S_v : v in C}$ covers $U$.
+  A vertex cover must "hit" every edge; set covering must "hit" every universe element. By making each edge a universe element and each vertex the set of its incident edges, the two covering conditions become identical. This is the canonical embedding of vertex cover as a special case of set covering.
 ][
-  Each vertex's edge set becomes a subset; the cover condition (every edge covered) maps to the covering condition (every universe element in some selected set). _Variable mapping:_ Universe $U = {0, ..., |E|-1}$ (edge indices), $S_v = {i : e_i "incident to" v}$, $w(S_v) = w(v)$. Solution extraction: for covering ${S_v : v in C}$, return VC $= C$.
+  _Construction._ Universe $U = {0, ..., |E|-1}$ (one element per edge). For each vertex $v$, define $S_v = {i : e_i "incident to" v}$ (the indices of edges touching $v$), with weight $w(S_v) = w(v)$. Variables correspond one-to-one: vertex $v$ maps to set $S_v$.
+
+  _Correctness._ ($arrow.r.double$) If $C$ is a vertex cover, every edge $e_i$ has at least one endpoint $v in C$, so $i in S_v$ for some selected set — hence $union.big_(v in C) S_v = U$, a valid covering. ($arrow.l.double$) If ${S_v : v in C}$ covers $U$, then every edge index $i in U$ appears in some $S_v$ with $v in C$, meaning edge $e_i$ is incident to some $v in C$ — hence $C$ is a vertex cover. Weight sums are identical, so optimality is preserved.
+
+  _Solution extraction._ For covering ${S_v : v in C}$, return VC $= C$ (same variable assignment).
 ]
 
 #reduction-rule("MaximumMatching", "MaximumSetPacking")[
-  Construct $U = V$, $S_e = {u, v}$ for $e = (u,v)$, $w(S_e) = w(e)$. Then $M$ is a matching iff ${S_e : e in M}$ is a packing.
+  A matching selects edges that share no endpoints; set packing selects sets that share no elements. By representing each edge as the 2-element set of its endpoints and using vertices as the universe, two edges conflict (share an endpoint) if and only if their sets overlap. This embeds matching as a special case of set packing where every set has size exactly 2.
 ][
-  Each edge becomes a set of its endpoints; disjoint edges have disjoint endpoint sets. _Variable mapping:_ Universe $U = V$ (vertices), $S_e = {u, v}$ for $e = (u,v)$, $w(S_e) = w(e)$. Solution extraction: for packing ${S_e : e in P}$, return matching $= P$ (the edges whose endpoint sets were packed).
+  _Construction._ Universe $U = V$ (vertices, indexed $0, ..., |V|-1$). For each edge $e = (u, v)$, define $S_e = {u, v}$ with weight $w(S_e) = w(e)$. Variables correspond one-to-one: edge $e$ maps to set $S_e$.
+
+  _Correctness._ ($arrow.r.double$) If $M$ is a matching, then for any $e_1, e_2 in M$, the edges share no endpoint, so $S_(e_1) inter S_(e_2) = emptyset$ — the sets are mutually disjoint, forming a valid packing. ($arrow.l.double$) If ${S_e : e in P}$ is a packing, then for any $e_1, e_2 in P$, $S_(e_1) inter S_(e_2) = emptyset$, meaning the edges share no vertex, so $P$ is a valid matching. Weight sums are identical, so optimality is preserved.
+
+  _Solution extraction._ For packing ${S_e : e in P}$, return matching $= P$ (same variable assignment).
 ]
 
 #reduction-rule("QUBO", "SpinGlass")[
-  The inverse substitution $x_i = (s_i + 1)/2$ converts QUBO to Ising. Reverse of SpinGlass $arrow.r$ QUBO.
+  QUBO uses binary variables $x_i in {0,1}$; the Ising model uses spin variables $s_i in {-1,+1}$. The affine substitution $x_i = (s_i + 1)\/2$ converts between the two encodings. Since every quadratic binary function maps to a quadratic spin function (and vice versa), the two models are polynomially equivalent. This is the reverse of SpinGlass $arrow.r$ QUBO.
 ][
-  Expanding $sum_(i,j) Q_(i j) (s_i+1)(s_j+1)/4$ gives $J_(i j) = -Q_(i j)/4$, $h_i = -(Q_(i i) + sum_j Q_(i j))/2$. _Solution extraction:_ $x_i = (s_i + 1)/2$.
+  _Construction._ Substitute $x_i = (s_i + 1)\/2$ into $f(bold(x)) = sum_(i <= j) Q_(i j) x_i x_j$. For diagonal terms ($i = j$): $Q_(i i) x_i = Q_(i i)(s_i + 1)\/2$, contributing $Q_(i i)\/2$ to $h_i$. For off-diagonal terms ($i < j$): $Q_(i j) x_i x_j = Q_(i j)(s_i + 1)(s_j + 1)\/4$, contributing $Q_(i j)\/4$ to $J_(i j)$, $Q_(i j)\/4$ to both $h_i$ and $h_j$, plus a constant. Collecting terms:
+  $ J_(i j) = Q_(i j) / 4, quad h_i = 1/2 (Q_(i i) + sum_(j != i) Q_(i j) / 2) $
+
+  _Correctness._ ($arrow.r.double$) Any binary assignment $bold(x)$ maps to a spin assignment $bold(s)$ with $s_i = 2 x_i - 1$, and the QUBO objective equals the Ising energy up to a global constant. ($arrow.l.double$) Any spin ground state maps back to a binary minimizer via $x_i = (s_i + 1)\/2$. The constant offset does not affect the argmin.
+
+  _Solution extraction._ Convert spins to binary: $x_i = (s_i + 1) \/ 2$, i.e.\ $s_i = +1 arrow.r x_i = 1$, $s_i = -1 arrow.r x_i = 0$.
 ]
 
 #let sg_qubo = load-example("spinglass_to_qubo")
@@ -529,9 +919,17 @@ Each reduction is presented as a *Rule* (with linked problem names and overhead
     Ground state ($#sg_qubo_r.solutions.len()$-fold degenerate): $bold(x) = (#sg_qubo_sol.target_config.map(str).join(", "))$ #sym.checkmark
   ],
 )[
-  The substitution $s_i = 2x_i - 1$ yields $H_"SG"(bold(s)) = H_"QUBO"(bold(x)) + "const"$.
+  The Ising model and QUBO are both quadratic functions over finite domains: spins ${-1,+1}$ and binary variables ${0,1}$, respectively. The affine map $s_i = 2x_i - 1$ establishes a bijection between the two domains and preserves the quadratic structure. Substituting into the Ising Hamiltonian yields a QUBO objective that differs from the original energy by a constant, so ground states correspond exactly.
 ][
-  Expanding $-sum_(i,j) J_(i j) (2x_i - 1)(2x_j - 1) - sum_i h_i (2x_i - 1)$ gives $Q_(i j) = -4J_(i j)$, $Q_(i i) = 2sum_j J_(i j) - 2h_i$. _Variable mapping:_ Spin $s_i in {-1, +1}$ maps to binary $x_i in {0, 1}$ via $s_i = 2x_i - 1$. Solution extraction: for QUBO solution $bold(x)$, return spins $s_i = 2x_i - 1$. The reverse maps $x_i = (s_i + 1)/2$.
+  _Construction._ Substitute $s_i = 2x_i - 1$ into $H = -sum_(i<j) J_(i j) s_i s_j - sum_i h_i s_i$. Expanding:
+  $ s_i s_j = (2x_i - 1)(2x_j - 1) = 4x_i x_j - 2x_i - 2x_j + 1 $
+  Collecting terms and using $x_i^2 = x_i$:
+  $ Q_(i j) = -4 J_(i j) quad (i < j), quad Q_(i i) = 2 sum_(j != i) J_(i j) - 2 h_i $
+  The constant offset $-sum_(i<j) J_(i j) + sum_i h_i$ does not affect the minimizer.
+
+  _Correctness._ ($arrow.r.double$) Any spin configuration $bold(s)$ maps to a unique binary vector $bold(x)$ via $x_i = (s_i + 1)\/2$, and $H_"SG"(bold(s)) = H_"QUBO"(bold(x)) + "const"$, so a ground state of the Ising model maps to a QUBO minimizer. ($arrow.l.double$) Any QUBO minimizer $bold(x)$ maps back to spins $s_i = 2x_i - 1$ with the same energy relationship, so optimality is preserved in both directions.
+
+  _Solution extraction._ Convert binary to spins: $s_i = 2x_i - 1$, i.e.\ $x_i = 1 arrow.r s_i = +1$, $x_i = 0 arrow.r s_i = -1$.
 ]
 
 == Penalty-Method QUBO Reductions <sec:penalty-method>
@@ -559,22 +957,28 @@ where $P$ is a penalty weight large enough that any constraint violation costs m
     }).join(", ")
   ],
 )[
-  Given $G = (V, E)$ with weights $w$, construct upper-triangular $Q in RR^(n times n)$ with $Q_(i i) = -w_i$ and $Q_(i j) = P$ for $(i,j) in E$ ($i < j$), where $P = 1 + sum_i w_i$. Then minimizing $f(bold(x)) = sum_i Q_(i i) x_i + sum_(i<j) Q_(i j) x_i x_j$ is equivalent to maximizing the IS objective.
+  An independent set selects vertices with no two adjacent. Each vertex $i$ gets a binary variable $x_i in {0,1}$ indicating selection, and the objective $sum_i w_i x_i$ rewards large sets. The adjacency constraint $x_i x_j = 0$ for each edge is naturally quadratic, so the penalty method directly yields a QUBO: diagonal entries reward vertex selection, while off-diagonal entries penalize adjacent pairs with a weight large enough to make any edge violation costlier than selecting all vertices.
 ][
   _Construction._ The IS objective is: maximize $sum_i w_i x_i$ subject to $x_i x_j = 0$ for $(i,j) in E$. Applying the penalty method (@sec:penalty-method):
   $ f(bold(x)) = -sum_i w_i x_i + P sum_((i,j) in E) x_i x_j $
-  Reading off the QUBO coefficients: diagonal $Q_(i i) = -w_i$ (linear terms), off-diagonal $Q_(i j) = P$ for edges $i < j$ (quadratic penalty).
+  with $P = 1 + sum_i w_i$. Reading off the QUBO coefficients: diagonal $Q_(i i) = -w_i$ (linear reward for selection), off-diagonal $Q_(i j) = P$ for edges $i < j$ (quadratic penalty for adjacency).
+
+  _Correctness._ ($arrow.r.double$) If $bold(x)$ encodes a maximum-weight IS $S^*$, then all penalty terms vanish ($x_i x_j = 0$ for all edges), and $f(bold(x)) = -sum_(i in S^*) w_i$. Any non-IS assignment activates at least one penalty $P > sum_i w_i$, yielding $f > 0 >= f(bold(x))$. ($arrow.l.double$) Among feasible assignments (independent sets), the penalty terms vanish and $f(bold(x)) = -sum_(i in S) w_i$, minimized exactly when $S$ is a maximum-weight IS. Thus QUBO minimizers correspond to maximum-weight independent sets.
 
-  _Correctness._ If $bold(x)$ has any adjacent pair $(x_i = 1, x_j = 1)$ with $(i,j) in E$, the penalty $P > sum_i w_i >= -sum_i Q_(i i) x_i$ exceeds the maximum objective gain, so $bold(x)$ is not a minimizer. Among independent sets ($x_i x_j = 0$ for all edges), $f(bold(x)) = -sum_(i in S) w_i$, minimized exactly when $S$ is a maximum-weight IS.
+  _Solution extraction._ Return $bold(x)$ directly — each $x_i = 1$ indicates vertex $i$ is in the IS.
 ]
 
 #reduction-rule("MinimumVertexCover", "QUBO")[
-  Given $G = (V, E)$ with weights $w$, construct upper-triangular $Q$ with $Q_(i i) = w_i - P dot "deg"(i)$ and $Q_(i j) = P$ for $(i,j) in E$ ($i < j$), where $P = 1 + sum_i w_i$ and $"deg"(i)$ is the degree of vertex $i$.
+  A vertex cover must include at least one endpoint of every edge. The covering constraint for edge $(i,j)$ — that $x_i = x_j = 0$ is forbidden — translates to the quadratic penalty $(1-x_i)(1-x_j)$, which equals 1 exactly when neither endpoint is selected. The penalty method combines the weight-minimization objective with these coverage penalties into a single QUBO, where diagonal entries reflect the trade-off between vertex cost and coverage benefit, and off-diagonal entries penalize uncovered edges.
 ][
   _Construction._ The VC objective is: minimize $sum_i w_i x_i$ subject to $x_i + x_j >= 1$ for $(i,j) in E$. Applying the penalty method (@sec:penalty-method), the constraint $x_i + x_j >= 1$ is violated iff $x_i = x_j = 0$, with penalty $(1 - x_i)(1 - x_j)$:
   $ f(bold(x)) = sum_i w_i x_i + P sum_((i,j) in E) (1 - x_i)(1 - x_j) $
-  Expanding: $(1 - x_i)(1 - x_j) = 1 - x_i - x_j + x_i x_j$.
-  Summing over all edges, each vertex $i$ appears in $"deg"(i)$ terms. The QUBO coefficients are: diagonal $Q_(i i) = w_i - P dot "deg"(i)$ (objective plus linear penalty), off-diagonal $Q_(i j) = P$ for edges. The constant $P |E|$ does not affect the minimizer.
+  with $P = 1 + sum_i w_i$. Expanding: $(1 - x_i)(1 - x_j) = 1 - x_i - x_j + x_i x_j$.
+  Summing over all edges, each vertex $i$ appears in $"deg"(i)$ penalty terms. The QUBO coefficients are: diagonal $Q_(i i) = w_i - P dot "deg"(i)$ (objective cost minus linear penalty for coverage), off-diagonal $Q_(i j) = P$ for edges (quadratic penalty). The constant $P |E|$ does not affect the minimizer.
+
+  _Correctness._ ($arrow.r.double$) If $bold(x)$ encodes a minimum vertex cover, every edge has at least one endpoint selected, so all penalty terms $(1-x_i)(1-x_j) = 0$ vanish and $f(bold(x)) = sum_(i in C) w_i$. ($arrow.l.double$) If some edge $(i,j)$ is uncovered ($x_i = x_j = 0$), the penalty $P > sum_i w_i$ exceeds the entire objective range, so $bold(x)$ cannot be a minimizer. Among valid covers (all penalties zero), $f(bold(x)) = sum_(i in C) w_i$ up to a constant, minimized exactly when $C$ is a minimum-weight vertex cover.
+
+  _Solution extraction._ Return $bold(x)$ directly — each $x_i = 1$ indicates vertex $i$ is in the cover.
 ]
 
 #let kc_qubo = load-example("kcoloring_to_qubo")
@@ -595,44 +999,54 @@ where $P$ is a penalty weight large enough that any constraint violation costs m
       }))
     }
 
-    *Step 1 -- Encode colors as binary variables.* Each vertex $v in {0,...,4}$ gets $k = 3$ binary variables $(x_(v,0), x_(v,1), x_(v,2))$, where $x_(v,c) = 1$ means "vertex $v$ receives color $c$." This gives $n k = 5 times 3 = 15$ QUBO variables total, arranged as:
+    *Step 1 -- Encode each color choice as a binary variable.* A coloring assigns each vertex one of $k$ colors. To express this in binary, introduce $k$ indicator variables per vertex: $x_(v,c) = 1$ means "vertex $v$ gets color $c$." For the house graph with $k = 3$, this gives $n k = 5 times 3 = 15$ QUBO variables:
     $ underbrace(x_(0,0) x_(0,1) x_(0,2), "vertex 0") #h(4pt) underbrace(x_(1,0) x_(1,1) x_(1,2), "vertex 1") #h(4pt) dots.c #h(4pt) underbrace(x_(4,0) x_(4,1) x_(4,2), "vertex 4") $
 
-    *Step 2 -- One-hot penalty.* Each vertex must receive _exactly one_ color, i.e.\ $sum_c x_(v,c) = 1$. The penalty $(1 - sum_c x_(v,c))^2$ is zero iff exactly one variable in the group is 1. With weight $P_1 = 1 + n = 6$, this contributes $Q_(v k+c, v k+c) = -6$ on the diagonal and $Q_(v k+c_1, v k+c_2) = 12$ between same-vertex color pairs. These are the $5 times 5$ diagonal blocks of $Q$.\
+    *Step 2 -- Penalize invalid color assignments (one-hot constraint).* A valid coloring requires each vertex to have _exactly one_ color, i.e.\ $sum_c x_(v,c) = 1$. The penalty $(1 - sum_c x_(v,c))^2$ equals zero when exactly one variable is 1, and is positive otherwise. Weighted by $P_1 = 1 + n = 6$, this contributes diagonal entries $Q_(v k+c, v k+c) = -6$ and off-diagonal entries $Q_(v k+c_1, v k+c_2) = 12$ between colors of the same vertex. These form the $5 times 5$ diagonal blocks of $Q$.\
 
-    *Step 3 -- Edge conflict penalty.* For each edge $(u,v) in E$ and each color $c$, both endpoints having color $c$ is penalized: $P_2 dot x_(u,c) x_(v,c)$ with $P_2 = P_1 slash 2 = 3$. The house has 6 edges, each contributing 3 color penalties $arrow.r$ 18 off-diagonal entries of value $3$ in $Q$.\
+    *Step 3 -- Penalize same-color neighbors (edge conflict).* For each edge $(u,v) in E$ and each color $c$, the product $x_(u,c) x_(v,c) = 1$ iff both endpoints receive color $c$ — exactly the coloring conflict we want to forbid. The penalty $P_2 dot x_(u,c) x_(v,c)$ with $P_2 = P_1 slash 2 = 3$ makes such conflicts costly. The house has 6 edges, each contributing 3 color-conflict penalties $arrow.r$ 18 off-diagonal entries of value $3$ in $Q$.\
 
     *Step 4 -- Verify a solution.* The first valid 3-coloring is $(c_0, ..., c_4) = (#kc_qubo_sol.source_config.map(str).join(", "))$, shown in the figure above. The one-hot encoding is $bold(x) = (#kc_qubo_sol.target_config.map(str).join(", "))$. Check: each 3-bit group has exactly one 1 (valid one-hot #sym.checkmark), and for every edge the two endpoints have different colors (e.g.\ edge $0 dash 1$: colors $#kc_qubo_sol.source_config.at(0), #kc_qubo_sol.source_config.at(1)$ #sym.checkmark).\
 
     *Count:* #kc_qubo_r.solutions.len() valid colorings $= 3! times 3$. The triangle $2 dash 3 dash 4$ forces 3 distinct colors ($3! = 6$ permutations); for each, the base vertices $0, 1$ each have 3 compatible choices but share edge $0 dash 1$, leaving $3$ valid pairs.
   ],
 )[
-  Given $G = (V, E)$ with $k$ colors, construct upper-triangular $Q in RR^(n k times n k)$ using one-hot encoding $x_(v,c) in {0,1}$ ($n k$ variables indexed by $v dot k + c$).
+  The $k$-coloring problem has two requirements: each vertex gets exactly one color, and adjacent vertices get different colors. Both can be expressed as quadratic penalties over binary variables. Introduce $n k$ binary variables $x_(v,c) in {0,1}$ (indexed by $v dot k + c$), where $x_(v,c) = 1$ means vertex $v$ receives color $c$. The first requirement becomes a _one-hot constraint_ penalizing vertices with zero or multiple colors; the second becomes an _edge conflict penalty_ penalizing same-color neighbors. The combined QUBO matrix $Q in RR^(n k times n k)$ encodes both penalties.
 ][
-  _Construction._ Applying the penalty method (@sec:penalty-method), the QUBO objective combines a one-hot constraint penalty and an edge conflict penalty:
-  $ f(bold(x)) = P_1 sum_(v in V) (1 - sum_(c=1)^k x_(v,c))^2 + P_2 sum_((u,v) in E) sum_(c=1)^k x_(u,c) x_(v,c) $
+  _Construction._ Applying the penalty method (@sec:penalty-method), the two requirements translate into two penalty terms:
+  $ f(bold(x)) = underbrace(P_1 sum_(v in V) (1 - sum_(c=1)^k x_(v,c))^2, "one-hot: exactly one color per vertex") + underbrace(P_2 sum_((u,v) in E) sum_(c=1)^k x_(u,c) x_(v,c), "edge conflict: neighbors differ") $
 
-  _One-hot expansion._ For each vertex $v$, using $x_(v,c)^2 = x_(v,c)$:
+  _One-hot expansion._ The constraint $(1 - sum_c x_(v,c))^2$ penalizes any vertex with $!= 1$ active color. Expanding using $x_(v,c)^2 = x_(v,c)$ (binary variables):
   $ (1 - sum_c x_(v,c))^2 = 1 - sum_c x_(v,c) + 2 sum_(c_1 < c_2) x_(v,c_1) x_(v,c_2) $
-  This yields diagonal $Q_(v k+c, v k+c) = -P_1$ and intra-vertex off-diagonal $Q_(v k+c_1, v k+c_2) = 2 P_1$ for $c_1 < c_2$.
+  Reading off the QUBO coefficients: diagonal $Q_(v k+c, v k+c) = -P_1$ (favors assigning a color) and intra-vertex off-diagonal $Q_(v k+c_1, v k+c_2) = 2 P_1$ for $c_1 < c_2$ (discourages multiple colors).
 
-  _Edge penalty._ For each edge $(u,v)$ and color $c$, the term $P_2 x_(u,c) x_(v,c)$ contributes to $Q_(u k+c, v k+c) += P_2$ (with appropriate index ordering).
+  _Edge conflict._ For each edge $(u,v)$ and color $c$, the product $x_(u,c) x_(v,c)$ equals 1 iff both endpoints share color $c$. The penalty $P_2 x_(u,c) x_(v,c)$ adds $P_2$ to $Q_(u k+c, v k+c)$ (with appropriate index ordering).
 
-  In our implementation, $P_1 = P = 1 + n$ and $P_2 = P\/2$.
+  In our implementation, $P_1 = P = 1 + n$ and $P_2 = P\/2$. The penalty $P_1$ exceeds the number of vertices, ensuring that any constraint violation outweighs any objective gain.
+
+  _Correctness._ ($arrow.r.double$) If $bold(x)$ violates any one-hot constraint (some vertex has 0 or $>= 2$ colors), the penalty $P_1 > n$ exceeds the objective range, so $bold(x)$ is not a minimizer. ($arrow.l.double$) Among valid one-hot encodings, $f$ reduces to the edge conflict term, minimized when no two adjacent vertices share a color — exactly the $k$-coloring objective.
 
   _Solution extraction._ For each vertex $v$, find $c$ with $x_(v,c) = 1$.
 ]
 
 #reduction-rule("MaximumSetPacking", "QUBO")[
-  Equivalent to IS on the intersection graph: $Q_(i i) = -w_i$ and $Q_(i j) = P$ for overlapping sets $i, j$ ($i < j$), where $P = 1 + sum_i w_i$.
+  Set packing selects mutually disjoint sets of maximum total weight. Two sets conflict if and only if they share a universe element — the same adjacency structure as an independent set on the _intersection graph_. This reduction builds the intersection graph implicitly and applies the IS penalty method directly: each set becomes a QUBO variable, diagonal entries reward selection, and off-diagonal entries penalize pairs of overlapping sets with a penalty large enough to forbid any overlap.
 ][
-  Two sets conflict iff they share an element. The intersection graph has sets as vertices and edges between conflicting pairs. Applying the penalty method (@sec:penalty-method) yields the same QUBO as IS on this graph: diagonal rewards selection, off-diagonal penalizes overlap. Correctness follows from the IS→QUBO proof.
+  _Construction._ Given sets $S_1, ..., S_m$ with weights $w_1, ..., w_m$, introduce binary variables $x_i in {0,1}$ for each set. Two sets $S_i, S_j$ _conflict_ iff $S_i inter S_j != emptyset$. The packing objective is: maximize $sum_i w_i x_i$ subject to $x_i x_j = 0$ for every conflicting pair. Applying the penalty method (@sec:penalty-method):
+  $ f(bold(x)) = -sum_i w_i x_i + P sum_(S_i inter S_j != emptyset, thin i < j) x_i x_j $
+  with $P = 1 + sum_i w_i$. The QUBO coefficients are: diagonal $Q_(i i) = -w_i$ (reward for selecting set $S_i$), off-diagonal $Q_(i j) = P$ for each conflicting pair $i < j$ (penalty for overlap).
+
+  _Correctness._ ($arrow.r.double$) If $bold(x)$ encodes a maximum-weight packing, all selected sets are mutually disjoint, so all penalty terms vanish and $f(bold(x)) = -sum_(i in cal(P)) w_i$. Any assignment selecting overlapping sets incurs penalty $P > sum_i w_i$, making it suboptimal. ($arrow.l.double$) Among feasible assignments (no overlapping sets selected), the penalty terms vanish and $f(bold(x)) = -sum_(i in cal(P)) w_i$, minimized exactly when $cal(P)$ is a maximum-weight packing.
+
+  _Solution extraction._ Return $bold(x)$ directly — each $x_i = 1$ indicates set $S_i$ is in the packing.
 ]
 
 #reduction-rule("KSatisfiability", "QUBO")[
-  Given a Max-$k$-SAT instance with $m$ clauses over $n$ variables, construct a QUBO that counts unsatisfied clauses. For $k = 2$, $Q in RR^(n times n)$ directly encodes quadratic penalties. For $k = 3$, Rosenberg quadratization introduces $m$ auxiliary variables, giving $Q in RR^((n+m) times (n+m))$.
+  Each clause in a $k$-SAT formula is falsified by exactly one assignment to its literals. For $k = 2$, this falsifying pattern is a product of two (possibly complemented) binary variables — already quadratic, so each clause maps directly to QUBO terms. For $k = 3$, the falsifying pattern $y_1 y_2 y_3$ is cubic; Rosenberg quadratization replaces the product $y_1 y_2$ with an auxiliary variable $a$, enforced by a penalty that makes $a != y_1 y_2$ suboptimal. The total QUBO counts unsatisfied clauses, so minimizers maximize satisfiability.
 ][
-  *Case $k = 2$.* Applying the penalty method (@sec:penalty-method), each 2-literal clause has exactly one falsifying assignment (both literals false). The penalty for that assignment is a quadratic function of $x_i, x_j$:
+  *Case $k = 2$.*
+
+  _Construction._ Each 2-literal clause has exactly one falsifying assignment (both literals false). The penalty for that assignment is a quadratic function of $x_i, x_j$:
 
   #table(
     columns: (auto, auto, auto, auto),
@@ -645,27 +1059,35 @@ where $P$ is a penalty weight large enough that any constraint violation costs m
     [$overline(x_i) or overline(x_j)$], [$x_i=1, x_j=1$], [$x_i x_j$], [$Q_(i j) += 1$],
   )
 
-  Summing over all clauses, $f(bold(x)) = sum_j "penalty"_j (bold(x))$ counts falsified clauses. Minimizers of $f$ maximize satisfied clauses.
+  Summing over all clauses, $f(bold(x)) = sum_j "penalty"_j (bold(x))$ counts falsified clauses.
 
-  *Case $k = 3$ (Rosenberg quadratization).* For each clause $(ell_1 or ell_2 or ell_3)$, define complement variables $y_i = overline(ell_i)$ (so $y_i = x_i$ if the literal is negated, $y_i = 1 - x_i$ if positive). The clause is violated when $y_1 y_2 y_3 = 1$. This cubic penalty is reduced to quadratic form by introducing an auxiliary variable $a$ and the substitution $a = y_1 y_2$, enforced via a Rosenberg penalty with weight $M$:
+  _Correctness._ ($arrow.r.double$) Each penalty term is non-negative and equals 1 exactly when its clause is falsified. If $bold(x)$ satisfies all clauses, $f(bold(x)) = 0$. ($arrow.l.double$) Any minimizer of $f$ achieves the fewest falsified clauses, hence maximizes satisfiability.
+
+  *Case $k = 3$ (Rosenberg quadratization).*
+
+  _Construction._ For each clause $(ell_1 or ell_2 or ell_3)$, define complement variables $y_i = overline(ell_i)$ (so $y_i = x_i$ if the literal is negated, $y_i = 1 - x_i$ if positive). The clause is violated when $y_1 y_2 y_3 = 1$. This cubic penalty is reduced to quadratic form by introducing an auxiliary variable $a$ and the substitution $a = y_1 y_2$, enforced via a Rosenberg penalty with weight $M$:
   $ H = a dot y_3 + M (y_1 y_2 - 2 y_1 a - 2 y_2 a + 3a) $
-  where $M = 2$ suffices. For any binary assignment, if $a = y_1 y_2$ the penalty term vanishes and $H = y_1 y_2 y_3$ counts the clause violation. If $a != y_1 y_2$, the penalty $M(dots.c) >= 1$ makes this suboptimal.
+  where $M = 2$ suffices. Each clause adds one auxiliary variable (indices $n, n+1, ..., n+m-1$), so the total QUBO has $n + m$ variables.
+
+  _Correctness._ ($arrow.r.double$) If $a = y_1 y_2$, the Rosenberg penalty term vanishes and $H = y_1 y_2 y_3$ counts the clause violation faithfully. ($arrow.l.double$) If $a != y_1 y_2$, the penalty $M(dots.c) >= 1$ strictly exceeds the clause-counting contribution (at most 1), so any minimizer must have $a = y_1 y_2$ for every clause. Among such assignments, $H$ counts unsatisfied clauses, and minimizers maximize satisfiability.
 
-  Each clause adds one auxiliary variable (indices $n, n+1, ..., n+m-1$), so the total QUBO has $n + m$ variables. Solution extraction discards auxiliary variables: return $bold(x)[0..n]$.
+  _Solution extraction._ Discard auxiliary variables: return $bold(x)[0..n]$.
 ]
 
 #reduction-rule("ILP", "QUBO")[
-  Given binary ILP: maximize $bold(c)^top bold(x)$ subject to $A bold(x) = bold(b)$, $bold(x) in {0,1}^n$, construct upper-triangular $Q = -"diag"(bold(c) + 2P bold(b)^top A) + P A^top A$ where $P = 1 + ||bold(c)||_1 + ||bold(b)||_1$.
+  A binary ILP optimizes a linear objective over binary variables subject to linear constraints. The penalty method converts each equality constraint $bold(a)_k^top bold(x) = b_k$ into the quadratic penalty $(bold(a)_k^top bold(x) - b_k)^2$, which is zero if and only if the constraint is satisfied. Inequality constraints are first converted to equalities using binary slack variables with powers-of-two coefficients. The resulting unconstrained quadratic over binary variables is a QUBO whose matrix $Q$ combines the negated objective (as diagonal terms) with the expanded constraint penalties (as a Gram matrix $A^top A$).
 ][
-  _Step 1: Normalize constraints._ Convert inequalities to equalities using slack variables: $bold(a)_k^top bold(x) <= b_k$ becomes $bold(a)_k^top bold(x) + sum_(s=0)^(S_k - 1) 2^s y_(k,s) = b_k$ where $S_k = ceil(log_2 (b_k + 1))$ slack bits. For $>=$ constraints, the slack has a negative sign. The extended system is $A' bold(x)' = bold(b)$ with $bold(x)' = (bold(x), bold(y)) in {0,1}^(n')$. For minimization, negate $bold(c)$ to convert to maximization.
+  _Construction._ First, normalize all constraints to equalities. Inequalities $bold(a)_k^top bold(x) <= b_k$ become $bold(a)_k^top bold(x) + sum_(s=0)^(S_k - 1) 2^s y_(k,s) = b_k$ where $S_k = ceil(log_2 (b_k + 1))$ binary slack bits. For $>=$ constraints, the slack has a negative sign. The extended system is $A' bold(x)' = bold(b)$ with $bold(x)' = (bold(x), bold(y)) in {0,1}^(n')$. For minimization, negate $bold(c)$ to convert to maximization.
 
-  _Step 2: QUBO construction._ Applying the penalty method (@sec:penalty-method), combine objective and penalty:
+  Applying the penalty method (@sec:penalty-method), combine the negated objective with quadratic constraint penalties:
   $ f(bold(x)') = -bold(c')^top bold(x)' + P sum_(k=1)^m (bold(a)'_k^(top) bold(x)' - b_k)^2 $
-  where $bold(c)' = (bold(c), bold(0))$. Expanding the quadratic penalty:
-  $ = bold(x)'^(top) A'^(top) A' bold(x)' - 2 bold(b)^top A' bold(x)' + ||bold(b)||_2^2 $
-  Combining with $-bold(c')^top bold(x)'$ and dropping constants:
+  where $bold(c)' = (bold(c), bold(0))$ and $P = 1 + ||bold(c)||_1 + ||bold(b)||_1$. Expanding the quadratic penalty:
+  $ sum_k (bold(a)'_k^(top) bold(x)' - b_k)^2 = bold(x)'^(top) A'^(top) A' bold(x)' - 2 bold(b)^top A' bold(x)' + ||bold(b)||_2^2 $
+  Combining with $-bold(c')^top bold(x)'$ and dropping the constant $||bold(b)||_2^2$:
   $ Q = -"diag"(bold(c)' + 2P bold(b)^top A') + P A'^(top) A' $
-  The diagonal contains linear terms; the upper triangle of $A'^(top) A'$ gives quadratic terms (doubled for upper-triangular convention).
+  The diagonal contains linear terms (objective plus constraint); the upper triangle of $A'^(top) A'$ gives quadratic cross-terms.
+
+  _Correctness._ ($arrow.r.double$) If $bold(x)'^*$ is an optimal ILP solution, then $A' bold(x)'^* = bold(b)$ and all penalty terms vanish, so $f(bold(x)'^*) = -bold(c')^top bold(x)'^*$. ($arrow.l.double$) If any constraint is violated, $(bold(a)'_k^(top) bold(x)' - b_k)^2 >= 1$ and the penalty $P > ||bold(c)||_1$ exceeds the entire objective range, so $bold(x)'$ cannot be a QUBO minimizer. Among feasible assignments (all penalties zero), $f$ reduces to $-bold(c')^top bold(x)'$, minimized at the ILP optimum.
 
   _Solution extraction._ Discard slack variables: return $bold(x)' [0..n]$.
 ]
@@ -682,16 +1104,19 @@ where $P$ is a penalty weight large enough that any constraint violation costs m
     Optimal: $bold(x) = (#qubo_ilp_sol.source_config.map(str).join(", "))$ ($#qubo_ilp_r.solutions.len()$-fold degenerate) #sym.checkmark
   ],
 )[
-  McCormick linearization: for each product $x_i x_j$ ($i < j$) with $Q_(i j) != 0$, introduce auxiliary $y_(i j)$ with three linear constraints.
+  QUBO minimizes a quadratic form $bold(x)^top Q bold(x)$ over binary variables. Every quadratic term $Q_(i j) x_i x_j$ can be _linearized_ by introducing an auxiliary variable $y_(i j)$ constrained to equal the product $x_i x_j$ via three McCormick inequalities. Diagonal terms $Q_(i i) x_i^2 = Q_(i i) x_i$ are already linear for binary $x_i$. The result is a binary ILP with a linear objective and $3 m$ constraints (where $m$ is the number of non-zero off-diagonal entries), whose minimizer corresponds exactly to the QUBO minimizer.
 ][
+  _Construction._ For $Q in RR^(n times n)$ (upper triangular) with $m$ non-zero off-diagonal entries:
+
   _Diagonal terms._ For binary $x_i$: $Q_(i i) x_i^2 = Q_(i i) x_i$, which is directly linear.
 
   _Off-diagonal terms._ For each non-zero $Q_(i j)$ ($i < j$), introduce binary $y_(i j) = x_i dot x_j$ with McCormick constraints:
   $ y_(i j) <= x_i, quad y_(i j) <= x_j, quad y_(i j) >= x_i + x_j - 1 $
-  These constraints enforce $y_(i j) = x_i x_j$ for binary variables.
 
   _ILP formulation._ Minimize $sum_i Q_(i i) x_i + sum_(i < j) Q_(i j) y_(i j)$ subject to the McCormick constraints and $x_i, y_(i j) in {0, 1}$.
 
+  _Correctness._ ($arrow.r.double$) For binary $x_i, x_j$, the three McCormick inequalities are tight: $y_(i j) = x_i x_j$ is the unique feasible value. Hence the ILP objective equals $bold(x)^top Q bold(x)$, and any ILP minimizer is a QUBO minimizer. ($arrow.l.double$) Given a QUBO minimizer $bold(x)^*$, setting $y_(i j) = x_i^* x_j^*$ satisfies all constraints and achieves the same objective value.
+
   _Solution extraction._ Return the first $n$ variables (discard auxiliary $y_(i j)$).
 ]
 
@@ -706,9 +1131,9 @@ where $P$ is a penalty weight large enough that any constraint violation costs m
     #cs_ilp_r.solutions.len() feasible solutions ($= 2^3$ valid input combinations for the full adder) #sym.checkmark
   ],
 )[
-  Each gate maps to linear constraints over binary variables; any feasible ILP solution is a satisfying circuit assignment.
+  Each boolean gate (AND, OR, NOT, XOR) has a truth table that can be captured exactly by a small set of linear inequalities over binary variables. By Tseitin-style flattening, each internal expression node gets an auxiliary ILP variable constrained to match its gate's output, so the conjunction of all gate constraints is feasible if and only if the circuit is satisfiable. The ILP has a trivial objective (minimize 0), making it a pure feasibility problem.
 ][
-  _Tseitin flattening._ Recursively assign an ILP variable to each expression node. Named circuit variables keep their identity; internal nodes get auxiliary variables.
+  _Construction._ Recursively assign an ILP variable to each expression node. Named circuit variables keep their identity; internal nodes get auxiliary variables.
 
   _Gate encodings_ (output $c$, inputs $a_1, ..., a_k$, all binary):
   - NOT: $c + a = 1$
@@ -718,6 +1143,8 @@ where $P$ is a penalty weight large enough that any constraint violation costs m
 
   _Objective._ Minimize $0$ (feasibility problem): any feasible solution satisfies the circuit.
 
+  _Correctness._ ($arrow.r.double$) Each gate encoding is the convex hull of the gate's truth table rows (viewed as binary vectors), so a satisfying circuit assignment satisfies all constraints. ($arrow.l.double$) Any binary feasible solution respects every gate's input-output relation, and since gates are composed in topological order, the full circuit evaluates to true.
+
   _Solution extraction._ Return values of the named circuit variables.
 ]
 
@@ -735,7 +1162,7 @@ where $P$ is a penalty weight large enough that any constraint violation costs m
     IS of size #sat_mis.source.instance.num_clauses $= m$: one vertex per clause $arrow.r$ satisfying assignment #sym.checkmark
   ],
 )[
-  @karp1972 Given CNF $phi$ with $m$ clauses, construct graph $G$ such that $phi$ is satisfiable iff $G$ has an IS of size $m$.
+  @karp1972 A satisfying assignment must make at least one literal true in every clause, and different clauses cannot assign contradictory values to the same variable. These two requirements map naturally to an independent set problem: _intra-clause cliques_ force exactly one literal per clause to be selected, while _conflict edges_ between complementary literals across clauses enforce consistency. The target IS size equals the number of clauses $m$, so an IS of size $m$ exists iff the formula is satisfiable.
 ][
   _Construction._ For $phi = and.big_(j=1)^m C_j$ with $C_j = (ell_(j,1) or ... or ell_(j,k_j))$:
 
@@ -760,11 +1187,13 @@ where $P$ is a penalty weight large enough that any constraint violation costs m
     #sat_kc_r.solutions.len() valid 3-colorings (color symmetry of satisfying assignments) #sym.checkmark
   ],
 )[
-  @garey1979 Given CNF $phi$, construct graph $G$ such that $phi$ is satisfiable iff $G$ is 3-colorable.
+  @garey1979 A 3-coloring partitions vertices into three classes. The key insight is that three colors suffice to encode Boolean logic: one color represents TRUE, one FALSE, and a third (AUX) serves as a neutral ground. Variable gadgets force each variable's positive and negative literals to receive opposite truth colors, while clause gadgets use an OR-chain that can only receive the TRUE color when at least one input literal is TRUE-colored. Connecting the output of each clause gadget to the FALSE vertex forces it to be TRUE-colored, encoding the requirement that every clause is satisfied.
 ][
-  _Construction._ (1) Base triangle: TRUE, FALSE, AUX vertices with all pairs connected. (2) Variable gadget for $x_i$: vertices $"pos"_i$, $"neg"_i$ connected to each other and to AUX. (3) Clause gadget: for $(ell_1 or ... or ell_k)$, apply OR-gadgets iteratively producing output $o$, then connect $o$ to FALSE and AUX.
+  _Construction._ (1) _Base triangle:_ vertices TRUE, FALSE, AUX, all mutually connected. This fixes three distinct colors and establishes the color semantics. (2) _Variable gadget_ for $x_i$: vertices $"pos"_i$, $"neg"_i$ connected to each other and to AUX. Since $"pos"_i$ and $"neg"_i$ are both adjacent to AUX, neither can receive the AUX color; since they are adjacent to each other, one must be TRUE-colored and the other FALSE-colored. (3) _Clause gadget_ for $(ell_1 or dots or ell_k)$: apply OR-gadgets iteratively --- $o_1 = "OR"(ell_1, ell_2)$, $o_2 = "OR"(o_1, ell_3)$, etc. --- producing final output $o$, then connect $o$ to both FALSE and AUX.
+
+  _OR-gadget$(a, b) arrow.bar o$:_ Introduces five auxiliary vertices with edges arranged so that $o$ can receive the TRUE color iff at least one of $a$, $b$ has the TRUE color. When both inputs have the FALSE color, the gadget's internal constraints force $o$ into the AUX color.
 
-  _OR-gadget$(a, b) arrow.bar o$:_ Five vertices encoding $o = a or b$: if both $a, b$ have FALSE color, $o$ cannot have TRUE color.
+  _Correctness._ ($arrow.r.double$) A satisfying assignment colors $"pos"_i$ as TRUE when $x_i = 1$ and FALSE otherwise. Each clause has at least one TRUE literal, so the OR-chain output receives the TRUE color, which is compatible with edges to FALSE and AUX. ($arrow.l.double$) In any valid 3-coloring, the variable gadgets assign consistent truth values and the clause gadget connections to FALSE force each clause output to be TRUE-colored, meaning at least one literal per clause is TRUE.
 
   _Solution extraction._ Set $x_i = 1$ iff $"color"("pos"_i) = "color"("TRUE")$.
 ]
@@ -781,19 +1210,21 @@ where $P$ is a penalty weight large enough that any constraint violation costs m
     Dominating set of size $n = #sat_ds.source.instance.num_vars$: one vertex per variable triangle #sym.checkmark
   ],
 )[
-  @garey1979 Given CNF $phi$ with $n$ variables and $m$ clauses, $phi$ is satisfiable iff the constructed graph has a dominating set of size $n$.
+  @garey1979 Each variable is represented by a triangle whose three vertices correspond to the positive literal, negative literal, and a dummy. Any dominating set must include at least one vertex from each triangle to dominate the dummy. The clause vertices are connected only to the literal vertices that appear in the clause, so a dominating set of minimum size $n$ (one vertex per triangle) dominates all clause vertices iff the chosen literals satisfy every clause.
 ][
-  _Construction._ (1) Variable triangle for $x_i$: vertices $"pos"_i = 3i$, $"neg"_i = 3i+1$, $"dum"_i = 3i+2$ forming a triangle. (2) Clause vertex $c_j = 3n+j$ connected to $"pos"_i$ if $x_i in C_j$, to $"neg"_i$ if $overline(x_i) in C_j$.
+  _Construction._ (1) _Variable triangle_ for $x_i$: vertices $"pos"_i = 3i$, $"neg"_i = 3i+1$, $"dum"_i = 3i+2$ forming a triangle. The dummy vertex $"dum"_i$ is adjacent only to $"pos"_i$ and $"neg"_i$, so it can only be dominated by a vertex from its own triangle. (2) _Clause vertex_ $c_j = 3n+j$ connected to $"pos"_i$ if $x_i in C_j$, to $"neg"_i$ if $overline(x_i) in C_j$.
 
-  _Correctness._ Each triangle requires at least one vertex in any dominating set. Size-$n$ set must take exactly one per triangle, which dominates clause vertices iff corresponding literals satisfy all clauses.
+  _Correctness._ ($arrow.r.double$) Given a satisfying assignment, select $"pos"_i$ if $x_i = 1$, else $"neg"_i$. This dominates all triangle vertices (each triangle has one selected vertex adjacent to both others). Each clause $C_j$ has at least one true literal, so $c_j$ is adjacent to at least one selected vertex. Total size: $n$. ($arrow.l.double$) Any dominating set needs $>= 1$ vertex per triangle (to dominate $"dum"_i$). A set of size $n$ has exactly one per triangle. If $"dum"_i$ is selected, it does not dominate any clause vertex; but it does dominate $"pos"_i$ and $"neg"_i$, which still need to cover clauses. Since $"dum"_i$ has no clause neighbors, we can swap it for $"pos"_i$ or $"neg"_i$ without losing domination of the triangle. After swapping, each clause vertex $c_j$ must be dominated by some $"pos"_i$ or $"neg"_i$, defining a consistent satisfying assignment.
 
   _Solution extraction._ Set $x_i = 1$ if $"pos"_i$ selected; $x_i = 0$ if $"neg"_i$ selected.
 ]
 
 #reduction-rule("KSatisfiability", "Satisfiability")[
-  Every $k$-SAT instance is already a SAT instance (clauses happen to have exactly $k$ literals). The embedding is trivial.
+  Every $k$-SAT instance is already a SAT instance --- clauses happen to have exactly $k$ literals, but SAT places no restriction on clause width. The embedding is the identity.
 ][
-  _Variable mapping:_ Identity — variables and clauses unchanged. _Solution extraction:_ identity.
+  _Construction._ Variables and clauses are unchanged.
+
+  _Correctness._ ($arrow.r.double$) Any $k$-SAT satisfying assignment satisfies the same clauses under SAT. ($arrow.l.double$) Any SAT satisfying assignment satisfies the same clauses (which all have width $k$). _Solution extraction._ Identity.
 ]
 
 #let sat_ksat = load-example("satisfiability_to_ksatisfiability")
@@ -808,23 +1239,31 @@ where $P$ is a penalty weight large enough that any constraint violation costs m
     First solution: $(x_1, ..., x_5) = (#sat_ksat_sol.source_config.map(str).join(", "))$, auxiliary vars are don't-cares #sym.checkmark
   ],
 )[
-  @cook1971 @garey1979 Any SAT formula converts to $k$-SAT ($k >= 3$) preserving satisfiability.
+  @cook1971 @garey1979 Clauses shorter than $k$ can be padded with a complementary pair $y, overline(y)$ that is always satisfiable; clauses longer than $k$ can be split into a chain of width-$k$ clauses linked by auxiliary variables that propagate truth values. Both transformations preserve satisfiability while enforcing uniform clause width.
 ][
-  _Small clauses ($|C| < k$):_ Pad $(ell_1 or ... or ell_r)$ with auxiliary $y$: $(ell_1 or ... or ell_r or y or overline(y) or ...)$ to length $k$.
+  _Construction._
+
+  _Small clauses ($|C| < k$):_ Pad $(ell_1 or dots or ell_r)$ with fresh auxiliary $y$: $(ell_1 or dots or ell_r or y or overline(y) or dots)$ to length $k$. The pair $y, overline(y)$ is a tautology, so the padded clause is satisfiable iff the original is.
 
-  _Large clauses ($|C| > k$):_ Split $(ell_1 or ... or ell_r)$ with auxiliaries $y_1, ..., y_(r-k)$:
-  $ (ell_1 or ... or ell_(k-1) or y_1) and (overline(y_1) or ell_k or ... or y_2) and ... and (overline(y_(r-k)) or ell_(r-k+2) or ... or ell_r) $
+  _Large clauses ($|C| > k$):_ Split $(ell_1 or dots or ell_r)$ with auxiliaries $y_1, dots, y_(r-k)$:
+  $ (ell_1 or dots or ell_(k-1) or y_1) and (overline(y_1) or ell_k or dots or y_2) and dots and (overline(y_(r-k)) or ell_(r-k+2) or dots or ell_r) $
 
-  _Correctness._ Original clause true $arrow.l.r$ auxiliary chain can propagate truth through new clauses.
+  _Correctness._ ($arrow.r.double$) If the original clause is satisfied by some literal $ell_j$, set the auxiliary chain so that $y_i = 1$ for all $i$ before $ell_j$'s sub-clause and $y_i = 0$ after. Each sub-clause then contains either a true original literal or a true auxiliary. ($arrow.l.double$) If all sub-clauses are satisfied but every original literal is false, the first clause forces $y_1 = 1$, which forces $y_2 = 1$ (since $overline(y_1)$ is false), and so on until the last clause has $overline(y_(r-k)) = 0$ and all remaining literals false --- a contradiction.
+
+  _Solution extraction._ Discard auxiliary variables; return original variable assignments.
 ]
 
 #reduction-rule("Satisfiability", "CircuitSAT",
   example: true,
   example-caption: [3-variable SAT formula to boolean circuit],
 )[
-  Each CNF clause $C_i = (ell_(i 1) or dots or ell_(i m_i))$ becomes an OR gate $g_i$, and a final AND gate computes $g_1 and dots and g_k$, constrained to output _true_.
+  CNF is inherently an AND-of-ORs structure, which maps directly to a boolean circuit: each clause becomes an OR gate over its literals, and a final AND gate combines all clause outputs. The circuit is constrained to output _true_, so a satisfying circuit assignment exists iff the original formula is satisfiable.
 ][
-  The circuit is satisfiable iff the CNF formula is satisfiable, since the AND-of-ORs structure is preserved exactly. Variable mapping: SAT variable $x_j$ maps to circuit input $x_j$; ancilla variables are the clause gate outputs and the final AND output.
+  _Construction._ For $phi = C_1 and dots and C_k$ with $C_i = (ell_(i 1) or dots or ell_(i m_i))$: (1) Create circuit inputs $x_1, dots, x_n$ corresponding to SAT variables. (2) For each clause $C_i$, add an OR gate $g_i$ with inputs from the clause's literals (negated inputs use NOT gates). (3) Add a final AND gate with inputs $g_1, dots, g_k$, constrained to output _true_.
+
+  _Correctness._ ($arrow.r.double$) A satisfying assignment makes at least one literal true in each clause, so each OR gate outputs true and the AND gate outputs true. ($arrow.l.double$) A satisfying circuit assignment has all OR gates true (forced by the AND output constraint), meaning at least one literal per clause is true --- exactly a SAT solution.
+
+  _Solution extraction._ Return the values of the circuit input variables $x_1, dots, x_n$.
 ]
 
 #let cs_sg = load-example("circuitsat_to_spinglass")
@@ -838,11 +1277,17 @@ where $P$ is a penalty weight large enough that any constraint violation costs m
     #cs_sg_r.solutions.len() ground states ($= 2^3$ valid input combinations for the full adder) #sym.checkmark
   ],
 )[
-  @whitfield2012 @lucas2014 Each gate maps to a gadget whose ground states encode valid I/O.
+  @whitfield2012 @lucas2014 Each logic gate can be represented as an Ising gadget --- a small set of spins with couplings $J_(i j)$ and fields $h_i$ chosen so that the gadget's ground states correspond exactly to the gate's truth table rows. Composing gadgets for all gates in the circuit yields a spin glass whose ground states encode precisely the satisfying assignments of the circuit. The energy gap between valid and invalid I/O patterns ensures that any global ground state respects every gate's logic.
 ][
-  _Spin mapping:_ $sigma in {0,1} arrow.bar s = 2sigma - 1 in {-1, +1}$.
+  _Construction._
+
+  _Spin mapping:_ Boolean variables $sigma in {0,1}$ map to Ising spins $s = 2sigma - 1 in {-1, +1}$. Each circuit variable is assigned a unique spin index; gate gadgets reference these indices for their inputs and outputs.
+
+  _Gate gadgets_ (inputs 0,1; output 2; auxiliary 3 for XOR) are listed in @tab:gadgets. For each gate, instantiate the gadget's couplings and fields. The total Hamiltonian is the sum over all gadgets: $H = -sum_(i < j) J_(i j) s_i s_j - sum_i h_i s_i$.
 
-  _Gate gadgets_ (inputs 0,1; output 2; auxiliary 3 for XOR) are shown in @tab:gadgets. Allocate spins per variable, instantiate gadgets, sum Hamiltonians. Ground states correspond to satisfying assignments.
+  _Correctness._ ($arrow.r.double$) A satisfying circuit assignment maps to a spin configuration where every gadget is in a ground state (valid I/O), so the total energy is minimized. ($arrow.l.double$) Any global ground state must minimize each gadget's contribution. Since each gadget's ground states match its gate's truth table, the spin configuration encodes a valid circuit evaluation. The output spin is constrained to $+1$ (true), so the circuit is satisfied.
+
+  _Solution extraction._ Map spins back to Boolean: $sigma_i = (s_i + 1) / 2$. Return the circuit input variables.
 ]
 
 #figure(
@@ -879,17 +1324,19 @@ where $P$ is a penalty weight large enough that any constraint violation costs m
     }).join(" and ") #sym.checkmark
   ],
 )[
-  An array multiplier with output constrained to $N$ is satisfiable iff $N$ factors within bit bounds. _(Folklore; no canonical reference.)_
+  Integer multiplication can be implemented as a boolean circuit: an $m times n$ array multiplier computes $p times q$ using only AND, XOR, and OR gates arranged in a grid of full adders. Constraining the output bits to match $N$ turns the circuit into a satisfiability problem --- the circuit is satisfiable iff $N = p times q$ for some $p, q$ within the given bit widths. _(Folklore; no canonical reference.)_
 ][
   _Construction._ Build $m times n$ array multiplier for $p times q$:
 
-  _Full adder $(i,j)$:_ $s_(i,j) + 2c_(i,j) = (p_i and q_j) + s_"prev" + c_"prev"$ via:
+  _Full adder $(i,j)$:_ Each cell computes one partial product bit $p_i and q_j$ and adds it to the running sum from previous cells. The sum and carry are: $s_(i,j) + 2c_(i,j) = (p_i and q_j) + s_"prev" + c_"prev"$, implemented via:
   $ a := p_i and q_j, quad t_1 := a xor s_"prev", quad s_(i,j) := t_1 xor c_"prev" $
   $ t_2 := t_1 and c_"prev", quad t_3 := a and s_"prev", quad c_(i,j) := t_2 or t_3 $
 
-  _Output constraint:_ $M_k := "bit"_k(N)$ for $k = 1, ..., m+n$.
+  _Output constraint:_ Fix output wires to the binary representation of $N$: $M_k := "bit"_k(N)$ for $k = 1, dots, m+n$.
+
+  _Correctness._ ($arrow.r.double$) If $N = p times q$ with $p < 2^m$ and $q < 2^n$, setting the input bits to the binary representations of $p$ and $q$ produces output bits matching $N$, satisfying all constraints. ($arrow.l.double$) Any satisfying assignment to the circuit computes a valid multiplication (the gates enforce arithmetic correctness), and the output constraint ensures the product equals $N$.
 
-  _Solution extraction._ $p = sum_i p_i 2^(i-1)$, $q = sum_j q_j 2^(j-1)$.
+  _Solution extraction._ Read off factor bits: $p = sum_i p_i 2^(i-1)$, $q = sum_j q_j 2^(j-1)$.
 ]
 
 #let mc_sg = load-example("maxcut_to_spinglass")
@@ -905,9 +1352,13 @@ where $P$ is a penalty weight large enough that any constraint violation costs m
     Cut value $= #mc_sg_cut$ ($#mc_sg_r.solutions.len()$-fold degenerate) #sym.checkmark
   ],
 )[
-  @barahona1982 Set $J_(i j) = w_(i j)$, $h_i = 0$. Maximizing cut equals minimizing $-sum J_(i j) s_i s_j$.
+  @barahona1982 A maximum cut partitions vertices into two groups to maximize the total weight of edges crossing the partition. In the Ising model, two spins with opposite signs contribute $-J_(i j) s_i s_j = J_(i j)$ to the energy, while same-sign spins contribute $-J_(i j)$. Setting $J_(i j) = w_(i j)$ and $h_i = 0$ makes each cut edge lower the energy by $2 J_(i j)$ relative to an uncut edge, so the Ising ground state corresponds to the maximum cut.
 ][
-  Opposite-partition vertices satisfy $s_i s_j = -1$, contributing $-J_(i j)(-1) = J_(i j)$ to the energy. _Variable mapping:_ $J_(i j) = w_(i j)$, $h_i = 0$, spins $s_i = 2 sigma_i - 1$ where $sigma_i in {0, 1}$ is the partition label. _Solution extraction:_ partition $= {i : s_i = +1}$.
+  _Construction._ Map each vertex to a spin with $J_(i j) = w_(i j)$ for each edge and $h_i = 0$ (no external field). Spins are $s_i = 2 sigma_i - 1$ where $sigma_i in {0, 1}$ is the partition label.
+
+  _Correctness._ ($arrow.r.double$) A maximum cut assigns $sigma_i in {0,1}$. For cut edges, $s_i s_j = -1$, contributing $-J_(i j)(-1) = +J_(i j)$. For uncut edges, $s_i s_j = +1$, contributing $-J_(i j)$. Maximizing cut weight is equivalent to minimizing $-sum J_(i j) s_i s_j$, the Ising energy. ($arrow.l.double$) An Ising ground state minimizes $-sum J_(i j) s_i s_j$, which is maximized when opposite-sign pairs (cut edges) have the largest possible weights --- exactly the maximum cut.
+
+  _Solution extraction._ Partition $= {i : s_i = +1}$.
 ]
 
 #let sg_mc = load-example("spinglass_to_maxcut")
@@ -922,17 +1373,17 @@ where $P$ is a penalty weight large enough that any constraint violation costs m
     Ground state ($#sg_mc_r.solutions.len()$-fold degenerate): partition $S = {#sg_mc_sol.source_config.enumerate().filter(((i, x)) => x == 1).map(((i, x)) => str(i)).join(", ")}$ #sym.checkmark
   ],
 )[
-  @barahona1982 @lucas2014 Ground states of Ising models correspond to maximum cuts.
+  @barahona1982 @lucas2014 The Ising Hamiltonian $H = -sum J_(i j) s_i s_j - sum h_i s_i$ has two types of terms. The pairwise couplings $J_(i j)$ map directly to MaxCut edge weights, since minimizing $-J_(i j) s_i s_j$ favors opposite spins (cut edges) when $J_(i j) > 0$. The local fields $h_i$ have no direct MaxCut analogue, but can be absorbed by introducing a single ancilla vertex connected to every spin with weight $h_i$: fixing the ancilla's partition side effectively creates a linear bias on each spin.
 ][
-  _MaxCut $arrow.r$ SpinGlass:_ Set $J_(i j) = w_(i j)$, $h_i = 0$. Maximizing cut equals minimizing $-sum J_(i j) s_i s_j$ since $s_i s_j = -1$ when $s_i != s_j$.
+  _Construction._ If all $h_i = 0$: set $w_(i j) = J_(i j)$ directly (1:1 mapping, no ancilla). If some $h_i != 0$: add ancilla vertex $a$ with edges $w_(i,a) = h_i$ for each spin $i$. The Ising energy $-sum J_(i j) s_i s_j - sum h_i s_i$ equals $-sum J_(i j) s_i s_j - sum h_i s_i s_a$ when $s_a = +1$, which is a pure pairwise Hamiltonian on $n + 1$ spins.
 
-  _SpinGlass $arrow.r$ MaxCut:_ If $h_i = 0$: direct mapping $w_(i j) = J_(i j)$. Otherwise, add ancilla $a$ with $w_(i,a) = h_i$.
+  _Correctness._ ($arrow.r.double$) An Ising ground state assigns spins to minimize $H$. The equivalent MaxCut graph has the same objective (up to a constant), so the spin configuration defines a maximum cut. ($arrow.l.double$) A maximum cut on the constructed graph maximizes $sum_("cut") w_(i j)$, which corresponds to minimizing $-sum J_(i j) s_i s_j - sum h_i s_i s_a$. With $s_a$ fixed, this is the Ising energy, so the cut defines a ground state.
 
-  _Solution extraction._ Without ancilla: identity. With ancilla: if $sigma_a = 1$, flip all spins before removing ancilla.
+  _Solution extraction._ Without ancilla: partition labels are the spin values directly. With ancilla: if $sigma_a = 1$ (ancilla on the $+1$ side), the spin values are read directly; if $sigma_a = 0$, flip all spins before reading (since the ancilla should represent $s_a = +1$).
 ]
 
 #reduction-rule("KColoring", "ILP")[
-  The $k$-coloring problem reduces to binary ILP with $|V| dot k$ variables and $|V| + |E| dot k$ constraints.
+  A $k$-coloring assigns each vertex exactly one of $k$ colors such that adjacent vertices differ. Both requirements are naturally linear: the "exactly one color" condition is an equality constraint on $k$ binary indicators per vertex, and the "neighbors differ" condition bounds each color's indicator sum to at most one per edge. The resulting ILP has $|V| k$ variables and $|V| + |E| k$ constraints with a trivial objective.
 ][
   _Construction._ For graph $G = (V, E)$ with $k$ colors:
 
@@ -948,7 +1399,7 @@ where $P$ is a penalty weight large enough that any constraint violation costs m
 ]
 
 #reduction-rule("Factoring", "ILP")[
-  Integer factorization reduces to binary ILP using McCormick linearization with $O(m n)$ variables and constraints.
+  Integer multiplication $p times q = N$ is a system of bilinear equations over binary factor bits with carry propagation. Each bit-product $p_i q_j$ is a bilinear term that McCormick linearization replaces with an auxiliary variable and three inequalities. The carry-chain equations are already linear, so the full system becomes a binary ILP with $O(m n)$ variables and constraints.
 ][
   _Construction._ For target $N$ with $m$-bit factor $p$ and $n$-bit factor $q$:
 
@@ -973,52 +1424,80 @@ where $P$ is a penalty weight large enough that any constraint violation costs m
 The following reductions to Integer Linear Programming are straightforward formulations where problem constraints map directly to linear inequalities.
 
 #reduction-rule("MaximumIndependentSet", "ILP")[
-  The maximum-weight IS problem reduces to binary ILP with $|V|$ variables and $|E|$ constraints.
+  Each vertex is either selected or not, and each edge forbids selecting both endpoints -- a constraint that is directly linear in binary indicator variables.
 ][
-  _Construction._ Variables: $x_v in {0, 1}$ for each $v in V$. Constraints: $x_u + x_v <= 1$ for each $(u, v) in E$. Objective: maximize $sum_v w_v x_v$. _Solution extraction:_ $S = {v : x_v = 1}$.
+  _Construction._ Variables: $x_v in {0, 1}$ for each $v in V$. Constraints: $x_u + x_v <= 1$ for each $(u, v) in E$. Objective: maximize $sum_v w_v x_v$.
+
+  _Correctness._ ($arrow.r.double$) An independent set has no two adjacent vertices selected, so all edge constraints hold. ($arrow.l.double$) Any feasible binary solution selects no two adjacent vertices, forming an independent set; the objective maximizes total weight.
+
+  _Solution extraction._ $S = {v : x_v = 1}$.
 ]
 
 #reduction-rule("MinimumVertexCover", "ILP")[
-  The minimum-weight VC problem reduces to binary ILP with $|V|$ variables and $|E|$ constraints.
+  Every edge must be covered by at least one endpoint -- a lower-bound constraint that is directly linear in binary vertex indicators.
 ][
-  _Construction._ Variables: $x_v in {0, 1}$ for each $v in V$. Constraints: $x_u + x_v >= 1$ for each $(u, v) in E$. Objective: minimize $sum_v w_v x_v$. _Solution extraction:_ $C = {v : x_v = 1}$.
+  _Construction._ Variables: $x_v in {0, 1}$ for each $v in V$. Constraints: $x_u + x_v >= 1$ for each $(u, v) in E$. Objective: minimize $sum_v w_v x_v$.
+
+  _Correctness._ ($arrow.r.double$) A vertex cover includes at least one endpoint of every edge, satisfying all constraints. ($arrow.l.double$) Any feasible solution covers every edge; the objective minimizes total weight.
+
+  _Solution extraction._ $C = {v : x_v = 1}$.
 ]
 
 #reduction-rule("MaximumMatching", "ILP")[
-  The maximum-weight matching reduces to binary ILP with $|E|$ variables and $|V|$ constraints.
+  Each edge is either selected or not, and each vertex may be incident to at most one selected edge -- a degree-bound constraint that is directly linear in binary edge indicators.
 ][
-  _Construction._ Variables: $x_e in {0, 1}$ for each $e in E$. Constraints: $sum_(e in.rev v) x_e <= 1$ for each $v in V$. Objective: maximize $sum_e w_e x_e$. _Solution extraction:_ $M = {e : x_e = 1}$.
+  _Construction._ Variables: $x_e in {0, 1}$ for each $e in E$. Constraints: $sum_(e in.rev v) x_e <= 1$ for each $v in V$. Objective: maximize $sum_e w_e x_e$.
+
+  _Correctness._ ($arrow.r.double$) A matching has at most one edge per vertex, so all degree constraints hold. ($arrow.l.double$) Any feasible solution is a matching by construction; the objective maximizes total weight.
+
+  _Solution extraction._ $M = {e : x_e = 1}$.
 ]
 
 #reduction-rule("MaximumSetPacking", "ILP")[
-  Set packing reduces to binary ILP with $|cal(S)|$ variables and at most $binom(|cal(S)|, 2)$ constraints.
+  Two sets conflict if they share a universe element, and at most one of each conflicting pair may be selected -- the same exclusion structure as independent set on the intersection graph, expressible as pairwise linear constraints.
 ][
-  _Construction._ Variables: $x_i in {0, 1}$ for each $S_i in cal(S)$. Constraints: $x_i + x_j <= 1$ for each overlapping pair $S_i, S_j in cal(S)$ with $S_i inter S_j != emptyset$. Objective: maximize $sum_i w_i x_i$. _Solution extraction:_ $cal(P) = {S_i : x_i = 1}$.
+  _Construction._ Variables: $x_i in {0, 1}$ for each $S_i in cal(S)$. Constraints: $x_i + x_j <= 1$ for each overlapping pair $S_i, S_j in cal(S)$ with $S_i inter S_j != emptyset$. Objective: maximize $sum_i w_i x_i$.
+
+  _Correctness._ ($arrow.r.double$) A packing has mutually disjoint sets, so no overlapping pair is co-selected. ($arrow.l.double$) Any feasible solution selects only mutually disjoint sets; the objective maximizes total weight.
+
+  _Solution extraction._ $cal(P) = {S_i : x_i = 1}$.
 ]
 
 #reduction-rule("MinimumSetCovering", "ILP")[
-  Set covering reduces to binary ILP with $|cal(S)|$ variables and $|U|$ constraints.
+  Every universe element must be covered by at least one selected set -- a lower-bound constraint on the sum of indicators for sets containing that element, which is directly linear.
 ][
-  _Construction._ Variables: $x_i in {0, 1}$ for each $S_i in cal(S)$. Constraints: $sum_(S_i in.rev u) x_i >= 1$ for each $u in U$. Objective: minimize $sum_i w_i x_i$. _Solution extraction:_ $cal(C) = {S_i : x_i = 1}$.
+  _Construction._ Variables: $x_i in {0, 1}$ for each $S_i in cal(S)$. Constraints: $sum_(S_i in.rev u) x_i >= 1$ for each $u in U$. Objective: minimize $sum_i w_i x_i$.
+
+  _Correctness._ ($arrow.r.double$) A set cover includes at least one set containing each element, satisfying all constraints. ($arrow.l.double$) Any feasible solution covers every element; the objective minimizes total weight.
+
+  _Solution extraction._ $cal(C) = {S_i : x_i = 1}$.
 ]
 
 #reduction-rule("MinimumDominatingSet", "ILP")[
-  Dominating set reduces to binary ILP with $|V|$ variables and $|V|$ constraints.
+  Every vertex must be dominated -- either selected itself or adjacent to a selected vertex -- which is a lower-bound constraint on the sum of indicators over its closed neighborhood.
 ][
-  _Construction._ Variables: $x_v in {0, 1}$ for each $v in V$. Constraints: $x_v + sum_(u in N(v)) x_u >= 1$ for each $v in V$ (each vertex dominated). Objective: minimize $sum_v w_v x_v$. _Solution extraction:_ $D = {v : x_v = 1}$.
+  _Construction._ Variables: $x_v in {0, 1}$ for each $v in V$. Constraints: $x_v + sum_(u in N(v)) x_u >= 1$ for each $v in V$ (each vertex dominated). Objective: minimize $sum_v w_v x_v$.
+
+  _Correctness._ ($arrow.r.double$) A dominating set includes a vertex or one of its neighbors for every vertex, satisfying all constraints. ($arrow.l.double$) Any feasible solution dominates every vertex; the objective minimizes total weight.
+
+  _Solution extraction._ $D = {v : x_v = 1}$.
 ]
 
 #reduction-rule("MaximumClique", "ILP")[
-  Maximum clique reduces to binary ILP with $|V|$ variables and $O(|overline(E)|)$ constraints.
+  A clique requires every pair of selected vertices to be adjacent; equivalently, no two selected vertices may share a _non_-edge. This is the independent set formulation on the complement graph $overline(G)$.
 ][
-  _Construction._ Variables: $x_v in {0, 1}$ for each $v in V$. Constraints: $x_u + x_v <= 1$ for each $(u, v) in.not E$ (non-edges). Objective: maximize $sum_v x_v$. Equivalently, IS on the complement graph. _Solution extraction:_ $K = {v : x_v = 1}$.
+  _Construction._ Variables: $x_v in {0, 1}$ for each $v in V$. Constraints: $x_u + x_v <= 1$ for each $(u, v) in.not E$ (non-edges). Objective: maximize $sum_v x_v$.
+
+  _Correctness._ ($arrow.r.double$) In a clique, every pair of selected vertices is adjacent, so no non-edge constraint is violated. ($arrow.l.double$) Any feasible solution selects only mutually adjacent vertices, forming a clique; the objective maximizes its size.
+
+  _Solution extraction._ $K = {v : x_v = 1}$.
 ]
 
 #reduction-rule("TravelingSalesman", "ILP",
   example: true,
   example-caption: [Weighted $K_4$: the optimal tour $0 arrow 1 arrow 3 arrow 2 arrow 0$ with cost 80 is found by position-based ILP.],
 )[
-  The traveling salesman problem reduces to binary ILP with $n^2 + 2 m n$ variables via position-based encoding with McCormick linearization.
+  A Hamiltonian tour is a permutation of vertices. Position-based encoding assigns each vertex a tour position via binary indicators, with permutation constraints ensuring a valid bijection. The tour cost involves products of position indicators for consecutive positions, which McCormick linearization converts to auxiliary variables with linear constraints.
 ][
   _Construction._ For graph $G = (V, E)$ with $n = |V|$ and $m = |E|$:
 
@@ -1030,17 +1509,19 @@ The following reductions to Integer Linear Programming are straightforward formu
 
   _Objective:_ Minimize $sum_((u,v) in E) w(u,v) dot sum_k (y_(u,v,k) + y_(v,u,k))$.
 
+  _Correctness._ ($arrow.r.double$) A valid tour defines a permutation matrix $(x_(v,k))$ satisfying constraints (1)--(2); consecutive vertices are adjacent by construction, so (3) holds; McCormick constraints (4) force $y = x_(u,k) x_(v,k+1)$, making the objective equal to the tour cost. ($arrow.l.double$) Any feasible binary solution defines a permutation (by (1)--(2)) where consecutive positions are connected by edges (by (3)), forming a Hamiltonian tour; the linearized objective equals the tour cost.
+
   _Solution extraction._ For each position $k$, find vertex $v$ with $x_(v,k) = 1$ to recover the tour permutation; then select edges between consecutive positions.
 ]
 
 == Unit Disk Mapping
 
 #reduction-rule("MaximumIndependentSet", "KingsSubgraph")[
-  @nguyen2023 Any MIS problem on a general graph $G$ can be reduced to MIS on a unit disk graph (King's subgraph) with at most quadratic overhead in the number of vertices.
+  @nguyen2023 The key idea is to represent each vertex of a general graph as a chain of grid nodes (a "copy line") on a King's subgraph, where adjacency is limited to unit-distance neighbors. Edges between vertices in the original graph are encoded by crossing gadgets: when two copy lines cross, the gadget ensures that at most one can be fully selected, mimicking the independence constraint. The overhead from the copy-line structure is a known constant $Delta$, so $"MIS"(G_"grid") = "MIS"(G) + Delta$, and the reduction preserves optimality with at most quadratic blowup.
 ][
   _Construction (Copy-Line Method)._ Given $G = (V, E)$ with $n = |V|$:
 
-  1. _Vertex ordering:_ Compute a path decomposition of $G$ to obtain vertex order $(v_1, ..., v_n)$. The pathwidth determines the grid height.
+  1. _Vertex ordering:_ Compute a path decomposition of $G$ to obtain vertex order $(v_1, dots, v_n)$. The pathwidth determines the grid height.
 
   2. _Copy lines:_ For each vertex $v_i$, create an L-shaped "copy line" on the grid:
   $ "CopyLine"(v_i) = {(r, c_i) : r in [r_"start", r_"stop"]} union {(r_i, c) : c in [c_i, c_"stop"]} $
@@ -1051,9 +1532,9 @@ The following reductions to Integer Linear Programming are straightforward formu
   4. _MIS correspondence:_ Each copy line has MIS contribution $approx |"line"|/2$. The gadgets add overhead $Delta$ such that:
   $ "MIS"(G_"grid") = "MIS"(G) + Delta $
 
-  _Solution extraction._ For each copy line, check if the majority of its vertices are in the grid MIS. Map back: $v_i in S$ iff copy line $i$ is active.
+  _Correctness._ ($arrow.r.double$) An IS $S$ in $G$ maps to a grid IS by activating copy lines for vertices in $S$ (selecting alternating grid nodes) and deactivating lines for vertices not in $S$. At each crossing gadget between adjacent vertices $v_i, v_j in S$, at most one line is active, but since $v_i$ and $v_j$ are not both in $S$ (they are independent), no conflict arises. ($arrow.l.double$) A grid MIS determines which copy lines are active (majority of nodes selected). Active lines correspond to an IS in $G$: if two adjacent vertices $v_i, v_j$ were both active, their crossing gadget would prevent both from contributing fully, contradicting optimality.
 
-  _Correctness._ ($arrow.r.double$) An IS in $G$ maps to selecting all copy line vertices for included vertices; crossing gadgets ensure no conflicts. ($arrow.l.double$) A grid MIS maps back to an IS by the copy line activity rule.
+  _Solution extraction._ For each copy line, check if the majority of its vertices are in the grid MIS. Map back: $v_i in S$ iff copy line $i$ is active.
 ]
 
 *Example: Petersen Graph.*#footnote[Generated using `cargo run --example export_petersen_mapping` from the accompanying code repository.] The Petersen graph ($n=10$, MIS$=4$) maps to a $30 times 42$ King's subgraph with 219 nodes and overhead $Delta = 89$. Solving MIS on the grid yields $"MIS"(G_"grid") = 4 + 89 = 93$. The weighted and unweighted KSG mappings share identical grid topology (same node positions and edges); only the vertex weights differ. With triangular lattice encoding @nguyen2023, the same graph maps to a $42 times 60$ grid with 395 nodes and overhead $Delta = 375$, giving $"MIS"(G_"tri") = 4 + 375 = 379$.
@@ -1093,9 +1574,13 @@ The following reductions to Integer Linear Programming are straightforward formu
 ) <fig:petersen-mapping>
 
 #reduction-rule("MaximumIndependentSet", "TriangularSubgraph")[
-  @nguyen2023 Any MIS problem on a general graph $G$ can be reduced to MIS on a weighted triangular lattice graph with at most quadratic overhead in the number of vertices.
+  @nguyen2023 The same copy-line principle as the King's subgraph reduction applies, but on a triangular lattice. The triangular geometry offers a denser packing of neighbors (each node has 6 neighbors vs. 8 in the King's grid), which requires redesigned crossing and simplifier gadgets but preserves the same asymptotic overhead. The resulting graph is a unit disk graph under the triangular metric, suitable for hardware architectures based on triangular lattice connectivity.
 ][
-  _Construction._ Same copy-line method as the KSG mapping, but uses a triangular lattice instead of a square grid. Crossing and simplifier gadgets are adapted for triangular geometry, producing a unit disk graph on a triangular grid where edges connect nodes within unit distance under the triangular metric.
+  _Construction._ Same copy-line method as the KSG mapping: vertex ordering via path decomposition, L-shaped copy lines, and crossing gadgets at edge intersections. The gadgets are adapted for the triangular lattice geometry, where adjacency is defined by unit distance under the triangular metric (6 neighbors per interior node instead of 8).
+
+  _Correctness._ ($arrow.r.double$) An IS in $G$ maps to an IS on the triangular grid by the same copy-line activation mechanism. ($arrow.l.double$) A grid MIS maps back to an IS by the copy-line activity rule, with the adapted crossing gadgets enforcing the same independence constraints.
+
+  _Solution extraction._ Same as the KSG mapping: determine copy-line activity by majority vote, then map back to the original graph.
 
   _Overhead._ Both vertex and edge counts grow as $O(n^2)$ where $n = |V|$, matching the KSG mapping.
 ]
diff --git a/docs/paper/references.bib b/docs/paper/references.bib
index add1254c6..e8c6ff5da 100644
--- a/docs/paper/references.bib
+++ b/docs/paper/references.bib
@@ -101,6 +101,13 @@ @article{robson1986
   doi     = {10.1016/0196-6774(86)90032-5}
 }
 
+@article{robson2001,
+  author  = {J. M. Robson},
+  title   = {Finding a Maximum Independent Set in Time $O(2^{n/4})$},
+  year    = {2001},
+  note    = {Technical Report 1251-01, LaBRI, Université Bordeaux I}
+}
+
 @article{vanrooij2011,
   author  = {Johan M. M. van Rooij and Hans L. Bodlaender},
   title   = {Exact algorithms for dominating set},
@@ -300,3 +307,14 @@ @article{epping2004
   doi     = {10.1016/S0166-218X(03)00442-6}
 }
 
+@article{shannon1956,
+  author  = {Claude E. Shannon},
+  title   = {The zero error capacity of a noisy channel},
+  journal = {IRE Transactions on Information Theory},
+  volume  = {2},
+  number  = {3},
+  pages   = {8--19},
+  year    = {1956},
+  doi     = {10.1109/TIT.1956.1056798}
+}
+