openai · EthanYangTW · Mar 24, 2026 · Mar 25, 2026 · Mar 25, 2026 · Mar 25, 2026
diff --git a/records/track_10min_16mb/2026-03-25_CROWNQ_GPTQ_SlidingWindow/README.md b/records/track_10min_16mb/2026-03-25_CROWNQ_GPTQ_SlidingWindow/README.md
@@ -0,0 +1,60 @@
+# CROWN-Q + Full GPTQ + SWA/EMA Blend
+
+## Summary
+
+- **CROWN-Q**: Curvature-weighted quantization variance penalty applied during warmdown. Encourages weights to settle in flat minima where int6 quantization causes less damage. Penalty: `lambda * mean(h_j) * delta_j^2 / 12` per row, where `h_j = w^2` (curvature proxy) and `delta_j = row_max / 15` (CROWN-Q step size). Note: the GPTQ/QAT quantizer uses clip_range=31; CROWN-Q intentionally uses a larger step size (row_max/15) to over-penalize and push weights further into flat basins.
+- **Full Cholesky GPTQ**: Hessian-aware quantization with act-order column permutation, block_size=128, 256-sample calibration from training data. GPTQ runs after the 585s training phase as part of model export.
+- **SWA/EMA 50/50 blend**: Stochastic Weight Averaging (every 50 steps during warmdown) blended 50/50 with EMA (decay=0.997).
+- **Architecture**: 11L, 512d, GQA 8H/4KV, MLP 3x LeakyReLU(0.5)^2, XSA on last 4 layers (7-10), VRL, BigramHash 3072, partial RoPE 16/64.
+- **Eval**: Sliding window with stride=64. No test-time training.
+
+## Configuration
+
+```bash
+torchrun --standalone --nproc_per_node=8 train_gpt.py
+
+# Key env vars (all defaults in code):
+# CROWNQ_LAMBDA=0.01        — CROWN-Q penalty weight
+# CROWNQ_WARMDOWN_ONLY=1    — only apply during warmdown
+# LATE_QAT_THRESHOLD=0.15   — QAT activation point
+# MAX_WALLCLOCK_SECONDS=585  — training budget
+# WARMDOWN_ITERS=4000        — warmdown length
+# TTT_ENABLED=0              — TTT disabled for this submission
+```
+
+## Results
+
+| Seed | Steps | Post-EMA BPB | Sliding BPB | Artifact |
+|------|-------|-------------|-------------|----------|
+| 1337 | 6613  | 1.1387      | **1.1189**  | 15,945,134 |
+| 42   | 6612  | 1.1382      | **1.1189**  | 15,947,742 |
+| 7    | 6613  | 1.1378      | **1.1179**  | 15,938,790 |
+| **Mean** | | 1.1382 | **1.1186** | |
+| **Std** | | | 0.0006 | |
+
+- Step speed: 87ms/step (FA3 Hopper)
+- Quant gap (roundtrip): ~0.004 BPB
+- Sliding window eval time: ~75s
+- Training time: 585s (under 600s budget)
+
+## What is CROWN-Q?
+
+CROWN-Q (Curvature-Regularized Optimization for Weight Noise Quantization) adds a training-time penalty that makes weights more robust to quantization noise:
+
+1. For each weight matrix, compute the per-row quantization step size `delta = row_max / 15`
+2. Compute quantization variance `delta^2 / 12` (uniform rounding noise)
+3. Weight by curvature proxy `h = mean(w^2)` per row (mean of squared weights)
+4. Penalty: `lambda * sum(h * quant_var)` encourages the optimizer to reduce weights in directions where quantization noise is most damaging
+
+The CROWN-Q step size (row_max/15) is intentionally larger than the actual quantizer step size (row_max/31, clip_range=31). This over-penalization pushes weights further into flat basins, providing extra robustness margin against quantization damage.
+
+Applied only during warmdown when QAT is active. Zero eval-time cost.
+
+## Included Files
+
+- `train_gpt.py` — self-contained training script
+- `submission.json` — submission metadata
+- `README.md` — this file
+- `train_seed1337.log` — seed 1337 training log
+- `train_seed42.log` — seed 42 training log
+- `train_seed7.log` — seed 7 training log
diff --git a/records/track_10min_16mb/2026-03-25_CROWNQ_GPTQ_SlidingWindow/submission.json b/records/track_10min_16mb/2026-03-25_CROWNQ_GPTQ_SlidingWindow/submission.json
@@ -0,0 +1,31 @@
+{
+  "author": "Ethan Yang",
+  "github_id": "EthanYangTW",
+  "name": "CROWN-Q + Full GPTQ + SWA/EMA Blend",
+  "blurb": "Curvature-weighted quantization variance penalty (CROWN-Q) during warmdown reduces quantization damage. Full Cholesky GPTQ with act-order, SWA/EMA 50/50 blend, VRL, XSA last 4 layers, LeakyReLU(0.5)^2. Sliding window eval only, no TTT.",
+  "date": "2026-03-25T06:30:00Z",
+  "val_loss": 1.8886,
+  "val_loss_std": 0.0009,
+  "val_bpb": 1.1186,
+  "val_bpb_std": 0.0006,
+  "seeds": [1337, 42, 7],
+  "seed_results": {
+    "1337": {
+      "val_bpb": 1.1189,
+      "val_loss": 1.8891,
+      "bytes": 15945134
+    },
+    "42": {
+      "val_bpb": 1.1189,
+      "val_loss": 1.8891,
+      "bytes": 15947742
+    },
+    "7": {
+      "val_bpb": 1.1179,
+      "val_loss": 1.8876,
+      "bytes": 15938790
+    }
+  },
+  "bytes_total": 15947742,
+  "bytes_code": 95390
+}