diff --git a/CHANGELOG.md b/CHANGELOG.md
index 17ec76c4..7f62f00a 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -10,6 +10,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Added
 - **`BusinessReport` and `DiagnosticReport` (experimental preview)** - practitioner-ready output layer. `BusinessReport(results, ...)` produces plain-English narrative summaries (`.summary()`, `.full_report()`, `.export_markdown()`, `.to_dict()`) from any of the 16 fitted result types. `DiagnosticReport(results, ...)` orchestrates the existing diagnostic battery (parallel trends, pre-trends power, HonestDiD sensitivity, Goodman-Bacon, heterogeneity, design-effect, EPV) plus estimator-native diagnostics for SyntheticDiD (`pre_treatment_fit`, weight concentration, in-time placebo, zeta sensitivity) and TROP (factor-model fit metrics). Both classes expose an AI-legible `to_dict()` schema (single source of truth; prose renders from the dict). BR auto-constructs DR by default so summaries mention pre-trends, robustness, and design-effect findings in one call. See `docs/methodology/REPORTING.md` for methodology deviations including the no-traffic-light-gates decision, pre-trends verdict thresholds (0.05 / 0.30), and power-aware phrasing driven by `compute_pretrends_power`. **Both schemas are marked experimental in this release** - wording, verdict thresholds, and schema shape will change; do not anchor downstream tooling on them yet.
 
+### Performance
+- **`aggregate_survey` stratum-PSU scaffolding precompute** — the per-cell Taylor-series variance inside `aggregate_survey` no longer rebuilds stratum-PSU scaffolding on every cell. A frozen `_PsuScaffolding` (strata codes, global PSU codes unique across strata, per-stratum counts and FPC ratios, singleton mask, static legitimate-zero counts and variance-computable flag) is precomputed once per design at the top of `aggregate_survey` and threaded through `_cell_mean_variance` to a new `_compute_if_variance_fast` path that replaces the per-stratum pandas groupby with two vectorized `np.bincount` passes. BRFSS-shaped 50-state × 10-year × 1M-row microdata → state-year panel drops from ~24s to sub-2s under both backends (the path is pure Python, so Python and Rust track each other). Numerical output is preserved to sub-ULP tolerance; seven-case equivalence tests (`TestAggregateSurveyScaffolding`) assert `assert_allclose(atol=1e-14, rtol=1e-14)` between fast and legacy paths across stratified+PSU+FPC, stratified no FPC, PSU-only, weights-only, and all three `lonely_psu` modes (remove / certainty / adjust). Replicate-weight designs continue to route through `compute_replicate_if_variance` unchanged. `_compute_stratified_psu_meat` is untouched — all other TSL callers (DiD / TWFE / CS / etc.) are unaffected.
+
 ### Changed
 - Add Zenodo DOI badge to README; upgrade the BibTeX citation block with the concept DOI (`10.5281/zenodo.19646175`) and list author as Isaac Gerber (matching `CITATION.cff`). Add `doi:` and `identifiers:` entries (concept + versioned) to `CITATION.cff`. DOI was minted by Zenodo when v3.1.3 was released.
 - **`ChaisemartinDHaultfoeuille` heterogeneity + within-group-varying PSU/strata now supported under Binder TSL** - `fit(heterogeneity=..., survey_design=...)` no longer raises `NotImplementedError` when the resolved design's PSU or strata vary across the cells of a group. On the **Binder TSL** branch (`compute_survey_if_variance`), the heterogeneity WLS coefficient IF is expanded to observation level via the cell-period allocator `ψ_i = ψ_g * (w_i / W_{g, out_idx})` on the post-period cell — the DID_l post-period single-cell convention shipped in v3.1.x. Under PSU=group the PSU-level Binder TSL variance is byte-identical to the previous release (PSU-level aggregate telescopes to `ψ_g`); under within-group-varying PSU, mass lands in the post-period PSU of the transition. The **Rao-Wu replicate-weight** branch (`compute_replicate_if_variance`) retains the legacy group-level allocator `ψ_i = ψ_g * (w_i / W_g)`: replicate variance computes `θ_r = sum_i ratio_ir * ψ_i` at observation level and is therefore not PSU-telescoping, so the cell-period allocator would silently change the replicate SE whenever a replicate column's ratios vary within group (e.g., per-row replicate matrices). Replicate + heterogeneity fits therefore produce byte-identical SE to the previous release, and the newly-unblocked `heterogeneity=` + within-group-varying PSU combination is unreachable under replicate designs by construction (`SurveyDesign` rejects `replicate_weights` combined with explicit `strata/psu/fpc`).
diff --git a/benchmarks/speed_review/baselines/brand_awareness_survey_large_python.json b/benchmarks/speed_review/baselines/brand_awareness_survey_large_python.json
index c8eb9108..22ed15d1 100644
--- a/benchmarks/speed_review/baselines/brand_awareness_survey_large_python.json
+++ b/benchmarks/speed_review/baselines/brand_awareness_survey_large_python.json
@@ -2,47 +2,47 @@
   "scenario": "brand_awareness_survey_large",
   "backend": "python",
   "has_rust_backend": false,
-  "total_seconds": 1.0910496250000001,
+  "total_seconds": 0.8670909579999999,
   "memory": {
     "available": true,
-    "start_mb": 188.45,
-    "peak_mb": 327.44,
-    "growth_mb": 138.98,
+    "start_mb": 200.7,
+    "peak_mb": 340.16,
+    "growth_mb": 139.45,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_naive_fit_no_survey_design": {
-      "seconds": 0.009826500000000182,
+      "seconds": 0.01288558399999995,
       "ok": true,
       "error": null
     },
     "2_tsl_strata_psu_fpc": {
-      "seconds": 0.030280333999999964,
+      "seconds": 0.03156662499999996,
       "ok": true,
       "error": null
     },
     "3_replicate_weights_jk1": {
-      "seconds": 0.6243122919999999,
+      "seconds": 0.39469687499999995,
       "ok": true,
       "error": null
     },
     "4_multi_outcome_loop_3_metrics": {
-      "seconds": 0.24174716599999968,
+      "seconds": 0.22814783400000005,
       "ok": true,
       "error": null
     },
     "5_check_parallel_trends": {
-      "seconds": 0.025623749999999834,
+      "seconds": 0.04083812500000006,
       "ok": true,
       "error": null
     },
     "6_placebo_refit_pre_period": {
-      "seconds": 0.01191299999999984,
+      "seconds": 0.014936375000000002,
       "ok": true,
       "error": null
     },
     "7_event_study_plus_honest_did": {
-      "seconds": 0.147335875,
+      "seconds": 0.14401216700000008,
       "ok": true,
       "error": null
     }
diff --git a/benchmarks/speed_review/baselines/brand_awareness_survey_large_rust.json b/benchmarks/speed_review/baselines/brand_awareness_survey_large_rust.json
index a3eb721c..ffcc5060 100644
--- a/benchmarks/speed_review/baselines/brand_awareness_survey_large_rust.json
+++ b/benchmarks/speed_review/baselines/brand_awareness_survey_large_rust.json
@@ -2,47 +2,47 @@
   "scenario": "brand_awareness_survey_large",
   "backend": "rust",
   "has_rust_backend": true,
-  "total_seconds": 1.0000031249999999,
+  "total_seconds": 0.9299781670000002,
   "memory": {
     "available": true,
-    "start_mb": 194.03,
-    "peak_mb": 336.08,
-    "growth_mb": 142.05,
+    "start_mb": 190.2,
+    "peak_mb": 347.92,
+    "growth_mb": 157.72,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_naive_fit_no_survey_design": {
-      "seconds": 0.013511041000000112,
+      "seconds": 0.01335629100000002,
       "ok": true,
       "error": null
     },
     "2_tsl_strata_psu_fpc": {
-      "seconds": 0.03037650000000003,
+      "seconds": 0.0316900830000002,
       "ok": true,
       "error": null
     },
     "3_replicate_weights_jk1": {
-      "seconds": 0.5431151669999998,
+      "seconds": 0.46433058400000005,
       "ok": true,
       "error": null
     },
     "4_multi_outcome_loop_3_metrics": {
-      "seconds": 0.21752962499999962,
+      "seconds": 0.23703795799999994,
       "ok": true,
       "error": null
     },
     "5_check_parallel_trends": {
-      "seconds": 0.04399687500000038,
+      "seconds": 0.030673249999999985,
       "ok": true,
       "error": null
     },
     "6_placebo_refit_pre_period": {
-      "seconds": 0.016433082999999904,
+      "seconds": 0.011707583000000188,
       "ok": true,
       "error": null
     },
     "7_event_study_plus_honest_did": {
-      "seconds": 0.13501837500000002,
+      "seconds": 0.14117254200000007,
       "ok": true,
       "error": null
     }
diff --git a/benchmarks/speed_review/baselines/brand_awareness_survey_medium_python.json b/benchmarks/speed_review/baselines/brand_awareness_survey_medium_python.json
index 869c5393..a59f68b4 100644
--- a/benchmarks/speed_review/baselines/brand_awareness_survey_medium_python.json
+++ b/benchmarks/speed_review/baselines/brand_awareness_survey_medium_python.json
@@ -2,47 +2,47 @@
   "scenario": "brand_awareness_survey_medium",
   "backend": "python",
   "has_rust_backend": false,
-  "total_seconds": 0.563283334,
+  "total_seconds": 0.529578166,
   "memory": {
     "available": true,
-    "start_mb": 133.69,
-    "peak_mb": 187.7,
-    "growth_mb": 54.02,
+    "start_mb": 137.67,
+    "peak_mb": 182.88,
+    "growth_mb": 45.2,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_naive_fit_no_survey_design": {
-      "seconds": 0.010921792000000097,
+      "seconds": 0.01053379199999993,
       "ok": true,
       "error": null
     },
     "2_tsl_strata_psu_fpc": {
-      "seconds": 0.03732066599999995,
+      "seconds": 0.032504792000000005,
       "ok": true,
       "error": null
     },
     "3_replicate_weights_jk1": {
-      "seconds": 0.20805304199999997,
+      "seconds": 0.16178545899999996,
       "ok": true,
       "error": null
     },
     "4_multi_outcome_loop_3_metrics": {
-      "seconds": 0.12622899999999992,
+      "seconds": 0.1744099589999999,
       "ok": true,
       "error": null
     },
     "5_check_parallel_trends": {
-      "seconds": 0.01834783299999998,
+      "seconds": 0.02328412499999999,
       "ok": true,
       "error": null
     },
     "6_placebo_refit_pre_period": {
-      "seconds": 0.054030583000000076,
+      "seconds": 0.06313762499999998,
       "ok": true,
       "error": null
     },
     "7_event_study_plus_honest_did": {
-      "seconds": 0.10836029199999997,
+      "seconds": 0.06389345899999999,
       "ok": true,
       "error": null
     }
diff --git a/benchmarks/speed_review/baselines/brand_awareness_survey_medium_rust.json b/benchmarks/speed_review/baselines/brand_awareness_survey_medium_rust.json
index 2ceed1ca..42535c3a 100644
--- a/benchmarks/speed_review/baselines/brand_awareness_survey_medium_rust.json
+++ b/benchmarks/speed_review/baselines/brand_awareness_survey_medium_rust.json
@@ -2,47 +2,47 @@
   "scenario": "brand_awareness_survey_medium",
   "backend": "rust",
   "has_rust_backend": true,
-  "total_seconds": 0.5500554579999999,
+  "total_seconds": 0.50248775,
   "memory": {
     "available": true,
-    "start_mb": 135.36,
-    "peak_mb": 184.86,
-    "growth_mb": 49.5,
+    "start_mb": 133.94,
+    "peak_mb": 189.34,
+    "growth_mb": 55.41,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_naive_fit_no_survey_design": {
-      "seconds": 0.011186999999999947,
+      "seconds": 0.010962209,
       "ok": true,
       "error": null
     },
     "2_tsl_strata_psu_fpc": {
-      "seconds": 0.03363270800000007,
+      "seconds": 0.03478112499999997,
       "ok": true,
       "error": null
     },
     "3_replicate_weights_jk1": {
-      "seconds": 0.18678066699999996,
+      "seconds": 0.13834324999999992,
       "ok": true,
       "error": null
     },
     "4_multi_outcome_loop_3_metrics": {
-      "seconds": 0.16038787500000007,
+      "seconds": 0.1290292500000001,
       "ok": true,
       "error": null
     },
     "5_check_parallel_trends": {
-      "seconds": 0.022171542000000155,
+      "seconds": 0.02951112499999997,
       "ok": true,
       "error": null
     },
     "6_placebo_refit_pre_period": {
-      "seconds": 0.0532650830000001,
+      "seconds": 0.06002304200000008,
       "ok": true,
       "error": null
     },
     "7_event_study_plus_honest_did": {
-      "seconds": 0.08262075000000002,
+      "seconds": 0.09981400000000007,
       "ok": true,
       "error": null
     }
diff --git a/benchmarks/speed_review/baselines/brand_awareness_survey_small_python.json b/benchmarks/speed_review/baselines/brand_awareness_survey_small_python.json
index 699da724..51e34058 100644
--- a/benchmarks/speed_review/baselines/brand_awareness_survey_small_python.json
+++ b/benchmarks/speed_review/baselines/brand_awareness_survey_small_python.json
@@ -2,47 +2,47 @@
   "scenario": "brand_awareness_survey_small",
   "backend": "python",
   "has_rust_backend": false,
-  "total_seconds": 0.19338629200000002,
+  "total_seconds": 0.22668149999999998,
   "memory": {
     "available": true,
-    "start_mb": 115.48,
-    "peak_mb": 127.31,
-    "growth_mb": 11.83,
+    "start_mb": 115.44,
+    "peak_mb": 130.16,
+    "growth_mb": 14.72,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_naive_fit_no_survey_design": {
-      "seconds": 0.0014470410000000378,
+      "seconds": 0.00165958300000002,
       "ok": true,
       "error": null
     },
     "2_tsl_strata_psu_fpc": {
-      "seconds": 0.0072707499999999925,
+      "seconds": 0.006191999999999975,
       "ok": true,
       "error": null
     },
     "3_replicate_weights_jk1": {
-      "seconds": 0.023173292000000068,
+      "seconds": 0.02364570900000007,
       "ok": true,
       "error": null
     },
     "4_multi_outcome_loop_3_metrics": {
-      "seconds": 0.03375529200000005,
+      "seconds": 0.07623400000000002,
       "ok": true,
       "error": null
     },
     "5_check_parallel_trends": {
-      "seconds": 0.01041325000000004,
+      "seconds": 0.009393082999999969,
       "ok": true,
       "error": null
     },
     "6_placebo_refit_pre_period": {
-      "seconds": 0.027520249999999913,
+      "seconds": 0.02586829199999996,
       "ok": true,
       "error": null
     },
     "7_event_study_plus_honest_did": {
-      "seconds": 0.08979433299999995,
+      "seconds": 0.08367512499999996,
       "ok": true,
       "error": null
     }
diff --git a/benchmarks/speed_review/baselines/brand_awareness_survey_small_rust.json b/benchmarks/speed_review/baselines/brand_awareness_survey_small_rust.json
index 006bc684..00cd03e8 100644
--- a/benchmarks/speed_review/baselines/brand_awareness_survey_small_rust.json
+++ b/benchmarks/speed_review/baselines/brand_awareness_survey_small_rust.json
@@ -2,47 +2,47 @@
   "scenario": "brand_awareness_survey_small",
   "backend": "rust",
   "has_rust_backend": true,
-  "total_seconds": 0.19669587500000008,
+  "total_seconds": 0.198891041,
   "memory": {
     "available": true,
-    "start_mb": 114.78,
-    "peak_mb": 127.91,
-    "growth_mb": 13.12,
+    "start_mb": 115.05,
+    "peak_mb": 127.78,
+    "growth_mb": 12.73,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_naive_fit_no_survey_design": {
-      "seconds": 0.0016678749999999853,
+      "seconds": 0.0019442080000000583,
       "ok": true,
       "error": null
     },
     "2_tsl_strata_psu_fpc": {
-      "seconds": 0.005756874999999995,
+      "seconds": 0.006045499999999926,
       "ok": true,
       "error": null
     },
     "3_replicate_weights_jk1": {
-      "seconds": 0.012066042000000055,
+      "seconds": 0.02063908400000003,
       "ok": true,
       "error": null
     },
     "4_multi_outcome_loop_3_metrics": {
-      "seconds": 0.05887395800000006,
+      "seconds": 0.05060483399999993,
       "ok": true,
       "error": null
     },
     "5_check_parallel_trends": {
-      "seconds": 0.008938375000000054,
+      "seconds": 0.009498208000000008,
       "ok": true,
       "error": null
     },
     "6_placebo_refit_pre_period": {
-      "seconds": 0.0274049999999999,
+      "seconds": 0.025947834000000003,
       "ok": true,
       "error": null
     },
     "7_event_study_plus_honest_did": {
-      "seconds": 0.08197737500000002,
+      "seconds": 0.08419849999999995,
       "ok": true,
       "error": null
     }
diff --git a/benchmarks/speed_review/baselines/brfss_panel_large_python.json b/benchmarks/speed_review/baselines/brfss_panel_large_python.json
index 1772355b..9437734c 100644
--- a/benchmarks/speed_review/baselines/brfss_panel_large_python.json
+++ b/benchmarks/speed_review/baselines/brfss_panel_large_python.json
@@ -2,42 +2,42 @@
   "scenario": "brfss_panel_large",
   "backend": "python",
   "has_rust_backend": false,
-  "total_seconds": 24.406984582999996,
+  "total_seconds": 1.328024584,
   "memory": {
     "available": true,
-    "start_mb": 401.05,
-    "peak_mb": 418.12,
-    "growth_mb": 17.08,
+    "start_mb": 387.59,
+    "peak_mb": 412.75,
+    "growth_mb": 25.16,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_aggregate_survey_microdata_to_panel": {
-      "seconds": 24.295822291,
+      "seconds": 1.2118086249999998,
       "ok": true,
       "error": null
     },
     "2_cs_fit_with_stage2_survey_design": {
-      "seconds": 0.012265292000002148,
+      "seconds": 0.012898916999999788,
       "ok": true,
       "error": null
     },
     "3_inspect_pretrends": {
-      "seconds": 2.2919999977943917e-06,
+      "seconds": 2.5409999997449972e-06,
       "ok": true,
       "error": null
     },
     "4_honest_did_grid": {
-      "seconds": 0.0016812089999973523,
+      "seconds": 0.0018360419999998712,
       "ok": true,
       "error": null
     },
     "5_sun_abraham_robustness": {
-      "seconds": 0.09669395799999592,
+      "seconds": 0.10123833299999996,
       "ok": true,
       "error": null
     },
     "6_practitioner_next_steps": {
-      "seconds": 0.0005083750000025589,
+      "seconds": 0.00022966599999962867,
       "ok": true,
       "error": null
     }
diff --git a/benchmarks/speed_review/baselines/brfss_panel_large_rust.json b/benchmarks/speed_review/baselines/brfss_panel_large_rust.json
index 886c63cc..338bfe61 100644
--- a/benchmarks/speed_review/baselines/brfss_panel_large_rust.json
+++ b/benchmarks/speed_review/baselines/brfss_panel_large_rust.json
@@ -2,42 +2,42 @@
   "scenario": "brfss_panel_large",
   "backend": "rust",
   "has_rust_backend": true,
-  "total_seconds": 24.936181916,
+  "total_seconds": 1.31504775,
   "memory": {
     "available": true,
-    "start_mb": 396.06,
-    "peak_mb": 429.31,
-    "growth_mb": 33.25,
+    "start_mb": 384.2,
+    "peak_mb": 409.28,
+    "growth_mb": 25.08,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_aggregate_survey_microdata_to_panel": {
-      "seconds": 24.820139083,
+      "seconds": 1.2451636250000002,
       "ok": true,
       "error": null
     },
     "2_cs_fit_with_stage2_survey_design": {
-      "seconds": 0.012674374999996019,
+      "seconds": 0.013531541999999952,
       "ok": true,
       "error": null
     },
     "3_inspect_pretrends": {
-      "seconds": 2.500000000793534e-06,
+      "seconds": 2.916000000130481e-06,
       "ok": true,
       "error": null
     },
     "4_honest_did_grid": {
-      "seconds": 0.0015977500000019518,
+      "seconds": 0.001939415999999916,
       "ok": true,
       "error": null
     },
     "5_sun_abraham_robustness": {
-      "seconds": 0.10144270800000044,
+      "seconds": 0.054231499999999766,
       "ok": true,
       "error": null
     },
     "6_practitioner_next_steps": {
-      "seconds": 0.00030387500000017553,
+      "seconds": 0.0001666249999998648,
       "ok": true,
       "error": null
     }
diff --git a/benchmarks/speed_review/baselines/brfss_panel_medium_python.json b/benchmarks/speed_review/baselines/brfss_panel_medium_python.json
index 91e5e648..ea65bf9d 100644
--- a/benchmarks/speed_review/baselines/brfss_panel_medium_python.json
+++ b/benchmarks/speed_review/baselines/brfss_panel_medium_python.json
@@ -2,42 +2,42 @@
   "scenario": "brfss_panel_medium",
   "backend": "python",
   "has_rust_backend": false,
-  "total_seconds": 6.096216417,
+  "total_seconds": 0.48709708400000007,
   "memory": {
     "available": true,
-    "start_mb": 193.25,
-    "peak_mb": 209.78,
-    "growth_mb": 16.53,
+    "start_mb": 185.42,
+    "peak_mb": 202.75,
+    "growth_mb": 17.33,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_aggregate_survey_microdata_to_panel": {
-      "seconds": 5.9895347910000005,
+      "seconds": 0.372203458,
       "ok": true,
       "error": null
     },
     "2_cs_fit_with_stage2_survey_design": {
-      "seconds": 0.012643416999999602,
+      "seconds": 0.01215470800000018,
       "ok": true,
       "error": null
     },
     "3_inspect_pretrends": {
-      "seconds": 2.166999999886343e-06,
+      "seconds": 2.5000000001274003e-06,
       "ok": true,
       "error": null
     },
     "4_honest_did_grid": {
-      "seconds": 0.0015969160000004479,
+      "seconds": 0.0016202499999999898,
       "ok": true,
       "error": null
     },
     "5_sun_abraham_robustness": {
-      "seconds": 0.0921533340000007,
+      "seconds": 0.10084249999999995,
       "ok": true,
       "error": null
     },
     "6_practitioner_next_steps": {
-      "seconds": 0.0002710829999994502,
+      "seconds": 0.000269875000000086,
       "ok": true,
       "error": null
     }
diff --git a/benchmarks/speed_review/baselines/brfss_panel_medium_rust.json b/benchmarks/speed_review/baselines/brfss_panel_medium_rust.json
index 670b3135..7876dd32 100644
--- a/benchmarks/speed_review/baselines/brfss_panel_medium_rust.json
+++ b/benchmarks/speed_review/baselines/brfss_panel_medium_rust.json
@@ -2,42 +2,42 @@
   "scenario": "brfss_panel_medium",
   "backend": "rust",
   "has_rust_backend": true,
-  "total_seconds": 6.228102207999999,
+  "total_seconds": 0.472971041,
   "memory": {
     "available": true,
-    "start_mb": 197.56,
-    "peak_mb": 212.22,
-    "growth_mb": 14.66,
+    "start_mb": 178.69,
+    "peak_mb": 199.55,
+    "growth_mb": 20.86,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_aggregate_survey_microdata_to_panel": {
-      "seconds": 6.142273,
+      "seconds": 0.4003294999999999,
       "ok": true,
       "error": null
     },
     "2_cs_fit_with_stage2_survey_design": {
-      "seconds": 0.012037416000000078,
+      "seconds": 0.0133387920000001,
       "ok": true,
       "error": null
     },
     "3_inspect_pretrends": {
-      "seconds": 2.1249999999639613e-06,
+      "seconds": 2.4999999999053557e-06,
       "ok": true,
       "error": null
     },
     "4_honest_did_grid": {
-      "seconds": 0.0016153329999983868,
+      "seconds": 0.0020148749999999715,
       "ok": true,
       "error": null
     },
     "5_sun_abraham_robustness": {
-      "seconds": 0.07184195800000026,
+      "seconds": 0.057244916000000146,
       "ok": true,
       "error": null
     },
     "6_practitioner_next_steps": {
-      "seconds": 0.0003229160000000064,
+      "seconds": 3.6416000000150106e-05,
       "ok": true,
       "error": null
     }
diff --git a/benchmarks/speed_review/baselines/brfss_panel_small_python.json b/benchmarks/speed_review/baselines/brfss_panel_small_python.json
index 093a7daf..127748c2 100644
--- a/benchmarks/speed_review/baselines/brfss_panel_small_python.json
+++ b/benchmarks/speed_review/baselines/brfss_panel_small_python.json
@@ -2,42 +2,42 @@
   "scenario": "brfss_panel_small",
   "backend": "python",
   "has_rust_backend": false,
-  "total_seconds": 1.608562042,
+  "total_seconds": 0.21261929199999996,
   "memory": {
     "available": true,
-    "start_mb": 121.97,
-    "peak_mb": 133.39,
-    "growth_mb": 11.42,
+    "start_mb": 121.34,
+    "peak_mb": 132.62,
+    "growth_mb": 11.28,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_aggregate_survey_microdata_to_panel": {
-      "seconds": 1.523675458,
+      "seconds": 0.08785816700000004,
       "ok": true,
       "error": null
     },
     "2_cs_fit_with_stage2_survey_design": {
-      "seconds": 0.015124000000000137,
+      "seconds": 0.016040416999999918,
       "ok": true,
       "error": null
     },
     "3_inspect_pretrends": {
-      "seconds": 2.165999999803603e-06,
+      "seconds": 2.583000000000446e-06,
       "ok": true,
       "error": null
     },
     "4_honest_did_grid": {
-      "seconds": 0.004194041999999953,
+      "seconds": 0.004216333999999988,
       "ok": true,
       "error": null
     },
     "5_sun_abraham_robustness": {
-      "seconds": 0.0653021250000001,
+      "seconds": 0.10422679200000007,
       "ok": true,
       "error": null
     },
     "6_practitioner_next_steps": {
-      "seconds": 0.00026012500000005545,
+      "seconds": 0.00026649999999994733,
       "ok": true,
       "error": null
     }
diff --git a/benchmarks/speed_review/baselines/brfss_panel_small_rust.json b/benchmarks/speed_review/baselines/brfss_panel_small_rust.json
index a1f19a21..a22692ca 100644
--- a/benchmarks/speed_review/baselines/brfss_panel_small_rust.json
+++ b/benchmarks/speed_review/baselines/brfss_panel_small_rust.json
@@ -2,42 +2,42 @@
   "scenario": "brfss_panel_small",
   "backend": "rust",
   "has_rust_backend": true,
-  "total_seconds": 1.6610665,
+  "total_seconds": 0.16585016600000002,
   "memory": {
     "available": true,
-    "start_mb": 121.16,
-    "peak_mb": 136.44,
-    "growth_mb": 15.28,
+    "start_mb": 121.91,
+    "peak_mb": 130.25,
+    "growth_mb": 8.34,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_aggregate_survey_microdata_to_panel": {
-      "seconds": 1.5438897920000003,
+      "seconds": 0.084868791,
       "ok": true,
       "error": null
     },
     "2_cs_fit_with_stage2_survey_design": {
-      "seconds": 0.01586162499999988,
+      "seconds": 0.016418874999999944,
       "ok": true,
       "error": null
     },
     "3_inspect_pretrends": {
-      "seconds": 2.4999999999053557e-06,
+      "seconds": 3.124999999992717e-06,
       "ok": true,
       "error": null
     },
     "4_honest_did_grid": {
-      "seconds": 0.003953542000000088,
+      "seconds": 0.004238000000000075,
       "ok": true,
       "error": null
     },
     "5_sun_abraham_robustness": {
-      "seconds": 0.09701791599999998,
+      "seconds": 0.060278041000000004,
       "ok": true,
       "error": null
     },
     "6_practitioner_next_steps": {
-      "seconds": 0.00032904199999972406,
+      "seconds": 3.820799999998403e-05,
       "ok": true,
       "error": null
     }
diff --git a/benchmarks/speed_review/baselines/campaign_staggered_large_python.json b/benchmarks/speed_review/baselines/campaign_staggered_large_python.json
index 0c2dc359..19bf1a59 100644
--- a/benchmarks/speed_review/baselines/campaign_staggered_large_python.json
+++ b/benchmarks/speed_review/baselines/campaign_staggered_large_python.json
@@ -2,52 +2,52 @@
   "scenario": "campaign_staggered_large",
   "backend": "python",
   "has_rust_backend": false,
-  "total_seconds": 1.3326843750000001,
+  "total_seconds": 1.321951625,
   "memory": {
     "available": true,
-    "start_mb": 227.28,
-    "peak_mb": 472.22,
-    "growth_mb": 244.94,
+    "start_mb": 235.58,
+    "peak_mb": 486.17,
+    "growth_mb": 250.59,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_bacon_decomposition": {
-      "seconds": 0.019139459000000025,
+      "seconds": 0.019820957999999944,
       "ok": true,
       "error": null
     },
     "2_cs_fit_with_covariates_bootstrap999": {
-      "seconds": 0.16680450000000002,
+      "seconds": 0.17604354199999994,
       "ok": true,
       "error": null
     },
     "3_inspect_pretrends": {
-      "seconds": 3.042000000341716e-06,
+      "seconds": 3.4580000001227518e-06,
       "ok": true,
       "error": null
     },
     "4_honest_did_M_grid": {
-      "seconds": 0.002607332999999823,
+      "seconds": 0.002394666999999906,
       "ok": true,
       "error": null
     },
     "5_sun_abraham_robustness": {
-      "seconds": 0.3669262500000001,
+      "seconds": 0.279372666,
       "ok": true,
       "error": null
     },
     "6_imputation_did_robustness": {
-      "seconds": 0.649511,
+      "seconds": 0.716293292,
       "ok": true,
       "error": null
     },
     "7_cs_without_covariates": {
-      "seconds": 0.12763954200000027,
+      "seconds": 0.12797208299999996,
       "ok": true,
       "error": null
     },
     "8_practitioner_next_steps": {
-      "seconds": 4.033299999983697e-05,
+      "seconds": 3.8041999999904874e-05,
       "ok": true,
       "error": null
     }
diff --git a/benchmarks/speed_review/baselines/campaign_staggered_large_rust.json b/benchmarks/speed_review/baselines/campaign_staggered_large_rust.json
index 6766f7ac..87200f59 100644
--- a/benchmarks/speed_review/baselines/campaign_staggered_large_rust.json
+++ b/benchmarks/speed_review/baselines/campaign_staggered_large_rust.json
@@ -2,52 +2,52 @@
   "scenario": "campaign_staggered_large",
   "backend": "rust",
   "has_rust_backend": true,
-  "total_seconds": 1.3826507919999997,
+  "total_seconds": 1.310933833,
   "memory": {
     "available": true,
-    "start_mb": 265.8,
-    "peak_mb": 587.92,
-    "growth_mb": 322.12,
+    "start_mb": 254.7,
+    "peak_mb": 581.67,
+    "growth_mb": 326.97,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_bacon_decomposition": {
-      "seconds": 0.019430332999999855,
+      "seconds": 0.01872620799999991,
       "ok": true,
       "error": null
     },
     "2_cs_fit_with_covariates_bootstrap999": {
-      "seconds": 0.17791104199999985,
+      "seconds": 0.1628326659999999,
       "ok": true,
       "error": null
     },
     "3_inspect_pretrends": {
-      "seconds": 3.5419999999675156e-06,
+      "seconds": 3.459000000205492e-06,
       "ok": true,
       "error": null
     },
     "4_honest_did_M_grid": {
-      "seconds": 0.0025778330000001404,
+      "seconds": 0.00247950000000019,
       "ok": true,
       "error": null
     },
     "5_sun_abraham_robustness": {
-      "seconds": 0.5076542499999999,
+      "seconds": 0.4679546669999999,
       "ok": true,
       "error": null
     },
     "6_imputation_did_robustness": {
-      "seconds": 0.5523530000000001,
+      "seconds": 0.539718041,
       "ok": true,
       "error": null
     },
     "7_cs_without_covariates": {
-      "seconds": 0.12266958400000005,
+      "seconds": 0.1191795830000002,
       "ok": true,
       "error": null
     },
     "8_practitioner_next_steps": {
-      "seconds": 4.233299999967244e-05,
+      "seconds": 3.449999999993736e-05,
       "ok": true,
       "error": null
     }
diff --git a/benchmarks/speed_review/baselines/campaign_staggered_medium_python.json b/benchmarks/speed_review/baselines/campaign_staggered_medium_python.json
index 914a09aa..234f2918 100644
--- a/benchmarks/speed_review/baselines/campaign_staggered_medium_python.json
+++ b/benchmarks/speed_review/baselines/campaign_staggered_medium_python.json
@@ -2,52 +2,52 @@
   "scenario": "campaign_staggered_medium",
   "backend": "python",
   "has_rust_backend": false,
-  "total_seconds": 0.7537883749999998,
+  "total_seconds": 0.81063825,
   "memory": {
     "available": true,
-    "start_mb": 147.67,
-    "peak_mb": 226.62,
-    "growth_mb": 78.95,
+    "start_mb": 150.39,
+    "peak_mb": 235.06,
+    "growth_mb": 84.67,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_bacon_decomposition": {
-      "seconds": 0.012091666999999973,
+      "seconds": 0.013887540999999892,
       "ok": true,
       "error": null
     },
     "2_cs_fit_with_covariates_bootstrap999": {
-      "seconds": 0.09575774999999997,
+      "seconds": 0.10513504099999982,
       "ok": true,
       "error": null
     },
     "3_inspect_pretrends": {
-      "seconds": 2.9589999999135586e-06,
+      "seconds": 3.750000000080078e-06,
       "ok": true,
       "error": null
     },
     "4_honest_did_M_grid": {
-      "seconds": 0.002356958999999881,
+      "seconds": 0.0026329160000000407,
       "ok": true,
       "error": null
     },
     "5_sun_abraham_robustness": {
-      "seconds": 0.276134208,
+      "seconds": 0.2873527090000001,
       "ok": true,
       "error": null
     },
     "6_imputation_did_robustness": {
-      "seconds": 0.2946765,
+      "seconds": 0.3267266660000001,
       "ok": true,
       "error": null
     },
     "7_cs_without_covariates": {
-      "seconds": 0.07270195899999998,
+      "seconds": 0.07484287499999986,
       "ok": true,
       "error": null
     },
     "8_practitioner_next_steps": {
-      "seconds": 5.983399999998085e-05,
+      "seconds": 5.050000000039745e-05,
       "ok": true,
       "error": null
     }
diff --git a/benchmarks/speed_review/baselines/campaign_staggered_medium_rust.json b/benchmarks/speed_review/baselines/campaign_staggered_medium_rust.json
index 81c02255..55107bbb 100644
--- a/benchmarks/speed_review/baselines/campaign_staggered_medium_rust.json
+++ b/benchmarks/speed_review/baselines/campaign_staggered_medium_rust.json
@@ -2,52 +2,52 @@
   "scenario": "campaign_staggered_medium",
   "backend": "rust",
   "has_rust_backend": true,
-  "total_seconds": 0.756008333,
+  "total_seconds": 0.814152875,
   "memory": {
     "available": true,
-    "start_mb": 154.94,
-    "peak_mb": 254.11,
-    "growth_mb": 99.17,
+    "start_mb": 152.19,
+    "peak_mb": 252.59,
+    "growth_mb": 100.41,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_bacon_decomposition": {
-      "seconds": 0.012925999999999993,
+      "seconds": 0.012288542000000069,
       "ok": true,
       "error": null
     },
     "2_cs_fit_with_covariates_bootstrap999": {
-      "seconds": 0.09863954099999983,
+      "seconds": 0.09617150000000008,
       "ok": true,
       "error": null
     },
     "3_inspect_pretrends": {
-      "seconds": 3.1659999999433808e-06,
+      "seconds": 3.084000000042053e-06,
       "ok": true,
       "error": null
     },
     "4_honest_did_M_grid": {
-      "seconds": 0.0024457499999999133,
+      "seconds": 0.002409292000000063,
       "ok": true,
       "error": null
     },
     "5_sun_abraham_robustness": {
-      "seconds": 0.281516125,
+      "seconds": 0.4186234579999999,
       "ok": true,
       "error": null
     },
     "6_imputation_did_robustness": {
-      "seconds": 0.29128733399999995,
+      "seconds": 0.217003375,
       "ok": true,
       "error": null
     },
     "7_cs_without_covariates": {
-      "seconds": 0.06915141700000005,
+      "seconds": 0.06760054199999987,
       "ok": true,
       "error": null
     },
     "8_practitioner_next_steps": {
-      "seconds": 3.383300000003864e-05,
+      "seconds": 4.71669999999591e-05,
       "ok": true,
       "error": null
     }
diff --git a/benchmarks/speed_review/baselines/campaign_staggered_small_python.json b/benchmarks/speed_review/baselines/campaign_staggered_small_python.json
index 44e82483..7fe1a2ac 100644
--- a/benchmarks/speed_review/baselines/campaign_staggered_small_python.json
+++ b/benchmarks/speed_review/baselines/campaign_staggered_small_python.json
@@ -2,52 +2,52 @@
   "scenario": "campaign_staggered_small",
   "backend": "python",
   "has_rust_backend": false,
-  "total_seconds": 0.509287875,
+  "total_seconds": 0.5199064999999999,
   "memory": {
     "available": true,
-    "start_mb": 114.72,
-    "peak_mb": 143.08,
-    "growth_mb": 28.36,
+    "start_mb": 114.66,
+    "peak_mb": 145.62,
+    "growth_mb": 30.97,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_bacon_decomposition": {
-      "seconds": 0.008488708000000011,
+      "seconds": 0.006750833000000012,
       "ok": true,
       "error": null
     },
     "2_cs_fit_with_covariates_bootstrap999": {
-      "seconds": 0.06242541699999993,
+      "seconds": 0.06804841700000008,
       "ok": true,
       "error": null
     },
     "3_inspect_pretrends": {
-      "seconds": 3.3329999999942572e-06,
+      "seconds": 4.1669999999438545e-06,
       "ok": true,
       "error": null
     },
     "4_honest_did_M_grid": {
-      "seconds": 0.00873587500000006,
+      "seconds": 0.005387375000000083,
       "ok": true,
       "error": null
     },
     "5_sun_abraham_robustness": {
-      "seconds": 0.18465104099999996,
+      "seconds": 0.17906933400000002,
       "ok": true,
       "error": null
     },
     "6_imputation_did_robustness": {
-      "seconds": 0.20897954100000016,
+      "seconds": 0.22210808299999996,
       "ok": true,
       "error": null
     },
     "7_cs_without_covariates": {
-      "seconds": 0.03596216600000002,
+      "seconds": 0.038495792000000195,
       "ok": true,
       "error": null
     },
     "8_practitioner_next_steps": {
-      "seconds": 3.28339999999816e-05,
+      "seconds": 3.6332999999943993e-05,
       "ok": true,
       "error": null
     }
diff --git a/benchmarks/speed_review/baselines/campaign_staggered_small_rust.json b/benchmarks/speed_review/baselines/campaign_staggered_small_rust.json
index bfe53aed..edeb195e 100644
--- a/benchmarks/speed_review/baselines/campaign_staggered_small_rust.json
+++ b/benchmarks/speed_review/baselines/campaign_staggered_small_rust.json
@@ -2,52 +2,52 @@
   "scenario": "campaign_staggered_small",
   "backend": "rust",
   "has_rust_backend": true,
-  "total_seconds": 0.501876834,
+  "total_seconds": 0.5057707079999999,
   "memory": {
     "available": true,
-    "start_mb": 114.78,
-    "peak_mb": 150.67,
-    "growth_mb": 35.89,
+    "start_mb": 114.27,
+    "peak_mb": 148.09,
+    "growth_mb": 33.83,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_bacon_decomposition": {
-      "seconds": 0.0068224170000000806,
+      "seconds": 0.007045167000000019,
       "ok": true,
       "error": null
     },
     "2_cs_fit_with_covariates_bootstrap999": {
-      "seconds": 0.06276566699999997,
+      "seconds": 0.06206424999999993,
       "ok": true,
       "error": null
     },
     "3_inspect_pretrends": {
-      "seconds": 2.9160000000194586e-06,
+      "seconds": 2.6250000000338503e-06,
       "ok": true,
       "error": null
     },
     "4_honest_did_M_grid": {
-      "seconds": 0.004543957999999959,
+      "seconds": 0.004464875000000035,
       "ok": true,
       "error": null
     },
     "5_sun_abraham_robustness": {
-      "seconds": 0.14964783299999995,
+      "seconds": 0.19407279099999997,
       "ok": true,
       "error": null
     },
     "6_imputation_did_robustness": {
-      "seconds": 0.241357292,
+      "seconds": 0.2018087919999999,
       "ok": true,
       "error": null
     },
     "7_cs_without_covariates": {
-      "seconds": 0.03669304200000001,
+      "seconds": 0.03626620899999988,
       "ok": true,
       "error": null
     },
     "8_practitioner_next_steps": {
-      "seconds": 3.850000000005238e-05,
+      "seconds": 4.0457999999965466e-05,
       "ok": true,
       "error": null
     }
diff --git a/benchmarks/speed_review/baselines/dose_response_python.json b/benchmarks/speed_review/baselines/dose_response_python.json
index 0e576e88..40399067 100644
--- a/benchmarks/speed_review/baselines/dose_response_python.json
+++ b/benchmarks/speed_review/baselines/dose_response_python.json
@@ -2,42 +2,42 @@
   "scenario": "dose_response",
   "backend": "python",
   "has_rust_backend": false,
-  "total_seconds": 0.5912168340000001,
+  "total_seconds": 0.5858542499999999,
   "memory": {
     "available": true,
-    "start_mb": 114.11,
-    "peak_mb": 123.11,
-    "growth_mb": 9.0,
+    "start_mb": 114.7,
+    "peak_mb": 122.31,
+    "growth_mb": 7.61,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_cdid_cubic_spline_bootstrap199": {
-      "seconds": 0.15039274999999996,
+      "seconds": 0.15196441700000007,
       "ok": true,
       "error": null
     },
     "2_extract_dose_response_dataframes": {
-      "seconds": 0.0007435829999999921,
+      "seconds": 0.0008212909999999463,
       "ok": true,
       "error": null
     },
     "3_cdid_event_study_pretrend": {
-      "seconds": 0.14597749999999998,
+      "seconds": 0.14416820900000005,
       "ok": true,
       "error": null
     },
     "4_binarized_did_comparison": {
-      "seconds": 0.0017279590000000011,
+      "seconds": 0.0015125420000000611,
       "ok": true,
       "error": null
     },
     "5_spline_sensitivity_degree1": {
-      "seconds": 0.14600595799999994,
+      "seconds": 0.1431360410000001,
       "ok": true,
       "error": null
     },
     "6_spline_sensitivity_num_knots2": {
-      "seconds": 0.14636520799999997,
+      "seconds": 0.14424499999999996,
       "ok": true,
       "error": null
     }
diff --git a/benchmarks/speed_review/baselines/dose_response_rust.json b/benchmarks/speed_review/baselines/dose_response_rust.json
index 51039f15..2c26010b 100644
--- a/benchmarks/speed_review/baselines/dose_response_rust.json
+++ b/benchmarks/speed_review/baselines/dose_response_rust.json
@@ -2,42 +2,42 @@
   "scenario": "dose_response",
   "backend": "rust",
   "has_rust_backend": true,
-  "total_seconds": 0.5952834579999999,
+  "total_seconds": 0.6261942910000001,
   "memory": {
     "available": true,
-    "start_mb": 113.73,
-    "peak_mb": 121.34,
-    "growth_mb": 7.61,
+    "start_mb": 113.95,
+    "peak_mb": 123.27,
+    "growth_mb": 9.31,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_cdid_cubic_spline_bootstrap199": {
-      "seconds": 0.15132816700000007,
+      "seconds": 0.1623119999999999,
       "ok": true,
       "error": null
     },
     "2_extract_dose_response_dataframes": {
-      "seconds": 0.0007386659999999434,
+      "seconds": 0.0007812500000000666,
       "ok": true,
       "error": null
     },
     "3_cdid_event_study_pretrend": {
-      "seconds": 0.147476167,
+      "seconds": 0.15469937500000008,
       "ok": true,
       "error": null
     },
     "4_binarized_did_comparison": {
-      "seconds": 0.001677958000000035,
+      "seconds": 0.001991167000000016,
       "ok": true,
       "error": null
     },
     "5_spline_sensitivity_degree1": {
-      "seconds": 0.145152917,
+      "seconds": 0.15138845899999998,
       "ok": true,
       "error": null
     },
     "6_spline_sensitivity_num_knots2": {
-      "seconds": 0.14890500000000007,
+      "seconds": 0.15501741599999996,
       "ok": true,
       "error": null
     }
diff --git a/benchmarks/speed_review/baselines/geo_few_markets_large_rust.json b/benchmarks/speed_review/baselines/geo_few_markets_large_rust.json
index dce42749..637c260d 100644
--- a/benchmarks/speed_review/baselines/geo_few_markets_large_rust.json
+++ b/benchmarks/speed_review/baselines/geo_few_markets_large_rust.json
@@ -2,42 +2,42 @@
   "scenario": "geo_few_markets_large",
   "backend": "rust",
   "has_rust_backend": true,
-  "total_seconds": 0.26079429200000015,
+  "total_seconds": 0.23366233300000006,
   "memory": {
     "available": true,
-    "start_mb": 117.8,
-    "peak_mb": 118.22,
-    "growth_mb": 0.42,
+    "start_mb": 117.77,
+    "peak_mb": 118.11,
+    "growth_mb": 0.34,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_sdid_jackknife_variance": {
-      "seconds": 0.04102845799999999,
+      "seconds": 0.03807345899999992,
       "ok": true,
       "error": null
     },
     "2_sdid_bootstrap_variance_200": {
-      "seconds": 0.03718729200000004,
+      "seconds": 0.03627791699999994,
       "ok": true,
       "error": null
     },
     "3_in_time_placebo": {
-      "seconds": 0.07744412499999997,
+      "seconds": 0.06991887500000005,
       "ok": true,
       "error": null
     },
     "4_get_loo_effects_df": {
-      "seconds": 0.0008073330000000212,
+      "seconds": 0.0007567080000000503,
       "ok": true,
       "error": null
     },
     "5_sensitivity_to_zeta_omega": {
-      "seconds": 0.10429091600000007,
+      "seconds": 0.08854208299999988,
       "ok": true,
       "error": null
     },
     "6_weight_concentration": {
-      "seconds": 3.220799999992252e-05,
+      "seconds": 8.5874999999902e-05,
       "ok": true,
       "error": null
     }
diff --git a/benchmarks/speed_review/baselines/geo_few_markets_medium_python.json b/benchmarks/speed_review/baselines/geo_few_markets_medium_python.json
index 868c0578..283552a2 100644
--- a/benchmarks/speed_review/baselines/geo_few_markets_medium_python.json
+++ b/benchmarks/speed_review/baselines/geo_few_markets_medium_python.json
@@ -2,42 +2,42 @@
   "scenario": "geo_few_markets_medium",
   "backend": "python",
   "has_rust_backend": false,
-  "total_seconds": 3.9883142080000002,
+  "total_seconds": 3.998488124999999,
   "memory": {
     "available": true,
-    "start_mb": 143.86,
-    "peak_mb": 151.53,
-    "growth_mb": 7.67,
+    "start_mb": 140.11,
+    "peak_mb": 148.12,
+    "growth_mb": 8.02,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_sdid_jackknife_variance": {
-      "seconds": 0.35804470799999955,
+      "seconds": 0.35502641700000037,
       "ok": true,
       "error": null
     },
     "2_sdid_bootstrap_variance_200": {
-      "seconds": 0.36447529099999976,
+      "seconds": 0.36030566600000036,
       "ok": true,
       "error": null
     },
     "3_in_time_placebo": {
-      "seconds": 1.5563965419999999,
+      "seconds": 1.5716015000000008,
       "ok": true,
       "error": null
     },
     "4_get_loo_effects_df": {
-      "seconds": 0.0007229159999999624,
+      "seconds": 0.0007380409999999671,
       "ok": true,
       "error": null
     },
     "5_sensitivity_to_zeta_omega": {
-      "seconds": 1.7086395420000002,
+      "seconds": 1.7107877500000006,
       "ok": true,
       "error": null
     },
     "6_weight_concentration": {
-      "seconds": 2.9666999999733434e-05,
+      "seconds": 2.462500000000034e-05,
       "ok": true,
       "error": null
     }
diff --git a/benchmarks/speed_review/baselines/geo_few_markets_medium_rust.json b/benchmarks/speed_review/baselines/geo_few_markets_medium_rust.json
index bd4471a6..debdccf6 100644
--- a/benchmarks/speed_review/baselines/geo_few_markets_medium_rust.json
+++ b/benchmarks/speed_review/baselines/geo_few_markets_medium_rust.json
@@ -2,42 +2,42 @@
   "scenario": "geo_few_markets_medium",
   "backend": "rust",
   "has_rust_backend": true,
-  "total_seconds": 0.118741875,
+  "total_seconds": 0.10621941700000004,
   "memory": {
     "available": true,
-    "start_mb": 117.23,
-    "peak_mb": 117.64,
-    "growth_mb": 0.41,
+    "start_mb": 117.05,
+    "peak_mb": 117.36,
+    "growth_mb": 0.31,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_sdid_jackknife_variance": {
-      "seconds": 0.020535375000000022,
+      "seconds": 0.018085625000000105,
       "ok": true,
       "error": null
     },
     "2_sdid_bootstrap_variance_200": {
-      "seconds": 0.023519291000000053,
+      "seconds": 0.020790666999999985,
       "ok": true,
       "error": null
     },
     "3_in_time_placebo": {
-      "seconds": 0.02495891699999997,
+      "seconds": 0.025967375000000015,
       "ok": true,
       "error": null
     },
     "4_get_loo_effects_df": {
-      "seconds": 0.0006400839999999297,
+      "seconds": 0.0006781249999999739,
       "ok": true,
       "error": null
     },
     "5_sensitivity_to_zeta_omega": {
-      "seconds": 0.049061250000000056,
+      "seconds": 0.04067133299999992,
       "ok": true,
       "error": null
     },
     "6_weight_concentration": {
-      "seconds": 2.31669999999351e-05,
+      "seconds": 2.2332999999985503e-05,
       "ok": true,
       "error": null
     }
diff --git a/benchmarks/speed_review/baselines/geo_few_markets_small_python.json b/benchmarks/speed_review/baselines/geo_few_markets_small_python.json
index e0bec083..ed7af335 100644
--- a/benchmarks/speed_review/baselines/geo_few_markets_small_python.json
+++ b/benchmarks/speed_review/baselines/geo_few_markets_small_python.json
@@ -2,42 +2,42 @@
   "scenario": "geo_few_markets_small",
   "backend": "python",
   "has_rust_backend": false,
-  "total_seconds": 3.697791375,
+  "total_seconds": 3.7007011660000004,
   "memory": {
     "available": true,
-    "start_mb": 114.09,
-    "peak_mb": 124.02,
-    "growth_mb": 9.92,
+    "start_mb": 114.14,
+    "peak_mb": 124.05,
+    "growth_mb": 9.91,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_sdid_jackknife_variance": {
-      "seconds": 0.593809709,
+      "seconds": 0.5908792500000001,
       "ok": true,
       "error": null
     },
     "2_sdid_bootstrap_variance_200": {
-      "seconds": 0.584832209,
+      "seconds": 0.593548083,
       "ok": true,
       "error": null
     },
     "3_in_time_placebo": {
-      "seconds": 1.194314458,
+      "seconds": 1.1894560410000001,
       "ok": true,
       "error": null
     },
     "4_get_loo_effects_df": {
-      "seconds": 0.0009036250000002966,
+      "seconds": 0.001243833000000194,
       "ok": true,
       "error": null
     },
     "5_sensitivity_to_zeta_omega": {
-      "seconds": 1.3238487909999996,
+      "seconds": 1.3254739579999995,
       "ok": true,
       "error": null
     },
     "6_weight_concentration": {
-      "seconds": 7.791699999959434e-05,
+      "seconds": 9.341699999954045e-05,
       "ok": true,
       "error": null
     }
diff --git a/benchmarks/speed_review/baselines/geo_few_markets_small_rust.json b/benchmarks/speed_review/baselines/geo_few_markets_small_rust.json
index 855eac85..91f9888d 100644
--- a/benchmarks/speed_review/baselines/geo_few_markets_small_rust.json
+++ b/benchmarks/speed_review/baselines/geo_few_markets_small_rust.json
@@ -2,42 +2,42 @@
   "scenario": "geo_few_markets_small",
   "backend": "rust",
   "has_rust_backend": true,
-  "total_seconds": 0.04129770799999999,
+  "total_seconds": 0.04177825000000002,
   "memory": {
     "available": true,
-    "start_mb": 114.56,
-    "peak_mb": 116.05,
-    "growth_mb": 1.48,
+    "start_mb": 114.55,
+    "peak_mb": 115.84,
+    "growth_mb": 1.3,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_sdid_jackknife_variance": {
-      "seconds": 0.008074541000000046,
+      "seconds": 0.008172167000000008,
       "ok": true,
       "error": null
     },
     "2_sdid_bootstrap_variance_200": {
-      "seconds": 0.012903124999999904,
+      "seconds": 0.013141583000000012,
       "ok": true,
       "error": null
     },
     "3_in_time_placebo": {
-      "seconds": 0.008189833999999951,
+      "seconds": 0.00833604099999996,
       "ok": true,
       "error": null
     },
     "4_get_loo_effects_df": {
-      "seconds": 0.0009220420000000118,
+      "seconds": 0.0008852080000000262,
       "ok": true,
       "error": null
     },
     "5_sensitivity_to_zeta_omega": {
-      "seconds": 0.01117779200000002,
+      "seconds": 0.011213916999999962,
       "ok": true,
       "error": null
     },
     "6_weight_concentration": {
-      "seconds": 2.6250000000005436e-05,
+      "seconds": 2.599999999997049e-05,
       "ok": true,
       "error": null
     }
diff --git a/benchmarks/speed_review/baselines/reversible_dcdh_python.json b/benchmarks/speed_review/baselines/reversible_dcdh_python.json
index 1cbed394..fff45fd6 100644
--- a/benchmarks/speed_review/baselines/reversible_dcdh_python.json
+++ b/benchmarks/speed_review/baselines/reversible_dcdh_python.json
@@ -2,32 +2,32 @@
   "scenario": "reversible_dcdh",
   "backend": "python",
   "has_rust_backend": false,
-  "total_seconds": 0.718732833,
+  "total_seconds": 0.788816875,
   "memory": {
     "available": true,
-    "start_mb": 113.5,
-    "peak_mb": 135.02,
-    "growth_mb": 21.52,
+    "start_mb": 113.75,
+    "peak_mb": 133.66,
+    "growth_mb": 19.91,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_dcdh_fit_Lmax3_survey_TSL": {
-      "seconds": 0.3450735829999999,
+      "seconds": 0.384559958,
       "ok": true,
       "error": null
     },
     "2_inspect_placebo_and_summary": {
-      "seconds": 1.4160000000318362e-06,
+      "seconds": 1.3329999999367459e-06,
       "ok": true,
       "error": null
     },
     "3_honest_did_on_placebo": {
-      "seconds": 0.004985583999999932,
+      "seconds": 0.003932208000000048,
       "ok": true,
       "error": null
     },
     "4_heterogeneity_refit": {
-      "seconds": 0.36866958299999986,
+      "seconds": 0.400320667,
       "ok": true,
       "error": null
     }
diff --git a/benchmarks/speed_review/baselines/reversible_dcdh_rust.json b/benchmarks/speed_review/baselines/reversible_dcdh_rust.json
index 2af530f5..0c073cd6 100644
--- a/benchmarks/speed_review/baselines/reversible_dcdh_rust.json
+++ b/benchmarks/speed_review/baselines/reversible_dcdh_rust.json
@@ -2,32 +2,32 @@
   "scenario": "reversible_dcdh",
   "backend": "rust",
   "has_rust_backend": true,
-  "total_seconds": 0.751090292,
+  "total_seconds": 0.7799259999999999,
   "memory": {
     "available": true,
-    "start_mb": 113.7,
-    "peak_mb": 134.89,
-    "growth_mb": 21.19,
+    "start_mb": 113.81,
+    "peak_mb": 134.28,
+    "growth_mb": 20.47,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_dcdh_fit_Lmax3_survey_TSL": {
-      "seconds": 0.36838229199999994,
+      "seconds": 0.38806558299999994,
       "ok": true,
       "error": null
     },
     "2_inspect_placebo_and_summary": {
-      "seconds": 1.3340000000194863e-06,
+      "seconds": 1.4580000000652404e-06,
       "ok": true,
       "error": null
     },
     "3_honest_did_on_placebo": {
-      "seconds": 0.005142916999999914,
+      "seconds": 0.003724375000000002,
       "ok": true,
       "error": null
     },
     "4_heterogeneity_refit": {
-      "seconds": 0.3775615830000001,
+      "seconds": 0.38813170900000005,
       "ok": true,
       "error": null
     }
diff --git a/diff_diff/prep.py b/diff_diff/prep.py
index 01d50653..70201144 100644
--- a/diff_diff/prep.py
+++ b/diff_diff/prep.py
@@ -30,6 +30,9 @@
 from diff_diff.survey import (
     ResolvedSurveyDesign,
     SurveyDesign,
+    _compute_if_variance_fast,
+    _precompute_psu_scaffolding,
+    _PsuScaffolding,
     compute_replicate_if_variance,
     compute_survey_if_variance,
 )
@@ -1318,6 +1321,7 @@ def _cell_mean_variance(
     full_resolved: ResolvedSurveyDesign,
     cell_mask: np.ndarray,
     min_n: int,
+    scaffolding: Optional[_PsuScaffolding] = None,
 ) -> Tuple[float, float, int, bool]:
     """Compute design-based mean and variance of the weighted mean for one cell.
 
@@ -1396,9 +1400,14 @@ def _cell_mean_variance(
     valid_positions = cell_indices[valid]
     psi[valid_positions] = w_valid[valid] * (y_clean[valid] - y_bar) / sum_w
 
-    # Route to TSL or replicate variance using the full design
+    # Route to TSL or replicate variance using the full design.  When a
+    # design-level scaffolding is provided (aggregate_survey's fast path),
+    # use it to skip the per-call pandas groupby / np.unique setup that
+    # otherwise dominates runtime at BRFSS scale.
     if full_resolved.uses_replicate_variance:
         variance, _ = compute_replicate_if_variance(psi, full_resolved)
+    elif scaffolding is not None:
+        variance = _compute_if_variance_fast(psi, scaffolding)
     else:
         variance = compute_survey_if_variance(psi, full_resolved)
 
@@ -1580,6 +1589,17 @@ def aggregate_survey(
     )
     full_resolved = effective_design.resolve(data)
 
+    # Precompute stratum/PSU scaffolding once per design.  Amortizes
+    # per-cell pandas groupby + np.unique + stratum FPC lookup that
+    # otherwise dominate runtime at scale (see _compute_if_variance_fast).
+    # Replicate-weight designs use a different variance surface and stay
+    # on the legacy path.
+    _tsl_scaffolding: Optional[_PsuScaffolding] = (
+        _precompute_psu_scaffolding(full_resolved)
+        if not full_resolved.uses_replicate_variance
+        else None
+    )
+
     # --- Precompute full-length outcome/covariate arrays ---
     n_total = len(data)
     all_vars = outcome_cols + cov_cols
@@ -1635,6 +1655,7 @@ def aggregate_survey(
                 full_resolved,
                 cell_mask,
                 min_n,
+                scaffolding=_tsl_scaffolding,
             )
             se = float(np.sqrt(variance)) if not np.isnan(variance) else np.nan
 
diff --git a/diff_diff/survey.py b/diff_diff/survey.py
index 2ee8334e..3d951fb6 100644
--- a/diff_diff/survey.py
+++ b/diff_diff/survey.py
@@ -1304,6 +1304,326 @@ def _compute_stratified_psu_meat(
     return meat, _variance_computed, legitimate_zero_count
 
 
+@dataclass(frozen=True)
+class _PsuScaffolding:
+    """Precomputed stratum/PSU layout for amortized TSL variance.
+
+    Internal helper used by :func:`diff_diff.prep.aggregate_survey` to reuse
+    design-dependent scaffolding across hundreds of per-cell variance calls.
+    Holds integer codes, per-stratum counts, FPC ratios, and static
+    variance-computability flags that depend only on the
+    :class:`ResolvedSurveyDesign` (not on the psi / outcome being collapsed).
+
+    See :func:`_compute_if_variance_fast` for the fast variance path that
+    consumes this scaffolding.  Numerically equivalent to
+    :func:`compute_survey_if_variance` up to sub-ULP reduction-order drift.
+    """
+
+    mode: str  # "no_strata_no_psu" | "psu_only" | "stratified"
+    n: int
+    lonely_psu: str
+    variance_computable: bool
+    legitimate_zero_count: int
+    # stratified-mode fields (None in other modes):
+    psu_codes: Optional[np.ndarray] = None          # (n,) int, global PSU id 0..P-1
+    psu_stratum: Optional[np.ndarray] = None        # (P,) int, stratum of each PSU
+    n_psu_per_stratum: Optional[np.ndarray] = None  # (S,) int
+    singleton_strata: Optional[np.ndarray] = None   # (S,) bool
+    adjustment_h: Optional[np.ndarray] = None       # (S,) float, (1-f_h)*n_h/(n_h-1); 0 for singletons
+    # psu_only-mode fields (None in other modes):
+    psu_codes_only: Optional[np.ndarray] = None     # (n,) int, PSU id 0..P-1
+    n_psu_only: Optional[int] = None
+    adjustment_only: Optional[float] = None         # (1-f)*n_psu/(n_psu-1) or 0
+    # no_strata_no_psu-mode fields (None in other modes):
+    adjustment_direct: Optional[float] = None       # (1-f)*n/(n-1) or 0
+
+
+def _precompute_psu_scaffolding(resolved: "ResolvedSurveyDesign") -> _PsuScaffolding:
+    """Precompute per-design PSU/stratum scaffolding for fast per-cell variance.
+
+    Equivalent in effect to the per-call scaffolding work inside
+    :func:`_compute_stratified_psu_meat`, but done once per design instead of
+    once per output cell.  For the typical BRFSS-scale
+    :func:`~diff_diff.prep.aggregate_survey` workload (~500 cells, ~20 strata),
+    this amortizes the pandas-groupby + ``np.unique`` setup that otherwise
+    dominates the chain runtime.
+
+    Parameters
+    ----------
+    resolved : ResolvedSurveyDesign
+        Resolved survey design.  Must NOT use replicate variance
+        (``resolved.uses_replicate_variance`` False).
+
+    Returns
+    -------
+    _PsuScaffolding
+        Frozen dataclass with mode-appropriate precomputed fields.
+
+    Raises
+    ------
+    ValueError
+        Same FPC-vs-n guards as :func:`_compute_stratified_psu_meat`
+        (FPC must be >= effective PSU count in each stratum).
+    """
+    weights = resolved.weights
+    n = int(len(weights))
+    strata = resolved.strata
+    psu = resolved.psu
+    fpc = resolved.fpc
+    lonely_psu = resolved.lonely_psu
+
+    if strata is None and psu is None:
+        # Implicit per-observation PSUs
+        f = 0.0
+        lz_count = 0
+        if fpc is not None:
+            N = fpc[0]
+            if N < n:
+                raise ValueError(
+                    f"FPC ({N}) is less than the number of observations "
+                    f"({n}). FPC must be >= n_obs for implicit per-observation PSUs."
+                )
+            f = n / N
+            if f >= 1.0:
+                lz_count = 1
+        var_computable = n >= 2
+        adjustment = (1.0 - f) * (n / (n - 1)) if n >= 2 else 0.0
+        return _PsuScaffolding(
+            mode="no_strata_no_psu",
+            n=n,
+            lonely_psu=lonely_psu,
+            variance_computable=var_computable,
+            legitimate_zero_count=lz_count,
+            adjustment_direct=float(adjustment),
+        )
+
+    if strata is None and psu is not None:
+        # Single-stratum cluster-robust
+        psu_arr = np.asarray(psu)
+        codes, uniques = pd.factorize(psu_arr)
+        n_psu = int(len(uniques))
+        f = 0.0
+        lz_count = 0
+        if n_psu >= 2:
+            if fpc is not None:
+                N = fpc[0]
+                if N < n_psu:
+                    raise ValueError(
+                        f"FPC ({N}) is less than the number of effective PSUs "
+                        f"({n_psu}). FPC must be >= n_PSU."
+                    )
+                f = n_psu / N
+                if f >= 1.0:
+                    lz_count = 1
+            adjustment = (1.0 - f) * (n_psu / (n_psu - 1))
+            var_computable = True
+        else:
+            adjustment = 0.0
+            var_computable = False
+        return _PsuScaffolding(
+            mode="psu_only",
+            n=n,
+            lonely_psu=lonely_psu,
+            variance_computable=var_computable,
+            legitimate_zero_count=lz_count,
+            psu_codes_only=codes.astype(np.int64),
+            n_psu_only=n_psu,
+            adjustment_only=float(adjustment),
+        )
+
+    # Stratified branch (with or without PSU)
+    strata_arr = np.asarray(strata)
+    strata_codes, strata_uniques = pd.factorize(strata_arr, sort=True)
+    strata_codes = strata_codes.astype(np.int64)
+    S = int(len(strata_uniques))
+
+    if psu is not None:
+        # Global PSU codes unique across (stratum, psu) pairs — matches the
+        # legacy per-stratum pandas groupby which never aggregated PSU labels
+        # across strata.
+        psu_arr = np.asarray(psu)
+        psu_local_codes, _ = pd.factorize(psu_arr)
+        psu_local_codes = psu_local_codes.astype(np.int64)
+        psu_local_max = int(psu_local_codes.max()) if len(psu_local_codes) > 0 else 0
+        compound = strata_codes * (psu_local_max + 1) + psu_local_codes
+        psu_codes, _ = pd.factorize(compound)
+        psu_codes = psu_codes.astype(np.int64)
+        P = int(psu_codes.max() + 1) if len(psu_codes) > 0 else 0
+        psu_stratum = np.zeros(P, dtype=np.int64)
+        # Safe scatter: by construction, all observations sharing a global
+        # PSU code share a stratum, so repeated writes to the same position
+        # store the same value.
+        if P > 0:
+            psu_stratum[psu_codes] = strata_codes
+    else:
+        # Each observation is its own PSU within its stratum (legacy
+        # behavior when strata is not None and psu is None).
+        psu_codes = np.arange(n, dtype=np.int64)
+        P = n
+        psu_stratum = strata_codes.copy()
+
+    n_psu_per_stratum = np.bincount(psu_stratum, minlength=S).astype(np.int64)
+    singleton_strata = n_psu_per_stratum == 1
+
+    # Per-stratum FPC ratio (stratum-level attribute; read from the first
+    # observation of each stratum, matching legacy ``resolved.fpc[mask_h][0]``).
+    f_h = np.zeros(S, dtype=np.float64)
+    if fpc is not None:
+        fpc_arr = np.asarray(fpc)
+        # Vectorized "first-in-stratum" FPC lookup:
+        # pd.factorize with sort=True iterates the array in input order, so
+        # the first observation encountered for each stratum_code is the
+        # reference row.
+        first_idx = np.full(S, -1, dtype=np.int64)
+        seen = np.zeros(S, dtype=bool)
+        for i in range(n):
+            h = strata_codes[i]
+            if not seen[h]:
+                seen[h] = True
+                first_idx[h] = i
+                if seen.all():
+                    break
+        for h in range(S):
+            if first_idx[h] < 0:
+                continue
+            N_h = fpc_arr[first_idx[h]]
+            n_h = n_psu_per_stratum[h]
+            if n_h > 0 and N_h < n_h:
+                raise ValueError(
+                    f"FPC ({N_h}) is less than the number of effective PSUs "
+                    f"({n_h}) in stratum. FPC must be >= n_PSU."
+                )
+            if n_h > 0:
+                f_h[h] = n_h / N_h
+
+    with np.errstate(divide="ignore", invalid="ignore"):
+        adjustment_h = np.where(
+            n_psu_per_stratum >= 2,
+            (1.0 - f_h) * n_psu_per_stratum / np.maximum(n_psu_per_stratum - 1, 1),
+            0.0,
+        )
+
+    # Static legitimate_zero_count (design-dependent only):
+    #   - Non-singleton strata with f_h >= 1.0 contribute (legacy counter).
+    #   - Singleton strata under lonely_psu == "certainty" contribute.
+    fpc_saturated = (n_psu_per_stratum >= 2) & (f_h >= 1.0)
+    legitimate_zero_count = int(fpc_saturated.sum())
+    if lonely_psu == "certainty":
+        legitimate_zero_count += int(singleton_strata.sum())
+
+    # Static variance_computable flag:
+    #   - Any non-singleton stratum (regardless of FPC) → variance_computed=True
+    #     path is exercised.
+    #   - Under "adjust", any singleton stratum also counts (adds V_h even if 0).
+    has_non_singleton = bool(np.any(~singleton_strata))
+    has_singleton = bool(np.any(singleton_strata))
+    variance_computable = has_non_singleton or (
+        lonely_psu == "adjust" and has_singleton
+    )
+
+    return _PsuScaffolding(
+        mode="stratified",
+        n=n,
+        lonely_psu=lonely_psu,
+        variance_computable=variance_computable,
+        legitimate_zero_count=legitimate_zero_count,
+        psu_codes=psu_codes,
+        psu_stratum=psu_stratum,
+        n_psu_per_stratum=n_psu_per_stratum,
+        singleton_strata=singleton_strata,
+        adjustment_h=adjustment_h,
+    )
+
+
+def _compute_if_variance_fast(
+    psi: np.ndarray,
+    scaffolding: _PsuScaffolding,
+) -> float:
+    """Fast TSL variance for aggregate_survey using precomputed scaffolding.
+
+    Numerically equivalent to :func:`compute_survey_if_variance` for any
+    TSL (non-replicate) design, up to sub-ULP reduction-order drift.  The
+    speedup comes from replacing per-cell pandas groupbys and per-stratum
+    Python loops with two ``np.bincount`` passes plus a fully vectorized
+    per-stratum reduction.
+
+    Parameters
+    ----------
+    psi : np.ndarray
+        Per-unit influence function values, shape (n,).
+    scaffolding : _PsuScaffolding
+        Precomputed via :func:`_precompute_psu_scaffolding` for the same
+        resolved design.
+
+    Returns
+    -------
+    float
+        Design-based variance.  Returns ``np.nan`` when variance is
+        unidentified (matches legacy behavior).
+    """
+    psi = np.asarray(psi, dtype=np.float64).ravel()
+
+    def _finalize(meat_scalar: float) -> float:
+        if meat_scalar == 0.0:
+            if scaffolding.variance_computable or scaffolding.legitimate_zero_count > 0:
+                return 0.0
+            return float("nan")
+        return meat_scalar
+
+    if scaffolding.mode == "no_strata_no_psu":
+        if scaffolding.n < 2:
+            return float("nan")
+        psi_mean = psi.mean()
+        centered = psi - psi_mean
+        meat = scaffolding.adjustment_direct * float(centered @ centered)
+        return _finalize(meat)
+
+    if scaffolding.mode == "psu_only":
+        if scaffolding.n_psu_only < 2:
+            if scaffolding.legitimate_zero_count > 0:
+                return 0.0
+            return float("nan")
+        psu_sums = np.bincount(
+            scaffolding.psu_codes_only, weights=psi, minlength=scaffolding.n_psu_only
+        )
+        psu_mean = psu_sums.mean()
+        centered = psu_sums - psu_mean
+        meat = scaffolding.adjustment_only * float(centered @ centered)
+        return _finalize(meat)
+
+    # Stratified
+    S = len(scaffolding.n_psu_per_stratum)
+    P = len(scaffolding.psu_stratum)
+
+    psu_sums = np.bincount(scaffolding.psu_codes, weights=psi, minlength=P)
+    sum_by_h = np.bincount(scaffolding.psu_stratum, weights=psu_sums, minlength=S)
+    sum2_by_h = np.bincount(
+        scaffolding.psu_stratum, weights=psu_sums * psu_sums, minlength=S
+    )
+
+    with np.errstate(divide="ignore", invalid="ignore"):
+        centered_ss = np.where(
+            scaffolding.n_psu_per_stratum >= 2,
+            sum2_by_h - (sum_by_h * sum_by_h) / np.maximum(scaffolding.n_psu_per_stratum, 1),
+            0.0,
+        )
+    meat_per_stratum = scaffolding.adjustment_h * centered_ss
+
+    if np.any(scaffolding.singleton_strata) and scaffolding.lonely_psu == "adjust":
+        # Singleton strata under "adjust": V_h = (psu_sum - global_mean)^2.
+        # For a singleton stratum, the one PSU's sum equals sum_by_h[h].
+        # No FPC, no (n-1) adjustment — matches legacy (survey.py:1276-1281).
+        if P > 0:
+            global_mean = psu_sums.mean()
+            singleton_meat = (sum_by_h - global_mean) ** 2
+            meat_per_stratum = np.where(
+                scaffolding.singleton_strata, singleton_meat, meat_per_stratum
+            )
+
+    meat = float(meat_per_stratum.sum())
+    return _finalize(meat)
+
+
 def _compute_stratified_meat_from_psu_scores(
     psu_scores: np.ndarray,
     psu_strata: np.ndarray,
diff --git a/docs/performance-plan.md b/docs/performance-plan.md
index 58f0f017..438a8b56 100644
--- a/docs/performance-plan.md
+++ b/docs/performance-plan.md
@@ -41,32 +41,36 @@ scale. Data-shape details are in `docs/performance-scenarios.md`.
 <!-- TABLE:start scale_sweep_totals -->
 | Scenario | Scale | Python (s) | Rust (s) | Py/Rust |
 |---|---|---:|---:|---:|
-| 1. Staggered campaign | small | 0.51 | 0.50 | 1.0x |
-|  | medium | 0.75 | 0.76 | 1.0x |
-|  | large | 1.33 | 1.38 | 1.0x |
-| 2. Brand awareness survey | small | 0.19 | 0.20 | 1.0x |
-|  | medium | 0.56 | 0.55 | 1.0x |
-|  | large | 1.09 | 1.00 | 1.1x |
-| 3. BRFSS microdata -> CS panel | small | 1.61 | 1.66 | 1.0x |
-|  | medium | 6.10 | 6.23 | 1.0x |
-|  | large | 24.41 | 24.94 | 1.0x |
-| 4. SDiD few markets | small | 3.70 | 0.04 | 89.5x |
-|  | medium | 3.99 | 0.12 | 33.6x |
-|  | large | skip | 0.26 | - |
-| 5. Reversible dCDH | single | 0.72 | 0.75 | 1.0x |
-| 6. Pricing dose-response | single | 0.59 | 0.60 | 1.0x |
+| 1. Staggered campaign | small | 0.52 | 0.51 | 1.0x |
+|  | medium | 0.81 | 0.81 | 1.0x |
+|  | large | 1.32 | 1.31 | 1.0x |
+| 2. Brand awareness survey | small | 0.23 | 0.20 | 1.1x |
+|  | medium | 0.53 | 0.50 | 1.1x |
+|  | large | 0.87 | 0.93 | 0.9x |
+| 3. BRFSS microdata -> CS panel | small | 0.21 | 0.17 | 1.3x |
+|  | medium | 0.49 | 0.47 | 1.0x |
+|  | large | 1.33 | 1.32 | 1.0x |
+| 4. SDiD few markets | small | 3.70 | 0.04 | 88.6x |
+|  | medium | 4.00 | 0.11 | 37.6x |
+|  | large | skip | 0.23 | - |
+| 5. Reversible dCDH | single | 0.79 | 0.78 | 1.0x |
+| 6. Pricing dose-response | single | 0.59 | 0.63 | 0.9x |
 <!-- TABLE:end scale_sweep_totals -->
 
 ### Scaling findings
 
 **Three findings are load-bearing for the optimization priority list:**
 
-1. **BRFSS `aggregate_survey` is the dominant practitioner pain point at
-   realistic pooled-multi-year scale.** Scales near-linearly with microdata
-   row count. At 1M rows (roughly what a 10-year pooled BRFSS analysis
-   looks like) the full chain takes ~24 seconds and essentially all of it
-   is inside `_compute_stratified_psu_meat`. Rust does not touch it
-   (`aggregate_survey` is entirely Python).
+1. **BRFSS `aggregate_survey` is now practitioner-fast at every measured
+   scale.** Prior to the precompute-scaffolding fix (see "Optimization
+   landed" below), the full chain at 1M rows took ~24 seconds and was
+   essentially all inside `_compute_stratified_psu_meat`. After the fix,
+   the chain is sub-2s at every measured scale; `aggregate_survey`
+   continues to dominate its own (now-cheap) chain share, but in
+   absolute time the entire workflow is well under a practitioner-
+   perceptible threshold at realistic pooled-multi-year BRFSS volume.
+   The path is entirely Python, so Python and Rust backends track each
+   other within noise.
 2. **Staggered CS chain stays cheap across scales.** A 10x unit increase
    (150 -> 1,500) is a small-single-digit multiplier on total time.
    ImputationDiD and SunAbraham together consistently account for
@@ -96,18 +100,18 @@ scale. Data-shape details are in `docs/performance-scenarios.md`.
 <!-- TABLE:start top_phases_by_scenario -->
 | Scenario | Scale | Backend | Top phase (%) | 2nd phase (%) | 3rd phase (%) |
 |---|---|---|---|---|---|
-| 1. Staggered campaign | large | python | `6_imputation_did_robustness` (49%) | `5_sun_abraham_robustness` (28%) | `2_cs_fit_with_covariates_bootstrap999` (13%) |
-| 1. Staggered campaign | large | rust | `6_imputation_did_robustness` (40%) | `5_sun_abraham_robustness` (37%) | `2_cs_fit_with_covariates_bootstrap999` (13%) |
-| 2. Brand awareness survey | large | python | `3_replicate_weights_jk1` (57%) | `4_multi_outcome_loop_3_metrics` (22%) | `7_event_study_plus_honest_did` (14%) |
-| 2. Brand awareness survey | large | rust | `3_replicate_weights_jk1` (54%) | `4_multi_outcome_loop_3_metrics` (22%) | `7_event_study_plus_honest_did` (14%) |
-| 3. BRFSS microdata -> CS panel | large | python | `1_aggregate_survey_microdata_to_panel` (100%) | `5_sun_abraham_robustness` (0%) | `2_cs_fit_with_stage2_survey_design` (0%) |
-| 3. BRFSS microdata -> CS panel | large | rust | `1_aggregate_survey_microdata_to_panel` (100%) | `5_sun_abraham_robustness` (0%) | `2_cs_fit_with_stage2_survey_design` (0%) |
+| 1. Staggered campaign | large | python | `6_imputation_did_robustness` (54%) | `5_sun_abraham_robustness` (21%) | `2_cs_fit_with_covariates_bootstrap999` (13%) |
+| 1. Staggered campaign | large | rust | `6_imputation_did_robustness` (41%) | `5_sun_abraham_robustness` (36%) | `2_cs_fit_with_covariates_bootstrap999` (12%) |
+| 2. Brand awareness survey | large | python | `3_replicate_weights_jk1` (46%) | `4_multi_outcome_loop_3_metrics` (26%) | `7_event_study_plus_honest_did` (17%) |
+| 2. Brand awareness survey | large | rust | `3_replicate_weights_jk1` (50%) | `4_multi_outcome_loop_3_metrics` (25%) | `7_event_study_plus_honest_did` (15%) |
+| 3. BRFSS microdata -> CS panel | large | python | `1_aggregate_survey_microdata_to_panel` (91%) | `5_sun_abraham_robustness` (8%) | `2_cs_fit_with_stage2_survey_design` (1%) |
+| 3. BRFSS microdata -> CS panel | large | rust | `1_aggregate_survey_microdata_to_panel` (95%) | `5_sun_abraham_robustness` (4%) | `2_cs_fit_with_stage2_survey_design` (1%) |
 | 4. SDiD few markets | medium | python | `5_sensitivity_to_zeta_omega` (43%) | `3_in_time_placebo` (39%) | `2_sdid_bootstrap_variance_200` (9%) |
-| 4. SDiD few markets | large | rust | `5_sensitivity_to_zeta_omega` (40%) | `3_in_time_placebo` (30%) | `1_sdid_jackknife_variance` (16%) |
-| 5. Reversible dCDH | single | python | `4_heterogeneity_refit` (51%) | `1_dcdh_fit_Lmax3_survey_TSL` (48%) | `3_honest_did_on_placebo` (1%) |
-| 5. Reversible dCDH | single | rust | `4_heterogeneity_refit` (50%) | `1_dcdh_fit_Lmax3_survey_TSL` (49%) | `3_honest_did_on_placebo` (1%) |
-| 6. Pricing dose-response | single | python | `1_cdid_cubic_spline_bootstrap199` (25%) | `6_spline_sensitivity_num_knots2` (25%) | `5_spline_sensitivity_degree1` (25%) |
-| 6. Pricing dose-response | single | rust | `1_cdid_cubic_spline_bootstrap199` (25%) | `6_spline_sensitivity_num_knots2` (25%) | `3_cdid_event_study_pretrend` (25%) |
+| 4. SDiD few markets | large | rust | `5_sensitivity_to_zeta_omega` (38%) | `3_in_time_placebo` (30%) | `1_sdid_jackknife_variance` (16%) |
+| 5. Reversible dCDH | single | python | `4_heterogeneity_refit` (51%) | `1_dcdh_fit_Lmax3_survey_TSL` (49%) | `3_honest_did_on_placebo` (0%) |
+| 5. Reversible dCDH | single | rust | `4_heterogeneity_refit` (50%) | `1_dcdh_fit_Lmax3_survey_TSL` (50%) | `3_honest_did_on_placebo` (0%) |
+| 6. Pricing dose-response | single | python | `1_cdid_cubic_spline_bootstrap199` (26%) | `6_spline_sensitivity_num_knots2` (25%) | `3_cdid_event_study_pretrend` (25%) |
+| 6. Pricing dose-response | single | rust | `1_cdid_cubic_spline_bootstrap199` (26%) | `6_spline_sensitivity_num_knots2` (25%) | `3_cdid_event_study_pretrend` (25%) |
 <!-- TABLE:end top_phases_by_scenario -->
 
 Per-scenario phase narrative (cross-check against the table above after
@@ -129,9 +133,11 @@ any rerun):
   see scale-sweep table); the JK1 replicate-fit loop is not
   Rust-accelerated, so the backends neither help nor hurt each other
   meaningfully on this chain.
-- **BRFSS.** `aggregate_survey` share of total grows with scale and is
-  effectively 100% of runtime at 1M rows. Downstream phases (CS fit,
-  SunAbraham, HonestDiD) are a fraction of a second combined.
+- **BRFSS.** `aggregate_survey` remains the single largest chain share
+  under both backends at every scale, but the absolute chain total is
+  sub-2s at 1M rows after the precompute-scaffolding fix. Downstream
+  phases (CS fit, SunAbraham, HonestDiD) are a fraction of a second
+  combined - see the scale-sweep table for the current totals.
 - **SDiD few markets.** `sensitivity_to_zeta_omega` and
   `in_time_placebo` are the two largest phases under Python at every
   scale and under Rust at medium/large (together ~70% of the chain).
@@ -156,7 +162,7 @@ any rerun):
 
 | # | Location | Scenario + scale | Signal | Recommended action |
 |---|---|---|---|---|
-| 1 | `diff_diff/survey.py:1160` `_compute_stratified_psu_meat` | BRFSS @ 1M rows | dominates BRFSS chain at all scales, ~100% at 1M rows | **Algorithmic fix, highest priority.** Function called once per (state, year) cell (500 calls); per-call work rebuilds stratum-PSU scaffolding every time. Precompute stratum indexes once at `aggregate_survey` top-level and reuse. |
+| 1 | `diff_diff/survey.py` `_compute_stratified_psu_meat` + `aggregate_survey` | BRFSS @ 1M rows | previously dominated BRFSS chain at all scales (~100% at 1M rows) | **LANDED** (this PR). Precompute stratum-PSU scaffolding once per design at `aggregate_survey` top level; replace per-cell pandas groupby with two vectorized `np.bincount` passes. BRFSS-large chain drops from ~24s to sub-2s across both backends. See "Optimization landed" below. |
 | 2 | `diff_diff/imputation.py` ImputationDiD fit (+ `diff_diff/sun_abraham.py` SunAbraham fit) | Staggered CS @ 1,500 units | together consistently ~70-80% of the chain at every scale; either can be the top phase at a given (scale, backend) cell | **Investigate only after BRFSS fix lands.** Total chain is well under practitioner-perceptible threshold; candidate follow-up. Either phase is a legitimate target. |
 | 3 | `diff_diff/utils.py:1434` `_sc_weight_fw_numpy` | SDiD python @ any scale | dominates Python SDiD at all scales | **Already ported to Rust.** Python fallback acceptable as a teaching/safety path; non-production for n > 100. Python skipped at n=500 (jackknife cost would exceed 4 minutes per run). |
 | 4 | `diff_diff/chaisemartin_dhaultfoeuille.py` dCDH fit + heterogeneity | Reversible (single scale) | main fit and survey-aware heterogeneity refit each rebuild TSL scaffolding; heterogeneity phase is as expensive as the main fit | **Cache/precompute** - heterogeneity refit duplicates the main fit's TSL setup under the same `SurveyDesign`. Not P0; newer code path (v3.1) never optimization-reviewed. |
@@ -174,20 +180,20 @@ in `benchmarks/speed_review/baselines/mem_profile_brfss_large_<backend>.txt`.
 <!-- TABLE:start memory_by_scenario -->
 | Scenario | Scale | Py peak RSS (MB) | Py growth (MB) | Rust peak RSS (MB) | Rust growth (MB) |
 |---|---|---:|---:|---:|---:|
-| 1. Staggered campaign | small | 143 | 28 | 151 | 36 |
-|  | medium | 227 | 79 | 254 | 99 |
-|  | large | 472 | 245 | 588 | 322 |
-| 2. Brand awareness survey | small | 127 | 12 | 128 | 13 |
-|  | medium | 188 | 54 | 185 | 50 |
-|  | large | 327 | 139 | 336 | 142 |
-| 3. BRFSS microdata -> CS panel | small | 133 | 11 | 136 | 15 |
-|  | medium | 210 | 17 | 212 | 15 |
-|  | large | 418 | 17 | 429 | 33 |
+| 1. Staggered campaign | small | 146 | 31 | 148 | 34 |
+|  | medium | 235 | 85 | 253 | 100 |
+|  | large | 486 | 251 | 582 | 327 |
+| 2. Brand awareness survey | small | 130 | 15 | 128 | 13 |
+|  | medium | 183 | 45 | 189 | 55 |
+|  | large | 340 | 139 | 348 | 158 |
+| 3. BRFSS microdata -> CS panel | small | 133 | 11 | 130 | 8 |
+|  | medium | 203 | 17 | 200 | 21 |
+|  | large | 413 | 25 | 409 | 25 |
 | 4. SDiD few markets | small | 124 | 10 | 116 | 1 |
-|  | medium | 152 | 8 | 118 | 0 |
+|  | medium | 148 | 8 | 117 | 0 |
 |  | large | skip | skip | 118 | 0 |
-| 5. Reversible dCDH | single | 135 | 22 | 135 | 21 |
-| 6. Pricing dose-response | single | 123 | 9 | 121 | 8 |
+| 5. Reversible dCDH | single | 134 | 20 | 134 | 20 |
+| 6. Pricing dose-response | single | 122 | 8 | 123 | 9 |
 <!-- TABLE:end memory_by_scenario -->
 
 The ~115-130 MB floor is the Python + diff-diff + numpy import footprint;
@@ -195,16 +201,15 @@ the "growth" columns are the practitioner-meaningful numbers.
 
 ### Memory findings
 
-1. **BRFSS `aggregate_survey` is compute-bound, not memory-bound.** At
-   20x data growth (50K -> 1M rows), working-memory growth stays in the
-   low tens of MB. The tracemalloc pass confirms: net retained allocation
-   after `aggregate_survey` returns is well under 1 MB; the top
-   allocation site is `tracemalloc`'s own linecache overhead (a smoking
-   gun that nothing else is allocating meaningfully). **The BRFSS cost
-   is pure CPU; the function is already memory-efficient.** This
-   strengthens the case for the precompute-scaffolding fix: low-risk,
-   pure CPU win, fits in any deployment environment including 512 MB
-   Lambda.
+1. **BRFSS `aggregate_survey` was compute-bound, not memory-bound - and
+   the compute side is now addressed.** Working-memory growth stayed in
+   the low tens of MB across the 20x data-growth sweep (50K -> 1M rows);
+   the pre-fix tracemalloc pass confirmed net retained allocation under
+   1 MB and identified `tracemalloc`'s own linecache overhead as the
+   top allocation site (smoking gun that nothing else was allocating
+   meaningfully). The precompute-scaffolding fix in this PR is a pure
+   CPU win - no change to the function's memory profile, which was
+   already Lambda-friendly.
 2. **Staggered CS chain is memory-heavier than wall-clock suggested.** At
    1,500 units the chain's peak RSS sits in the high-400s to high-500s
    MB depending on backend. Fine for workstations, tight for 512 MB
@@ -229,16 +234,32 @@ the "growth" columns are the practitioner-meaningful numbers.
 
 | # | Opportunity | Time upside | Memory upside | Risk | Priority |
 |---|---|---|---|---|---|
-| 1 | `aggregate_survey` precompute stratum scaffolding | ~-20s at 1M rows | none (already memory-efficient) | Low | **High** |
+| 1 | `aggregate_survey` precompute stratum scaffolding | ~-20s at 1M rows | none (already memory-efficient) | Low | **LANDED** (this PR) |
 | 2 | Staggered CS chain working-memory audit (Lambda-oriented) | none | ~200-300 MB at 1,500 units (peak RSS crosses 512 MB Lambda line under Rust) | Medium | Low (bump to Medium if Lambda deployment becomes a concrete ask) |
 | 3 | dCDH: cache TSL scaffolding across main fit + heterogeneity refit | ~0.2s per chain | ~20 MB per chain | Low | Low |
 | 4 | ImputationDiD fit-loop vectorization audit | ~0.1-0.3s at 1,500 units | unknown | Low | Low |
 | 5 | Rust-port JK1 replicate fit loop | ~0.5s at 160 replicates | ~140 MB at 160 replicates | Medium | Low (demoted: Rust is no longer slower than Python on this path after rerun, so the "fix-a-Rust-regression" leg of the original rationale is gone) |
 
-**Bottom line: one clear priority, four optional.** #1 is the single
-practitioner-perceptible win identified by this analysis and should be
-the next PR. #2-5 are optional polish that should be prioritized by
-concrete deployment-environment signal (Lambda OOMs, practitioner
+### Optimization landed
+
+**#1 shipped in this PR.** `diff_diff/survey.py` now precomputes a
+per-design `_PsuScaffolding` (strata codes, global PSU codes, per-
+stratum counts and FPC ratios, singleton mask, lonely-PSU-aware
+variance-computable flag).  `aggregate_survey` builds it once per call
+and threads it through `_cell_mean_variance` so each per-cell variance
+reduction uses two vectorized `np.bincount` passes instead of a
+per-stratum pandas groupby loop.  Numerics are preserved to sub-ULP
+tolerance; equivalence tests across seven design cases
+(`TestAggregateSurveyScaffolding`) enforce `assert_allclose(atol=1e-14,
+rtol=1e-14)` between fast and legacy paths.
+
+Replicate-weight designs (JK1 etc.) continue to use the legacy
+`compute_replicate_if_variance` code path and are unaffected.
+
+**Bottom line: no practitioner-perceptible bottleneck remains in the
+six measured workflows; four optional items stand by.** Items #2-5
+above should be prioritized by concrete deployment-environment signal
+(Lambda OOMs, practitioner
 reports of slowness at specific shapes), not proactively.
 
 ### Correctness-adjacent observations (not P0, route separately)
diff --git a/tests/test_prep.py b/tests/test_prep.py
index 3c96626b..a9818e4a 100644
--- a/tests/test_prep.py
+++ b/tests/test_prep.py
@@ -3440,3 +3440,217 @@ def test_pweight_retains_zero_precision_geo(self):
             )
         assert 0 not in panel_a["state"].values
         assert len(panel_a) == 6  # 3 states x 2 periods
+
+
+class TestAggregateSurveyScaffolding:
+    """Tests for the amortized TSL variance fast path in aggregate_survey.
+
+    Equivalence tests verify that ``_compute_if_variance_fast`` produces
+    numerically identical ``_mean`` / ``_se`` / ``_precision`` outputs
+    (assert_allclose atol=1e-14 rtol=1e-14) relative to the legacy
+    ``compute_survey_if_variance`` path across every supported design
+    mode and ``lonely_psu`` policy.  Reduction-order drift is expected
+    to be sub-ULP because the formulas are identical and only the
+    order of summation changes (single np.bincount vs per-stratum
+    pandas groupby).
+    """
+
+    def _build_microdata(self, mode, seed=42):
+        """Per-case microdata plus a SurveyDesign that exercises that mode."""
+        rng = np.random.default_rng(seed)
+        n_per_cell = 80
+        state = np.repeat(["A", "B", "C"], 2 * n_per_cell)
+        year = np.tile(np.repeat([2019, 2020], n_per_cell), 3)
+        n = len(state)
+        wt = rng.uniform(0.5, 2.5, n)
+        y = rng.normal(5.0, 1.5, n)
+        df_base = pd.DataFrame(
+            {"state": state, "year": year, "wt": wt, "y": y}
+        )
+
+        if mode == "stratified_fpc":
+            df = df_base.copy()
+            df["stratum"] = rng.integers(0, 4, n)
+            df["psu"] = df["stratum"] * 10 + rng.integers(0, 4, n)
+            df["fpc"] = 200.0  # comfortably above per-stratum n_psu
+            sd = SurveyDesign(weights="wt", strata="stratum", psu="psu", fpc="fpc")
+            return df, sd
+
+        if mode == "stratified_no_fpc":
+            df = df_base.copy()
+            df["stratum"] = rng.integers(0, 4, n)
+            df["psu"] = df["stratum"] * 10 + rng.integers(0, 4, n)
+            sd = SurveyDesign(weights="wt", strata="stratum", psu="psu")
+            return df, sd
+
+        if mode == "stratified_no_psu":
+            # strata present, psu absent — each observation is its own
+            # PSU within its stratum.  This is a distinct scaffolding
+            # branch (survey.py:_precompute_psu_scaffolding, else clause
+            # of the `if psu is not None` block).
+            df = df_base.copy()
+            df["stratum"] = rng.integers(0, 4, n)
+            sd = SurveyDesign(weights="wt", strata="stratum")
+            return df, sd
+
+        if mode == "stratified_no_psu_fpc":
+            # Same branch as above plus stratum-level FPC lookup.
+            df = df_base.copy()
+            df["stratum"] = rng.integers(0, 4, n)
+            df["fpc"] = 1000.0  # well above per-stratum obs count
+            sd = SurveyDesign(weights="wt", strata="stratum", fpc="fpc")
+            return df, sd
+
+        if mode == "psu_only":
+            df = df_base.copy()
+            df["psu"] = rng.integers(0, 12, n)
+            sd = SurveyDesign(weights="wt", psu="psu")
+            return df, sd
+
+        if mode == "weights_only":
+            return df_base.copy(), SurveyDesign(weights="wt")
+
+        if mode.startswith("lonely_"):
+            # Singleton stratum: stratum 0 has exactly one PSU; strata 1..3
+            # each have 4 PSUs.  Forces every lonely_psu branch to engage.
+            df = df_base.copy()
+            strata = rng.integers(1, 4, n)
+            psu = strata * 10 + rng.integers(0, 4, n)
+            sentinel = rng.choice(n, size=n // 8, replace=False)
+            strata[sentinel] = 0
+            psu[sentinel] = 999
+            df["stratum"] = strata
+            df["psu"] = psu
+            policy = mode.split("_", 1)[1]
+            sd = SurveyDesign(
+                weights="wt", strata="stratum", psu="psu", lonely_psu=policy,
+            )
+            return df, sd
+
+        raise ValueError(f"Unknown mode: {mode}")
+
+    @staticmethod
+    def _assert_panels_equivalent(p_fast, p_legacy, outcome="y"):
+        assert len(p_fast) == len(p_legacy)
+        assert list(p_fast.columns) == list(p_legacy.columns)
+        for suffix in ("_mean", "_se", "_precision"):
+            col = f"{outcome}{suffix}"
+            a = p_fast[col].to_numpy(dtype=np.float64)
+            b = p_legacy[col].to_numpy(dtype=np.float64)
+            nan_a, nan_b = np.isnan(a), np.isnan(b)
+            assert np.array_equal(nan_a, nan_b), f"NaN pattern mismatch in {col}"
+            np.testing.assert_allclose(
+                a[~nan_a], b[~nan_b],
+                atol=1e-14, rtol=1e-14,
+                err_msg=f"{col} diverges between fast and legacy paths",
+            )
+
+    @pytest.mark.parametrize(
+        "mode",
+        [
+            "stratified_fpc",
+            "stratified_no_fpc",
+            "stratified_no_psu",
+            "stratified_no_psu_fpc",
+            "psu_only",
+            "weights_only",
+            "lonely_remove",
+            "lonely_certainty",
+            "lonely_adjust",
+        ],
+    )
+    def test_fast_path_equals_legacy(self, mode, monkeypatch):
+        """Fast and legacy paths produce numerically identical panels."""
+        from diff_diff import prep
+
+        data, sd = self._build_microdata(mode)
+        panel_fast, _ = aggregate_survey(
+            data, by=["state", "year"], outcomes="y", survey_design=sd,
+        )
+        # Force the legacy code path by disabling the scaffolding precompute.
+        # _cell_mean_variance falls back to compute_survey_if_variance when
+        # scaffolding is None.
+        monkeypatch.setattr(
+            prep, "_precompute_psu_scaffolding", lambda resolved: None,
+        )
+        panel_legacy, _ = aggregate_survey(
+            data, by=["state", "year"], outcomes="y", survey_design=sd,
+        )
+        self._assert_panels_equivalent(panel_fast, panel_legacy)
+
+    def test_scaffolding_stratified_shape(self):
+        from diff_diff.survey import _precompute_psu_scaffolding
+
+        data, sd = self._build_microdata("stratified_fpc")
+        resolved = sd.resolve(data)
+        scf = _precompute_psu_scaffolding(resolved)
+        assert scf.mode == "stratified"
+        assert scf.n == len(data)
+        assert scf.psu_codes.shape == (len(data),)
+        assert scf.psu_stratum.ndim == 1
+        assert scf.n_psu_per_stratum.ndim == 1
+        assert len(scf.psu_stratum) == int(scf.psu_codes.max() + 1)
+        # adjustment_h is zero for any singleton stratum by construction
+        if scf.singleton_strata.any():
+            assert np.all(scf.adjustment_h[scf.singleton_strata] == 0.0)
+
+    def test_scaffolding_weights_only_shape(self):
+        from diff_diff.survey import _precompute_psu_scaffolding
+
+        data, sd = self._build_microdata("weights_only")
+        resolved = sd.resolve(data)
+        scf = _precompute_psu_scaffolding(resolved)
+        assert scf.mode == "no_strata_no_psu"
+        assert scf.adjustment_direct is not None
+        assert scf.psu_codes is None
+        assert scf.psu_codes_only is None
+
+    def test_scaffolding_psu_only_shape(self):
+        from diff_diff.survey import _precompute_psu_scaffolding
+
+        data, sd = self._build_microdata("psu_only")
+        resolved = sd.resolve(data)
+        scf = _precompute_psu_scaffolding(resolved)
+        assert scf.mode == "psu_only"
+        assert scf.psu_codes_only is not None
+        assert scf.n_psu_only is not None and scf.n_psu_only >= 2
+        assert scf.adjustment_only is not None
+        assert scf.psu_codes is None
+        assert scf.adjustment_direct is None
+
+    def test_lonely_psu_certainty_counts_singletons(self):
+        """Under lonely_psu='certainty', singletons contribute to legitimate_zero_count."""
+        from diff_diff.survey import _precompute_psu_scaffolding
+
+        data, sd = self._build_microdata("lonely_certainty")
+        resolved = sd.resolve(data)
+        scf = _precompute_psu_scaffolding(resolved)
+        n_singletons = int(scf.singleton_strata.sum())
+        assert n_singletons >= 1  # sanity: fixture does plant a singleton
+        assert scf.legitimate_zero_count >= n_singletons
+
+    def test_scaffolding_fpc_saturation_counts(self):
+        """f_h >= 1.0 increments legitimate_zero_count independent of singletons."""
+        from diff_diff.survey import _precompute_psu_scaffolding
+
+        rng = np.random.default_rng(7)
+        n = 200
+        stratum = rng.integers(0, 2, n)
+        # Build exactly 4 unique PSUs per stratum so FPC = n_psu exactly.
+        psu = np.empty(n, dtype=np.int64)
+        for h in range(2):
+            idx = np.where(stratum == h)[0]
+            psu[idx] = np.arange(len(idx)) % 4 + h * 10
+        df = pd.DataFrame(
+            {
+                "wt": rng.uniform(1, 2, n),
+                "stratum": stratum,
+                "psu": psu,
+                "y": rng.normal(size=n),
+                "fpc": 4.0,  # f_h = 4/4 = 1.0
+            }
+        )
+        sd = SurveyDesign(weights="wt", strata="stratum", psu="psu", fpc="fpc")
+        resolved = sd.resolve(df)
+        scf = _precompute_psu_scaffolding(resolved)
+        assert scf.legitimate_zero_count >= 1