igerber · igerber · Apr 20, 2026 · Apr 19, 2026 · Apr 19, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -10,6 +10,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Added
 - **`BusinessReport` and `DiagnosticReport` (experimental preview)** - practitioner-ready output layer. `BusinessReport(results, ...)` produces plain-English narrative summaries (`.summary()`, `.full_report()`, `.export_markdown()`, `.to_dict()`) from any of the 16 fitted result types. `DiagnosticReport(results, ...)` orchestrates the existing diagnostic battery (parallel trends, pre-trends power, HonestDiD sensitivity, Goodman-Bacon, heterogeneity, design-effect, EPV) plus estimator-native diagnostics for SyntheticDiD (`pre_treatment_fit`, weight concentration, in-time placebo, zeta sensitivity) and TROP (factor-model fit metrics). Both classes expose an AI-legible `to_dict()` schema (single source of truth; prose renders from the dict). BR auto-constructs DR by default so summaries mention pre-trends, robustness, and design-effect findings in one call. See `docs/methodology/REPORTING.md` for methodology deviations including the no-traffic-light-gates decision, pre-trends verdict thresholds (0.05 / 0.30), and power-aware phrasing driven by `compute_pretrends_power`. **Both schemas are marked experimental in this release** - wording, verdict thresholds, and schema shape will change; do not anchor downstream tooling on them yet.
 
+### Performance
+- **`aggregate_survey` stratum-PSU scaffolding precompute** — the per-cell Taylor-series variance inside `aggregate_survey` no longer rebuilds stratum-PSU scaffolding on every cell. A frozen `_PsuScaffolding` (strata codes, global PSU codes unique across strata, per-stratum counts and FPC ratios, singleton mask, static legitimate-zero counts and variance-computable flag) is precomputed once per design at the top of `aggregate_survey` and threaded through `_cell_mean_variance` to a new `_compute_if_variance_fast` path that replaces the per-stratum pandas groupby with two vectorized `np.bincount` passes. BRFSS-shaped 50-state × 10-year × 1M-row microdata → state-year panel drops from ~24s to sub-2s under both backends (the path is pure Python, so Python and Rust track each other). Numerical output is preserved to sub-ULP tolerance; seven-case equivalence tests (`TestAggregateSurveyScaffolding`) assert `assert_allclose(atol=1e-14, rtol=1e-14)` between fast and legacy paths across stratified+PSU+FPC, stratified no FPC, PSU-only, weights-only, and all three `lonely_psu` modes (remove / certainty / adjust). Replicate-weight designs continue to route through `compute_replicate_if_variance` unchanged. `_compute_stratified_psu_meat` is untouched — all other TSL callers (DiD / TWFE / CS / etc.) are unaffected.
+
 ### Changed
 - Add Zenodo DOI badge to README; upgrade the BibTeX citation block with the concept DOI (`10.5281/zenodo.19646175`) and list author as Isaac Gerber (matching `CITATION.cff`). Add `doi:` and `identifiers:` entries (concept + versioned) to `CITATION.cff`. DOI was minted by Zenodo when v3.1.3 was released.
 - **`ChaisemartinDHaultfoeuille` heterogeneity + within-group-varying PSU/strata now supported under Binder TSL** - `fit(heterogeneity=..., survey_design=...)` no longer raises `NotImplementedError` when the resolved design's PSU or strata vary across the cells of a group. On the **Binder TSL** branch (`compute_survey_if_variance`), the heterogeneity WLS coefficient IF is expanded to observation level via the cell-period allocator `ψ_i = ψ_g * (w_i / W_{g, out_idx})` on the post-period cell — the DID_l post-period single-cell convention shipped in v3.1.x. Under PSU=group the PSU-level Binder TSL variance is byte-identical to the previous release (PSU-level aggregate telescopes to `ψ_g`); under within-group-varying PSU, mass lands in the post-period PSU of the transition. The **Rao-Wu replicate-weight** branch (`compute_replicate_if_variance`) retains the legacy group-level allocator `ψ_i = ψ_g * (w_i / W_g)`: replicate variance computes `θ_r = sum_i ratio_ir * ψ_i` at observation level and is therefore not PSU-telescoping, so the cell-period allocator would silently change the replicate SE whenever a replicate column's ratios vary within group (e.g., per-row replicate matrices). Replicate + heterogeneity fits therefore produce byte-identical SE to the previous release, and the newly-unblocked `heterogeneity=` + within-group-varying PSU combination is unreachable under replicate designs by construction (`SurveyDesign` rejects `replicate_weights` combined with explicit `strata/psu/fpc`).

diff --git a/benchmarks/speed_review/baselines/brand_awareness_survey_large_python.json b/benchmarks/speed_review/baselines/brand_awareness_survey_large_python.json
@@ -2,47 +2,47 @@
   "scenario": "brand_awareness_survey_large",
   "backend": "python",
   "has_rust_backend": false,
-  "total_seconds": 1.0910496250000001,
+  "total_seconds": 0.8670909579999999,
   "memory": {
     "available": true,
-    "start_mb": 188.45,
-    "peak_mb": 327.44,
-    "growth_mb": 138.98,
+    "start_mb": 200.7,
+    "peak_mb": 340.16,
+    "growth_mb": 139.45,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_naive_fit_no_survey_design": {
-      "seconds": 0.009826500000000182,
+      "seconds": 0.01288558399999995,
       "ok": true,
       "error": null
     },
     "2_tsl_strata_psu_fpc": {
-      "seconds": 0.030280333999999964,
+      "seconds": 0.03156662499999996,
       "ok": true,
       "error": null
     },
     "3_replicate_weights_jk1": {
-      "seconds": 0.6243122919999999,
+      "seconds": 0.39469687499999995,
       "ok": true,
       "error": null
     },
     "4_multi_outcome_loop_3_metrics": {
-      "seconds": 0.24174716599999968,
+      "seconds": 0.22814783400000005,
       "ok": true,
       "error": null
     },
     "5_check_parallel_trends": {
-      "seconds": 0.025623749999999834,
+      "seconds": 0.04083812500000006,
       "ok": true,
       "error": null
     },
     "6_placebo_refit_pre_period": {
-      "seconds": 0.01191299999999984,
+      "seconds": 0.014936375000000002,
       "ok": true,
       "error": null
     },
     "7_event_study_plus_honest_did": {
-      "seconds": 0.147335875,
+      "seconds": 0.14401216700000008,
       "ok": true,
       "error": null
     }

diff --git a/benchmarks/speed_review/baselines/brand_awareness_survey_large_rust.json b/benchmarks/speed_review/baselines/brand_awareness_survey_large_rust.json
@@ -2,47 +2,47 @@
   "scenario": "brand_awareness_survey_large",
   "backend": "rust",
   "has_rust_backend": true,
-  "total_seconds": 1.0000031249999999,
+  "total_seconds": 0.9299781670000002,
   "memory": {
     "available": true,
-    "start_mb": 194.03,
-    "peak_mb": 336.08,
-    "growth_mb": 142.05,
+    "start_mb": 190.2,
+    "peak_mb": 347.92,
+    "growth_mb": 157.72,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_naive_fit_no_survey_design": {
-      "seconds": 0.013511041000000112,
+      "seconds": 0.01335629100000002,
       "ok": true,
       "error": null
     },
     "2_tsl_strata_psu_fpc": {
-      "seconds": 0.03037650000000003,
+      "seconds": 0.0316900830000002,
       "ok": true,
       "error": null
     },
     "3_replicate_weights_jk1": {
-      "seconds": 0.5431151669999998,
+      "seconds": 0.46433058400000005,
       "ok": true,
       "error": null
     },
     "4_multi_outcome_loop_3_metrics": {
-      "seconds": 0.21752962499999962,
+      "seconds": 0.23703795799999994,
       "ok": true,
       "error": null
     },
     "5_check_parallel_trends": {
-      "seconds": 0.04399687500000038,
+      "seconds": 0.030673249999999985,
       "ok": true,
       "error": null
     },
     "6_placebo_refit_pre_period": {
-      "seconds": 0.016433082999999904,
+      "seconds": 0.011707583000000188,
       "ok": true,
       "error": null
     },
     "7_event_study_plus_honest_did": {
-      "seconds": 0.13501837500000002,
+      "seconds": 0.14117254200000007,
       "ok": true,
       "error": null
     }

diff --git a/benchmarks/speed_review/baselines/brand_awareness_survey_medium_python.json b/benchmarks/speed_review/baselines/brand_awareness_survey_medium_python.json
@@ -2,47 +2,47 @@
   "scenario": "brand_awareness_survey_medium",
   "backend": "python",
   "has_rust_backend": false,
-  "total_seconds": 0.563283334,
+  "total_seconds": 0.529578166,
   "memory": {
     "available": true,
-    "start_mb": 133.69,
-    "peak_mb": 187.7,
-    "growth_mb": 54.02,
+    "start_mb": 137.67,
+    "peak_mb": 182.88,
+    "growth_mb": 45.2,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_naive_fit_no_survey_design": {
-      "seconds": 0.010921792000000097,
+      "seconds": 0.01053379199999993,
       "ok": true,
       "error": null
     },
     "2_tsl_strata_psu_fpc": {
-      "seconds": 0.03732066599999995,
+      "seconds": 0.032504792000000005,
       "ok": true,
       "error": null
     },
     "3_replicate_weights_jk1": {
-      "seconds": 0.20805304199999997,
+      "seconds": 0.16178545899999996,
       "ok": true,
       "error": null
     },
     "4_multi_outcome_loop_3_metrics": {
-      "seconds": 0.12622899999999992,
+      "seconds": 0.1744099589999999,
       "ok": true,
       "error": null
     },
     "5_check_parallel_trends": {
-      "seconds": 0.01834783299999998,
+      "seconds": 0.02328412499999999,
       "ok": true,
       "error": null
     },
     "6_placebo_refit_pre_period": {
-      "seconds": 0.054030583000000076,
+      "seconds": 0.06313762499999998,
       "ok": true,
       "error": null
     },
     "7_event_study_plus_honest_did": {
-      "seconds": 0.10836029199999997,
+      "seconds": 0.06389345899999999,
       "ok": true,
       "error": null
     }

diff --git a/benchmarks/speed_review/baselines/brand_awareness_survey_medium_rust.json b/benchmarks/speed_review/baselines/brand_awareness_survey_medium_rust.json
@@ -2,47 +2,47 @@
   "scenario": "brand_awareness_survey_medium",
   "backend": "rust",
   "has_rust_backend": true,
-  "total_seconds": 0.5500554579999999,
+  "total_seconds": 0.50248775,
   "memory": {
     "available": true,
-    "start_mb": 135.36,
-    "peak_mb": 184.86,
-    "growth_mb": 49.5,
+    "start_mb": 133.94,
+    "peak_mb": 189.34,
+    "growth_mb": 55.41,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_naive_fit_no_survey_design": {
-      "seconds": 0.011186999999999947,
+      "seconds": 0.010962209,
       "ok": true,
       "error": null
     },
     "2_tsl_strata_psu_fpc": {
-      "seconds": 0.03363270800000007,
+      "seconds": 0.03478112499999997,
       "ok": true,
       "error": null
     },
     "3_replicate_weights_jk1": {
-      "seconds": 0.18678066699999996,
+      "seconds": 0.13834324999999992,
       "ok": true,
       "error": null
     },
     "4_multi_outcome_loop_3_metrics": {
-      "seconds": 0.16038787500000007,
+      "seconds": 0.1290292500000001,
       "ok": true,
       "error": null
     },
     "5_check_parallel_trends": {
-      "seconds": 0.022171542000000155,
+      "seconds": 0.02951112499999997,
       "ok": true,
       "error": null
     },
     "6_placebo_refit_pre_period": {
-      "seconds": 0.0532650830000001,
+      "seconds": 0.06002304200000008,
       "ok": true,
       "error": null
     },
     "7_event_study_plus_honest_did": {
-      "seconds": 0.08262075000000002,
+      "seconds": 0.09981400000000007,
       "ok": true,
       "error": null
     }

diff --git a/benchmarks/speed_review/baselines/brand_awareness_survey_small_python.json b/benchmarks/speed_review/baselines/brand_awareness_survey_small_python.json
@@ -2,47 +2,47 @@
   "scenario": "brand_awareness_survey_small",
   "backend": "python",
   "has_rust_backend": false,
-  "total_seconds": 0.19338629200000002,
+  "total_seconds": 0.22668149999999998,
   "memory": {
     "available": true,
-    "start_mb": 115.48,
-    "peak_mb": 127.31,
-    "growth_mb": 11.83,
+    "start_mb": 115.44,
+    "peak_mb": 130.16,
+    "growth_mb": 14.72,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_naive_fit_no_survey_design": {
-      "seconds": 0.0014470410000000378,
+      "seconds": 0.00165958300000002,
       "ok": true,
       "error": null
     },
     "2_tsl_strata_psu_fpc": {
-      "seconds": 0.0072707499999999925,
+      "seconds": 0.006191999999999975,
       "ok": true,
       "error": null
     },
     "3_replicate_weights_jk1": {
-      "seconds": 0.023173292000000068,
+      "seconds": 0.02364570900000007,
       "ok": true,
       "error": null
     },
     "4_multi_outcome_loop_3_metrics": {
-      "seconds": 0.03375529200000005,
+      "seconds": 0.07623400000000002,
       "ok": true,
       "error": null
     },
     "5_check_parallel_trends": {
-      "seconds": 0.01041325000000004,
+      "seconds": 0.009393082999999969,
       "ok": true,
       "error": null
     },
     "6_placebo_refit_pre_period": {
-      "seconds": 0.027520249999999913,
+      "seconds": 0.02586829199999996,
       "ok": true,
       "error": null
     },
     "7_event_study_plus_honest_did": {
-      "seconds": 0.08979433299999995,
+      "seconds": 0.08367512499999996,
       "ok": true,
       "error": null
     }

diff --git a/benchmarks/speed_review/baselines/brand_awareness_survey_small_rust.json b/benchmarks/speed_review/baselines/brand_awareness_survey_small_rust.json
@@ -2,47 +2,47 @@
   "scenario": "brand_awareness_survey_small",
   "backend": "rust",
   "has_rust_backend": true,
-  "total_seconds": 0.19669587500000008,
+  "total_seconds": 0.198891041,
   "memory": {
     "available": true,
-    "start_mb": 114.78,
-    "peak_mb": 127.91,
-    "growth_mb": 13.12,
+    "start_mb": 115.05,
+    "peak_mb": 127.78,
+    "growth_mb": 12.73,
     "sampler_interval_s": 0.01
   },
   "phases": {
     "1_naive_fit_no_survey_design": {
-      "seconds": 0.0016678749999999853,
+      "seconds": 0.0019442080000000583,
       "ok": true,
       "error": null
     },
     "2_tsl_strata_psu_fpc": {
-      "seconds": 0.005756874999999995,
+      "seconds": 0.006045499999999926,
       "ok": true,
       "error": null
     },
     "3_replicate_weights_jk1": {
-      "seconds": 0.012066042000000055,
+      "seconds": 0.02063908400000003,
       "ok": true,
       "error": null
     },
     "4_multi_outcome_loop_3_metrics": {
-      "seconds": 0.05887395800000006,
+      "seconds": 0.05060483399999993,
       "ok": true,
       "error": null
     },
     "5_check_parallel_trends": {
-      "seconds": 0.008938375000000054,
+      "seconds": 0.009498208000000008,
       "ok": true,
       "error": null
     },
     "6_placebo_refit_pre_period": {
-      "seconds": 0.0274049999999999,
+      "seconds": 0.025947834000000003,
       "ok": true,
       "error": null
     },
     "7_event_study_plus_honest_did": {
-      "seconds": 0.08197737500000002,
+      "seconds": 0.08419849999999995,
       "ok": true,
       "error": null
     }