diff --git a/CHANGELOG.md b/CHANGELOG.md index 17ec76c4..7f62f00a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added - **`BusinessReport` and `DiagnosticReport` (experimental preview)** - practitioner-ready output layer. `BusinessReport(results, ...)` produces plain-English narrative summaries (`.summary()`, `.full_report()`, `.export_markdown()`, `.to_dict()`) from any of the 16 fitted result types. `DiagnosticReport(results, ...)` orchestrates the existing diagnostic battery (parallel trends, pre-trends power, HonestDiD sensitivity, Goodman-Bacon, heterogeneity, design-effect, EPV) plus estimator-native diagnostics for SyntheticDiD (`pre_treatment_fit`, weight concentration, in-time placebo, zeta sensitivity) and TROP (factor-model fit metrics). Both classes expose an AI-legible `to_dict()` schema (single source of truth; prose renders from the dict). BR auto-constructs DR by default so summaries mention pre-trends, robustness, and design-effect findings in one call. See `docs/methodology/REPORTING.md` for methodology deviations including the no-traffic-light-gates decision, pre-trends verdict thresholds (0.05 / 0.30), and power-aware phrasing driven by `compute_pretrends_power`. **Both schemas are marked experimental in this release** - wording, verdict thresholds, and schema shape will change; do not anchor downstream tooling on them yet. +### Performance +- **`aggregate_survey` stratum-PSU scaffolding precompute** — the per-cell Taylor-series variance inside `aggregate_survey` no longer rebuilds stratum-PSU scaffolding on every cell. A frozen `_PsuScaffolding` (strata codes, global PSU codes unique across strata, per-stratum counts and FPC ratios, singleton mask, static legitimate-zero counts and variance-computable flag) is precomputed once per design at the top of `aggregate_survey` and threaded through `_cell_mean_variance` to a new `_compute_if_variance_fast` path that replaces the per-stratum pandas groupby with two vectorized `np.bincount` passes. BRFSS-shaped 50-state × 10-year × 1M-row microdata → state-year panel drops from ~24s to sub-2s under both backends (the path is pure Python, so Python and Rust track each other). Numerical output is preserved to sub-ULP tolerance; seven-case equivalence tests (`TestAggregateSurveyScaffolding`) assert `assert_allclose(atol=1e-14, rtol=1e-14)` between fast and legacy paths across stratified+PSU+FPC, stratified no FPC, PSU-only, weights-only, and all three `lonely_psu` modes (remove / certainty / adjust). Replicate-weight designs continue to route through `compute_replicate_if_variance` unchanged. `_compute_stratified_psu_meat` is untouched — all other TSL callers (DiD / TWFE / CS / etc.) are unaffected. + ### Changed - Add Zenodo DOI badge to README; upgrade the BibTeX citation block with the concept DOI (`10.5281/zenodo.19646175`) and list author as Isaac Gerber (matching `CITATION.cff`). Add `doi:` and `identifiers:` entries (concept + versioned) to `CITATION.cff`. DOI was minted by Zenodo when v3.1.3 was released. - **`ChaisemartinDHaultfoeuille` heterogeneity + within-group-varying PSU/strata now supported under Binder TSL** - `fit(heterogeneity=..., survey_design=...)` no longer raises `NotImplementedError` when the resolved design's PSU or strata vary across the cells of a group. On the **Binder TSL** branch (`compute_survey_if_variance`), the heterogeneity WLS coefficient IF is expanded to observation level via the cell-period allocator `ψ_i = ψ_g * (w_i / W_{g, out_idx})` on the post-period cell — the DID_l post-period single-cell convention shipped in v3.1.x. Under PSU=group the PSU-level Binder TSL variance is byte-identical to the previous release (PSU-level aggregate telescopes to `ψ_g`); under within-group-varying PSU, mass lands in the post-period PSU of the transition. The **Rao-Wu replicate-weight** branch (`compute_replicate_if_variance`) retains the legacy group-level allocator `ψ_i = ψ_g * (w_i / W_g)`: replicate variance computes `θ_r = sum_i ratio_ir * ψ_i` at observation level and is therefore not PSU-telescoping, so the cell-period allocator would silently change the replicate SE whenever a replicate column's ratios vary within group (e.g., per-row replicate matrices). Replicate + heterogeneity fits therefore produce byte-identical SE to the previous release, and the newly-unblocked `heterogeneity=` + within-group-varying PSU combination is unreachable under replicate designs by construction (`SurveyDesign` rejects `replicate_weights` combined with explicit `strata/psu/fpc`). diff --git a/benchmarks/speed_review/baselines/brand_awareness_survey_large_python.json b/benchmarks/speed_review/baselines/brand_awareness_survey_large_python.json index c8eb9108..22ed15d1 100644 --- a/benchmarks/speed_review/baselines/brand_awareness_survey_large_python.json +++ b/benchmarks/speed_review/baselines/brand_awareness_survey_large_python.json @@ -2,47 +2,47 @@ "scenario": "brand_awareness_survey_large", "backend": "python", "has_rust_backend": false, - "total_seconds": 1.0910496250000001, + "total_seconds": 0.8670909579999999, "memory": { "available": true, - "start_mb": 188.45, - "peak_mb": 327.44, - "growth_mb": 138.98, + "start_mb": 200.7, + "peak_mb": 340.16, + "growth_mb": 139.45, "sampler_interval_s": 0.01 }, "phases": { "1_naive_fit_no_survey_design": { - "seconds": 0.009826500000000182, + "seconds": 0.01288558399999995, "ok": true, "error": null }, "2_tsl_strata_psu_fpc": { - "seconds": 0.030280333999999964, + "seconds": 0.03156662499999996, "ok": true, "error": null }, "3_replicate_weights_jk1": { - "seconds": 0.6243122919999999, + "seconds": 0.39469687499999995, "ok": true, "error": null }, "4_multi_outcome_loop_3_metrics": { - "seconds": 0.24174716599999968, + "seconds": 0.22814783400000005, "ok": true, "error": null }, "5_check_parallel_trends": { - "seconds": 0.025623749999999834, + "seconds": 0.04083812500000006, "ok": true, "error": null }, "6_placebo_refit_pre_period": { - "seconds": 0.01191299999999984, + "seconds": 0.014936375000000002, "ok": true, "error": null }, "7_event_study_plus_honest_did": { - "seconds": 0.147335875, + "seconds": 0.14401216700000008, "ok": true, "error": null } diff --git a/benchmarks/speed_review/baselines/brand_awareness_survey_large_rust.json b/benchmarks/speed_review/baselines/brand_awareness_survey_large_rust.json index a3eb721c..ffcc5060 100644 --- a/benchmarks/speed_review/baselines/brand_awareness_survey_large_rust.json +++ b/benchmarks/speed_review/baselines/brand_awareness_survey_large_rust.json @@ -2,47 +2,47 @@ "scenario": "brand_awareness_survey_large", "backend": "rust", "has_rust_backend": true, - "total_seconds": 1.0000031249999999, + "total_seconds": 0.9299781670000002, "memory": { "available": true, - "start_mb": 194.03, - "peak_mb": 336.08, - "growth_mb": 142.05, + "start_mb": 190.2, + "peak_mb": 347.92, + "growth_mb": 157.72, "sampler_interval_s": 0.01 }, "phases": { "1_naive_fit_no_survey_design": { - "seconds": 0.013511041000000112, + "seconds": 0.01335629100000002, "ok": true, "error": null }, "2_tsl_strata_psu_fpc": { - "seconds": 0.03037650000000003, + "seconds": 0.0316900830000002, "ok": true, "error": null }, "3_replicate_weights_jk1": { - "seconds": 0.5431151669999998, + "seconds": 0.46433058400000005, "ok": true, "error": null }, "4_multi_outcome_loop_3_metrics": { - "seconds": 0.21752962499999962, + "seconds": 0.23703795799999994, "ok": true, "error": null }, "5_check_parallel_trends": { - "seconds": 0.04399687500000038, + "seconds": 0.030673249999999985, "ok": true, "error": null }, "6_placebo_refit_pre_period": { - "seconds": 0.016433082999999904, + "seconds": 0.011707583000000188, "ok": true, "error": null }, "7_event_study_plus_honest_did": { - "seconds": 0.13501837500000002, + "seconds": 0.14117254200000007, "ok": true, "error": null } diff --git a/benchmarks/speed_review/baselines/brand_awareness_survey_medium_python.json b/benchmarks/speed_review/baselines/brand_awareness_survey_medium_python.json index 869c5393..a59f68b4 100644 --- a/benchmarks/speed_review/baselines/brand_awareness_survey_medium_python.json +++ b/benchmarks/speed_review/baselines/brand_awareness_survey_medium_python.json @@ -2,47 +2,47 @@ "scenario": "brand_awareness_survey_medium", "backend": "python", "has_rust_backend": false, - "total_seconds": 0.563283334, + "total_seconds": 0.529578166, "memory": { "available": true, - "start_mb": 133.69, - "peak_mb": 187.7, - "growth_mb": 54.02, + "start_mb": 137.67, + "peak_mb": 182.88, + "growth_mb": 45.2, "sampler_interval_s": 0.01 }, "phases": { "1_naive_fit_no_survey_design": { - "seconds": 0.010921792000000097, + "seconds": 0.01053379199999993, "ok": true, "error": null }, "2_tsl_strata_psu_fpc": { - "seconds": 0.03732066599999995, + "seconds": 0.032504792000000005, "ok": true, "error": null }, "3_replicate_weights_jk1": { - "seconds": 0.20805304199999997, + "seconds": 0.16178545899999996, "ok": true, "error": null }, "4_multi_outcome_loop_3_metrics": { - "seconds": 0.12622899999999992, + "seconds": 0.1744099589999999, "ok": true, "error": null }, "5_check_parallel_trends": { - "seconds": 0.01834783299999998, + "seconds": 0.02328412499999999, "ok": true, "error": null }, "6_placebo_refit_pre_period": { - "seconds": 0.054030583000000076, + "seconds": 0.06313762499999998, "ok": true, "error": null }, "7_event_study_plus_honest_did": { - "seconds": 0.10836029199999997, + "seconds": 0.06389345899999999, "ok": true, "error": null } diff --git a/benchmarks/speed_review/baselines/brand_awareness_survey_medium_rust.json b/benchmarks/speed_review/baselines/brand_awareness_survey_medium_rust.json index 2ceed1ca..42535c3a 100644 --- a/benchmarks/speed_review/baselines/brand_awareness_survey_medium_rust.json +++ b/benchmarks/speed_review/baselines/brand_awareness_survey_medium_rust.json @@ -2,47 +2,47 @@ "scenario": "brand_awareness_survey_medium", "backend": "rust", "has_rust_backend": true, - "total_seconds": 0.5500554579999999, + "total_seconds": 0.50248775, "memory": { "available": true, - "start_mb": 135.36, - "peak_mb": 184.86, - "growth_mb": 49.5, + "start_mb": 133.94, + "peak_mb": 189.34, + "growth_mb": 55.41, "sampler_interval_s": 0.01 }, "phases": { "1_naive_fit_no_survey_design": { - "seconds": 0.011186999999999947, + "seconds": 0.010962209, "ok": true, "error": null }, "2_tsl_strata_psu_fpc": { - "seconds": 0.03363270800000007, + "seconds": 0.03478112499999997, "ok": true, "error": null }, "3_replicate_weights_jk1": { - "seconds": 0.18678066699999996, + "seconds": 0.13834324999999992, "ok": true, "error": null }, "4_multi_outcome_loop_3_metrics": { - "seconds": 0.16038787500000007, + "seconds": 0.1290292500000001, "ok": true, "error": null }, "5_check_parallel_trends": { - "seconds": 0.022171542000000155, + "seconds": 0.02951112499999997, "ok": true, "error": null }, "6_placebo_refit_pre_period": { - "seconds": 0.0532650830000001, + "seconds": 0.06002304200000008, "ok": true, "error": null }, "7_event_study_plus_honest_did": { - "seconds": 0.08262075000000002, + "seconds": 0.09981400000000007, "ok": true, "error": null } diff --git a/benchmarks/speed_review/baselines/brand_awareness_survey_small_python.json b/benchmarks/speed_review/baselines/brand_awareness_survey_small_python.json index 699da724..51e34058 100644 --- a/benchmarks/speed_review/baselines/brand_awareness_survey_small_python.json +++ b/benchmarks/speed_review/baselines/brand_awareness_survey_small_python.json @@ -2,47 +2,47 @@ "scenario": "brand_awareness_survey_small", "backend": "python", "has_rust_backend": false, - "total_seconds": 0.19338629200000002, + "total_seconds": 0.22668149999999998, "memory": { "available": true, - "start_mb": 115.48, - "peak_mb": 127.31, - "growth_mb": 11.83, + "start_mb": 115.44, + "peak_mb": 130.16, + "growth_mb": 14.72, "sampler_interval_s": 0.01 }, "phases": { "1_naive_fit_no_survey_design": { - "seconds": 0.0014470410000000378, + "seconds": 0.00165958300000002, "ok": true, "error": null }, "2_tsl_strata_psu_fpc": { - "seconds": 0.0072707499999999925, + "seconds": 0.006191999999999975, "ok": true, "error": null }, "3_replicate_weights_jk1": { - "seconds": 0.023173292000000068, + "seconds": 0.02364570900000007, "ok": true, "error": null }, "4_multi_outcome_loop_3_metrics": { - "seconds": 0.03375529200000005, + "seconds": 0.07623400000000002, "ok": true, "error": null }, "5_check_parallel_trends": { - "seconds": 0.01041325000000004, + "seconds": 0.009393082999999969, "ok": true, "error": null }, "6_placebo_refit_pre_period": { - "seconds": 0.027520249999999913, + "seconds": 0.02586829199999996, "ok": true, "error": null }, "7_event_study_plus_honest_did": { - "seconds": 0.08979433299999995, + "seconds": 0.08367512499999996, "ok": true, "error": null } diff --git a/benchmarks/speed_review/baselines/brand_awareness_survey_small_rust.json b/benchmarks/speed_review/baselines/brand_awareness_survey_small_rust.json index 006bc684..00cd03e8 100644 --- a/benchmarks/speed_review/baselines/brand_awareness_survey_small_rust.json +++ b/benchmarks/speed_review/baselines/brand_awareness_survey_small_rust.json @@ -2,47 +2,47 @@ "scenario": "brand_awareness_survey_small", "backend": "rust", "has_rust_backend": true, - "total_seconds": 0.19669587500000008, + "total_seconds": 0.198891041, "memory": { "available": true, - "start_mb": 114.78, - "peak_mb": 127.91, - "growth_mb": 13.12, + "start_mb": 115.05, + "peak_mb": 127.78, + "growth_mb": 12.73, "sampler_interval_s": 0.01 }, "phases": { "1_naive_fit_no_survey_design": { - "seconds": 0.0016678749999999853, + "seconds": 0.0019442080000000583, "ok": true, "error": null }, "2_tsl_strata_psu_fpc": { - "seconds": 0.005756874999999995, + "seconds": 0.006045499999999926, "ok": true, "error": null }, "3_replicate_weights_jk1": { - "seconds": 0.012066042000000055, + "seconds": 0.02063908400000003, "ok": true, "error": null }, "4_multi_outcome_loop_3_metrics": { - "seconds": 0.05887395800000006, + "seconds": 0.05060483399999993, "ok": true, "error": null }, "5_check_parallel_trends": { - "seconds": 0.008938375000000054, + "seconds": 0.009498208000000008, "ok": true, "error": null }, "6_placebo_refit_pre_period": { - "seconds": 0.0274049999999999, + "seconds": 0.025947834000000003, "ok": true, "error": null }, "7_event_study_plus_honest_did": { - "seconds": 0.08197737500000002, + "seconds": 0.08419849999999995, "ok": true, "error": null } diff --git a/benchmarks/speed_review/baselines/brfss_panel_large_python.json b/benchmarks/speed_review/baselines/brfss_panel_large_python.json index 1772355b..9437734c 100644 --- a/benchmarks/speed_review/baselines/brfss_panel_large_python.json +++ b/benchmarks/speed_review/baselines/brfss_panel_large_python.json @@ -2,42 +2,42 @@ "scenario": "brfss_panel_large", "backend": "python", "has_rust_backend": false, - "total_seconds": 24.406984582999996, + "total_seconds": 1.328024584, "memory": { "available": true, - "start_mb": 401.05, - "peak_mb": 418.12, - "growth_mb": 17.08, + "start_mb": 387.59, + "peak_mb": 412.75, + "growth_mb": 25.16, "sampler_interval_s": 0.01 }, "phases": { "1_aggregate_survey_microdata_to_panel": { - "seconds": 24.295822291, + "seconds": 1.2118086249999998, "ok": true, "error": null }, "2_cs_fit_with_stage2_survey_design": { - "seconds": 0.012265292000002148, + "seconds": 0.012898916999999788, "ok": true, "error": null }, "3_inspect_pretrends": { - "seconds": 2.2919999977943917e-06, + "seconds": 2.5409999997449972e-06, "ok": true, "error": null }, "4_honest_did_grid": { - "seconds": 0.0016812089999973523, + "seconds": 0.0018360419999998712, "ok": true, "error": null }, "5_sun_abraham_robustness": { - "seconds": 0.09669395799999592, + "seconds": 0.10123833299999996, "ok": true, "error": null }, "6_practitioner_next_steps": { - "seconds": 0.0005083750000025589, + "seconds": 0.00022966599999962867, "ok": true, "error": null } diff --git a/benchmarks/speed_review/baselines/brfss_panel_large_rust.json b/benchmarks/speed_review/baselines/brfss_panel_large_rust.json index 886c63cc..338bfe61 100644 --- a/benchmarks/speed_review/baselines/brfss_panel_large_rust.json +++ b/benchmarks/speed_review/baselines/brfss_panel_large_rust.json @@ -2,42 +2,42 @@ "scenario": "brfss_panel_large", "backend": "rust", "has_rust_backend": true, - "total_seconds": 24.936181916, + "total_seconds": 1.31504775, "memory": { "available": true, - "start_mb": 396.06, - "peak_mb": 429.31, - "growth_mb": 33.25, + "start_mb": 384.2, + "peak_mb": 409.28, + "growth_mb": 25.08, "sampler_interval_s": 0.01 }, "phases": { "1_aggregate_survey_microdata_to_panel": { - "seconds": 24.820139083, + "seconds": 1.2451636250000002, "ok": true, "error": null }, "2_cs_fit_with_stage2_survey_design": { - "seconds": 0.012674374999996019, + "seconds": 0.013531541999999952, "ok": true, "error": null }, "3_inspect_pretrends": { - "seconds": 2.500000000793534e-06, + "seconds": 2.916000000130481e-06, "ok": true, "error": null }, "4_honest_did_grid": { - "seconds": 0.0015977500000019518, + "seconds": 0.001939415999999916, "ok": true, "error": null }, "5_sun_abraham_robustness": { - "seconds": 0.10144270800000044, + "seconds": 0.054231499999999766, "ok": true, "error": null }, "6_practitioner_next_steps": { - "seconds": 0.00030387500000017553, + "seconds": 0.0001666249999998648, "ok": true, "error": null } diff --git a/benchmarks/speed_review/baselines/brfss_panel_medium_python.json b/benchmarks/speed_review/baselines/brfss_panel_medium_python.json index 91e5e648..ea65bf9d 100644 --- a/benchmarks/speed_review/baselines/brfss_panel_medium_python.json +++ b/benchmarks/speed_review/baselines/brfss_panel_medium_python.json @@ -2,42 +2,42 @@ "scenario": "brfss_panel_medium", "backend": "python", "has_rust_backend": false, - "total_seconds": 6.096216417, + "total_seconds": 0.48709708400000007, "memory": { "available": true, - "start_mb": 193.25, - "peak_mb": 209.78, - "growth_mb": 16.53, + "start_mb": 185.42, + "peak_mb": 202.75, + "growth_mb": 17.33, "sampler_interval_s": 0.01 }, "phases": { "1_aggregate_survey_microdata_to_panel": { - "seconds": 5.9895347910000005, + "seconds": 0.372203458, "ok": true, "error": null }, "2_cs_fit_with_stage2_survey_design": { - "seconds": 0.012643416999999602, + "seconds": 0.01215470800000018, "ok": true, "error": null }, "3_inspect_pretrends": { - "seconds": 2.166999999886343e-06, + "seconds": 2.5000000001274003e-06, "ok": true, "error": null }, "4_honest_did_grid": { - "seconds": 0.0015969160000004479, + "seconds": 0.0016202499999999898, "ok": true, "error": null }, "5_sun_abraham_robustness": { - "seconds": 0.0921533340000007, + "seconds": 0.10084249999999995, "ok": true, "error": null }, "6_practitioner_next_steps": { - "seconds": 0.0002710829999994502, + "seconds": 0.000269875000000086, "ok": true, "error": null } diff --git a/benchmarks/speed_review/baselines/brfss_panel_medium_rust.json b/benchmarks/speed_review/baselines/brfss_panel_medium_rust.json index 670b3135..7876dd32 100644 --- a/benchmarks/speed_review/baselines/brfss_panel_medium_rust.json +++ b/benchmarks/speed_review/baselines/brfss_panel_medium_rust.json @@ -2,42 +2,42 @@ "scenario": "brfss_panel_medium", "backend": "rust", "has_rust_backend": true, - "total_seconds": 6.228102207999999, + "total_seconds": 0.472971041, "memory": { "available": true, - "start_mb": 197.56, - "peak_mb": 212.22, - "growth_mb": 14.66, + "start_mb": 178.69, + "peak_mb": 199.55, + "growth_mb": 20.86, "sampler_interval_s": 0.01 }, "phases": { "1_aggregate_survey_microdata_to_panel": { - "seconds": 6.142273, + "seconds": 0.4003294999999999, "ok": true, "error": null }, "2_cs_fit_with_stage2_survey_design": { - "seconds": 0.012037416000000078, + "seconds": 0.0133387920000001, "ok": true, "error": null }, "3_inspect_pretrends": { - "seconds": 2.1249999999639613e-06, + "seconds": 2.4999999999053557e-06, "ok": true, "error": null }, "4_honest_did_grid": { - "seconds": 0.0016153329999983868, + "seconds": 0.0020148749999999715, "ok": true, "error": null }, "5_sun_abraham_robustness": { - "seconds": 0.07184195800000026, + "seconds": 0.057244916000000146, "ok": true, "error": null }, "6_practitioner_next_steps": { - "seconds": 0.0003229160000000064, + "seconds": 3.6416000000150106e-05, "ok": true, "error": null } diff --git a/benchmarks/speed_review/baselines/brfss_panel_small_python.json b/benchmarks/speed_review/baselines/brfss_panel_small_python.json index 093a7daf..127748c2 100644 --- a/benchmarks/speed_review/baselines/brfss_panel_small_python.json +++ b/benchmarks/speed_review/baselines/brfss_panel_small_python.json @@ -2,42 +2,42 @@ "scenario": "brfss_panel_small", "backend": "python", "has_rust_backend": false, - "total_seconds": 1.608562042, + "total_seconds": 0.21261929199999996, "memory": { "available": true, - "start_mb": 121.97, - "peak_mb": 133.39, - "growth_mb": 11.42, + "start_mb": 121.34, + "peak_mb": 132.62, + "growth_mb": 11.28, "sampler_interval_s": 0.01 }, "phases": { "1_aggregate_survey_microdata_to_panel": { - "seconds": 1.523675458, + "seconds": 0.08785816700000004, "ok": true, "error": null }, "2_cs_fit_with_stage2_survey_design": { - "seconds": 0.015124000000000137, + "seconds": 0.016040416999999918, "ok": true, "error": null }, "3_inspect_pretrends": { - "seconds": 2.165999999803603e-06, + "seconds": 2.583000000000446e-06, "ok": true, "error": null }, "4_honest_did_grid": { - "seconds": 0.004194041999999953, + "seconds": 0.004216333999999988, "ok": true, "error": null }, "5_sun_abraham_robustness": { - "seconds": 0.0653021250000001, + "seconds": 0.10422679200000007, "ok": true, "error": null }, "6_practitioner_next_steps": { - "seconds": 0.00026012500000005545, + "seconds": 0.00026649999999994733, "ok": true, "error": null } diff --git a/benchmarks/speed_review/baselines/brfss_panel_small_rust.json b/benchmarks/speed_review/baselines/brfss_panel_small_rust.json index a1f19a21..a22692ca 100644 --- a/benchmarks/speed_review/baselines/brfss_panel_small_rust.json +++ b/benchmarks/speed_review/baselines/brfss_panel_small_rust.json @@ -2,42 +2,42 @@ "scenario": "brfss_panel_small", "backend": "rust", "has_rust_backend": true, - "total_seconds": 1.6610665, + "total_seconds": 0.16585016600000002, "memory": { "available": true, - "start_mb": 121.16, - "peak_mb": 136.44, - "growth_mb": 15.28, + "start_mb": 121.91, + "peak_mb": 130.25, + "growth_mb": 8.34, "sampler_interval_s": 0.01 }, "phases": { "1_aggregate_survey_microdata_to_panel": { - "seconds": 1.5438897920000003, + "seconds": 0.084868791, "ok": true, "error": null }, "2_cs_fit_with_stage2_survey_design": { - "seconds": 0.01586162499999988, + "seconds": 0.016418874999999944, "ok": true, "error": null }, "3_inspect_pretrends": { - "seconds": 2.4999999999053557e-06, + "seconds": 3.124999999992717e-06, "ok": true, "error": null }, "4_honest_did_grid": { - "seconds": 0.003953542000000088, + "seconds": 0.004238000000000075, "ok": true, "error": null }, "5_sun_abraham_robustness": { - "seconds": 0.09701791599999998, + "seconds": 0.060278041000000004, "ok": true, "error": null }, "6_practitioner_next_steps": { - "seconds": 0.00032904199999972406, + "seconds": 3.820799999998403e-05, "ok": true, "error": null } diff --git a/benchmarks/speed_review/baselines/campaign_staggered_large_python.json b/benchmarks/speed_review/baselines/campaign_staggered_large_python.json index 0c2dc359..19bf1a59 100644 --- a/benchmarks/speed_review/baselines/campaign_staggered_large_python.json +++ b/benchmarks/speed_review/baselines/campaign_staggered_large_python.json @@ -2,52 +2,52 @@ "scenario": "campaign_staggered_large", "backend": "python", "has_rust_backend": false, - "total_seconds": 1.3326843750000001, + "total_seconds": 1.321951625, "memory": { "available": true, - "start_mb": 227.28, - "peak_mb": 472.22, - "growth_mb": 244.94, + "start_mb": 235.58, + "peak_mb": 486.17, + "growth_mb": 250.59, "sampler_interval_s": 0.01 }, "phases": { "1_bacon_decomposition": { - "seconds": 0.019139459000000025, + "seconds": 0.019820957999999944, "ok": true, "error": null }, "2_cs_fit_with_covariates_bootstrap999": { - "seconds": 0.16680450000000002, + "seconds": 0.17604354199999994, "ok": true, "error": null }, "3_inspect_pretrends": { - "seconds": 3.042000000341716e-06, + "seconds": 3.4580000001227518e-06, "ok": true, "error": null }, "4_honest_did_M_grid": { - "seconds": 0.002607332999999823, + "seconds": 0.002394666999999906, "ok": true, "error": null }, "5_sun_abraham_robustness": { - "seconds": 0.3669262500000001, + "seconds": 0.279372666, "ok": true, "error": null }, "6_imputation_did_robustness": { - "seconds": 0.649511, + "seconds": 0.716293292, "ok": true, "error": null }, "7_cs_without_covariates": { - "seconds": 0.12763954200000027, + "seconds": 0.12797208299999996, "ok": true, "error": null }, "8_practitioner_next_steps": { - "seconds": 4.033299999983697e-05, + "seconds": 3.8041999999904874e-05, "ok": true, "error": null } diff --git a/benchmarks/speed_review/baselines/campaign_staggered_large_rust.json b/benchmarks/speed_review/baselines/campaign_staggered_large_rust.json index 6766f7ac..87200f59 100644 --- a/benchmarks/speed_review/baselines/campaign_staggered_large_rust.json +++ b/benchmarks/speed_review/baselines/campaign_staggered_large_rust.json @@ -2,52 +2,52 @@ "scenario": "campaign_staggered_large", "backend": "rust", "has_rust_backend": true, - "total_seconds": 1.3826507919999997, + "total_seconds": 1.310933833, "memory": { "available": true, - "start_mb": 265.8, - "peak_mb": 587.92, - "growth_mb": 322.12, + "start_mb": 254.7, + "peak_mb": 581.67, + "growth_mb": 326.97, "sampler_interval_s": 0.01 }, "phases": { "1_bacon_decomposition": { - "seconds": 0.019430332999999855, + "seconds": 0.01872620799999991, "ok": true, "error": null }, "2_cs_fit_with_covariates_bootstrap999": { - "seconds": 0.17791104199999985, + "seconds": 0.1628326659999999, "ok": true, "error": null }, "3_inspect_pretrends": { - "seconds": 3.5419999999675156e-06, + "seconds": 3.459000000205492e-06, "ok": true, "error": null }, "4_honest_did_M_grid": { - "seconds": 0.0025778330000001404, + "seconds": 0.00247950000000019, "ok": true, "error": null }, "5_sun_abraham_robustness": { - "seconds": 0.5076542499999999, + "seconds": 0.4679546669999999, "ok": true, "error": null }, "6_imputation_did_robustness": { - "seconds": 0.5523530000000001, + "seconds": 0.539718041, "ok": true, "error": null }, "7_cs_without_covariates": { - "seconds": 0.12266958400000005, + "seconds": 0.1191795830000002, "ok": true, "error": null }, "8_practitioner_next_steps": { - "seconds": 4.233299999967244e-05, + "seconds": 3.449999999993736e-05, "ok": true, "error": null } diff --git a/benchmarks/speed_review/baselines/campaign_staggered_medium_python.json b/benchmarks/speed_review/baselines/campaign_staggered_medium_python.json index 914a09aa..234f2918 100644 --- a/benchmarks/speed_review/baselines/campaign_staggered_medium_python.json +++ b/benchmarks/speed_review/baselines/campaign_staggered_medium_python.json @@ -2,52 +2,52 @@ "scenario": "campaign_staggered_medium", "backend": "python", "has_rust_backend": false, - "total_seconds": 0.7537883749999998, + "total_seconds": 0.81063825, "memory": { "available": true, - "start_mb": 147.67, - "peak_mb": 226.62, - "growth_mb": 78.95, + "start_mb": 150.39, + "peak_mb": 235.06, + "growth_mb": 84.67, "sampler_interval_s": 0.01 }, "phases": { "1_bacon_decomposition": { - "seconds": 0.012091666999999973, + "seconds": 0.013887540999999892, "ok": true, "error": null }, "2_cs_fit_with_covariates_bootstrap999": { - "seconds": 0.09575774999999997, + "seconds": 0.10513504099999982, "ok": true, "error": null }, "3_inspect_pretrends": { - "seconds": 2.9589999999135586e-06, + "seconds": 3.750000000080078e-06, "ok": true, "error": null }, "4_honest_did_M_grid": { - "seconds": 0.002356958999999881, + "seconds": 0.0026329160000000407, "ok": true, "error": null }, "5_sun_abraham_robustness": { - "seconds": 0.276134208, + "seconds": 0.2873527090000001, "ok": true, "error": null }, "6_imputation_did_robustness": { - "seconds": 0.2946765, + "seconds": 0.3267266660000001, "ok": true, "error": null }, "7_cs_without_covariates": { - "seconds": 0.07270195899999998, + "seconds": 0.07484287499999986, "ok": true, "error": null }, "8_practitioner_next_steps": { - "seconds": 5.983399999998085e-05, + "seconds": 5.050000000039745e-05, "ok": true, "error": null } diff --git a/benchmarks/speed_review/baselines/campaign_staggered_medium_rust.json b/benchmarks/speed_review/baselines/campaign_staggered_medium_rust.json index 81c02255..55107bbb 100644 --- a/benchmarks/speed_review/baselines/campaign_staggered_medium_rust.json +++ b/benchmarks/speed_review/baselines/campaign_staggered_medium_rust.json @@ -2,52 +2,52 @@ "scenario": "campaign_staggered_medium", "backend": "rust", "has_rust_backend": true, - "total_seconds": 0.756008333, + "total_seconds": 0.814152875, "memory": { "available": true, - "start_mb": 154.94, - "peak_mb": 254.11, - "growth_mb": 99.17, + "start_mb": 152.19, + "peak_mb": 252.59, + "growth_mb": 100.41, "sampler_interval_s": 0.01 }, "phases": { "1_bacon_decomposition": { - "seconds": 0.012925999999999993, + "seconds": 0.012288542000000069, "ok": true, "error": null }, "2_cs_fit_with_covariates_bootstrap999": { - "seconds": 0.09863954099999983, + "seconds": 0.09617150000000008, "ok": true, "error": null }, "3_inspect_pretrends": { - "seconds": 3.1659999999433808e-06, + "seconds": 3.084000000042053e-06, "ok": true, "error": null }, "4_honest_did_M_grid": { - "seconds": 0.0024457499999999133, + "seconds": 0.002409292000000063, "ok": true, "error": null }, "5_sun_abraham_robustness": { - "seconds": 0.281516125, + "seconds": 0.4186234579999999, "ok": true, "error": null }, "6_imputation_did_robustness": { - "seconds": 0.29128733399999995, + "seconds": 0.217003375, "ok": true, "error": null }, "7_cs_without_covariates": { - "seconds": 0.06915141700000005, + "seconds": 0.06760054199999987, "ok": true, "error": null }, "8_practitioner_next_steps": { - "seconds": 3.383300000003864e-05, + "seconds": 4.71669999999591e-05, "ok": true, "error": null } diff --git a/benchmarks/speed_review/baselines/campaign_staggered_small_python.json b/benchmarks/speed_review/baselines/campaign_staggered_small_python.json index 44e82483..7fe1a2ac 100644 --- a/benchmarks/speed_review/baselines/campaign_staggered_small_python.json +++ b/benchmarks/speed_review/baselines/campaign_staggered_small_python.json @@ -2,52 +2,52 @@ "scenario": "campaign_staggered_small", "backend": "python", "has_rust_backend": false, - "total_seconds": 0.509287875, + "total_seconds": 0.5199064999999999, "memory": { "available": true, - "start_mb": 114.72, - "peak_mb": 143.08, - "growth_mb": 28.36, + "start_mb": 114.66, + "peak_mb": 145.62, + "growth_mb": 30.97, "sampler_interval_s": 0.01 }, "phases": { "1_bacon_decomposition": { - "seconds": 0.008488708000000011, + "seconds": 0.006750833000000012, "ok": true, "error": null }, "2_cs_fit_with_covariates_bootstrap999": { - "seconds": 0.06242541699999993, + "seconds": 0.06804841700000008, "ok": true, "error": null }, "3_inspect_pretrends": { - "seconds": 3.3329999999942572e-06, + "seconds": 4.1669999999438545e-06, "ok": true, "error": null }, "4_honest_did_M_grid": { - "seconds": 0.00873587500000006, + "seconds": 0.005387375000000083, "ok": true, "error": null }, "5_sun_abraham_robustness": { - "seconds": 0.18465104099999996, + "seconds": 0.17906933400000002, "ok": true, "error": null }, "6_imputation_did_robustness": { - "seconds": 0.20897954100000016, + "seconds": 0.22210808299999996, "ok": true, "error": null }, "7_cs_without_covariates": { - "seconds": 0.03596216600000002, + "seconds": 0.038495792000000195, "ok": true, "error": null }, "8_practitioner_next_steps": { - "seconds": 3.28339999999816e-05, + "seconds": 3.6332999999943993e-05, "ok": true, "error": null } diff --git a/benchmarks/speed_review/baselines/campaign_staggered_small_rust.json b/benchmarks/speed_review/baselines/campaign_staggered_small_rust.json index bfe53aed..edeb195e 100644 --- a/benchmarks/speed_review/baselines/campaign_staggered_small_rust.json +++ b/benchmarks/speed_review/baselines/campaign_staggered_small_rust.json @@ -2,52 +2,52 @@ "scenario": "campaign_staggered_small", "backend": "rust", "has_rust_backend": true, - "total_seconds": 0.501876834, + "total_seconds": 0.5057707079999999, "memory": { "available": true, - "start_mb": 114.78, - "peak_mb": 150.67, - "growth_mb": 35.89, + "start_mb": 114.27, + "peak_mb": 148.09, + "growth_mb": 33.83, "sampler_interval_s": 0.01 }, "phases": { "1_bacon_decomposition": { - "seconds": 0.0068224170000000806, + "seconds": 0.007045167000000019, "ok": true, "error": null }, "2_cs_fit_with_covariates_bootstrap999": { - "seconds": 0.06276566699999997, + "seconds": 0.06206424999999993, "ok": true, "error": null }, "3_inspect_pretrends": { - "seconds": 2.9160000000194586e-06, + "seconds": 2.6250000000338503e-06, "ok": true, "error": null }, "4_honest_did_M_grid": { - "seconds": 0.004543957999999959, + "seconds": 0.004464875000000035, "ok": true, "error": null }, "5_sun_abraham_robustness": { - "seconds": 0.14964783299999995, + "seconds": 0.19407279099999997, "ok": true, "error": null }, "6_imputation_did_robustness": { - "seconds": 0.241357292, + "seconds": 0.2018087919999999, "ok": true, "error": null }, "7_cs_without_covariates": { - "seconds": 0.03669304200000001, + "seconds": 0.03626620899999988, "ok": true, "error": null }, "8_practitioner_next_steps": { - "seconds": 3.850000000005238e-05, + "seconds": 4.0457999999965466e-05, "ok": true, "error": null } diff --git a/benchmarks/speed_review/baselines/dose_response_python.json b/benchmarks/speed_review/baselines/dose_response_python.json index 0e576e88..40399067 100644 --- a/benchmarks/speed_review/baselines/dose_response_python.json +++ b/benchmarks/speed_review/baselines/dose_response_python.json @@ -2,42 +2,42 @@ "scenario": "dose_response", "backend": "python", "has_rust_backend": false, - "total_seconds": 0.5912168340000001, + "total_seconds": 0.5858542499999999, "memory": { "available": true, - "start_mb": 114.11, - "peak_mb": 123.11, - "growth_mb": 9.0, + "start_mb": 114.7, + "peak_mb": 122.31, + "growth_mb": 7.61, "sampler_interval_s": 0.01 }, "phases": { "1_cdid_cubic_spline_bootstrap199": { - "seconds": 0.15039274999999996, + "seconds": 0.15196441700000007, "ok": true, "error": null }, "2_extract_dose_response_dataframes": { - "seconds": 0.0007435829999999921, + "seconds": 0.0008212909999999463, "ok": true, "error": null }, "3_cdid_event_study_pretrend": { - "seconds": 0.14597749999999998, + "seconds": 0.14416820900000005, "ok": true, "error": null }, "4_binarized_did_comparison": { - "seconds": 0.0017279590000000011, + "seconds": 0.0015125420000000611, "ok": true, "error": null }, "5_spline_sensitivity_degree1": { - "seconds": 0.14600595799999994, + "seconds": 0.1431360410000001, "ok": true, "error": null }, "6_spline_sensitivity_num_knots2": { - "seconds": 0.14636520799999997, + "seconds": 0.14424499999999996, "ok": true, "error": null } diff --git a/benchmarks/speed_review/baselines/dose_response_rust.json b/benchmarks/speed_review/baselines/dose_response_rust.json index 51039f15..2c26010b 100644 --- a/benchmarks/speed_review/baselines/dose_response_rust.json +++ b/benchmarks/speed_review/baselines/dose_response_rust.json @@ -2,42 +2,42 @@ "scenario": "dose_response", "backend": "rust", "has_rust_backend": true, - "total_seconds": 0.5952834579999999, + "total_seconds": 0.6261942910000001, "memory": { "available": true, - "start_mb": 113.73, - "peak_mb": 121.34, - "growth_mb": 7.61, + "start_mb": 113.95, + "peak_mb": 123.27, + "growth_mb": 9.31, "sampler_interval_s": 0.01 }, "phases": { "1_cdid_cubic_spline_bootstrap199": { - "seconds": 0.15132816700000007, + "seconds": 0.1623119999999999, "ok": true, "error": null }, "2_extract_dose_response_dataframes": { - "seconds": 0.0007386659999999434, + "seconds": 0.0007812500000000666, "ok": true, "error": null }, "3_cdid_event_study_pretrend": { - "seconds": 0.147476167, + "seconds": 0.15469937500000008, "ok": true, "error": null }, "4_binarized_did_comparison": { - "seconds": 0.001677958000000035, + "seconds": 0.001991167000000016, "ok": true, "error": null }, "5_spline_sensitivity_degree1": { - "seconds": 0.145152917, + "seconds": 0.15138845899999998, "ok": true, "error": null }, "6_spline_sensitivity_num_knots2": { - "seconds": 0.14890500000000007, + "seconds": 0.15501741599999996, "ok": true, "error": null } diff --git a/benchmarks/speed_review/baselines/geo_few_markets_large_rust.json b/benchmarks/speed_review/baselines/geo_few_markets_large_rust.json index dce42749..637c260d 100644 --- a/benchmarks/speed_review/baselines/geo_few_markets_large_rust.json +++ b/benchmarks/speed_review/baselines/geo_few_markets_large_rust.json @@ -2,42 +2,42 @@ "scenario": "geo_few_markets_large", "backend": "rust", "has_rust_backend": true, - "total_seconds": 0.26079429200000015, + "total_seconds": 0.23366233300000006, "memory": { "available": true, - "start_mb": 117.8, - "peak_mb": 118.22, - "growth_mb": 0.42, + "start_mb": 117.77, + "peak_mb": 118.11, + "growth_mb": 0.34, "sampler_interval_s": 0.01 }, "phases": { "1_sdid_jackknife_variance": { - "seconds": 0.04102845799999999, + "seconds": 0.03807345899999992, "ok": true, "error": null }, "2_sdid_bootstrap_variance_200": { - "seconds": 0.03718729200000004, + "seconds": 0.03627791699999994, "ok": true, "error": null }, "3_in_time_placebo": { - "seconds": 0.07744412499999997, + "seconds": 0.06991887500000005, "ok": true, "error": null }, "4_get_loo_effects_df": { - "seconds": 0.0008073330000000212, + "seconds": 0.0007567080000000503, "ok": true, "error": null }, "5_sensitivity_to_zeta_omega": { - "seconds": 0.10429091600000007, + "seconds": 0.08854208299999988, "ok": true, "error": null }, "6_weight_concentration": { - "seconds": 3.220799999992252e-05, + "seconds": 8.5874999999902e-05, "ok": true, "error": null } diff --git a/benchmarks/speed_review/baselines/geo_few_markets_medium_python.json b/benchmarks/speed_review/baselines/geo_few_markets_medium_python.json index 868c0578..283552a2 100644 --- a/benchmarks/speed_review/baselines/geo_few_markets_medium_python.json +++ b/benchmarks/speed_review/baselines/geo_few_markets_medium_python.json @@ -2,42 +2,42 @@ "scenario": "geo_few_markets_medium", "backend": "python", "has_rust_backend": false, - "total_seconds": 3.9883142080000002, + "total_seconds": 3.998488124999999, "memory": { "available": true, - "start_mb": 143.86, - "peak_mb": 151.53, - "growth_mb": 7.67, + "start_mb": 140.11, + "peak_mb": 148.12, + "growth_mb": 8.02, "sampler_interval_s": 0.01 }, "phases": { "1_sdid_jackknife_variance": { - "seconds": 0.35804470799999955, + "seconds": 0.35502641700000037, "ok": true, "error": null }, "2_sdid_bootstrap_variance_200": { - "seconds": 0.36447529099999976, + "seconds": 0.36030566600000036, "ok": true, "error": null }, "3_in_time_placebo": { - "seconds": 1.5563965419999999, + "seconds": 1.5716015000000008, "ok": true, "error": null }, "4_get_loo_effects_df": { - "seconds": 0.0007229159999999624, + "seconds": 0.0007380409999999671, "ok": true, "error": null }, "5_sensitivity_to_zeta_omega": { - "seconds": 1.7086395420000002, + "seconds": 1.7107877500000006, "ok": true, "error": null }, "6_weight_concentration": { - "seconds": 2.9666999999733434e-05, + "seconds": 2.462500000000034e-05, "ok": true, "error": null } diff --git a/benchmarks/speed_review/baselines/geo_few_markets_medium_rust.json b/benchmarks/speed_review/baselines/geo_few_markets_medium_rust.json index bd4471a6..debdccf6 100644 --- a/benchmarks/speed_review/baselines/geo_few_markets_medium_rust.json +++ b/benchmarks/speed_review/baselines/geo_few_markets_medium_rust.json @@ -2,42 +2,42 @@ "scenario": "geo_few_markets_medium", "backend": "rust", "has_rust_backend": true, - "total_seconds": 0.118741875, + "total_seconds": 0.10621941700000004, "memory": { "available": true, - "start_mb": 117.23, - "peak_mb": 117.64, - "growth_mb": 0.41, + "start_mb": 117.05, + "peak_mb": 117.36, + "growth_mb": 0.31, "sampler_interval_s": 0.01 }, "phases": { "1_sdid_jackknife_variance": { - "seconds": 0.020535375000000022, + "seconds": 0.018085625000000105, "ok": true, "error": null }, "2_sdid_bootstrap_variance_200": { - "seconds": 0.023519291000000053, + "seconds": 0.020790666999999985, "ok": true, "error": null }, "3_in_time_placebo": { - "seconds": 0.02495891699999997, + "seconds": 0.025967375000000015, "ok": true, "error": null }, "4_get_loo_effects_df": { - "seconds": 0.0006400839999999297, + "seconds": 0.0006781249999999739, "ok": true, "error": null }, "5_sensitivity_to_zeta_omega": { - "seconds": 0.049061250000000056, + "seconds": 0.04067133299999992, "ok": true, "error": null }, "6_weight_concentration": { - "seconds": 2.31669999999351e-05, + "seconds": 2.2332999999985503e-05, "ok": true, "error": null } diff --git a/benchmarks/speed_review/baselines/geo_few_markets_small_python.json b/benchmarks/speed_review/baselines/geo_few_markets_small_python.json index e0bec083..ed7af335 100644 --- a/benchmarks/speed_review/baselines/geo_few_markets_small_python.json +++ b/benchmarks/speed_review/baselines/geo_few_markets_small_python.json @@ -2,42 +2,42 @@ "scenario": "geo_few_markets_small", "backend": "python", "has_rust_backend": false, - "total_seconds": 3.697791375, + "total_seconds": 3.7007011660000004, "memory": { "available": true, - "start_mb": 114.09, - "peak_mb": 124.02, - "growth_mb": 9.92, + "start_mb": 114.14, + "peak_mb": 124.05, + "growth_mb": 9.91, "sampler_interval_s": 0.01 }, "phases": { "1_sdid_jackknife_variance": { - "seconds": 0.593809709, + "seconds": 0.5908792500000001, "ok": true, "error": null }, "2_sdid_bootstrap_variance_200": { - "seconds": 0.584832209, + "seconds": 0.593548083, "ok": true, "error": null }, "3_in_time_placebo": { - "seconds": 1.194314458, + "seconds": 1.1894560410000001, "ok": true, "error": null }, "4_get_loo_effects_df": { - "seconds": 0.0009036250000002966, + "seconds": 0.001243833000000194, "ok": true, "error": null }, "5_sensitivity_to_zeta_omega": { - "seconds": 1.3238487909999996, + "seconds": 1.3254739579999995, "ok": true, "error": null }, "6_weight_concentration": { - "seconds": 7.791699999959434e-05, + "seconds": 9.341699999954045e-05, "ok": true, "error": null } diff --git a/benchmarks/speed_review/baselines/geo_few_markets_small_rust.json b/benchmarks/speed_review/baselines/geo_few_markets_small_rust.json index 855eac85..91f9888d 100644 --- a/benchmarks/speed_review/baselines/geo_few_markets_small_rust.json +++ b/benchmarks/speed_review/baselines/geo_few_markets_small_rust.json @@ -2,42 +2,42 @@ "scenario": "geo_few_markets_small", "backend": "rust", "has_rust_backend": true, - "total_seconds": 0.04129770799999999, + "total_seconds": 0.04177825000000002, "memory": { "available": true, - "start_mb": 114.56, - "peak_mb": 116.05, - "growth_mb": 1.48, + "start_mb": 114.55, + "peak_mb": 115.84, + "growth_mb": 1.3, "sampler_interval_s": 0.01 }, "phases": { "1_sdid_jackknife_variance": { - "seconds": 0.008074541000000046, + "seconds": 0.008172167000000008, "ok": true, "error": null }, "2_sdid_bootstrap_variance_200": { - "seconds": 0.012903124999999904, + "seconds": 0.013141583000000012, "ok": true, "error": null }, "3_in_time_placebo": { - "seconds": 0.008189833999999951, + "seconds": 0.00833604099999996, "ok": true, "error": null }, "4_get_loo_effects_df": { - "seconds": 0.0009220420000000118, + "seconds": 0.0008852080000000262, "ok": true, "error": null }, "5_sensitivity_to_zeta_omega": { - "seconds": 0.01117779200000002, + "seconds": 0.011213916999999962, "ok": true, "error": null }, "6_weight_concentration": { - "seconds": 2.6250000000005436e-05, + "seconds": 2.599999999997049e-05, "ok": true, "error": null } diff --git a/benchmarks/speed_review/baselines/reversible_dcdh_python.json b/benchmarks/speed_review/baselines/reversible_dcdh_python.json index 1cbed394..fff45fd6 100644 --- a/benchmarks/speed_review/baselines/reversible_dcdh_python.json +++ b/benchmarks/speed_review/baselines/reversible_dcdh_python.json @@ -2,32 +2,32 @@ "scenario": "reversible_dcdh", "backend": "python", "has_rust_backend": false, - "total_seconds": 0.718732833, + "total_seconds": 0.788816875, "memory": { "available": true, - "start_mb": 113.5, - "peak_mb": 135.02, - "growth_mb": 21.52, + "start_mb": 113.75, + "peak_mb": 133.66, + "growth_mb": 19.91, "sampler_interval_s": 0.01 }, "phases": { "1_dcdh_fit_Lmax3_survey_TSL": { - "seconds": 0.3450735829999999, + "seconds": 0.384559958, "ok": true, "error": null }, "2_inspect_placebo_and_summary": { - "seconds": 1.4160000000318362e-06, + "seconds": 1.3329999999367459e-06, "ok": true, "error": null }, "3_honest_did_on_placebo": { - "seconds": 0.004985583999999932, + "seconds": 0.003932208000000048, "ok": true, "error": null }, "4_heterogeneity_refit": { - "seconds": 0.36866958299999986, + "seconds": 0.400320667, "ok": true, "error": null } diff --git a/benchmarks/speed_review/baselines/reversible_dcdh_rust.json b/benchmarks/speed_review/baselines/reversible_dcdh_rust.json index 2af530f5..0c073cd6 100644 --- a/benchmarks/speed_review/baselines/reversible_dcdh_rust.json +++ b/benchmarks/speed_review/baselines/reversible_dcdh_rust.json @@ -2,32 +2,32 @@ "scenario": "reversible_dcdh", "backend": "rust", "has_rust_backend": true, - "total_seconds": 0.751090292, + "total_seconds": 0.7799259999999999, "memory": { "available": true, - "start_mb": 113.7, - "peak_mb": 134.89, - "growth_mb": 21.19, + "start_mb": 113.81, + "peak_mb": 134.28, + "growth_mb": 20.47, "sampler_interval_s": 0.01 }, "phases": { "1_dcdh_fit_Lmax3_survey_TSL": { - "seconds": 0.36838229199999994, + "seconds": 0.38806558299999994, "ok": true, "error": null }, "2_inspect_placebo_and_summary": { - "seconds": 1.3340000000194863e-06, + "seconds": 1.4580000000652404e-06, "ok": true, "error": null }, "3_honest_did_on_placebo": { - "seconds": 0.005142916999999914, + "seconds": 0.003724375000000002, "ok": true, "error": null }, "4_heterogeneity_refit": { - "seconds": 0.3775615830000001, + "seconds": 0.38813170900000005, "ok": true, "error": null } diff --git a/diff_diff/prep.py b/diff_diff/prep.py index 01d50653..70201144 100644 --- a/diff_diff/prep.py +++ b/diff_diff/prep.py @@ -30,6 +30,9 @@ from diff_diff.survey import ( ResolvedSurveyDesign, SurveyDesign, + _compute_if_variance_fast, + _precompute_psu_scaffolding, + _PsuScaffolding, compute_replicate_if_variance, compute_survey_if_variance, ) @@ -1318,6 +1321,7 @@ def _cell_mean_variance( full_resolved: ResolvedSurveyDesign, cell_mask: np.ndarray, min_n: int, + scaffolding: Optional[_PsuScaffolding] = None, ) -> Tuple[float, float, int, bool]: """Compute design-based mean and variance of the weighted mean for one cell. @@ -1396,9 +1400,14 @@ def _cell_mean_variance( valid_positions = cell_indices[valid] psi[valid_positions] = w_valid[valid] * (y_clean[valid] - y_bar) / sum_w - # Route to TSL or replicate variance using the full design + # Route to TSL or replicate variance using the full design. When a + # design-level scaffolding is provided (aggregate_survey's fast path), + # use it to skip the per-call pandas groupby / np.unique setup that + # otherwise dominates runtime at BRFSS scale. if full_resolved.uses_replicate_variance: variance, _ = compute_replicate_if_variance(psi, full_resolved) + elif scaffolding is not None: + variance = _compute_if_variance_fast(psi, scaffolding) else: variance = compute_survey_if_variance(psi, full_resolved) @@ -1580,6 +1589,17 @@ def aggregate_survey( ) full_resolved = effective_design.resolve(data) + # Precompute stratum/PSU scaffolding once per design. Amortizes + # per-cell pandas groupby + np.unique + stratum FPC lookup that + # otherwise dominate runtime at scale (see _compute_if_variance_fast). + # Replicate-weight designs use a different variance surface and stay + # on the legacy path. + _tsl_scaffolding: Optional[_PsuScaffolding] = ( + _precompute_psu_scaffolding(full_resolved) + if not full_resolved.uses_replicate_variance + else None + ) + # --- Precompute full-length outcome/covariate arrays --- n_total = len(data) all_vars = outcome_cols + cov_cols @@ -1635,6 +1655,7 @@ def aggregate_survey( full_resolved, cell_mask, min_n, + scaffolding=_tsl_scaffolding, ) se = float(np.sqrt(variance)) if not np.isnan(variance) else np.nan diff --git a/diff_diff/survey.py b/diff_diff/survey.py index 2ee8334e..3d951fb6 100644 --- a/diff_diff/survey.py +++ b/diff_diff/survey.py @@ -1304,6 +1304,326 @@ def _compute_stratified_psu_meat( return meat, _variance_computed, legitimate_zero_count +@dataclass(frozen=True) +class _PsuScaffolding: + """Precomputed stratum/PSU layout for amortized TSL variance. + + Internal helper used by :func:`diff_diff.prep.aggregate_survey` to reuse + design-dependent scaffolding across hundreds of per-cell variance calls. + Holds integer codes, per-stratum counts, FPC ratios, and static + variance-computability flags that depend only on the + :class:`ResolvedSurveyDesign` (not on the psi / outcome being collapsed). + + See :func:`_compute_if_variance_fast` for the fast variance path that + consumes this scaffolding. Numerically equivalent to + :func:`compute_survey_if_variance` up to sub-ULP reduction-order drift. + """ + + mode: str # "no_strata_no_psu" | "psu_only" | "stratified" + n: int + lonely_psu: str + variance_computable: bool + legitimate_zero_count: int + # stratified-mode fields (None in other modes): + psu_codes: Optional[np.ndarray] = None # (n,) int, global PSU id 0..P-1 + psu_stratum: Optional[np.ndarray] = None # (P,) int, stratum of each PSU + n_psu_per_stratum: Optional[np.ndarray] = None # (S,) int + singleton_strata: Optional[np.ndarray] = None # (S,) bool + adjustment_h: Optional[np.ndarray] = None # (S,) float, (1-f_h)*n_h/(n_h-1); 0 for singletons + # psu_only-mode fields (None in other modes): + psu_codes_only: Optional[np.ndarray] = None # (n,) int, PSU id 0..P-1 + n_psu_only: Optional[int] = None + adjustment_only: Optional[float] = None # (1-f)*n_psu/(n_psu-1) or 0 + # no_strata_no_psu-mode fields (None in other modes): + adjustment_direct: Optional[float] = None # (1-f)*n/(n-1) or 0 + + +def _precompute_psu_scaffolding(resolved: "ResolvedSurveyDesign") -> _PsuScaffolding: + """Precompute per-design PSU/stratum scaffolding for fast per-cell variance. + + Equivalent in effect to the per-call scaffolding work inside + :func:`_compute_stratified_psu_meat`, but done once per design instead of + once per output cell. For the typical BRFSS-scale + :func:`~diff_diff.prep.aggregate_survey` workload (~500 cells, ~20 strata), + this amortizes the pandas-groupby + ``np.unique`` setup that otherwise + dominates the chain runtime. + + Parameters + ---------- + resolved : ResolvedSurveyDesign + Resolved survey design. Must NOT use replicate variance + (``resolved.uses_replicate_variance`` False). + + Returns + ------- + _PsuScaffolding + Frozen dataclass with mode-appropriate precomputed fields. + + Raises + ------ + ValueError + Same FPC-vs-n guards as :func:`_compute_stratified_psu_meat` + (FPC must be >= effective PSU count in each stratum). + """ + weights = resolved.weights + n = int(len(weights)) + strata = resolved.strata + psu = resolved.psu + fpc = resolved.fpc + lonely_psu = resolved.lonely_psu + + if strata is None and psu is None: + # Implicit per-observation PSUs + f = 0.0 + lz_count = 0 + if fpc is not None: + N = fpc[0] + if N < n: + raise ValueError( + f"FPC ({N}) is less than the number of observations " + f"({n}). FPC must be >= n_obs for implicit per-observation PSUs." + ) + f = n / N + if f >= 1.0: + lz_count = 1 + var_computable = n >= 2 + adjustment = (1.0 - f) * (n / (n - 1)) if n >= 2 else 0.0 + return _PsuScaffolding( + mode="no_strata_no_psu", + n=n, + lonely_psu=lonely_psu, + variance_computable=var_computable, + legitimate_zero_count=lz_count, + adjustment_direct=float(adjustment), + ) + + if strata is None and psu is not None: + # Single-stratum cluster-robust + psu_arr = np.asarray(psu) + codes, uniques = pd.factorize(psu_arr) + n_psu = int(len(uniques)) + f = 0.0 + lz_count = 0 + if n_psu >= 2: + if fpc is not None: + N = fpc[0] + if N < n_psu: + raise ValueError( + f"FPC ({N}) is less than the number of effective PSUs " + f"({n_psu}). FPC must be >= n_PSU." + ) + f = n_psu / N + if f >= 1.0: + lz_count = 1 + adjustment = (1.0 - f) * (n_psu / (n_psu - 1)) + var_computable = True + else: + adjustment = 0.0 + var_computable = False + return _PsuScaffolding( + mode="psu_only", + n=n, + lonely_psu=lonely_psu, + variance_computable=var_computable, + legitimate_zero_count=lz_count, + psu_codes_only=codes.astype(np.int64), + n_psu_only=n_psu, + adjustment_only=float(adjustment), + ) + + # Stratified branch (with or without PSU) + strata_arr = np.asarray(strata) + strata_codes, strata_uniques = pd.factorize(strata_arr, sort=True) + strata_codes = strata_codes.astype(np.int64) + S = int(len(strata_uniques)) + + if psu is not None: + # Global PSU codes unique across (stratum, psu) pairs — matches the + # legacy per-stratum pandas groupby which never aggregated PSU labels + # across strata. + psu_arr = np.asarray(psu) + psu_local_codes, _ = pd.factorize(psu_arr) + psu_local_codes = psu_local_codes.astype(np.int64) + psu_local_max = int(psu_local_codes.max()) if len(psu_local_codes) > 0 else 0 + compound = strata_codes * (psu_local_max + 1) + psu_local_codes + psu_codes, _ = pd.factorize(compound) + psu_codes = psu_codes.astype(np.int64) + P = int(psu_codes.max() + 1) if len(psu_codes) > 0 else 0 + psu_stratum = np.zeros(P, dtype=np.int64) + # Safe scatter: by construction, all observations sharing a global + # PSU code share a stratum, so repeated writes to the same position + # store the same value. + if P > 0: + psu_stratum[psu_codes] = strata_codes + else: + # Each observation is its own PSU within its stratum (legacy + # behavior when strata is not None and psu is None). + psu_codes = np.arange(n, dtype=np.int64) + P = n + psu_stratum = strata_codes.copy() + + n_psu_per_stratum = np.bincount(psu_stratum, minlength=S).astype(np.int64) + singleton_strata = n_psu_per_stratum == 1 + + # Per-stratum FPC ratio (stratum-level attribute; read from the first + # observation of each stratum, matching legacy ``resolved.fpc[mask_h][0]``). + f_h = np.zeros(S, dtype=np.float64) + if fpc is not None: + fpc_arr = np.asarray(fpc) + # Vectorized "first-in-stratum" FPC lookup: + # pd.factorize with sort=True iterates the array in input order, so + # the first observation encountered for each stratum_code is the + # reference row. + first_idx = np.full(S, -1, dtype=np.int64) + seen = np.zeros(S, dtype=bool) + for i in range(n): + h = strata_codes[i] + if not seen[h]: + seen[h] = True + first_idx[h] = i + if seen.all(): + break + for h in range(S): + if first_idx[h] < 0: + continue + N_h = fpc_arr[first_idx[h]] + n_h = n_psu_per_stratum[h] + if n_h > 0 and N_h < n_h: + raise ValueError( + f"FPC ({N_h}) is less than the number of effective PSUs " + f"({n_h}) in stratum. FPC must be >= n_PSU." + ) + if n_h > 0: + f_h[h] = n_h / N_h + + with np.errstate(divide="ignore", invalid="ignore"): + adjustment_h = np.where( + n_psu_per_stratum >= 2, + (1.0 - f_h) * n_psu_per_stratum / np.maximum(n_psu_per_stratum - 1, 1), + 0.0, + ) + + # Static legitimate_zero_count (design-dependent only): + # - Non-singleton strata with f_h >= 1.0 contribute (legacy counter). + # - Singleton strata under lonely_psu == "certainty" contribute. + fpc_saturated = (n_psu_per_stratum >= 2) & (f_h >= 1.0) + legitimate_zero_count = int(fpc_saturated.sum()) + if lonely_psu == "certainty": + legitimate_zero_count += int(singleton_strata.sum()) + + # Static variance_computable flag: + # - Any non-singleton stratum (regardless of FPC) → variance_computed=True + # path is exercised. + # - Under "adjust", any singleton stratum also counts (adds V_h even if 0). + has_non_singleton = bool(np.any(~singleton_strata)) + has_singleton = bool(np.any(singleton_strata)) + variance_computable = has_non_singleton or ( + lonely_psu == "adjust" and has_singleton + ) + + return _PsuScaffolding( + mode="stratified", + n=n, + lonely_psu=lonely_psu, + variance_computable=variance_computable, + legitimate_zero_count=legitimate_zero_count, + psu_codes=psu_codes, + psu_stratum=psu_stratum, + n_psu_per_stratum=n_psu_per_stratum, + singleton_strata=singleton_strata, + adjustment_h=adjustment_h, + ) + + +def _compute_if_variance_fast( + psi: np.ndarray, + scaffolding: _PsuScaffolding, +) -> float: + """Fast TSL variance for aggregate_survey using precomputed scaffolding. + + Numerically equivalent to :func:`compute_survey_if_variance` for any + TSL (non-replicate) design, up to sub-ULP reduction-order drift. The + speedup comes from replacing per-cell pandas groupbys and per-stratum + Python loops with two ``np.bincount`` passes plus a fully vectorized + per-stratum reduction. + + Parameters + ---------- + psi : np.ndarray + Per-unit influence function values, shape (n,). + scaffolding : _PsuScaffolding + Precomputed via :func:`_precompute_psu_scaffolding` for the same + resolved design. + + Returns + ------- + float + Design-based variance. Returns ``np.nan`` when variance is + unidentified (matches legacy behavior). + """ + psi = np.asarray(psi, dtype=np.float64).ravel() + + def _finalize(meat_scalar: float) -> float: + if meat_scalar == 0.0: + if scaffolding.variance_computable or scaffolding.legitimate_zero_count > 0: + return 0.0 + return float("nan") + return meat_scalar + + if scaffolding.mode == "no_strata_no_psu": + if scaffolding.n < 2: + return float("nan") + psi_mean = psi.mean() + centered = psi - psi_mean + meat = scaffolding.adjustment_direct * float(centered @ centered) + return _finalize(meat) + + if scaffolding.mode == "psu_only": + if scaffolding.n_psu_only < 2: + if scaffolding.legitimate_zero_count > 0: + return 0.0 + return float("nan") + psu_sums = np.bincount( + scaffolding.psu_codes_only, weights=psi, minlength=scaffolding.n_psu_only + ) + psu_mean = psu_sums.mean() + centered = psu_sums - psu_mean + meat = scaffolding.adjustment_only * float(centered @ centered) + return _finalize(meat) + + # Stratified + S = len(scaffolding.n_psu_per_stratum) + P = len(scaffolding.psu_stratum) + + psu_sums = np.bincount(scaffolding.psu_codes, weights=psi, minlength=P) + sum_by_h = np.bincount(scaffolding.psu_stratum, weights=psu_sums, minlength=S) + sum2_by_h = np.bincount( + scaffolding.psu_stratum, weights=psu_sums * psu_sums, minlength=S + ) + + with np.errstate(divide="ignore", invalid="ignore"): + centered_ss = np.where( + scaffolding.n_psu_per_stratum >= 2, + sum2_by_h - (sum_by_h * sum_by_h) / np.maximum(scaffolding.n_psu_per_stratum, 1), + 0.0, + ) + meat_per_stratum = scaffolding.adjustment_h * centered_ss + + if np.any(scaffolding.singleton_strata) and scaffolding.lonely_psu == "adjust": + # Singleton strata under "adjust": V_h = (psu_sum - global_mean)^2. + # For a singleton stratum, the one PSU's sum equals sum_by_h[h]. + # No FPC, no (n-1) adjustment — matches legacy (survey.py:1276-1281). + if P > 0: + global_mean = psu_sums.mean() + singleton_meat = (sum_by_h - global_mean) ** 2 + meat_per_stratum = np.where( + scaffolding.singleton_strata, singleton_meat, meat_per_stratum + ) + + meat = float(meat_per_stratum.sum()) + return _finalize(meat) + + def _compute_stratified_meat_from_psu_scores( psu_scores: np.ndarray, psu_strata: np.ndarray, diff --git a/docs/performance-plan.md b/docs/performance-plan.md index 58f0f017..438a8b56 100644 --- a/docs/performance-plan.md +++ b/docs/performance-plan.md @@ -41,32 +41,36 @@ scale. Data-shape details are in `docs/performance-scenarios.md`. | Scenario | Scale | Python (s) | Rust (s) | Py/Rust | |---|---|---:|---:|---:| -| 1. Staggered campaign | small | 0.51 | 0.50 | 1.0x | -| | medium | 0.75 | 0.76 | 1.0x | -| | large | 1.33 | 1.38 | 1.0x | -| 2. Brand awareness survey | small | 0.19 | 0.20 | 1.0x | -| | medium | 0.56 | 0.55 | 1.0x | -| | large | 1.09 | 1.00 | 1.1x | -| 3. BRFSS microdata -> CS panel | small | 1.61 | 1.66 | 1.0x | -| | medium | 6.10 | 6.23 | 1.0x | -| | large | 24.41 | 24.94 | 1.0x | -| 4. SDiD few markets | small | 3.70 | 0.04 | 89.5x | -| | medium | 3.99 | 0.12 | 33.6x | -| | large | skip | 0.26 | - | -| 5. Reversible dCDH | single | 0.72 | 0.75 | 1.0x | -| 6. Pricing dose-response | single | 0.59 | 0.60 | 1.0x | +| 1. Staggered campaign | small | 0.52 | 0.51 | 1.0x | +| | medium | 0.81 | 0.81 | 1.0x | +| | large | 1.32 | 1.31 | 1.0x | +| 2. Brand awareness survey | small | 0.23 | 0.20 | 1.1x | +| | medium | 0.53 | 0.50 | 1.1x | +| | large | 0.87 | 0.93 | 0.9x | +| 3. BRFSS microdata -> CS panel | small | 0.21 | 0.17 | 1.3x | +| | medium | 0.49 | 0.47 | 1.0x | +| | large | 1.33 | 1.32 | 1.0x | +| 4. SDiD few markets | small | 3.70 | 0.04 | 88.6x | +| | medium | 4.00 | 0.11 | 37.6x | +| | large | skip | 0.23 | - | +| 5. Reversible dCDH | single | 0.79 | 0.78 | 1.0x | +| 6. Pricing dose-response | single | 0.59 | 0.63 | 0.9x | ### Scaling findings **Three findings are load-bearing for the optimization priority list:** -1. **BRFSS `aggregate_survey` is the dominant practitioner pain point at - realistic pooled-multi-year scale.** Scales near-linearly with microdata - row count. At 1M rows (roughly what a 10-year pooled BRFSS analysis - looks like) the full chain takes ~24 seconds and essentially all of it - is inside `_compute_stratified_psu_meat`. Rust does not touch it - (`aggregate_survey` is entirely Python). +1. **BRFSS `aggregate_survey` is now practitioner-fast at every measured + scale.** Prior to the precompute-scaffolding fix (see "Optimization + landed" below), the full chain at 1M rows took ~24 seconds and was + essentially all inside `_compute_stratified_psu_meat`. After the fix, + the chain is sub-2s at every measured scale; `aggregate_survey` + continues to dominate its own (now-cheap) chain share, but in + absolute time the entire workflow is well under a practitioner- + perceptible threshold at realistic pooled-multi-year BRFSS volume. + The path is entirely Python, so Python and Rust backends track each + other within noise. 2. **Staggered CS chain stays cheap across scales.** A 10x unit increase (150 -> 1,500) is a small-single-digit multiplier on total time. ImputationDiD and SunAbraham together consistently account for @@ -96,18 +100,18 @@ scale. Data-shape details are in `docs/performance-scenarios.md`. | Scenario | Scale | Backend | Top phase (%) | 2nd phase (%) | 3rd phase (%) | |---|---|---|---|---|---| -| 1. Staggered campaign | large | python | `6_imputation_did_robustness` (49%) | `5_sun_abraham_robustness` (28%) | `2_cs_fit_with_covariates_bootstrap999` (13%) | -| 1. Staggered campaign | large | rust | `6_imputation_did_robustness` (40%) | `5_sun_abraham_robustness` (37%) | `2_cs_fit_with_covariates_bootstrap999` (13%) | -| 2. Brand awareness survey | large | python | `3_replicate_weights_jk1` (57%) | `4_multi_outcome_loop_3_metrics` (22%) | `7_event_study_plus_honest_did` (14%) | -| 2. Brand awareness survey | large | rust | `3_replicate_weights_jk1` (54%) | `4_multi_outcome_loop_3_metrics` (22%) | `7_event_study_plus_honest_did` (14%) | -| 3. BRFSS microdata -> CS panel | large | python | `1_aggregate_survey_microdata_to_panel` (100%) | `5_sun_abraham_robustness` (0%) | `2_cs_fit_with_stage2_survey_design` (0%) | -| 3. BRFSS microdata -> CS panel | large | rust | `1_aggregate_survey_microdata_to_panel` (100%) | `5_sun_abraham_robustness` (0%) | `2_cs_fit_with_stage2_survey_design` (0%) | +| 1. Staggered campaign | large | python | `6_imputation_did_robustness` (54%) | `5_sun_abraham_robustness` (21%) | `2_cs_fit_with_covariates_bootstrap999` (13%) | +| 1. Staggered campaign | large | rust | `6_imputation_did_robustness` (41%) | `5_sun_abraham_robustness` (36%) | `2_cs_fit_with_covariates_bootstrap999` (12%) | +| 2. Brand awareness survey | large | python | `3_replicate_weights_jk1` (46%) | `4_multi_outcome_loop_3_metrics` (26%) | `7_event_study_plus_honest_did` (17%) | +| 2. Brand awareness survey | large | rust | `3_replicate_weights_jk1` (50%) | `4_multi_outcome_loop_3_metrics` (25%) | `7_event_study_plus_honest_did` (15%) | +| 3. BRFSS microdata -> CS panel | large | python | `1_aggregate_survey_microdata_to_panel` (91%) | `5_sun_abraham_robustness` (8%) | `2_cs_fit_with_stage2_survey_design` (1%) | +| 3. BRFSS microdata -> CS panel | large | rust | `1_aggregate_survey_microdata_to_panel` (95%) | `5_sun_abraham_robustness` (4%) | `2_cs_fit_with_stage2_survey_design` (1%) | | 4. SDiD few markets | medium | python | `5_sensitivity_to_zeta_omega` (43%) | `3_in_time_placebo` (39%) | `2_sdid_bootstrap_variance_200` (9%) | -| 4. SDiD few markets | large | rust | `5_sensitivity_to_zeta_omega` (40%) | `3_in_time_placebo` (30%) | `1_sdid_jackknife_variance` (16%) | -| 5. Reversible dCDH | single | python | `4_heterogeneity_refit` (51%) | `1_dcdh_fit_Lmax3_survey_TSL` (48%) | `3_honest_did_on_placebo` (1%) | -| 5. Reversible dCDH | single | rust | `4_heterogeneity_refit` (50%) | `1_dcdh_fit_Lmax3_survey_TSL` (49%) | `3_honest_did_on_placebo` (1%) | -| 6. Pricing dose-response | single | python | `1_cdid_cubic_spline_bootstrap199` (25%) | `6_spline_sensitivity_num_knots2` (25%) | `5_spline_sensitivity_degree1` (25%) | -| 6. Pricing dose-response | single | rust | `1_cdid_cubic_spline_bootstrap199` (25%) | `6_spline_sensitivity_num_knots2` (25%) | `3_cdid_event_study_pretrend` (25%) | +| 4. SDiD few markets | large | rust | `5_sensitivity_to_zeta_omega` (38%) | `3_in_time_placebo` (30%) | `1_sdid_jackknife_variance` (16%) | +| 5. Reversible dCDH | single | python | `4_heterogeneity_refit` (51%) | `1_dcdh_fit_Lmax3_survey_TSL` (49%) | `3_honest_did_on_placebo` (0%) | +| 5. Reversible dCDH | single | rust | `4_heterogeneity_refit` (50%) | `1_dcdh_fit_Lmax3_survey_TSL` (50%) | `3_honest_did_on_placebo` (0%) | +| 6. Pricing dose-response | single | python | `1_cdid_cubic_spline_bootstrap199` (26%) | `6_spline_sensitivity_num_knots2` (25%) | `3_cdid_event_study_pretrend` (25%) | +| 6. Pricing dose-response | single | rust | `1_cdid_cubic_spline_bootstrap199` (26%) | `6_spline_sensitivity_num_knots2` (25%) | `3_cdid_event_study_pretrend` (25%) | Per-scenario phase narrative (cross-check against the table above after @@ -129,9 +133,11 @@ any rerun): see scale-sweep table); the JK1 replicate-fit loop is not Rust-accelerated, so the backends neither help nor hurt each other meaningfully on this chain. -- **BRFSS.** `aggregate_survey` share of total grows with scale and is - effectively 100% of runtime at 1M rows. Downstream phases (CS fit, - SunAbraham, HonestDiD) are a fraction of a second combined. +- **BRFSS.** `aggregate_survey` remains the single largest chain share + under both backends at every scale, but the absolute chain total is + sub-2s at 1M rows after the precompute-scaffolding fix. Downstream + phases (CS fit, SunAbraham, HonestDiD) are a fraction of a second + combined - see the scale-sweep table for the current totals. - **SDiD few markets.** `sensitivity_to_zeta_omega` and `in_time_placebo` are the two largest phases under Python at every scale and under Rust at medium/large (together ~70% of the chain). @@ -156,7 +162,7 @@ any rerun): | # | Location | Scenario + scale | Signal | Recommended action | |---|---|---|---|---| -| 1 | `diff_diff/survey.py:1160` `_compute_stratified_psu_meat` | BRFSS @ 1M rows | dominates BRFSS chain at all scales, ~100% at 1M rows | **Algorithmic fix, highest priority.** Function called once per (state, year) cell (500 calls); per-call work rebuilds stratum-PSU scaffolding every time. Precompute stratum indexes once at `aggregate_survey` top-level and reuse. | +| 1 | `diff_diff/survey.py` `_compute_stratified_psu_meat` + `aggregate_survey` | BRFSS @ 1M rows | previously dominated BRFSS chain at all scales (~100% at 1M rows) | **LANDED** (this PR). Precompute stratum-PSU scaffolding once per design at `aggregate_survey` top level; replace per-cell pandas groupby with two vectorized `np.bincount` passes. BRFSS-large chain drops from ~24s to sub-2s across both backends. See "Optimization landed" below. | | 2 | `diff_diff/imputation.py` ImputationDiD fit (+ `diff_diff/sun_abraham.py` SunAbraham fit) | Staggered CS @ 1,500 units | together consistently ~70-80% of the chain at every scale; either can be the top phase at a given (scale, backend) cell | **Investigate only after BRFSS fix lands.** Total chain is well under practitioner-perceptible threshold; candidate follow-up. Either phase is a legitimate target. | | 3 | `diff_diff/utils.py:1434` `_sc_weight_fw_numpy` | SDiD python @ any scale | dominates Python SDiD at all scales | **Already ported to Rust.** Python fallback acceptable as a teaching/safety path; non-production for n > 100. Python skipped at n=500 (jackknife cost would exceed 4 minutes per run). | | 4 | `diff_diff/chaisemartin_dhaultfoeuille.py` dCDH fit + heterogeneity | Reversible (single scale) | main fit and survey-aware heterogeneity refit each rebuild TSL scaffolding; heterogeneity phase is as expensive as the main fit | **Cache/precompute** - heterogeneity refit duplicates the main fit's TSL setup under the same `SurveyDesign`. Not P0; newer code path (v3.1) never optimization-reviewed. | @@ -174,20 +180,20 @@ in `benchmarks/speed_review/baselines/mem_profile_brfss_large_.txt`. | Scenario | Scale | Py peak RSS (MB) | Py growth (MB) | Rust peak RSS (MB) | Rust growth (MB) | |---|---|---:|---:|---:|---:| -| 1. Staggered campaign | small | 143 | 28 | 151 | 36 | -| | medium | 227 | 79 | 254 | 99 | -| | large | 472 | 245 | 588 | 322 | -| 2. Brand awareness survey | small | 127 | 12 | 128 | 13 | -| | medium | 188 | 54 | 185 | 50 | -| | large | 327 | 139 | 336 | 142 | -| 3. BRFSS microdata -> CS panel | small | 133 | 11 | 136 | 15 | -| | medium | 210 | 17 | 212 | 15 | -| | large | 418 | 17 | 429 | 33 | +| 1. Staggered campaign | small | 146 | 31 | 148 | 34 | +| | medium | 235 | 85 | 253 | 100 | +| | large | 486 | 251 | 582 | 327 | +| 2. Brand awareness survey | small | 130 | 15 | 128 | 13 | +| | medium | 183 | 45 | 189 | 55 | +| | large | 340 | 139 | 348 | 158 | +| 3. BRFSS microdata -> CS panel | small | 133 | 11 | 130 | 8 | +| | medium | 203 | 17 | 200 | 21 | +| | large | 413 | 25 | 409 | 25 | | 4. SDiD few markets | small | 124 | 10 | 116 | 1 | -| | medium | 152 | 8 | 118 | 0 | +| | medium | 148 | 8 | 117 | 0 | | | large | skip | skip | 118 | 0 | -| 5. Reversible dCDH | single | 135 | 22 | 135 | 21 | -| 6. Pricing dose-response | single | 123 | 9 | 121 | 8 | +| 5. Reversible dCDH | single | 134 | 20 | 134 | 20 | +| 6. Pricing dose-response | single | 122 | 8 | 123 | 9 | The ~115-130 MB floor is the Python + diff-diff + numpy import footprint; @@ -195,16 +201,15 @@ the "growth" columns are the practitioner-meaningful numbers. ### Memory findings -1. **BRFSS `aggregate_survey` is compute-bound, not memory-bound.** At - 20x data growth (50K -> 1M rows), working-memory growth stays in the - low tens of MB. The tracemalloc pass confirms: net retained allocation - after `aggregate_survey` returns is well under 1 MB; the top - allocation site is `tracemalloc`'s own linecache overhead (a smoking - gun that nothing else is allocating meaningfully). **The BRFSS cost - is pure CPU; the function is already memory-efficient.** This - strengthens the case for the precompute-scaffolding fix: low-risk, - pure CPU win, fits in any deployment environment including 512 MB - Lambda. +1. **BRFSS `aggregate_survey` was compute-bound, not memory-bound - and + the compute side is now addressed.** Working-memory growth stayed in + the low tens of MB across the 20x data-growth sweep (50K -> 1M rows); + the pre-fix tracemalloc pass confirmed net retained allocation under + 1 MB and identified `tracemalloc`'s own linecache overhead as the + top allocation site (smoking gun that nothing else was allocating + meaningfully). The precompute-scaffolding fix in this PR is a pure + CPU win - no change to the function's memory profile, which was + already Lambda-friendly. 2. **Staggered CS chain is memory-heavier than wall-clock suggested.** At 1,500 units the chain's peak RSS sits in the high-400s to high-500s MB depending on backend. Fine for workstations, tight for 512 MB @@ -229,16 +234,32 @@ the "growth" columns are the practitioner-meaningful numbers. | # | Opportunity | Time upside | Memory upside | Risk | Priority | |---|---|---|---|---|---| -| 1 | `aggregate_survey` precompute stratum scaffolding | ~-20s at 1M rows | none (already memory-efficient) | Low | **High** | +| 1 | `aggregate_survey` precompute stratum scaffolding | ~-20s at 1M rows | none (already memory-efficient) | Low | **LANDED** (this PR) | | 2 | Staggered CS chain working-memory audit (Lambda-oriented) | none | ~200-300 MB at 1,500 units (peak RSS crosses 512 MB Lambda line under Rust) | Medium | Low (bump to Medium if Lambda deployment becomes a concrete ask) | | 3 | dCDH: cache TSL scaffolding across main fit + heterogeneity refit | ~0.2s per chain | ~20 MB per chain | Low | Low | | 4 | ImputationDiD fit-loop vectorization audit | ~0.1-0.3s at 1,500 units | unknown | Low | Low | | 5 | Rust-port JK1 replicate fit loop | ~0.5s at 160 replicates | ~140 MB at 160 replicates | Medium | Low (demoted: Rust is no longer slower than Python on this path after rerun, so the "fix-a-Rust-regression" leg of the original rationale is gone) | -**Bottom line: one clear priority, four optional.** #1 is the single -practitioner-perceptible win identified by this analysis and should be -the next PR. #2-5 are optional polish that should be prioritized by -concrete deployment-environment signal (Lambda OOMs, practitioner +### Optimization landed + +**#1 shipped in this PR.** `diff_diff/survey.py` now precomputes a +per-design `_PsuScaffolding` (strata codes, global PSU codes, per- +stratum counts and FPC ratios, singleton mask, lonely-PSU-aware +variance-computable flag). `aggregate_survey` builds it once per call +and threads it through `_cell_mean_variance` so each per-cell variance +reduction uses two vectorized `np.bincount` passes instead of a +per-stratum pandas groupby loop. Numerics are preserved to sub-ULP +tolerance; equivalence tests across seven design cases +(`TestAggregateSurveyScaffolding`) enforce `assert_allclose(atol=1e-14, +rtol=1e-14)` between fast and legacy paths. + +Replicate-weight designs (JK1 etc.) continue to use the legacy +`compute_replicate_if_variance` code path and are unaffected. + +**Bottom line: no practitioner-perceptible bottleneck remains in the +six measured workflows; four optional items stand by.** Items #2-5 +above should be prioritized by concrete deployment-environment signal +(Lambda OOMs, practitioner reports of slowness at specific shapes), not proactively. ### Correctness-adjacent observations (not P0, route separately) diff --git a/tests/test_prep.py b/tests/test_prep.py index 3c96626b..a9818e4a 100644 --- a/tests/test_prep.py +++ b/tests/test_prep.py @@ -3440,3 +3440,217 @@ def test_pweight_retains_zero_precision_geo(self): ) assert 0 not in panel_a["state"].values assert len(panel_a) == 6 # 3 states x 2 periods + + +class TestAggregateSurveyScaffolding: + """Tests for the amortized TSL variance fast path in aggregate_survey. + + Equivalence tests verify that ``_compute_if_variance_fast`` produces + numerically identical ``_mean`` / ``_se`` / ``_precision`` outputs + (assert_allclose atol=1e-14 rtol=1e-14) relative to the legacy + ``compute_survey_if_variance`` path across every supported design + mode and ``lonely_psu`` policy. Reduction-order drift is expected + to be sub-ULP because the formulas are identical and only the + order of summation changes (single np.bincount vs per-stratum + pandas groupby). + """ + + def _build_microdata(self, mode, seed=42): + """Per-case microdata plus a SurveyDesign that exercises that mode.""" + rng = np.random.default_rng(seed) + n_per_cell = 80 + state = np.repeat(["A", "B", "C"], 2 * n_per_cell) + year = np.tile(np.repeat([2019, 2020], n_per_cell), 3) + n = len(state) + wt = rng.uniform(0.5, 2.5, n) + y = rng.normal(5.0, 1.5, n) + df_base = pd.DataFrame( + {"state": state, "year": year, "wt": wt, "y": y} + ) + + if mode == "stratified_fpc": + df = df_base.copy() + df["stratum"] = rng.integers(0, 4, n) + df["psu"] = df["stratum"] * 10 + rng.integers(0, 4, n) + df["fpc"] = 200.0 # comfortably above per-stratum n_psu + sd = SurveyDesign(weights="wt", strata="stratum", psu="psu", fpc="fpc") + return df, sd + + if mode == "stratified_no_fpc": + df = df_base.copy() + df["stratum"] = rng.integers(0, 4, n) + df["psu"] = df["stratum"] * 10 + rng.integers(0, 4, n) + sd = SurveyDesign(weights="wt", strata="stratum", psu="psu") + return df, sd + + if mode == "stratified_no_psu": + # strata present, psu absent — each observation is its own + # PSU within its stratum. This is a distinct scaffolding + # branch (survey.py:_precompute_psu_scaffolding, else clause + # of the `if psu is not None` block). + df = df_base.copy() + df["stratum"] = rng.integers(0, 4, n) + sd = SurveyDesign(weights="wt", strata="stratum") + return df, sd + + if mode == "stratified_no_psu_fpc": + # Same branch as above plus stratum-level FPC lookup. + df = df_base.copy() + df["stratum"] = rng.integers(0, 4, n) + df["fpc"] = 1000.0 # well above per-stratum obs count + sd = SurveyDesign(weights="wt", strata="stratum", fpc="fpc") + return df, sd + + if mode == "psu_only": + df = df_base.copy() + df["psu"] = rng.integers(0, 12, n) + sd = SurveyDesign(weights="wt", psu="psu") + return df, sd + + if mode == "weights_only": + return df_base.copy(), SurveyDesign(weights="wt") + + if mode.startswith("lonely_"): + # Singleton stratum: stratum 0 has exactly one PSU; strata 1..3 + # each have 4 PSUs. Forces every lonely_psu branch to engage. + df = df_base.copy() + strata = rng.integers(1, 4, n) + psu = strata * 10 + rng.integers(0, 4, n) + sentinel = rng.choice(n, size=n // 8, replace=False) + strata[sentinel] = 0 + psu[sentinel] = 999 + df["stratum"] = strata + df["psu"] = psu + policy = mode.split("_", 1)[1] + sd = SurveyDesign( + weights="wt", strata="stratum", psu="psu", lonely_psu=policy, + ) + return df, sd + + raise ValueError(f"Unknown mode: {mode}") + + @staticmethod + def _assert_panels_equivalent(p_fast, p_legacy, outcome="y"): + assert len(p_fast) == len(p_legacy) + assert list(p_fast.columns) == list(p_legacy.columns) + for suffix in ("_mean", "_se", "_precision"): + col = f"{outcome}{suffix}" + a = p_fast[col].to_numpy(dtype=np.float64) + b = p_legacy[col].to_numpy(dtype=np.float64) + nan_a, nan_b = np.isnan(a), np.isnan(b) + assert np.array_equal(nan_a, nan_b), f"NaN pattern mismatch in {col}" + np.testing.assert_allclose( + a[~nan_a], b[~nan_b], + atol=1e-14, rtol=1e-14, + err_msg=f"{col} diverges between fast and legacy paths", + ) + + @pytest.mark.parametrize( + "mode", + [ + "stratified_fpc", + "stratified_no_fpc", + "stratified_no_psu", + "stratified_no_psu_fpc", + "psu_only", + "weights_only", + "lonely_remove", + "lonely_certainty", + "lonely_adjust", + ], + ) + def test_fast_path_equals_legacy(self, mode, monkeypatch): + """Fast and legacy paths produce numerically identical panels.""" + from diff_diff import prep + + data, sd = self._build_microdata(mode) + panel_fast, _ = aggregate_survey( + data, by=["state", "year"], outcomes="y", survey_design=sd, + ) + # Force the legacy code path by disabling the scaffolding precompute. + # _cell_mean_variance falls back to compute_survey_if_variance when + # scaffolding is None. + monkeypatch.setattr( + prep, "_precompute_psu_scaffolding", lambda resolved: None, + ) + panel_legacy, _ = aggregate_survey( + data, by=["state", "year"], outcomes="y", survey_design=sd, + ) + self._assert_panels_equivalent(panel_fast, panel_legacy) + + def test_scaffolding_stratified_shape(self): + from diff_diff.survey import _precompute_psu_scaffolding + + data, sd = self._build_microdata("stratified_fpc") + resolved = sd.resolve(data) + scf = _precompute_psu_scaffolding(resolved) + assert scf.mode == "stratified" + assert scf.n == len(data) + assert scf.psu_codes.shape == (len(data),) + assert scf.psu_stratum.ndim == 1 + assert scf.n_psu_per_stratum.ndim == 1 + assert len(scf.psu_stratum) == int(scf.psu_codes.max() + 1) + # adjustment_h is zero for any singleton stratum by construction + if scf.singleton_strata.any(): + assert np.all(scf.adjustment_h[scf.singleton_strata] == 0.0) + + def test_scaffolding_weights_only_shape(self): + from diff_diff.survey import _precompute_psu_scaffolding + + data, sd = self._build_microdata("weights_only") + resolved = sd.resolve(data) + scf = _precompute_psu_scaffolding(resolved) + assert scf.mode == "no_strata_no_psu" + assert scf.adjustment_direct is not None + assert scf.psu_codes is None + assert scf.psu_codes_only is None + + def test_scaffolding_psu_only_shape(self): + from diff_diff.survey import _precompute_psu_scaffolding + + data, sd = self._build_microdata("psu_only") + resolved = sd.resolve(data) + scf = _precompute_psu_scaffolding(resolved) + assert scf.mode == "psu_only" + assert scf.psu_codes_only is not None + assert scf.n_psu_only is not None and scf.n_psu_only >= 2 + assert scf.adjustment_only is not None + assert scf.psu_codes is None + assert scf.adjustment_direct is None + + def test_lonely_psu_certainty_counts_singletons(self): + """Under lonely_psu='certainty', singletons contribute to legitimate_zero_count.""" + from diff_diff.survey import _precompute_psu_scaffolding + + data, sd = self._build_microdata("lonely_certainty") + resolved = sd.resolve(data) + scf = _precompute_psu_scaffolding(resolved) + n_singletons = int(scf.singleton_strata.sum()) + assert n_singletons >= 1 # sanity: fixture does plant a singleton + assert scf.legitimate_zero_count >= n_singletons + + def test_scaffolding_fpc_saturation_counts(self): + """f_h >= 1.0 increments legitimate_zero_count independent of singletons.""" + from diff_diff.survey import _precompute_psu_scaffolding + + rng = np.random.default_rng(7) + n = 200 + stratum = rng.integers(0, 2, n) + # Build exactly 4 unique PSUs per stratum so FPC = n_psu exactly. + psu = np.empty(n, dtype=np.int64) + for h in range(2): + idx = np.where(stratum == h)[0] + psu[idx] = np.arange(len(idx)) % 4 + h * 10 + df = pd.DataFrame( + { + "wt": rng.uniform(1, 2, n), + "stratum": stratum, + "psu": psu, + "y": rng.normal(size=n), + "fpc": 4.0, # f_h = 4/4 = 1.0 + } + ) + sd = SurveyDesign(weights="wt", strata="stratum", psu="psu", fpc="fpc") + resolved = sd.resolve(df) + scf = _precompute_psu_scaffolding(resolved) + assert scf.legitimate_zero_count >= 1