Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
86d503f
Add weighted Frank-Wolfe kernel with per-unit regularization weights
igerber Apr 24, 2026
4bef0ec
Restore SDID survey-bootstrap via weighted Frank-Wolfe + Rao-Wu compo…
igerber Apr 24, 2026
e59cb60
Extend SDID coverage MC with stratified-survey DGP; regenerate artifact
igerber Apr 24, 2026
c3c7361
Address PR #355 R1: weighted λ centering + weights=None survey designs
igerber Apr 24, 2026
7325ee0
Address PR #355 R2 P0: retry zero-treated-mass draws on pweight-only …
igerber Apr 24, 2026
2f1beb7
Address PR #355 R3 P3: clarify hybrid bootstrap docs + pin boot_idx s…
igerber Apr 24, 2026
d324c6d
Address PR #355 R4 P0 + P3: resolve-normalize pweight-only weights + …
igerber Apr 24, 2026
46e50aa
Address PR #355 R5 P2 + P3: unify reg_weights shape failure + doc sweep
igerber Apr 24, 2026
6f7eb8e
Address PR #355 R6 P3: refresh three stale REGISTRY bullets post-PR #352
igerber Apr 24, 2026
5515fbe
Address PR #355 R7 P1 + P3: fit-time positive-mass guard + doc wordin…
igerber Apr 24, 2026
f60ece2
Address PR #355 R8 P1: front-door FPC validation for implicit-PSU SDI…
igerber Apr 24, 2026
1a20c16
Document R7 P1 + R8 P1 fit-time guards in SyntheticDiD.fit() docstring
igerber Apr 24, 2026
c29febf
Address PR #355 R11 P3: emit null instead of NaN in coverage artifact
igerber Apr 24, 2026
fb2dd90
Address PR #355 R12 P1: guard against zero effective-control omega_eff
igerber Apr 24, 2026
08056e4
Address PR #355 R13 P1: stratified_survey DGP off-by-one on post_periods
igerber Apr 24, 2026
18219c5
Move stratified_survey DGP test to benchmarks/ (CI isolated-install fix)
igerber Apr 24, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changed
- **`did_had_pretest_workflow(aggregate="event_study")` verdict no longer emits the "paper step 2 deferred to Phase 3 follow-up" caveat** — the joint pre-trends Stute test closes that gap. The two-period `aggregate="overall"` path retains the existing caveat since the joint variant does not apply to single-pre-period panels. Downstream code that greps verdict strings for the Phase 3 caveat will see it suppressed on the event-study path.
- **SyntheticDiD bootstrap no longer supports survey designs** (capability regression). The removed fixed-weight bootstrap path was the only SDID variance method that supported strata/PSU/FPC (via Rao-Wu rescaled bootstrap); the new paper-faithful refit bootstrap rejects all survey designs (including pweight-only) with `NotImplementedError`. Pweight-only users can switch to `variance_method="placebo"` or `"jackknife"`. Strata/PSU/FPC users have no SDID variance option on this release. Composing Rao-Wu rescaled weights with Frank-Wolfe re-estimation requires a separate derivation (weighted FW solver); sketch and reusable scaffolding pointers are in `docs/methodology/REGISTRY.md` §SyntheticDiD and `TODO.md`.
- **SyntheticDiD bootstrap no longer supports survey designs** (capability regression in PR #351, **restored in PR #352** — see Added/Changed entries directly below). The removed fixed-weight bootstrap path was the only SDID variance method that supported strata/PSU/FPC (via Rao-Wu rescaled bootstrap); the PR #351 paper-faithful refit bootstrap initially rejected all survey designs (including pweight-only) with `NotImplementedError`. PR #352 restores the capability via a weighted-FW + Rao-Wu composition; the lock-out window applies only to the v3.2.x line that ships PR #351 alone (without PR #352). Composing Rao-Wu rescaled weights with Frank-Wolfe re-estimation: see `docs/methodology/REGISTRY.md` §SyntheticDiD `Note (survey + bootstrap composition)`.

### Added (PR #352)
- **SDID `variance_method="bootstrap"` survey support restored** via a hybrid pairs-bootstrap + Rao-Wu rescaling composed with a weighted Frank-Wolfe kernel. Each bootstrap draw first performs the unit-level pairs-bootstrap resampling specified by Arkhangelsky et al. (2021) Algorithm 2 (`boot_idx = rng.choice(n_total)`), and *then* applies Rao-Wu rescaled per-unit weights (Rao & Wu 1988) sliced over the resampled units — NOT a standalone Rao-Wu bootstrap. New Rust kernel `sc_weight_fw_weighted` (and `_with_convergence` sibling) accepts a per-coordinate `reg_weights` argument so the FW objective becomes `min ||A·ω - b||² + ζ²·Σ_j reg_w[j]·ω[j]²`. New Python helpers `compute_sdid_unit_weights_survey` and `compute_time_weights_survey` thread per-control survey weights through the two-pass sparsify-refit dispatcher (column-scaling Y by `rw` for the loss, `reg_weights=rw` for the penalty on the unit-weights side; weighted column-centering + row-scaling Y by `sqrt(rw)` for the loss with uniform reg on the time-weights side). `_bootstrap_se` survey branch composes the per-draw `rw` (Rao-Wu rescaling for full designs, constant `w_control` for pweight-only fits) with the weighted-FW helpers, then composes `ω_eff = rw·ω/Σ(rw·ω)` for the SDID estimator. Coverage MC artifact extended with a `stratified_survey` DGP (BRFSS-style: N=40, strata=2, PSU=2/stratum); the bootstrap row's near-nominal calibration is the validation gate (target rejection ∈ [0.02, 0.10] at α=0.05). New regression tests across `test_methodology_sdid.py::TestBootstrapSE` (single-PSU short-circuit, full-design and pweight-only succeeds-tests, zero-treated-mass retry, deterministic Rao-Wu × boot_idx slice) and `test_survey_phase5.py::TestSyntheticDiDSurvey` (full-design ↔ pweight-only SE differs assertion). See REGISTRY.md §SyntheticDiD ``Note (survey + bootstrap composition)`` for the full objective and the argmin-set caveat.

### Changed (PR #352)
- **SDID bootstrap SE values under survey fits now differ numerically from the v3.2.x line that shipped PR #351 alone**: the fit no longer raises `NotImplementedError`, and instead returns the weighted-FW + Rao-Wu SE. Non-survey fits are unaffected (the bootstrap dispatcher routes only the survey branch through the new `_survey` helpers; non-survey fits continue to call the existing `compute_sdid_unit_weights` / `compute_time_weights` and stay bit-identical at rel=1e-14 on the `_BASELINE["bootstrap"]` regression). SDID's `placebo` and `jackknife` paths still reject `strata/PSU/FPC` (separate methodology gap; tracked in TODO.md as a follow-up PR).

## [3.2.0] - 2026-04-19

Expand Down
2 changes: 1 addition & 1 deletion TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ Deferred items from PR reviews that were not addressed before merge.
| `HeterogeneousAdoptionDiD` Phase 5: `practitioner_next_steps()` integration, tutorial notebook, and `llms.txt` updates (preserving UTF-8 fingerprint). | `diff_diff/practitioner.py`, `tutorials/`, `diff_diff/guides/` | Phase 2a | Low |
| `HeterogeneousAdoptionDiD` time-varying dose on event study: Phase 2b REJECTS panels where `D_{g,t}` varies within a unit for `t >= F` (the aggregation uses `D_{g, F}` as the single regressor for all horizons, paper Appendix B.2 constant-dose convention). A follow-up PR could add a time-varying-dose estimator for these panels; current behavior is front-door rejection with a redirect to `ChaisemartinDHaultfoeuille`. | `diff_diff/had.py::_validate_had_panel_event_study` | Phase 2b | Low |
| `HeterogeneousAdoptionDiD` repeated-cross-section support: paper Section 2 defines HAD on panel OR repeated cross-section, but Phase 2a is panel-only. RCS inputs (disjoint unit IDs between periods) are rejected by the balanced-panel validator with the generic "unit(s) do not appear in both periods" error. A follow-up PR will add an RCS identification path based on pre/post cell means (rather than unit-level first differences), with its own validator and a distinct `data_mode` / API surface. | `diff_diff/had.py::_validate_had_panel`, `diff_diff/had.py::_aggregate_first_difference` | Phase 2a | Medium |
| **SDID + survey designs** (capability regression in this release; both pweight-only AND strata/PSU/FPC). The previous release's fixed-weight bootstrap accepted strata/PSU/FPC via Rao-Wu rescaled bootstrap; the new paper-faithful refit bootstrap rejects all survey designs because Rao-Wu composed with Frank-Wolfe re-estimation requires its own derivation. The follow-up needs a **weighted Frank-Wolfe** variant of `_sc_weight_fw` accepting per-unit weights in the loss and regularization (`Σ rw_i ω_i Y_i,pre` / `ζ² Σ rw_i ω_i²`), threaded through `compute_sdid_unit_weights` / `compute_time_weights`. Reusable scaffolding (`generate_rao_wu_weights`, split into `rw_control` / `rw_treated`, degenerate-retry, treated-mean weighting) is recoverable from the pre-rewrite `_bootstrap_se` body via `git show 91082e5:diff_diff/synthetic_did.py` (PR #351 "Replace SDID fixed-weight bootstrap with paper-faithful refit"). Compose-after-unweighted-FW does not work — silently reproduces the fixed-weight Rao-Wu behavior we removed. Validation: re-use the coverage MC harness with a stratified DGP, confirm near-nominal rejection rates against placebo-SE tracking. See REGISTRY.md §SyntheticDiD `Note (deferred survey + bootstrap composition)` for the sketch. | `synthetic_did.py::fit`, `synthetic_did.py::_bootstrap_se`, `utils.py::_sc_weight_fw` | follow-up | Medium |
| **SDID + placebo/jackknife + strata/PSU/FPC** (capability gap remaining after PR #352). PR #352 restored survey-bootstrap support via weighted Frank-Wolfe + Rao-Wu composition; the same composition for `placebo` (which permutes control indices) and `jackknife` (which leaves out one unit at a time) requires its own derivations: placebo's allocator needs a weighted permutation distribution that respects PSU clustering; jackknife needs PSU-level LOO + stratum aggregation. Both reuse the weighted-FW kernel from PR #352 (`_sc_weight_fw(reg_weights=)`); the genuinely new work is the per-method allocator. Tracked but no concrete sketch yet — defer until user demand surfaces. | `synthetic_did.py::_placebo_variance_se`, `synthetic_did.py::_jackknife_se` | follow-up | Low |
| SyntheticDiD: bootstrap cross-language parity anchor against R's default `synthdid::vcov(method="bootstrap")` (refit; rebinds `opts` per draw) or Julia `Synthdid.jl::src/vcov.jl::bootstrap_se` (refit by construction). Same-library validation (placebo-SE tracking, AER §6.3 MC truth) is in place; a cross-language anchor is desirable to bolster the methodology contract. Julia is the cleanest target — minimal wrapping work and refit-native vcov. Tolerance target: 1e-6 on Monte Carlo samples (different BLAS + RNG paths preclude 1e-10). The R-parity fixture from the previous release was deleted because it pinned the now-removed fixed-weight path. | `benchmarks/R/`, `benchmarks/julia/`, `tests/` | follow-up | Low |

#### Performance
Expand Down
61 changes: 49 additions & 12 deletions benchmarks/data/sdid_coverage.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
"n_bootstrap": 200,
"library_version": "3.2.0",
"backend": "rust",
"generated_at": "2026-04-22T20:48:18.361220+00:00",
"total_elapsed_sec": 2424.92,
"generated_at": "2026-04-24T13:01:54.876774+00:00",
"total_elapsed_sec": 2420.61,
"methods": [
"placebo",
"bootstrap",
Expand All @@ -20,7 +20,8 @@
"dgps": {
"balanced": "Balanced / exchangeable: N_co=20, N_tr=3, T_pre=8, T_post=4",
"unbalanced": "Unbalanced: N_co=30, N_tr=8, heterogeneous unit-FE variance",
"aer63": "Arkhangelsky et al. (2021) AER \u00a76.3: N=100, N1=20, T=120, T1=5, rank=2, \u03c3=2"
"aer63": "Arkhangelsky et al. (2021) AER \u00a76.3: N=100, N1=20, T=120, T1=5, rank=2, \u03c3=2",
"stratified_survey": "BRFSS-style: N=40, strata=2, PSU=2/stratum, psu_re_sd=1.5 (PR #352)"
},
"per_dgp": {
"balanced": {
Expand All @@ -42,9 +43,9 @@
"0.05": 0.078,
"0.10": 0.116
},
"mean_se": 0.21962976414466187,
"mean_se": 0.2195984748876297,
"true_sd_tau_hat": 0.2093529148687405,
"se_over_truesd": 1.0490886371578094
"se_over_truesd": 1.0489391801652868
},
"jackknife": {
"n_successful_fits": 500,
Expand All @@ -57,7 +58,7 @@
"true_sd_tau_hat": 0.2093529148687405,
"se_over_truesd": 1.0756639338270981
},
"_elapsed_sec": 78.62
"_elapsed_sec": 71.24
},
"unbalanced": {
"placebo": {
Expand All @@ -78,9 +79,9 @@
"0.05": 0.038,
"0.10": 0.08
},
"mean_se": 0.15072674925763238,
"mean_se": 0.15070173940119225,
"true_sd_tau_hat": 0.135562270427217,
"se_over_truesd": 1.1118635648593473
"se_over_truesd": 1.1116790750572711
},
"jackknife": {
"n_successful_fits": 500,
Expand All @@ -93,7 +94,7 @@
"true_sd_tau_hat": 0.135562270427217,
"se_over_truesd": 0.990639682456852
},
"_elapsed_sec": 90.61
"_elapsed_sec": 78.91
},
"aer63": {
"placebo": {
Expand All @@ -114,9 +115,9 @@
"0.05": 0.04,
"0.10": 0.078
},
"mean_se": 0.28291769703671454,
"mean_se": 0.28265726432861016,
"true_sd_tau_hat": 0.2696262336703088,
"se_over_truesd": 1.0492958833622181
"se_over_truesd": 1.0483299806584672
},
"jackknife": {
"n_successful_fits": 500,
Expand All @@ -129,7 +130,43 @@
"true_sd_tau_hat": 0.2696262336703088,
"se_over_truesd": 0.9015870263136688
},
"_elapsed_sec": 2255.69
"_elapsed_sec": 2237.29
},
"stratified_survey": {
"placebo": {
"n_successful_fits": 0,
"rejection_rate": {
"0.01": null,
"0.05": null,
"0.10": null
},
"mean_se": null,
"true_sd_tau_hat": null,
"se_over_truesd": null
},
"bootstrap": {
"n_successful_fits": 500,
"rejection_rate": {
"0.01": 0.024,
"0.05": 0.058,
"0.10": 0.094
},
"mean_se": 0.5097482138251239,
"true_sd_tau_hat": 0.4512243070193919,
"se_over_truesd": 1.1297002530566618
},
"jackknife": {
"n_successful_fits": 0,
"rejection_rate": {
"0.01": null,
"0.05": null,
"0.10": null
},
"mean_se": null,
"true_sd_tau_hat": null,
"se_over_truesd": null
},
"_elapsed_sec": 16.48
}
}
}
Loading
Loading