Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
239202e
Add SyntheticDiD variance_method='bootstrap_refit' and coverage MC study
igerber Apr 22, 2026
26c5695
Validate variance_method and n_bootstrap in SyntheticDiD.set_params
igerber Apr 22, 2026
86a3956
Make SyntheticDiD.set_params rollback on validation failure
igerber Apr 22, 2026
6a03326
Address PR #351 CI review: FW warning per-draw, docstrings, scale reg…
igerber Apr 22, 2026
3fa6d8d
Address PR #351 CI review R2 P1: document fixed-weight bootstrap as R…
igerber Apr 22, 2026
91082e5
Replace SDID fixed-weight bootstrap with paper-faithful refit
igerber Apr 22, 2026
4767244
Refresh SDID docs & tutorials for single-bootstrap surface
igerber Apr 22, 2026
15481d9
Regenerate SDID coverage MC artifact for 3-method surface
igerber Apr 22, 2026
105cf73
Warm-start Frank-Wolfe to match R's vcov shape; pin artifact cadence
igerber Apr 22, 2026
f49a19f
Propagate SDID refit-only bootstrap to survey tests and cross-ref docs
igerber Apr 22, 2026
ec92d39
Resolve PR #351 R4 P3 cleanup
igerber Apr 22, 2026
826b1a8
Address PR #351 R5 P3: test silent-column-pickup contract
igerber Apr 22, 2026
c0d089b
Fix silent Rust FW non-convergence in SDID bootstrap aggregate warning
igerber Apr 22, 2026
9280cdd
Address R7 P2 + P3: rewrite null-calibration test; refresh REGISTRY F…
igerber Apr 22, 2026
710f966
Address PR #351 R8 P3 cleanup
igerber Apr 22, 2026
dc2045f
Address PR #351 R9 P3: unify SDID bootstrap slowdown wording
igerber Apr 22, 2026
b123c2b
Skip test_bootstrap_se_tracks_placebo_se_exchangeable under pure-Python
igerber Apr 23, 2026
5eadcb6
Address R10 P2 + P3: FW non-conv denominator; survey docs SDID Rao-Wu
igerber Apr 23, 2026
ec34beb
Address PR #351 R11 P3: document return_convergence tuple return
igerber Apr 23, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added
- **`target_parameter` block in BR/DR schemas (experimental; schema version bumped to 2.0)** — `BUSINESS_REPORT_SCHEMA_VERSION` and `DIAGNOSTIC_REPORT_SCHEMA_VERSION` bumped from `"1.0"` to `"2.0"` because the new `"no_scalar_by_design"` value on the `headline.status` / `headline_metric.status` enum (dCDH `trends_linear=True, L_max>=2` configuration) is a breaking change per the REPORTING.md stability policy. BusinessReport and DiagnosticReport now emit a top-level `target_parameter` block naming what the headline scalar actually represents for each of the 16 result classes. Closes BR/DR foundation gap #6 (target-parameter clarity). Fields: `name`, `definition`, `aggregation` (machine-readable dispatch tag), `headline_attribute` (raw result attribute), `reference` (citation pointer). BR's summary emits the short `name` right after the headline; DR's overall-interpretation paragraph does the same; both full reports carry a "## Target Parameter" section with the full definition. Per-estimator dispatch is sourced from REGISTRY.md and lives in the new `diff_diff/_reporting_helpers.py::describe_target_parameter`. A few branches read fit-time config (`EfficientDiDResults.pt_assumption`, `StackedDiDResults.clean_control`, `ChaisemartinDHaultfoeuilleResults.L_max` / `covariate_residuals` / `linear_trends_effects`); others emit a fixed tag (the fit-time `aggregate` kwarg on CS / Imputation / TwoStage / Wooldridge does not change the `overall_att` scalar — disambiguating horizon / group tables is tracked under gap #9). See `docs/methodology/REPORTING.md` "Target parameter" section.
- SyntheticDiD coverage Monte Carlo calibration table added to `docs/methodology/REGISTRY.md` §SyntheticDiD — rejection rates at α ∈ {0.01, 0.05, 0.10} across `placebo` / `bootstrap` / `jackknife` on 3 representative DGPs (balanced / exchangeable, unbalanced, and Arkhangelsky et al. (2021) AER §6.3 non-exchangeable). Artifact at `benchmarks/data/sdid_coverage.json` (500 seeds × B=200), regenerable via `benchmarks/python/coverage_sdid.py`.

### Fixed
- SyntheticDiD `variance_method="bootstrap"` now computes p-values from the analytical normal-theory formula using the bootstrap SE (matching R's `synthdid::vcov()` convention), rather than an empirical null-distribution formula that is not valid for bootstrap draws. `is_significant` and `significance_stars` are derived from `p_value` and will also change for bootstrap fits. Placebo and jackknife are unchanged. Point estimates and standard errors are unaffected.
- **SyntheticDiD `variance_method="bootstrap"` now runs the paper-faithful refit bootstrap** with R-default warm-start. Re-estimates ω̂_b and λ̂_b via two-pass sparsified Frank-Wolfe on each pairs-bootstrap draw using the fit-time normalized-scale zeta — Arkhangelsky et al. (2021) Algorithm 2 step 2, matching the behavior of R's default `synthdid::vcov(method="bootstrap")` (which rebinds `attr(estimate, "opts")` so the renormalized ω serves as Frank-Wolfe initialization). The Python path threads that warm-start through `compute_sdid_unit_weights(..., init_weights=_sum_normalize(ω̂[boot_control_idx]))` and `compute_time_weights(..., init_weights=λ̂)` on each bootstrap draw. `compute_sdid_unit_weights` and `compute_time_weights` gain a new `init_weights` kwarg; when provided, the Rust top-level fast-path is skipped in favor of the Python two-pass dispatcher (whose inner FW calls still dispatch to Rust). Without this kwarg both helpers remain backward-compatible and keep the Rust fast-path. The previous fixed-weight bootstrap path is removed entirely — it was not paper-faithful and, despite prior documentation claiming otherwise, also did not match R's default bootstrap (the previous R-parity test fixture invoked `synthdid_estimate(weights=...)` without rebinding `opts`, which silently runs fixed-weight, so the 1e-10 parity was between two paths both wrong in the same direction). Coverage MC at the new artifact above quantifies the correctness fix on 3 representative null DGPs. **Users' existing `variance_method="bootstrap"` fits will return materially different SE / p-value / CI values on the next release** — same enum name, corrected semantics. Bootstrap is now ~5–30× slower per fit than the old fixed-weight shortcut (panel-size dependent; warm-start converges faster than cold-start so the slowdown is less than the 10–100× prior estimate). The PR #349 follow-on bullets below (analytical p-value dispatch, sqrt((r-1)/r) SE formula, retry-to-B contract) all carry over to the refit path unchanged.
- SyntheticDiD `variance_method="bootstrap"` now computes p-values from the analytical normal-theory formula using the bootstrap SE (matching R's `synthdid::vcov()` convention), rather than an empirical null-distribution formula that is not valid for bootstrap draws. `is_significant` and `significance_stars` are derived from `p_value` and will also change for bootstrap fits. Placebo and jackknife are unchanged. Point estimates are unaffected.
- SyntheticDiD bootstrap SE formula applies the `sqrt((r-1)/r)` correction matching R's synthdid and the placebo SE formula.
- SyntheticDiD bootstrap now retries degenerate resamples (all-control or all-treated, or non-finite `τ_b`) until exactly `n_bootstrap` valid replicates are accumulated, matching R's `synthdid::bootstrap_sample` and Arkhangelsky et al. (2021) Algorithm 2. Previously the Python path counted attempts (with degenerate draws silently dropped), producing fewer valid replicates than requested. A bounded-attempt guard (`20 × n_bootstrap`) prevents pathological-input hangs.

### Changed
- **SyntheticDiD bootstrap no longer supports survey designs** (capability regression). The removed fixed-weight bootstrap path was the only SDID variance method that supported strata/PSU/FPC (via Rao-Wu rescaled bootstrap); the new paper-faithful refit bootstrap rejects all survey designs (including pweight-only) with `NotImplementedError`. Pweight-only users can switch to `variance_method="placebo"` or `"jackknife"`. Strata/PSU/FPC users have no SDID variance option on this release. Composing Rao-Wu rescaled weights with Frank-Wolfe re-estimation requires a separate derivation (weighted FW solver); sketch and reusable scaffolding pointers are in `docs/methodology/REGISTRY.md` §SyntheticDiD and `TODO.md`.

## [3.2.0] - 2026-04-19

### Added
Expand Down
27 changes: 20 additions & 7 deletions METHODOLOGY_REVIEW.md
Original file line number Diff line number Diff line change
Expand Up @@ -501,13 +501,26 @@ variables appear to the left of the `|` separator.
`zeta=1.0`). Regularization parameters `zeta_omega` and `zeta_lambda` are now
computed automatically from the data noise level (N_tr * sigma^2) as specified in
Appendix D of Arkhangelsky et al. (2021), matching R's default behavior.
4. **Bootstrap SE uses fixed weights matching R's `bootstrap_sample`** (was
re-estimating all weights). The bootstrap variance procedure now holds unit and time
weights fixed at their point estimates and only re-estimates the treatment effect,
matching the approach in R's `synthdid::bootstrap_sample()`.
5. **Default `variance_method` changed to `"placebo"`** matching R's default. The R
package uses placebo variance by default (`synthdid_estimate` returns an object whose
`vcov()` uses the placebo method); our default now matches.
4. **Bootstrap SE is paper-faithful refit (Algorithm 2 step 2), matching R's default
`synthdid::vcov(method="bootstrap")` including its warm-start shape.** On each
pairs-bootstrap draw, ω and λ are re-estimated via Frank-Wolfe on the resampled
panel using the fit-time normalized-scale zeta. The Frank-Wolfe first pass is
warm-started from the fit-time ω (renormalized over the resampled controls via
`_sum_normalize`) and the fit-time λ (unchanged), matching R's `bootstrap_sample`
which rebinds `attr(estimate, "opts")` so those weights serve as the FW
initialization per `update.omega=TRUE` / `update.lambda=TRUE`.
*(Historical note: an earlier release shipped a fixed-weight shortcut here
that matched neither the paper nor R's default vcov; that path was removed
in PR #351 along with its R-parity fixture, which had also been mis-anchored.
The same PR added the warm-start plumbing to `compute_sdid_unit_weights` /
`compute_time_weights` via new `init_weights=` kwargs.)*
5. **Default `variance_method` changed to `"placebo"`** — intentional deviation from
R's default (R's `synthdid::vcov()` defaults to `"bootstrap"`). The library default
is placebo for two reasons: (a) placebo is unconditionally available on pweight-only
survey designs, whereas refit bootstrap rejects every survey design in this release;
(b) placebo sidesteps the ~5–30× slowdown of per-draw Frank-Wolfe re-estimation in
refit bootstrap. See REGISTRY.md §SyntheticDiD `Note (default variance_method
deviation from R)` for details.
6. **Deprecated `lambda_reg` and `zeta` params; new params are `zeta_omega` and
`zeta_lambda`**. The old parameters had unclear semantics and did not correspond to
the paper's notation. The new parameters directly match the paper and R package
Expand Down
6 changes: 3 additions & 3 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,8 @@ Deferred items from PR reviews that were not addressed before merge.
| `HeterogeneousAdoptionDiD` Phase 5: `practitioner_next_steps()` integration, tutorial notebook, and `llms.txt` updates (preserving UTF-8 fingerprint). | `diff_diff/practitioner.py`, `tutorials/`, `diff_diff/guides/` | Phase 2a | Low |
| `HeterogeneousAdoptionDiD` time-varying dose on event study: Phase 2b REJECTS panels where `D_{g,t}` varies within a unit for `t >= F` (the aggregation uses `D_{g, F}` as the single regressor for all horizons, paper Appendix B.2 constant-dose convention). A follow-up PR could add a time-varying-dose estimator for these panels; current behavior is front-door rejection with a redirect to `ChaisemartinDHaultfoeuille`. | `diff_diff/had.py::_validate_had_panel_event_study` | Phase 2b | Low |
| `HeterogeneousAdoptionDiD` repeated-cross-section support: paper Section 2 defines HAD on panel OR repeated cross-section, but Phase 2a is panel-only. RCS inputs (disjoint unit IDs between periods) are rejected by the balanced-panel validator with the generic "unit(s) do not appear in both periods" error. A follow-up PR will add an RCS identification path based on pre/post cell means (rather than unit-level first differences), with its own validator and a distinct `data_mode` / API surface. | `diff_diff/had.py::_validate_had_panel`, `diff_diff/had.py::_aggregate_first_difference` | Phase 2a | Medium |
| SyntheticDiD: ship paper-faithful refit bootstrap (Arkhangelsky et al. 2021 Algorithm 2, re-estimating ω and λ per draw) as an opt-in `bootstrap_weights="refit"` kwarg. Current bootstrap matches R's fixed-weight shortcut. | `synthetic_did.py::_bootstrap_se` | follow-up | Low |
| **SDID + survey designs** (capability regression in this release; both pweight-only AND strata/PSU/FPC). The previous release's fixed-weight bootstrap accepted strata/PSU/FPC via Rao-Wu rescaled bootstrap; the new paper-faithful refit bootstrap rejects all survey designs because Rao-Wu composed with Frank-Wolfe re-estimation requires its own derivation. The follow-up needs a **weighted Frank-Wolfe** variant of `_sc_weight_fw` accepting per-unit weights in the loss and regularization (`Σ rw_i ω_i Y_i,pre` / `ζ² Σ rw_i ω_i²`), threaded through `compute_sdid_unit_weights` / `compute_time_weights`. Reusable scaffolding (`generate_rao_wu_weights`, split into `rw_control` / `rw_treated`, degenerate-retry, treated-mean weighting) is recoverable from the pre-rewrite `_bootstrap_se` body via `git show 91082e5:diff_diff/synthetic_did.py` (PR #351 "Replace SDID fixed-weight bootstrap with paper-faithful refit"). Compose-after-unweighted-FW does not work — silently reproduces the fixed-weight Rao-Wu behavior we removed. Validation: re-use the coverage MC harness with a stratified DGP, confirm near-nominal rejection rates against placebo-SE tracking. See REGISTRY.md §SyntheticDiD `Note (deferred survey + bootstrap composition)` for the sketch. | `synthetic_did.py::fit`, `synthetic_did.py::_bootstrap_se`, `utils.py::_sc_weight_fw` | follow-up | Medium |
| SyntheticDiD: bootstrap cross-language parity anchor against R's default `synthdid::vcov(method="bootstrap")` (refit; rebinds `opts` per draw) or Julia `Synthdid.jl::src/vcov.jl::bootstrap_se` (refit by construction). Same-library validation (placebo-SE tracking, AER §6.3 MC truth) is in place; a cross-language anchor is desirable to bolster the methodology contract. Julia is the cleanest target — minimal wrapping work and refit-native vcov. Tolerance target: 1e-6 on Monte Carlo samples (different BLAS + RNG paths preclude 1e-10). The R-parity fixture from the previous release was deleted because it pinned the now-removed fixed-weight path. | `benchmarks/R/`, `benchmarks/julia/`, `tests/` | follow-up | Low |

#### Performance

Expand All @@ -126,8 +127,7 @@ Deferred items from PR reviews that were not addressed before merge.
| `EDiDBootstrapResults` cross-reference is ambiguous — class is exported from both `diff_diff` and `diff_diff.efficient_did_bootstrap`, producing 3 "more than one target found" warnings. Add `:noindex:` to one source or use full-path refs | `diff_diff/efficient_did_results.py`, `docs/api/efficient_did.rst` | — | Low |
| Tracked Sphinx autosummary stubs in `docs/api/_autosummary/*.rst` are stale — every sphinx build regenerates them with new attributes (e.g., `coef_var`, `survey_metadata`) that have been added to result classes. Either commit a refresh or move the directory to `.gitignore` and treat as build output. Also 6 untracked stubs exist for newer estimators (`WooldridgeDiD`, `SimulationMDEResults`, etc.) that have never been committed. | `docs/api/_autosummary/` | — | Low |
| HonestDiD `test_m0_short_circuit` uses wall-clock `elapsed < 0.5s` as a proxy for "short-circuit path taken" instead of calling the full optimizer. Replace with a direct correctness signal (mock/spy the optimizer or check a state flag) so the test doesn't depend on CI timing. Not flaky today at 500ms, but load-bearing correctness on a timing proxy is brittle. | `tests/test_methodology_honest_did.py:246` | — | Low |
| SyntheticDiD: coverage Monte Carlo study — empirical 95% CI coverage for placebo / fixed-bootstrap / jackknife on representative DGPs; document in REGISTRY.md to support the fixed-weight deviation label and calibrate user expectations. | `benchmarks/`, `docs/methodology/REGISTRY.md` | follow-up | Low |
| SyntheticDiD: rename internal `placebo_effects` variable to `null_or_bootstrap_effects` (or `variance_effects`). Misleading name across the bootstrap/placebo/jackknife dispatch paths; low-risk refactor. | `synthetic_did.py`, `synthetic_did_results.py` | follow-up | Low |
| SyntheticDiD: rename internal `placebo_effects` variable to `variance_effects` (or `resampled_effects`). Misleading name across the placebo/bootstrap/jackknife dispatch paths — holds three different contents depending on variance method. Low-risk refactor; user-facing field rename should preserve `placebo_effects` as a deprecated alias for one release. | `synthetic_did.py`, `results.py` | follow-up | Medium |

---

Expand Down
Loading
Loading