Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
- **`target_parameter` block in BR/DR schemas (experimental; schema version bumped to 2.0)** — `BUSINESS_REPORT_SCHEMA_VERSION` and `DIAGNOSTIC_REPORT_SCHEMA_VERSION` bumped from `"1.0"` to `"2.0"` because the new `"no_scalar_by_design"` value on the `headline.status` / `headline_metric.status` enum (dCDH `trends_linear=True, L_max>=2` configuration) is a breaking change per the REPORTING.md stability policy. BusinessReport and DiagnosticReport now emit a top-level `target_parameter` block naming what the headline scalar actually represents for each of the 16 result classes. Closes BR/DR foundation gap #6 (target-parameter clarity). Fields: `name`, `definition`, `aggregation` (machine-readable dispatch tag), `headline_attribute` (raw result attribute), `reference` (citation pointer). BR's summary emits the short `name` right after the headline; DR's overall-interpretation paragraph does the same; both full reports carry a "## Target Parameter" section with the full definition. Per-estimator dispatch is sourced from REGISTRY.md and lives in the new `diff_diff/_reporting_helpers.py::describe_target_parameter`. A few branches read fit-time config (`EfficientDiDResults.pt_assumption`, `StackedDiDResults.clean_control`, `ChaisemartinDHaultfoeuilleResults.L_max` / `covariate_residuals` / `linear_trends_effects`); others emit a fixed tag (the fit-time `aggregate` kwarg on CS / Imputation / TwoStage / Wooldridge does not change the `overall_att` scalar — disambiguating horizon / group tables is tracked under gap #9). See `docs/methodology/REPORTING.md` "Target parameter" section.

### Fixed
- SyntheticDiD `variance_method="bootstrap"` now computes p-values from the analytical normal-theory formula using the bootstrap SE (matching R's `synthdid::vcov()` convention), rather than an empirical null-distribution formula that is not valid for bootstrap draws. `is_significant` and `significance_stars` are derived from `p_value` and will also change for bootstrap fits. Placebo and jackknife are unchanged. Point estimates and standard errors are unaffected.
- SyntheticDiD bootstrap SE formula applies the `sqrt((r-1)/r)` correction matching R's synthdid and the placebo SE formula.
- SyntheticDiD bootstrap now retries degenerate resamples (all-control or all-treated, or non-finite `τ_b`) until exactly `n_bootstrap` valid replicates are accumulated, matching R's `synthdid::bootstrap_sample` and Arkhangelsky et al. (2021) Algorithm 2. Previously the Python path counted attempts (with degenerate draws silently dropped), producing fewer valid replicates than requested. A bounded-attempt guard (`20 × n_bootstrap`) prevents pathological-input hangs.

## [3.2.0] - 2026-04-19

### Added
Expand Down
3 changes: 3 additions & 0 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ Deferred items from PR reviews that were not addressed before merge.
| `HeterogeneousAdoptionDiD` Phase 5: `practitioner_next_steps()` integration, tutorial notebook, and `llms.txt` updates (preserving UTF-8 fingerprint). | `diff_diff/practitioner.py`, `tutorials/`, `diff_diff/guides/` | Phase 2a | Low |
| `HeterogeneousAdoptionDiD` staggered-timing reduction: Phase 2a requires exactly 2 time periods and raises on `>2` periods with or without `first_treat_col`. A "last-cohort subgroup" reduction scheme (slice to max-cohort's 2-period window) could lift this in a targeted follow-up PR before full Phase 2b multi-period aggregation. | `diff_diff/had.py::_validate_had_panel` | Phase 2a | Low |
| `HeterogeneousAdoptionDiD` repeated-cross-section support: paper Section 2 defines HAD on panel OR repeated cross-section, but Phase 2a is panel-only. RCS inputs (disjoint unit IDs between periods) are rejected by the balanced-panel validator with the generic "unit(s) do not appear in both periods" error. A follow-up PR will add an RCS identification path based on pre/post cell means (rather than unit-level first differences), with its own validator and a distinct `data_mode` / API surface. | `diff_diff/had.py::_validate_had_panel`, `diff_diff/had.py::_aggregate_first_difference` | Phase 2a | Medium |
| SyntheticDiD: ship paper-faithful refit bootstrap (Arkhangelsky et al. 2021 Algorithm 2, re-estimating ω and λ per draw) as an opt-in `bootstrap_weights="refit"` kwarg. Current bootstrap matches R's fixed-weight shortcut. | `synthetic_did.py::_bootstrap_se` | follow-up | Low |

#### Performance

Expand All @@ -124,6 +125,8 @@ Deferred items from PR reviews that were not addressed before merge.
| `EDiDBootstrapResults` cross-reference is ambiguous — class is exported from both `diff_diff` and `diff_diff.efficient_did_bootstrap`, producing 3 "more than one target found" warnings. Add `:noindex:` to one source or use full-path refs | `diff_diff/efficient_did_results.py`, `docs/api/efficient_did.rst` | — | Low |
| Tracked Sphinx autosummary stubs in `docs/api/_autosummary/*.rst` are stale — every sphinx build regenerates them with new attributes (e.g., `coef_var`, `survey_metadata`) that have been added to result classes. Either commit a refresh or move the directory to `.gitignore` and treat as build output. Also 6 untracked stubs exist for newer estimators (`WooldridgeDiD`, `SimulationMDEResults`, etc.) that have never been committed. | `docs/api/_autosummary/` | — | Low |
| HonestDiD `test_m0_short_circuit` uses wall-clock `elapsed < 0.5s` as a proxy for "short-circuit path taken" instead of calling the full optimizer. Replace with a direct correctness signal (mock/spy the optimizer or check a state flag) so the test doesn't depend on CI timing. Not flaky today at 500ms, but load-bearing correctness on a timing proxy is brittle. | `tests/test_methodology_honest_did.py:246` | — | Low |
| SyntheticDiD: coverage Monte Carlo study — empirical 95% CI coverage for placebo / fixed-bootstrap / jackknife on representative DGPs; document in REGISTRY.md to support the fixed-weight deviation label and calibrate user expectations. | `benchmarks/`, `docs/methodology/REGISTRY.md` | follow-up | Low |
| SyntheticDiD: rename internal `placebo_effects` variable to `null_or_bootstrap_effects` (or `variance_effects`). Misleading name across the bootstrap/placebo/jackknife dispatch paths; low-risk refactor. | `synthetic_did.py`, `synthetic_did_results.py` | follow-up | Low |

---

Expand Down
153 changes: 153 additions & 0 deletions benchmarks/R/generate_sdid_bootstrap_parity_fixture.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
#!/usr/bin/env Rscript
# Generate an R-parity fixture for SyntheticDiD bootstrap SE.
#
# The fixture pins B non-degenerate bootstrap indices and the resulting
# R-computed bootstrap SE so that the Python R-parity test can feed the
# same indices through `_bootstrap_se` and expect bit-identical SE.
#
# Usage:
# Rscript benchmarks/R/generate_sdid_bootstrap_parity_fixture.R
#
# Output:
# tests/data/sdid_bootstrap_indices_r.json

library(synthdid)
library(jsonlite)

# Panel data — must match TestJackknifeSERParity.Y_FLAT in
# tests/test_methodology_sdid.py (23 units × 8 periods = 184 values).
Y_flat <- c(
12.459567808595292, 13.223481099962006, 13.658348196773856,
13.844051055863837, 13.888854636247594, 14.997677893012806,
14.494587375086788, 15.851128751231856, 10.527006629006900,
11.317894498245712, 9.780141451338988, 10.635177418486473,
11.007911133698329, 11.692547000930196, 11.532445341187122,
10.646344091442769, 5.779122815714058, 5.265746845809725,
4.828411925858962, 5.933107464969151, 6.926403492435262,
7.566662873481445, 6.703831577045862, 7.090431451464497,
6.703722507026075, 6.453676391630379, 7.301398891231049,
7.726092498224848, 8.191225590595401, 7.669210641906834,
8.526151391259425, 7.715169490073769, 8.005628186152748,
7.523978158267692, 9.049143286687135, 9.434081283341134,
9.450553333966674, 10.310163601090766, 9.867729569702721,
9.846941461031360, 10.459939463684098, 11.887686682638062,
11.249912950470762, 12.093459993478538, 12.226598684379407,
11.973716581337246, 13.453499811673423, 13.287085704636093,
10.317796666844943, 10.819165701226847, 10.824437736488752,
9.582976251622744, 11.521962769964540, 11.495903971828724,
12.072136575632017, 12.570433156881965, 12.435827624848123,
13.750744970607428, 13.567397714461393, 14.218726703934166,
14.459837938730677, 14.659912736018788, 14.077914185301429,
14.854380461280002, 10.770274645112915, 11.275621916712160,
12.137534572839927, 12.531125692916383, 12.678920118269170,
12.304148175294246, 12.497145874675160, 14.103389828901550,
10.560062989643855, 10.755394606294518, 10.518678427483797,
11.721841324084256, 11.607272952190801, 11.924464521898100,
12.782516039349641, 13.026729430318186, 12.546145790341205,
13.409407032231695, 14.079787980063543, 13.128838312144593,
13.553836458429620, 13.718363411441658, 13.854625752117343,
14.924224028489123, 11.906891367097627, 12.128784222882244,
11.404804355878456, 13.130649630134753, 12.173021974919472,
12.859165585526416, 12.895280738363951, 13.345233593320895,
10.435966548001499, 10.663839793569295, 11.030422432974012,
11.033668451079661, 11.324277503659044, 11.045836529045589,
11.985219205566086, 12.220060940064094, 14.722723885094736,
15.772410109968900, 15.256969467031452, 15.568564129971197,
16.666133193788099, 16.405462433247578, 17.202870693537243,
17.289652559976691, 7.760317864391456, 8.460282811921017,
9.462415007659978, 9.956467084312777, 9.726218110324272,
10.272688229133685, 11.134101608790994, 11.592584658589104,
7.747112683063268, 8.706521663648207, 8.170907672905205,
8.679537720718859, 8.962718814069811, 8.861932954235140,
9.383430460745986, 9.891050023644237, 9.728955313568255,
9.231765881057163, 9.555677785583788, 10.420693590160205,
9.844078095298698, 10.651913064308546, 10.196489890710358,
11.855847076501993, 9.218785934915712, 9.133582433258733,
10.048827580363175, 9.952567508276010, 10.385962432276619,
11.596546220044132, 11.164945662130776, 11.016817405176500,
10.145044557120791, 10.921420538928436, 11.642624728800259,
10.730067509380019, 11.753738913724906, 11.868862794274008,
12.574196556067037, 12.311524695461632, 10.800710206252880,
12.817967597577915, 12.705627126180516, 12.497850142478354,
12.148734571851643, 13.494742486942219, 13.714835068828613,
13.770060323710533, 10.010857300549947, 10.787315152039971,
11.050238955584605, 11.063282099053561, 10.834793458278272,
17.153286194944865, 17.380010096861866, 16.984758489324143,
6.913302966281331, 6.938279687001069, 7.537129527669741,
7.063822443245238, 7.531238453797332, 13.853711102827464,
13.812711128345372, 14.204067444347162, 13.694867606609098,
12.929992273442151, 14.397345491024691, 15.116119455987304,
15.860226513457558, 19.442026093187646, 19.855029109494353,
20.377546194927845
)
stopifnot(length(Y_flat) == 184L)

N0 <- 20L # controls
N1 <- 3L # treated
T0 <- 5L # pre
T1 <- 3L # post
N <- N0 + N1
T <- T0 + T1

Y <- matrix(Y_flat, nrow = N, ncol = T, byrow = TRUE)

# Fit once to obtain the omega / lambda that the bootstrap holds fixed
tau_hat <- synthdid_estimate(Y, N0, T0)
weights <- attr(tau_hat, "weights")

# Bootstrap loop — record indices and compute tau_b with fixed weights,
# mimicking synthdid::vcov(method="bootstrap") per the package source.
# Retry on degenerate draws (no controls or no treated) so the fixture
# contains B non-degenerate rows and Python's `_bootstrap_indices` seam
# consumes them all without skipping.
set.seed(42L)
B <- 200L

sum_normalize <- function(v) {
s <- sum(v)
if (s > 0) v / s else rep(1 / length(v), length(v))
}

indices_matrix <- matrix(0L, nrow = B, ncol = N)
tau_boot <- numeric(B)
b <- 1L
while (b <= B) {
ind <- sample(seq_len(N), replace = TRUE)
n_co_b <- sum(ind <= N0)
if (n_co_b == 0L || n_co_b == N) next # degenerate — retry
weights_boot <- weights
weights_boot$omega <- sum_normalize(weights$omega[sort(ind[ind <= N0])])
tau_boot_b <- synthdid_estimate(
Y[sort(ind), ], sum(ind <= N0), T0,
weights = weights_boot
)
indices_matrix[b, ] <- ind
tau_boot[b] <- as.numeric(tau_boot_b)
b <- b + 1L
}

se <- sqrt((B - 1) / B) * sd(tau_boot)

output_path <- file.path("tests", "data", "sdid_bootstrap_indices_r.json")
dir.create(dirname(output_path), recursive = TRUE, showWarnings = FALSE)
write_json(
list(
indices = indices_matrix,
seed = 42L,
n_bootstrap = B,
se = se,
att = as.numeric(tau_hat),
metadata = list(
r_version = R.version.string,
synthdid_version = as.character(packageVersion("synthdid")),
panel_N = N,
panel_T = T,
N0 = N0, T0 = T0
)
),
output_path,
pretty = TRUE,
auto_unbox = TRUE,
digits = NA # preserve full float64 precision
)
cat(sprintf("Wrote %s (B=%d, SE=%.15g)\n", output_path, B, se))
Loading
Loading