From 5b76c37b02b18e22629a3cad68cd8ea663a9ba63 Mon Sep 17 00:00:00 2001 From: igerber Date: Sun, 19 Apr 2026 20:05:14 -0400 Subject: [PATCH 1/2] Release 3.2.0: BusinessReport preview, dCDH survey completion, silent-failures audit MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Packages 161 commits across 18 PRs since v3.1.3 as minor release 3.2.0. Per project SemVer convention, minor bumps are reserved for new estimators or new module-level public API — BusinessReport / DiagnosticReport / DiagnosticReportResults (PR #318) add a new public API surface and drive this bump. Headline work: - PR #318 BusinessReport + DiagnosticReport (experimental preview) - practitioner- ready output layer. Plain-English narrative summaries across all 16 result types, with AI-legible to_dict() schemas. See docs/methodology/REPORTING.md. - PR #327, #335 did-no-untreated foundation - kernel infrastructure, local linear regression, HC2/Bell-McCaffrey variance, nprobust port. Foundation for the upcoming HeterogeneousAdoptionDiD estimator. - PR #323, #329, #332 dCDH survey completion - cell-period IF allocator (Class A contract), heterogeneity + within-group-varying PSU under Binder TSL, and PSU-level Hall-Mammen wild bootstrap at cell granularity. - PR #333 performance review - docs/performance-scenarios.md documents 5-7 realistic practitioner workflows; benchmark harness extended. Silent-failures audit closeouts (PRs #324, #326, #328, #331, #334, #337, #339) continue the reliability work started in v3.1.2-3.1.3 across axes A/C/E/G/J. CI infrastructure: PRs #330 and #336 exclude wall-clock timing tests from default CI after false-positive flakes; perf-review harness is the principled replacement. Version strings bumped in diff_diff/__init__.py, pyproject.toml, rust/Cargo.toml, diff_diff/guides/llms-full.txt, and CITATION.cff (version: 3.2.0, date-released: 2026-04-19). CHANGELOG populated with Added / Changed / Fixed sections and the comparison-link footer. CITATION.cff retains v3.1.3 versioned DOI in identifiers; the v3.2.0 versioned DOI will be minted by Zenodo on GitHub Release and added in a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 18 +++++++++++++++++- CITATION.cff | 4 ++-- diff_diff/__init__.py | 2 +- diff_diff/guides/llms-full.txt | 2 +- pyproject.toml | 2 +- rust/Cargo.toml | 2 +- 6 files changed, 23 insertions(+), 7 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 17ec76c4..ad6ec0ac 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,13 +7,28 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +## [3.2.0] - 2026-04-19 + ### Added -- **`BusinessReport` and `DiagnosticReport` (experimental preview)** - practitioner-ready output layer. `BusinessReport(results, ...)` produces plain-English narrative summaries (`.summary()`, `.full_report()`, `.export_markdown()`, `.to_dict()`) from any of the 16 fitted result types. `DiagnosticReport(results, ...)` orchestrates the existing diagnostic battery (parallel trends, pre-trends power, HonestDiD sensitivity, Goodman-Bacon, heterogeneity, design-effect, EPV) plus estimator-native diagnostics for SyntheticDiD (`pre_treatment_fit`, weight concentration, in-time placebo, zeta sensitivity) and TROP (factor-model fit metrics). Both classes expose an AI-legible `to_dict()` schema (single source of truth; prose renders from the dict). BR auto-constructs DR by default so summaries mention pre-trends, robustness, and design-effect findings in one call. See `docs/methodology/REPORTING.md` for methodology deviations including the no-traffic-light-gates decision, pre-trends verdict thresholds (0.05 / 0.30), and power-aware phrasing driven by `compute_pretrends_power`. **Both schemas are marked experimental in this release** - wording, verdict thresholds, and schema shape will change; do not anchor downstream tooling on them yet. +- **`BusinessReport` and `DiagnosticReport` (experimental preview)** (PR #318) - practitioner-ready output layer. `BusinessReport(results, ...)` produces plain-English narrative summaries (`.summary()`, `.full_report()`, `.export_markdown()`, `.to_dict()`) from any of the 16 fitted result types. `DiagnosticReport(results, ...)` orchestrates the existing diagnostic battery (parallel trends, pre-trends power, HonestDiD sensitivity, Goodman-Bacon, heterogeneity, design-effect, EPV) plus estimator-native diagnostics for SyntheticDiD (`pre_treatment_fit`, weight concentration, in-time placebo, zeta sensitivity) and TROP (factor-model fit metrics). Both classes expose an AI-legible `to_dict()` schema (single source of truth; prose renders from the dict). BR auto-constructs DR by default so summaries mention pre-trends, robustness, and design-effect findings in one call. See `docs/methodology/REPORTING.md` for methodology deviations including the no-traffic-light-gates decision, pre-trends verdict thresholds (0.05 / 0.30), and power-aware phrasing driven by `compute_pretrends_power`. **Both schemas are marked experimental in this release** - wording, verdict thresholds, and schema shape will change; do not anchor downstream tooling on them yet. +- **Kernel / local-linear / nonparametric infrastructure** (PRs #327, #335) - bandwidth selector, local linear regression, HC2 / Bell-McCaffrey variance helpers, and a port of R `nprobust`'s point-estimate path. Foundation for the upcoming `HeterogeneousAdoptionDiD` estimator (de Chaisemartin, Ciccia, D'Haultfœuille & Knau 2024 — "DiD with no untreated group"). Released as internal modules with full test coverage (`tests/test_bandwidth_selector.py`, `tests/test_local_linear.py`, `tests/test_linalg_hc2_bm.py`, `tests/test_nprobust_port.py`); the user-facing estimator ships in a later phase. +- **Cell-period IF allocator for dCDH survey variance (Class A contract)** (PR #323) - replaces the group-level allocator `ψ_i = ψ_g * (w_i / W_g)` with a cell-period allocator `ψ_i = ψ_g * (w_i / W_{g, out_idx})` on the post-period cell for the DID_l replicate-weight ATT path. Is the allocator shape that the v3.2.0 heterogeneity and bootstrap extensions below build on. Documents the post-period attribution convention in REGISTRY.md with a hand-computed row-sum identity test. ### Changed - Add Zenodo DOI badge to README; upgrade the BibTeX citation block with the concept DOI (`10.5281/zenodo.19646175`) and list author as Isaac Gerber (matching `CITATION.cff`). Add `doi:` and `identifiers:` entries (concept + versioned) to `CITATION.cff`. DOI was minted by Zenodo when v3.1.3 was released. - **`ChaisemartinDHaultfoeuille` heterogeneity + within-group-varying PSU/strata now supported under Binder TSL** - `fit(heterogeneity=..., survey_design=...)` no longer raises `NotImplementedError` when the resolved design's PSU or strata vary across the cells of a group. On the **Binder TSL** branch (`compute_survey_if_variance`), the heterogeneity WLS coefficient IF is expanded to observation level via the cell-period allocator `ψ_i = ψ_g * (w_i / W_{g, out_idx})` on the post-period cell — the DID_l post-period single-cell convention shipped in v3.1.x. Under PSU=group the PSU-level Binder TSL variance is byte-identical to the previous release (PSU-level aggregate telescopes to `ψ_g`); under within-group-varying PSU, mass lands in the post-period PSU of the transition. The **Rao-Wu replicate-weight** branch (`compute_replicate_if_variance`) retains the legacy group-level allocator `ψ_i = ψ_g * (w_i / W_g)`: replicate variance computes `θ_r = sum_i ratio_ir * ψ_i` at observation level and is therefore not PSU-telescoping, so the cell-period allocator would silently change the replicate SE whenever a replicate column's ratios vary within group (e.g., per-row replicate matrices). Replicate + heterogeneity fits therefore produce byte-identical SE to the previous release, and the newly-unblocked `heterogeneity=` + within-group-varying PSU combination is unreachable under replicate designs by construction (`SurveyDesign` rejects `replicate_weights` combined with explicit `strata/psu/fpc`). - **`ChaisemartinDHaultfoeuille.fit(survey_design=..., n_bootstrap > 0)` now supports within-group-varying PSU** — the PSU-level Hall-Mammen wild multiplier bootstrap has been extended from a group-level PSU map (one multiplier per group) to a cell-level PSU map (one multiplier per `(g, t)` cell's PSU). A dispatcher in `_compute_dcdh_bootstrap` detects PSU-within-group-constant regimes (including PSU=group auto-inject and strictly-coarser PSU with within-group constancy) and routes them through the legacy group-level path so the bootstrap SE is bit-identical to the previous release (guarded by the new `test_bootstrap_se_matches_pre_pr4_baseline` and the pre-existing `test_auto_inject_bit_identical_to_group_level`). Under within-group-varying PSU, a group contributing cells to multiple PSUs receives independent multiplier draws per PSU — the correct Hall-Mammen wild PSU clustering at cell granularity. Multi-horizon bootstraps draw a single shared `(n_bootstrap, n_psu)` PSU-level weight matrix per block and broadcast per-horizon via each horizon's cell-to-PSU map, so the sup-t simultaneous confidence band remains a valid joint distribution. Closes the last `NotImplementedError` gate in the dCDH survey contract; replicate-weight variance and `n_bootstrap > 0` remain mutually exclusive by construction. **Scope note:** panels with *terminal missingness* where the terminally-missing group is in a cohort whose other groups still contribute at the missing period now raise a targeted `ValueError` on every survey variance path that uses the cell-period allocator: Binder TSL with within-group-varying PSU, Rao-Wu replicate-weight ATT (which always uses the cell allocator per the Class A contract shipped in PR #323), and the cell-level wild PSU bootstrap. Cohort-recentering leaks centered IF mass onto cells with no positive-weight observations, which the cell-period allocator cannot attach to any observation/PSU. This closes a silent mass-drop bug the cell-period allocator introduced across all three paths in v3.1.x; pre-process the panel to remove terminal missingness (drop late-exit groups or trim to a balanced sub-panel) as the documented workaround. For Binder TSL only, using an explicit `psu=` routes through the legacy group-level allocator where the row-sum identity makes the two allocators statistically equivalent. Replicate-weight ATT and within-group-varying-PSU bootstrap have no such allocator fallback — the panel itself must be pre-processed. PSU-within-group-constant Binder TSL (including PSU=group auto-inject) is unaffected. +- **Performance review: practitioner-scale scenarios + benchmark harness extension** (PR #333) - new `docs/performance-scenarios.md` documents 5-7 realistic practitioner workflows (marketing lift, geo-experiment, BRFSS state-policy, dCDH reversible treatment) grounded in the practitioner docs and the paper literature, not cookie-cutter textbook data. `benchmarks/speed_review/` extended with practitioner-scale scripts and per-backend bit-identity baselines. Baselines refreshed against current main. Finding: the biggest leverage areas are bootstrap resampling loops and per-replicate survey-design rebuilds in the bootstrap path; documented in `docs/performance-plan.md` for follow-up optimization PRs. +- **Wall-clock timing tests excluded from default CI** (PRs #330, #336) - `TestCallawaySantAnnaSEAccuracy.test_timing_performance` and `TestPerformanceRegression` marked `@pytest.mark.slow`, removing false-positive CI failures from runner-noise variance (BLAS path variation, neighbor VM contention). Tests remain runnable via `pytest -m slow` for ad-hoc local benchmarking; the perf-review harness above is the principled replacement for CI-gated performance tracking. + +### Fixed +- **Silent-failures audit: axis A** (PR #334) — minor solver paths numerical-precision / scale-fragility closeouts, completing the SDID extreme-Y-scale work started in v3.1.2. +- **Silent-failures audit: axis C & J** (PR #339) — B-spline derivative warning scope broadened; `SurveyPowerConfig` stale-cache wording narrowed. +- **Silent-failures audit: axis E** (PR #331) — row-drop counters surfaced across estimator paths so silent validator row-drops leave an explicit count on the result. +- **Silent-failures audit: axis G** (PR #337) — Rust vs Python backend edge-case parity tests added for rank-deficient, extreme-scale, and constant-column inputs. +- **SyntheticDiD diagnostic Y-normalization parity** (PR #328) — extends the PR #312 catastrophic-cancellation fix from the main fit path into `SyntheticDiDResults.in_time_placebo()` and `.sensitivity_to_zeta_omega()`. Diagnostics now apply the same `Y_shift / Y_scale` normalization the main fit uses, pass `zeta / Y_scale` and a normalized `min_decrease` into Frank-Wolfe, then rescale `att` / `pre_fit_rmse` back to original-Y units. +- **TROP bootstrap failure-rate guards** (PR #324) — alternating-minimization bootstrap loops now emit a `UserWarning` on silent high-failure-rate runs (LOOCV and bootstrap aggregation paths both covered); attempt-count-based warning replaces the previous observation-count denominator that could silently mask sparse runs. +- **`simulate_power()` failure-count surface + narrow except clause** (PR #326) — power-simulation replicate loop narrows the exception whitelist from `except Exception` to estimation/data-path failures (`TypeError` and friends now propagate, not silently absorb), and surfaces `n_simulation_failures` on `SimulationPowerResults`. Failure count included in `summary()` and `to_dict()`. ## [3.1.3] - 2026-04-18 @@ -1327,6 +1342,7 @@ for the full feature history leading to this release. [2.1.2]: https://github.com/igerber/diff-diff/compare/v2.1.1...v2.1.2 [2.1.1]: https://github.com/igerber/diff-diff/compare/v2.1.0...v2.1.1 [2.1.0]: https://github.com/igerber/diff-diff/compare/v2.0.3...v2.1.0 +[3.2.0]: https://github.com/igerber/diff-diff/compare/v3.1.3...v3.2.0 [3.1.3]: https://github.com/igerber/diff-diff/compare/v3.1.2...v3.1.3 [3.1.2]: https://github.com/igerber/diff-diff/compare/v3.1.1...v3.1.2 [3.1.1]: https://github.com/igerber/diff-diff/compare/v3.1.0...v3.1.1 diff --git a/CITATION.cff b/CITATION.cff index 8e0d8a4b..5e94da02 100644 --- a/CITATION.cff +++ b/CITATION.cff @@ -7,8 +7,8 @@ authors: family-names: Gerber orcid: "https://orcid.org/0009-0009-3275-5591" license: MIT -version: "3.1.3" -date-released: "2026-04-18" +version: "3.2.0" +date-released: "2026-04-19" doi: "10.5281/zenodo.19646175" identifiers: - type: doi diff --git a/diff_diff/__init__.py b/diff_diff/__init__.py index 8145f1da..3241ec92 100644 --- a/diff_diff/__init__.py +++ b/diff_diff/__init__.py @@ -252,7 +252,7 @@ ETWFE = WooldridgeDiD DCDH = ChaisemartinDHaultfoeuille -__version__ = "3.1.3" +__version__ = "3.2.0" __all__ = [ # Estimators "DifferenceInDifferences", diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt index 40bb5ae2..26b02514 100644 --- a/diff_diff/guides/llms-full.txt +++ b/diff_diff/guides/llms-full.txt @@ -2,7 +2,7 @@ > A Python library for Difference-in-Differences causal inference analysis. Provides sklearn-like estimators with statsmodels-style output for econometric analysis. -- Version: 3.1.3 +- Version: 3.2.0 - Repository: https://github.com/igerber/diff-diff - License: MIT - Dependencies: numpy, pandas, scipy (no statsmodels dependency) diff --git a/pyproject.toml b/pyproject.toml index 22753813..c977c396 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "maturin" [project] name = "diff-diff" -version = "3.1.3" +version = "3.2.0" description = "Difference-in-Differences causal inference with sklearn-like API. Callaway-Sant'Anna, Synthetic DiD, Honest DiD, event studies, parallel trends." readme = "README.md" license = "MIT" diff --git a/rust/Cargo.toml b/rust/Cargo.toml index 35fa0c6d..1aad72a2 100644 --- a/rust/Cargo.toml +++ b/rust/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "diff_diff_rust" -version = "3.1.3" +version = "3.2.0" edition = "2021" rust-version = "1.84" description = "Rust backend for diff-diff DiD library" From 852953b61b44f2d8e541ffd7e968f4f262e68b81 Mon Sep 17 00:00:00 2001 From: igerber Date: Sun, 19 Apr 2026 20:19:09 -0400 Subject: [PATCH 2/2] Simplify CITATION.cff to concept DOI only; drop identifiers list MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Drop the CITATION.cff `identifiers:` block. The top-level `doi:` field already carries the concept DOI (10.5281/zenodo.19646175), and Zenodo auto-mints a versioned DOI for every release regardless of what CITATION.cff says. Listing versioned DOIs in the CFF file was redundant with Zenodo's own record index and required a follow-up CFF edit per release — this change eliminates both. Tools that read CFF (GitHub's "Cite this repository" button, reference managers, JOSS workflow) read the top-level `doi:` field first, so the concept DOI is still the primary citation path. Anyone wanting a version-specific DOI can look it up on Zenodo's project page, where every release's versioned DOI is listed. Also update the CHANGELOG entry for the Zenodo DOI work to describe the simplified file shape. Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 2 +- CITATION.cff | 7 ------- 2 files changed, 1 insertion(+), 8 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index ad6ec0ac..34c1fb24 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -15,7 +15,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - **Cell-period IF allocator for dCDH survey variance (Class A contract)** (PR #323) - replaces the group-level allocator `ψ_i = ψ_g * (w_i / W_g)` with a cell-period allocator `ψ_i = ψ_g * (w_i / W_{g, out_idx})` on the post-period cell for the DID_l replicate-weight ATT path. Is the allocator shape that the v3.2.0 heterogeneity and bootstrap extensions below build on. Documents the post-period attribution convention in REGISTRY.md with a hand-computed row-sum identity test. ### Changed -- Add Zenodo DOI badge to README; upgrade the BibTeX citation block with the concept DOI (`10.5281/zenodo.19646175`) and list author as Isaac Gerber (matching `CITATION.cff`). Add `doi:` and `identifiers:` entries (concept + versioned) to `CITATION.cff`. DOI was minted by Zenodo when v3.1.3 was released. +- Add Zenodo DOI badge to README; upgrade the BibTeX citation block with the concept DOI (`10.5281/zenodo.19646175`) and list author as Isaac Gerber (matching `CITATION.cff`). `CITATION.cff` carries the concept DOI as its top-level `doi:` field — Zenodo auto-mints a versioned DOI for every release, but the CFF file tracks the concept DOI only so it doesn't need a follow-up edit per release. DOI was minted by Zenodo when v3.1.3 was released. - **`ChaisemartinDHaultfoeuille` heterogeneity + within-group-varying PSU/strata now supported under Binder TSL** - `fit(heterogeneity=..., survey_design=...)` no longer raises `NotImplementedError` when the resolved design's PSU or strata vary across the cells of a group. On the **Binder TSL** branch (`compute_survey_if_variance`), the heterogeneity WLS coefficient IF is expanded to observation level via the cell-period allocator `ψ_i = ψ_g * (w_i / W_{g, out_idx})` on the post-period cell — the DID_l post-period single-cell convention shipped in v3.1.x. Under PSU=group the PSU-level Binder TSL variance is byte-identical to the previous release (PSU-level aggregate telescopes to `ψ_g`); under within-group-varying PSU, mass lands in the post-period PSU of the transition. The **Rao-Wu replicate-weight** branch (`compute_replicate_if_variance`) retains the legacy group-level allocator `ψ_i = ψ_g * (w_i / W_g)`: replicate variance computes `θ_r = sum_i ratio_ir * ψ_i` at observation level and is therefore not PSU-telescoping, so the cell-period allocator would silently change the replicate SE whenever a replicate column's ratios vary within group (e.g., per-row replicate matrices). Replicate + heterogeneity fits therefore produce byte-identical SE to the previous release, and the newly-unblocked `heterogeneity=` + within-group-varying PSU combination is unreachable under replicate designs by construction (`SurveyDesign` rejects `replicate_weights` combined with explicit `strata/psu/fpc`). - **`ChaisemartinDHaultfoeuille.fit(survey_design=..., n_bootstrap > 0)` now supports within-group-varying PSU** — the PSU-level Hall-Mammen wild multiplier bootstrap has been extended from a group-level PSU map (one multiplier per group) to a cell-level PSU map (one multiplier per `(g, t)` cell's PSU). A dispatcher in `_compute_dcdh_bootstrap` detects PSU-within-group-constant regimes (including PSU=group auto-inject and strictly-coarser PSU with within-group constancy) and routes them through the legacy group-level path so the bootstrap SE is bit-identical to the previous release (guarded by the new `test_bootstrap_se_matches_pre_pr4_baseline` and the pre-existing `test_auto_inject_bit_identical_to_group_level`). Under within-group-varying PSU, a group contributing cells to multiple PSUs receives independent multiplier draws per PSU — the correct Hall-Mammen wild PSU clustering at cell granularity. Multi-horizon bootstraps draw a single shared `(n_bootstrap, n_psu)` PSU-level weight matrix per block and broadcast per-horizon via each horizon's cell-to-PSU map, so the sup-t simultaneous confidence band remains a valid joint distribution. Closes the last `NotImplementedError` gate in the dCDH survey contract; replicate-weight variance and `n_bootstrap > 0` remain mutually exclusive by construction. **Scope note:** panels with *terminal missingness* where the terminally-missing group is in a cohort whose other groups still contribute at the missing period now raise a targeted `ValueError` on every survey variance path that uses the cell-period allocator: Binder TSL with within-group-varying PSU, Rao-Wu replicate-weight ATT (which always uses the cell allocator per the Class A contract shipped in PR #323), and the cell-level wild PSU bootstrap. Cohort-recentering leaks centered IF mass onto cells with no positive-weight observations, which the cell-period allocator cannot attach to any observation/PSU. This closes a silent mass-drop bug the cell-period allocator introduced across all three paths in v3.1.x; pre-process the panel to remove terminal missingness (drop late-exit groups or trim to a balanced sub-panel) as the documented workaround. For Binder TSL only, using an explicit `psu=` routes through the legacy group-level allocator where the row-sum identity makes the two allocators statistically equivalent. Replicate-weight ATT and within-group-varying-PSU bootstrap have no such allocator fallback — the panel itself must be pre-processed. PSU-within-group-constant Binder TSL (including PSU=group auto-inject) is unaffected. - **Performance review: practitioner-scale scenarios + benchmark harness extension** (PR #333) - new `docs/performance-scenarios.md` documents 5-7 realistic practitioner workflows (marketing lift, geo-experiment, BRFSS state-policy, dCDH reversible treatment) grounded in the practitioner docs and the paper literature, not cookie-cutter textbook data. `benchmarks/speed_review/` extended with practitioner-scale scripts and per-backend bit-identity baselines. Baselines refreshed against current main. Finding: the biggest leverage areas are bootstrap resampling loops and per-replicate survey-design rebuilds in the bootstrap path; documented in `docs/performance-plan.md` for follow-up optimization PRs. diff --git a/CITATION.cff b/CITATION.cff index 5e94da02..f254fba4 100644 --- a/CITATION.cff +++ b/CITATION.cff @@ -10,13 +10,6 @@ license: MIT version: "3.2.0" date-released: "2026-04-19" doi: "10.5281/zenodo.19646175" -identifiers: - - type: doi - value: "10.5281/zenodo.19646175" - description: "Concept DOI — always resolves to the latest release" - - type: doi - value: "10.5281/zenodo.19646176" - description: "Versioned DOI for v3.1.3" url: "https://github.com/igerber/diff-diff" repository-code: "https://github.com/igerber/diff-diff" keywords: