Extend dCDH PSU-level wild bootstrap to cell granularity#332
Conversation
Lifts the last NotImplementedError gate in the dCDH survey contract: n_bootstrap > 0 combined with within-group-varying PSU is now supported via a cell-level Hall-Mammen wild multiplier bootstrap. Approach: the bootstrap mixin dispatches on whether PSU varies across cells of a group, inspecting a new (n_eligible, n_periods) PSU tensor (sentinel -1 for zero-weight cells) built alongside the existing group-level map. Under PSU-within-group-constant regimes (including PSU=group auto-inject and strictly-coarser PSU with within-group constancy) the dispatcher routes to the legacy group-level bootstrap so the SE is bit-identical to the previous release. Under within-group-varying PSU each (g, t) cell draws its multiplier via the per-cell PSU map and a group contributing cells to multiple PSUs receives independent multiplier draws per PSU — the correct Hall-Mammen wild PSU clustering at cell granularity. Multi-horizon bootstraps draw a single shared (n_bootstrap, n_psu) PSU-level weight matrix once per block and broadcast per-horizon via each horizon's cell-to-PSU map (via psu_codes_per_cell raveled and sentinel-masked), so the sup-t simultaneous confidence band remains a valid joint distribution across horizons. The cohort-recentered per-horizon tensors U_centered_pp_l and U_centered_pp_pl_l are persisted to _mh_pp_cache and _pl_pp_cache dicts inside the existing analytical-SE loops so the bootstrap block can reuse them without recomputing cohort-recentering. Tests: - Flip the bootstrap gating test to assert finite SE + CI under within-group-varying PSU at both the overall and per-horizon event-study surfaces (the multi-horizon path is the highest-risk surface for shared-weight regressions). - Pin a numeric baseline SE captured from pre-PR-4 code (origin/main at SHA ac181b7) under PSU=group and assert bit- identity: test_bootstrap_se_matches_pre_pr4_baseline. Dispatcher regressions that silently drift away from the legacy path fail this guard. - Add zero-weight-eligible-group smoke test. - Add slow-tier MC coverage test (test_dcdh_bootstrap_cell_period_coverage.py) validating empirical 95% coverage at both overall and horizon-1 surfaces on a DGP with within-group-varying PSU. REGISTRY.md Survey + bootstrap contract Note rewritten to describe the cell-level allocator, the dispatcher's bit-identity routing, and the shared-PSU-weight multi-horizon machinery. Cluster contract Note updated; no scope limitation remains in the Survey IF expansion Note. Replicate-weight variance and n_bootstrap > 0 remain mutually exclusive by SurveyDesign construction — no change there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses local AI review P1: the dispatcher previously silently fell back to the legacy group-level bootstrap whenever `psu_varies=True` but a target's per-cell IF tensor was `None`. Under within-group- varying PSU that would silently under-cluster (collapsing multi-PSU group contributions to one PSU). Fix: - Raise ValueError in every target branch (overall, joiners, leavers, multi-horizon, placebo-horizon) when `psu_varies=True` and the target's per-cell tensor is missing. Each raise names the target so the error localizes the missed wiring. - Add shape-check in _unroll_target_to_cells: require `u_per_period_target.shape == psu_codes_per_cell.shape` before flattening, so a silent cell/PSU misalignment fails loudly. - Add three negative-contract tests in TestBootstrapCellPeriod: varying PSU + missing overall tensor, shape mismatch, and varying PSU + missing multi-horizon tensor. - Update stale `_strata_psu_vary_within_group` docstring (the helper no longer gates any out-of-scope combo at fit() time). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Overall Assessment ⛔ Blocker Executive Summary
Methodology Affected method: dCDH survey multiplier bootstrap. The papers cited in the module docstrings justify the analytical plug-in variance; this bootstrap path is a documented library extension, so I am reviewing it against the REGISTRY/bootstrap contract rather than treating the existence of the extension as a defect.
Code Quality No separate code-quality findings beyond the two statistical-correctness blockers above. Performance No material performance findings. The new Monte Carlo coverage test is marked Maintainability No separate maintainability finding beyond the PSU-code contract bug above; that bug is itself a sign that the dense-code invariant is currently split across too many call sites. Tech Debt
Security No findings. Documentation/Tests
Path to Approval
|
Addresses two P0 correctness regressions in the PR-4 bootstrap PSU-map plumbing flagged by CI review. **P0 #1 - valid_map gate discarded the per-cell tensor too eagerly.** When any variance-eligible group had no positive-weight cells (all- sentinel row in psu_codes_per_cell), the old code set valid_map=False and left BOTH group_id_to_psu_code_bootstrap AND psu_codes_per_cell_bootstrap as None. The bootstrap then silently dropped to unclustered group-level instead of excluding only that group's empty row. Fix: always populate psu_codes_per_cell_bootstrap once the tensor is built; the cell-level path already masks out -1 cells at unroll time. Always populate group_id_to_psu_code_bootstrap with a per-group code (use placeholder 0 for all-sentinel rows since those groups have no IF mass and the multiplier they receive is irrelevant on either the legacy or the cell-level path). **P0 #2 - dense PSU codes factorized over non-eligible subset.** `np.unique(obs_psu_codes[pos_mask_boot])` previously included PSU labels from groups that were filtered out of _eligible_group_ids (e.g., singleton-baseline-excluded groups). The excluded groups' PSUs contributed dense codes that formed gaps in the eligible subset's map. Downstream `_generate_psu_or_group_weights` computes `n_psu = max(code) + 1` and triggers the identity fast path when `n_psu >= n_groups_target`. A gapped map like `[1, 1]` or `[0, 2, 2]` silently activated independent-draws clustering for eligible groups that should have shared a multiplier. Fix: restrict the np.unique factorization to the eligible-subset positive-weight obs only (`elig_obs_mask = pos_mask_boot & (g_idx_arr >= 0) & (t_idx_arr >= 0)`), so the dense code domain exactly matches the PSUs actually used by variance-eligible groups. Tests: - `test_bootstrap_zero_weight_group_equivalent_to_removing_it`: fit with vs without an all-zero-weight eligible group must produce byte-identical bootstrap SE at the same seed (byte- identity would have failed before P0 #1 fix because valid_map flipped the PSU-aware path off for the with-zero-group fit). - `test_bootstrap_dense_codes_under_singleton_baseline_excluded_group`: spies on the group_id_to_psu_code dict passed to `_compute_dcdh_bootstrap` under a fixture with an always-treated singleton-baseline group and strictly-coarser PSU among eligible groups. Asserts the dict's values form a contiguous `[0, n_unique-1]` range (no gaps from the excluded group's PSU), and that eligible groups sharing a PSU label receive the same dense code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…rying-PSU equivalence test Addresses the three P2 findings from the CI re-review (all P0s cleared). 1. **Warning prepass assumed one PSU per group** (`chaisemartin_dhaultfoeuille.py:2111-2148`). The old code collected `labels[0]` per eligible group, so a within-group-varying PSU design was mis-counted as having one PSU per group and emitted a misleading "strictly-coarser PSU" UserWarning. Rewrite counts unique PSU labels across all positive-weight obs of eligible groups (not just the first label per group); under PSU=group unchanged, under varying-PSU no false warning. 2. **REGISTRY heterogeneity Note still claimed NotImplementedError** (`REGISTRY.md:618`, "Combining heterogeneity= with n_bootstrap > 0 and within-group-varying PSU still raises NotImplementedError"). That gate was removed in the current PR. Update to clarify that heterogeneity inference stays analytical when bootstrap runs on the main ATT surfaces — the two inference paths are independent. 3. **Zero-weight-equivalence test used `psu=group`** (`test_bootstrap_zero_weight_group_equivalent_to_removing_it`). Under PSU=group both the buggy and correct code paths collapse to the same identity-draw structure, so the test didn't actually exercise the P0 #1 regression. Switch the fixture to within-group-varying PSU (period parity per group) so the cell-level dispatcher is invoked and the before-fix silent-dropback bug would fail this test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability No findings beyond the methodology blocker above. Tech Debt
Security No findings. Documentation/Tests
Verification note: static review only; Path to Approval
|
…entinel cells **P1 (methodology):** Under terminal missingness, `_cohort_recenter_per_period` subtracts cohort column means across the full period grid, so a group with no observation at period t acquires non-zero centered mass at that cell. The PR-4 cell-level bootstrap builds `psu_codes_per_cell` with -1 sentinels for such cells and `_unroll_target_to_cells` drops them — silently losing that centered mass. Under within-group-varying PSU + terminal missingness, this would under-cluster the bootstrap SE/CI/p-values. Conservative guard: `_unroll_target_to_cells` now raises `ValueError` when any sentinel cell (-1 PSU) carries non-zero cohort-recentered IF mass (|u| > 1e-12). The error message points users to `n_bootstrap=0` for analytical TSL on such panels. The analytical path has the same mass-leakage behavior under this regime but was shipped in PR #323; documenting the bootstrap- specific guard here avoids advertising a broken combination. Regression test: `test_bootstrap_cell_level_raises_on_sentinel_mass_leak` constructs a per-cell IF tensor with non-zero mass at a -1-PSU cell and asserts `_compute_dcdh_bootstrap` raises with the documented error message. **P2 (tests):** The slow MC bootstrap coverage test previously ran at `L_max=1`, which collapsed the multi-horizon block to a single target and never exercised the cross-horizon shared-weight path described in its own docstring. Bumped to `L_max=2` so the shared (n_bootstrap, n_psu) PSU-level weight matrix is drawn once and broadcast across horizons via each horizon's cell-to-PSU map. Added three assertions: - Horizon-1 bootstrap CI coverage in [0.925, 0.975]. - Horizon-2 bootstrap CI coverage in [0.910, 0.975]. Tolerance is wider than h-1 because finite-sample analytical TSL coverage on this DGP is itself ~0.93 at l=2 (measured offline: analytical h-1 = 0.94, h-2 = 0.926 at n_groups=40). An observed bootstrap coverage within 1pp of the analytical baseline is consistent with correct clustering; a drop to ≤ 0.90 would indicate a real shared-weight broadcast regression. - `cband_crit_value` finite in ≥ 90% of reps — validates that the shared (n_bootstrap, n_psu) weight matrix produces a coherent joint distribution across horizons (required for a valid sup-t simultaneous band). Bumped n_bootstrap to 1000 (from 500) to keep internal bootstrap MC noise below ~0.3pp per CI endpoint at horizon-2's slightly wider percentile-CI spread. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
Verification note: static review only; |
…ession Addresses the CI round-3 P1 (docs overclaiming support relative to the newly-added `_unroll_target_to_cells` guard) and P2 (no end-to-end fit() regression for terminal missingness + varying PSU + bootstrap). **P1 (docs contract consistency):** The cell-level bootstrap now hard-fails on the terminal-missingness mass-leak regime, but the surrounding docs still advertised full support. Added a "Bootstrap + terminal-missingness scope note" to: - `REGISTRY.md` "terminal missingness retained" Note — describes the cohort-recentering leakage mechanism and directs users to `n_bootstrap=0` for affected panels. - `REGISTRY.md` Survey + bootstrap contract Note — same carve-out, also clarifies that PSU-within-group-constant regimes are unaffected (dispatcher routes to the legacy path). - `CHANGELOG.md` (PR-4 entry) — explicit scope note after the "closes the last NotImplementedError gate" claim. - `fit()` docstring `survey_design` paragraph — scope note directs users to `n_bootstrap=0` as the documented workaround. **P2 (end-to-end fit() regression):** Added `test_bootstrap_fit_raises_on_terminal_missingness_with_varying_psu` in TestBootstrapCellPeriod. Fixture: 10 groups with joiners cohort (at period 3), leavers cohort (at period 4), and never-treated controls; group 2 is terminally missing at periods 4-5. At period 4 the other joiners serve as stable_1 controls for the leavers, producing non-zero cohort mean in cohort A — `_cohort_recenter_per_period` leaks `-col_mean` onto group 2's missing cell. Varying PSU (period parity per group) routes the bootstrap to the cell-level path. Test asserts: - `fit(..., n_bootstrap=50)` raises `ValueError` with the documented "no positive-weight observations" message. - `fit(..., n_bootstrap=0)` succeeds on the same panel — analytical TSL supports this regime (the contract the scope note preserves). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ⛔ Blocker Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
Static review only; |
Addresses the P0 escalation: recommending `n_bootstrap=0` as a workaround for terminal missingness + within-group-varying PSU was incorrect — the analytical TSL path uses the SAME cell-period allocator and has the same silent mass-drop bug. **Analytical guard parity.** `_survey_se_from_group_if` now computes `W_cell` (per-(g,t) weight totals) and, before the cell-to-obs expansion, checks whether any cell has `W_cell == 0` while the corresponding cohort-recentered IF mass is non-zero. If so, raises a targeted `ValueError` mirroring the bootstrap-side `_unroll_target_to_cells` guard. The message text is aligned across the two paths (same "no positive-weight observations" phrasing) so the regression test matches both. **Docs cleanup.** Removed the "use n_bootstrap=0 as workaround" language from REGISTRY, CHANGELOG, and the fit() docstring. Replaced with the correct workaround: pre-process the panel (drop late-exit groups / trim to a balanced sub-panel), or use an explicit `psu=<group_col>` so the dispatcher routes through the legacy group-level path (which does not use the cell-period allocator and is not affected by the mass-leak). **Regression test update.** The end-to-end fit() regression now asserts `ValueError` on BOTH `n_bootstrap=0` and `n_bootstrap > 0` under the terminally-missing + within-group-varying PSU fixture. This is technically a behavior change for panels previously covered silently by PR #323's cell-period analytical allocator — those panels used to produce finite (but silently mass-dropped) SEs and now raise. The change closes a real silent-correctness bug; the analytical path never had a principled treatment for the leaked mass in the first place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
Static review only. |
The round-5 analytical guard in _survey_se_from_group_if was over- scoped: it fired on any panel with leaked centered mass, including PSU-within-group-constant regimes (PSU=group auto-inject, strictly- coarser-PSU-within-group-constant). But the docs said those regimes are the intended workaround. Panels with terminal missingness and PSU=group are the documented supported path and must not raise. Fix: add an analytical dispatcher inside `_survey_se_from_group_if` mirroring the bootstrap's `_psu_varies_within_group` routing. When PSU is within-group-constant on the eligible subset, fall back to the legacy group-level allocator (which uses U_centered[g] directly via the row-sum identity and does not trigger the sentinel-mass guard). Only when PSU actually varies within group does the cell allocator run — and only then can the sentinel-mass guard fire. Byte-identity: under PSU=group the row-sum identity makes the cell and group allocators statistically equivalent, but the legacy branch was the one in use before PR #323 introduced the cell allocator. Rerouting to it under within-group-constant regimes is a regression-free fallback (it's what PR #323 aimed to preserve via byte-identity in the first place). Test coverage: added `test_fit_succeeds_on_terminal_missingness_with_psu_group` — same fixture as the varying-PSU failure regression but with auto-inject `psu=<group>`, asserts both `n_bootstrap=0` and `n_bootstrap > 0` succeed with finite SE. Also updated the cell-level bootstrap `ValueError` text (and the _unroll_target_to_cells docstring) to no longer advertise `n_bootstrap=0` as a workaround — both paths now fail consistently on the varying-PSU carve-out, and the correct workaround is pre-processing or using an explicit `psu=<group_col>`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests
Path to Approval
Static review only: |
Round-6's constant-PSU fallback in `_survey_se_from_group_if` silently disabled the cell-period allocator for replicate-weight designs, because replicate designs have `resolved.psu is None` and the fallback routes `psu_arr is None` to the legacy group-level path. That's an allocator change on a Class A surface (overall DID_M, joiners, leavers, dynamic placebos) under per-row-varying replicate ratios, where cell and legacy allocators produce materially different Rao-Wu variances (same non-invariance PR #329 established for heterogeneity). Fix: restrict the round-6 fallback to the TSL branch by gating on `not resolved.uses_replicate_variance`. Replicate designs retain the cell-period allocator (the PR #323 Class A contract), and the sentinel-mass guard still fires on mass leakage when it applies. Regression: `TestReplicateClassA::test_att_cell_allocator_with_varying_replicate_ratios` constructs legacy and cell-level psi_obs on the fitted replicate design, feeds both through `compute_replicate_if_variance`, and asserts they produce materially different variances — locking the allocator contract so a future refactor that switches Class A to the legacy allocator would be detectable. Mirrors the heterogeneity non-invariance guard from PR #329's CI review round. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…ss A regression **P1 (docs scope carve-out missing for replicate ATT):** the sentinel-mass `ValueError` fires BEFORE the replicate-vs-TSL dispatch inside `_survey_se_from_group_if`, so replicate-weight ATT fits (which always use the cell-period allocator per the PR #323 Class A contract) also raise on terminal missingness. The docs previously scoped the carve-out to "within-group-varying PSU / analytical TSL / bootstrap" and silently omitted the replicate path. Rewrite the scope note in REGISTRY.md (both the balanced- baseline Note and the Survey + bootstrap contract Note), CHANGELOG, and the fit() docstring to list all three affected paths explicitly: Binder TSL with within-group-varying PSU, Rao-Wu replicate ATT, and the cell-level wild PSU bootstrap. Clarify that the `psu=<group_col>` workaround applies ONLY to Binder TSL — replicate ATT and the varying-PSU bootstrap have no allocator fallback; the only workaround for those paths is pre-processing the panel to remove terminal missingness. **P2 (synthetic allocator test didn't pin fitted SE):** the previous `test_att_cell_allocator_with_varying_replicate_ratios` showed that synthetic cell and legacy allocators differ, but never tied the difference back to the fit's `overall_se`. A refactor switching Class A replicate to the legacy allocator could have slipped through unnoticed. Rewrite using a pytest monkeypatch spy on `compute_replicate_if_variance` to capture the psi_obs the fit actually passes, then reconstruct the legacy-equivalent psi_obs via the row-sum identity on the SAME captured `U_centered` and verify: - fit's `overall_se**2` equals the recomputed variance on the captured cell-allocator psi_obs (sanity / dispatch alignment); - the legacy-reconstructed psi_obs produces a materially different Rao-Wu variance on the same fixture — so a silent Class A allocator regression would flip overall_se and fail this guard. No code behavior changed in round-8 beyond the doc updates — only the test and documentation now lock the contract explicitly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0 or P1 findings. Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability
Tech Debt
Security No findings. Documentation/Tests
|
…string Round-8 cleared all P0/P1 findings. Addressing the two remaining P2/P3s with concrete fixes. **P2 (tests/docs): replicate ATT terminal-missingness failure mode lacked an end-to-end regression.** The round-8 docs extended the carve-out to replicate ATT, but no fit()-level test asserted it. Added `TestReplicateClassA::test_att_raises_on_terminal_missingness_replicate_path`: fixture has a terminally-missing joiner group in a cohort whose surviving peers serve as stable_1 controls at the missing period (leaking non-zero centered IF mass onto the missing cells). With SDR replicate weights and per-row-varying ratios, the fit must raise the documented `ValueError` from `_survey_se_from_group_if`'s sentinel-mass guard. **P3 (maintainability): `_compute_dcdh_bootstrap` docstring was stale.** The docstring still described 3-tuple inputs and a purely per-group weight-matrix flow, even though PR 4 changed every per-target input to a 4-tuple and added the cell-level dispatch path. Rewrote the Parameters section to enumerate every kwarg with correct shapes, and added explicit entries for the PR-4 additions (multi_horizon_inputs, placebo_horizon_inputs, group_id_to_psu_code, eligible_group_ids, u_per_period_overall, psu_codes_per_cell) including the dispatcher's routing rule. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…ch modes Round-9 cleared every finding except one P3: the bootstrap helper's opening summary still read as a purely per-group W @ u_centered / divisor flow, which no longer matches the cell-level dispatch under within-group-varying PSU. Rewrote the introductory numbered steps to describe both paths explicitly — (a) legacy group-level path under PSU-within-group- constant regimes and (b) cell-level wild PSU path under within- group-varying PSU — and cross-reference `_unroll_target_to_cells`, the terminal-missingness guard, and the shared-PSU-weights multi- horizon machinery. Also re-flagged the row-sum identity that makes the two paths statistically equivalent under PSU=group. No code behavior change. Targeted tests (269) all pass unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good. No unmitigated P0 or P1 findings. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
The shared terminal-missingness ValueError in `_survey_se_from_group_if` said "Analytical survey SE" and attributed the failure to "within- group-varying PSU" — misleading for replicate ATT users, since replicate designs carry no PSU structure and still hit the same guard. Branch the wording on `resolved.uses_replicate_variance`: - Replicate ATT: "Rao-Wu replicate-weight ATT variance" + note that replicate ATT unconditionally uses the cell-period allocator per the Class A contract, PSU is not involved. - Binder TSL: keeps the original "Analytical Binder TSL survey SE" wording + the within-group-varying-PSU trigger explanation. Shared mechanic and pre-processing workaround kept the same. The regression tests still match on "no positive-weight observations" which appears in both branches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…= workaround text **P3 #1 (warning predicate inconsistent with "strictly coarser PSU" contract):** the new bootstrap warning block's comment said the warning fires only on strictly-coarser PSU designs, but the predicate `n_psu_eff_warn < n_groups_eff_warn` could also fire on supported varying-PSU designs whose eligible groups happened to share PSU labels across groups. Detect within-group-varying PSU explicitly (`.groupby("g")["p"].nunique().gt(1).any()`) and suppress the warning in that regime. Under auto-inject PSU=group and under within-group-varying PSU the warning now stays silent, matching the stated contract. **P3 #2 (`_unroll_target_to_cells` suggested `psu=<group_col>` as a bootstrap workaround):** the Registry / CHANGELOG already clarified that `psu=<group_col>` is ONLY a Binder TSL workaround; the cell- level wild PSU bootstrap has no allocator fallback. The helper's docstring and `ValueError` message still advertised it as a bootstrap-path workaround. Dropped that suggestion and explicitly clarified: the varying-PSU bootstrap IS the cell-level path, so there is no legacy-allocator alternative to fall back to — pre-processing the panel is the only workaround on the bootstrap side. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment✅ Looks good — no unmitigated P0/P1 findings. Executive summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Recaptured the baseline on origin/main@ac181b7f (the stated reference SHA). Current dispatcher reproduces this value bit-for-bit, confirming the PSU-within-group-constant legacy-path routing is intact. The prior pinned value was captured on an intermediate dev checkpoint rather than the merged pre-PR-4 SHA. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rust and pure-Python backends take different numeric paths through `_generate_bootstrap_weights_batch` and `solve_ols`, so the pinned pre-PR-4 SE differs between them. Both values were captured on origin/main@ac181b7f under each backend; bit-identity holds within each backend independently. Select the baseline at test time based on the resolved backend (mirrors `_PURE_PYTHON_MODE` detection in conftest.py). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-failures audit Packages 161 commits across 18 PRs since v3.1.3 as minor release 3.2.0. Per project SemVer convention, minor bumps are reserved for new estimators or new module-level public API — BusinessReport / DiagnosticReport / DiagnosticReportResults (PR #318) add a new public API surface and drive this bump. Headline work: - PR #318 BusinessReport + DiagnosticReport (experimental preview) - practitioner- ready output layer. Plain-English narrative summaries across all 16 result types, with AI-legible to_dict() schemas. See docs/methodology/REPORTING.md. - PR #327, #335 did-no-untreated foundation - kernel infrastructure, local linear regression, HC2/Bell-McCaffrey variance, nprobust port. Foundation for the upcoming HeterogeneousAdoptionDiD estimator. - PR #323, #329, #332 dCDH survey completion - cell-period IF allocator (Class A contract), heterogeneity + within-group-varying PSU under Binder TSL, and PSU-level Hall-Mammen wild bootstrap at cell granularity. - PR #333 performance review - docs/performance-scenarios.md documents 5-7 realistic practitioner workflows; benchmark harness extended. Silent-failures audit closeouts (PRs #324, #326, #328, #331, #334, #337, #339) continue the reliability work started in v3.1.2-3.1.3 across axes A/C/E/G/J. CI infrastructure: PRs #330 and #336 exclude wall-clock timing tests from default CI after false-positive flakes; perf-review harness is the principled replacement. Version strings bumped in diff_diff/__init__.py, pyproject.toml, rust/Cargo.toml, diff_diff/guides/llms-full.txt, and CITATION.cff (version: 3.2.0, date-released: 2026-04-19). CHANGELOG populated with Added / Changed / Fixed sections and the comparison-link footer. CITATION.cff retains v3.1.3 versioned DOI in identifiers; the v3.2.0 versioned DOI will be minted by Zenodo on GitHub Release and added in a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
NotImplementedErrorgate in the dCDH survey contract:n_bootstrap > 0combined with within-group-varying PSU is now supported via a cell-level Hall-Mammen wild multiplier bootstrap._compute_dcdh_bootstrapdetects PSU-within-group-constant regimes (PSU=group auto-inject and strictly-coarser PSU with within-group constancy) and routes them through the legacy group-level path so the bootstrap SE stays bit-identical to the previous release.(n_bootstrap, n_psu)PSU-level weight matrix per block and broadcast per-horizon via each horizon's cell-to-PSU map, so the sup-t simultaneous confidence band remains a valid joint distribution.Methodology references (required if estimator / math changes)
ChaisemartinDHaultfoeuillesurvey multiplier bootstrap (_compute_dcdh_bootstrap). Cell-level wild PSU Hall-Mammen extension; cohort-recentered per-cell IFU_centered_per_period(from PR Add cell-period IF allocator for dCDH survey variance #323).REGISTRY.md.DIDmultiplegtDYNdoes not support survey designs, so "deviation from R" does not apply.Validation
tests/test_survey_dcdh.py: flipped the bootstrap gating test to assert finite SE + CI under within-group-varying PSU at both the overall and per-horizon event-study surfaces. AddedTestBootstrapCellPeriodwith (a) pinned numeric baseline SE captured from pre-PR-4 code (origin/mainatac181b7f) as the bit-identity regression guard; (b) zero-weight-eligible-group smoke test; (c) three negative-contract tests assertingValueErrorwhenpsu_variesbut the required per-cell tensor is missing or misshaped.tests/test_dcdh_bootstrap_cell_period_coverage.py(new,@pytest.mark.slow): 500-rep MC coverage test validating empirical 95% coverage at both the overall bootstrap CI and the horizon-1 event-study bootstrap CI on a DGP with within-group-varying PSU. Horizon-specific assertion guards the shared-PSU-weight multi-horizon path.test_auto_inject_bit_identical_to_group_level(auto-inject invariance),test_coarser_psu_produces_finite_se(strictly-coarser PSU), and the ATT + heterogeneity slow MC coverage tests all pass unchanged.Security / privacy