diff --git a/CHANGELOG.md b/CHANGELOG.md index a84732c7..ef26f511 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,7 +9,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Changed - Add Zenodo DOI badge to README; upgrade the BibTeX citation block with the concept DOI (`10.5281/zenodo.19646175`) and list author as Isaac Gerber (matching `CITATION.cff`). Add `doi:` and `identifiers:` entries (concept + versioned) to `CITATION.cff`. DOI was minted by Zenodo when v3.1.3 was released. -- **`ChaisemartinDHaultfoeuille` heterogeneity + within-group-varying PSU/strata now supported under Binder TSL** - `fit(heterogeneity=..., survey_design=...)` no longer raises `NotImplementedError` when the resolved design's PSU or strata vary across the cells of a group. On the **Binder TSL** branch (`compute_survey_if_variance`), the heterogeneity WLS coefficient IF is expanded to observation level via the cell-period allocator `ψ_i = ψ_g * (w_i / W_{g, out_idx})` on the post-period cell — the DID_l post-period single-cell convention shipped in v3.1.x. Under PSU=group the PSU-level Binder TSL variance is byte-identical to the previous release (PSU-level aggregate telescopes to `ψ_g`); under within-group-varying PSU, mass lands in the post-period PSU of the transition. The **Rao-Wu replicate-weight** branch (`compute_replicate_if_variance`) retains the legacy group-level allocator `ψ_i = ψ_g * (w_i / W_g)`: replicate variance computes `θ_r = sum_i ratio_ir * ψ_i` at observation level and is therefore not PSU-telescoping, so the cell-period allocator would silently change the replicate SE whenever a replicate column's ratios vary within group (e.g., per-row replicate matrices). Replicate + heterogeneity fits therefore produce byte-identical SE to the previous release, and the newly-unblocked `heterogeneity=` + within-group-varying PSU combination is unreachable under replicate designs by construction (`SurveyDesign` rejects `replicate_weights` combined with explicit `strata/psu/fpc`). `n_bootstrap > 0` combined with within-group-varying PSU remains gated with `NotImplementedError` — the PSU-level Hall-Mammen wild bootstrap still uses the legacy group-level PSU map and will be extended in a follow-up PR. +- **`ChaisemartinDHaultfoeuille` heterogeneity + within-group-varying PSU/strata now supported under Binder TSL** - `fit(heterogeneity=..., survey_design=...)` no longer raises `NotImplementedError` when the resolved design's PSU or strata vary across the cells of a group. On the **Binder TSL** branch (`compute_survey_if_variance`), the heterogeneity WLS coefficient IF is expanded to observation level via the cell-period allocator `ψ_i = ψ_g * (w_i / W_{g, out_idx})` on the post-period cell — the DID_l post-period single-cell convention shipped in v3.1.x. Under PSU=group the PSU-level Binder TSL variance is byte-identical to the previous release (PSU-level aggregate telescopes to `ψ_g`); under within-group-varying PSU, mass lands in the post-period PSU of the transition. The **Rao-Wu replicate-weight** branch (`compute_replicate_if_variance`) retains the legacy group-level allocator `ψ_i = ψ_g * (w_i / W_g)`: replicate variance computes `θ_r = sum_i ratio_ir * ψ_i` at observation level and is therefore not PSU-telescoping, so the cell-period allocator would silently change the replicate SE whenever a replicate column's ratios vary within group (e.g., per-row replicate matrices). Replicate + heterogeneity fits therefore produce byte-identical SE to the previous release, and the newly-unblocked `heterogeneity=` + within-group-varying PSU combination is unreachable under replicate designs by construction (`SurveyDesign` rejects `replicate_weights` combined with explicit `strata/psu/fpc`). +- **`ChaisemartinDHaultfoeuille.fit(survey_design=..., n_bootstrap > 0)` now supports within-group-varying PSU** — the PSU-level Hall-Mammen wild multiplier bootstrap has been extended from a group-level PSU map (one multiplier per group) to a cell-level PSU map (one multiplier per `(g, t)` cell's PSU). A dispatcher in `_compute_dcdh_bootstrap` detects PSU-within-group-constant regimes (including PSU=group auto-inject and strictly-coarser PSU with within-group constancy) and routes them through the legacy group-level path so the bootstrap SE is bit-identical to the previous release (guarded by the new `test_bootstrap_se_matches_pre_pr4_baseline` and the pre-existing `test_auto_inject_bit_identical_to_group_level`). Under within-group-varying PSU, a group contributing cells to multiple PSUs receives independent multiplier draws per PSU — the correct Hall-Mammen wild PSU clustering at cell granularity. Multi-horizon bootstraps draw a single shared `(n_bootstrap, n_psu)` PSU-level weight matrix per block and broadcast per-horizon via each horizon's cell-to-PSU map, so the sup-t simultaneous confidence band remains a valid joint distribution. Closes the last `NotImplementedError` gate in the dCDH survey contract; replicate-weight variance and `n_bootstrap > 0` remain mutually exclusive by construction. **Scope note:** panels with *terminal missingness* where the terminally-missing group is in a cohort whose other groups still contribute at the missing period now raise a targeted `ValueError` on every survey variance path that uses the cell-period allocator: Binder TSL with within-group-varying PSU, Rao-Wu replicate-weight ATT (which always uses the cell allocator per the Class A contract shipped in PR #323), and the cell-level wild PSU bootstrap. Cohort-recentering leaks centered IF mass onto cells with no positive-weight observations, which the cell-period allocator cannot attach to any observation/PSU. This closes a silent mass-drop bug the cell-period allocator introduced across all three paths in v3.1.x; pre-process the panel to remove terminal missingness (drop late-exit groups or trim to a balanced sub-panel) as the documented workaround. For Binder TSL only, using an explicit `psu=` routes through the legacy group-level allocator where the row-sum identity makes the two allocators statistically equivalent. Replicate-weight ATT and within-group-varying-PSU bootstrap have no such allocator fallback — the panel itself must be pre-processed. PSU-within-group-constant Binder TSL (including PSU=group auto-inject) is unaffected. ## [3.1.3] - 2026-04-18 diff --git a/diff_diff/chaisemartin_dhaultfoeuille.py b/diff_diff/chaisemartin_dhaultfoeuille.py index 846c9506..1416d5ed 100644 --- a/diff_diff/chaisemartin_dhaultfoeuille.py +++ b/diff_diff/chaisemartin_dhaultfoeuille.py @@ -662,14 +662,32 @@ def fit( design is supplied, the multiplier bootstrap operates at the PSU level (Hall-Mammen wild PSU bootstrap) — under the default auto-inject this collapses to a group-level - clustered bootstrap. **Out-of-scope combinations raise - ``NotImplementedError``**: (a) replicate weights with - ``n_bootstrap > 0`` (replicate variance is closed-form; - bootstrap would double-count variance); (b) ``n_bootstrap > - 0`` with PSU that varies within group (PSU-level bootstrap - still uses the legacy group-level PSU map; follow-up PR - extends it). See REGISTRY.md - ``ChaisemartinDHaultfoeuille`` Notes for the full contract. + clustered bootstrap. Under within-group-varying PSU the + bootstrap uses a cell-level wild PSU allocator — a group + contributing cells to multiple PSUs receives independent + multiplier draws per PSU (see the Survey + bootstrap + contract Note in REGISTRY.md). **Scope note (terminal + missingness under any cell-period-allocator path):** on + panels where a terminally-missing group is in a cohort + whose other groups still contribute at the missing + period, every survey variance path that uses the cell- + period allocator raises a targeted ``ValueError``: + Binder TSL with within-group-varying PSU, Rao-Wu + replicate-weight ATT (which always uses the cell + allocator), and the cell-level wild PSU bootstrap. + Cohort-recentering leaks centered IF mass onto cells + with no positive-weight obs, which the cell-period + allocator cannot allocate to any observation or PSU. + Pre-process the panel (drop late-exit groups or trim to + a balanced sub-panel), or — for Binder TSL only — use + an explicit ``psu=`` so the analytical path + routes through the legacy group-level allocator. + Replicate ATT and within-group-varying-PSU bootstrap + have no such allocator fallback. **Replicate weights with ``n_bootstrap > 0`` + raises ``NotImplementedError``** (replicate variance is + closed-form; bootstrap would double-count variance). See + REGISTRY.md ``ChaisemartinDHaultfoeuille`` Notes for the + full contract. Returns ------- @@ -825,25 +843,12 @@ def fit( # Cell-period IF allocator contract: strata and PSU must be # constant within each (g, t) cell, a strict relaxation of - # the previous within-group constancy rule. One out-of-scope - # combination remains gated with NotImplementedError until - # the corresponding follow-up PR extends it: - # - n_bootstrap > 0 + within-group-varying PSU - # (PR 4: cell-level Hall-Mammen wild bootstrap) - _, psu_varies = _strata_psu_vary_within_group( - resolved_survey, data, group, survey_weights, - ) - if psu_varies and self.n_bootstrap > 0: - raise NotImplementedError( - "n_bootstrap > 0 is not supported under a " - "survey design whose PSU varies within group. " - "The PSU-level Hall-Mammen wild bootstrap uses " - "the legacy group-level PSU map and will be " - "extended to cell-level PSU in a follow-up PR. " - "For now, use n_bootstrap=0 (analytical TSL " - "variance, which fully supports within-group-" - "varying PSU via the cell-period allocator)." - ) + # the previous within-group constancy rule. Both the + # analytical TSL path and the PSU-level wild bootstrap now + # honor within-group-varying PSU via the cell-period + # allocator (the bootstrap dispatcher routes PSU-within- + # group-constant regimes through the legacy group-level + # path for bit-identity with prior releases). _validate_cell_constant_strata_psu( resolved_survey, data, group, time, survey_weights, ) @@ -1693,6 +1698,14 @@ def fit( multi_horizon_se = {} multi_horizon_inference = {} + # Cache the cohort-recentered per-cell IF tensors built + # inside this loop so the bootstrap block below can reuse + # them without recomputing cohort-recentering. Keyed by + # horizon; value shape (n_eligible_var, n_periods). Populated + # only when a survey design is active (else the per-period + # tensor is None and the bootstrap has no cell-level path + # to take). + _mh_pp_cache: Dict[int, np.ndarray] = {} # Compute inference for ALL horizons 1..L_max (including l=1) # so the event_study_effects dict uses a consistent estimand # (per-group DID_{g,l}) across all horizons. @@ -1736,6 +1749,7 @@ def fit( U_centered_pp_l: Optional[np.ndarray] = _cohort_recenter_per_period( U_pp_l[eligible_mask_var], cid_elig ) + _mh_pp_cache[l_h] = U_centered_pp_l else: U_centered_pp_l = None N_l_h = multi_horizon_dids[l_h]["N_l"] @@ -1842,6 +1856,10 @@ def fit( [g not in singleton_baseline_set_pl for g in all_groups], dtype=bool, ) + # Cache per-placebo-horizon cohort-recentered per-cell + # IF tensors for the bootstrap block below (same + # pattern as _mh_pp_cache for positive horizons). + _pl_pp_cache: Dict[int, np.ndarray] = {} for lag_l in range(1, L_max + 1): pl_data = multi_horizon_placebos.get(lag_l) if pl_data is None or pl_data["N_pl_l"] == 0: @@ -1874,6 +1892,7 @@ def fit( U_centered_pp_pl_l: Optional[np.ndarray] = _cohort_recenter_per_period( U_pp_pl[eligible_mask_pl], cid_elig_pl ) + _pl_pp_cache[lag_l] = U_centered_pp_pl_l else: U_centered_pp_pl_l = None _elig_groups_pl = [all_groups[g] for g in range(len(all_groups)) if eligible_mask_pl[g]] @@ -2106,21 +2125,27 @@ def fit( "instead of replicate weights." ) - # Warning fires only when PSU is strictly coarser than - # group. Under auto-inject psu=group (or explicit - # psu=), groups and PSUs coincide so - # Hall-Mammen wild PSU bootstrap equals group-level - # multiplier bootstrap — no need to warn. - # Count groups/PSUs on the POST-FILTER eligible set - # (`_eligible_group_ids`) — the same set the bootstrap - # map is built from below. Using raw positive-weight - # groups here would emit a misleading warning on - # panels where upstream dCDH filtering drops groups - # that happen to share PSUs with kept groups. + # Warning fires only when PSU is **strictly coarser + # than group** on an otherwise within-group-constant + # design (multiple eligible groups share a PSU label + # but no group spans more than one PSU). Two regimes + # are explicitly excluded: + # - PSU=group (auto-inject default): identity-map + # fast path — no warning needed. + # - Within-group-varying PSU: the cell-level + # allocator honors the per-cell PSU structure; + # "n_psu < n_groups" is expected whenever cells + # of a group share a PSU with cells of another + # group, which does not indicate coarser-than-group + # clustering in the Hall-Mammen sense. + # Count unique PSUs across ALL positive-weight obs of + # eligible groups AND detect within-group-varying + # PSU; suppress the warning in that regime. psu_arr_warn = getattr(resolved_survey, "psu", None) if psu_arr_warn is None or _obs_survey_info is None: # No PSU info — can't compare to group count. n_psu_eff_warn, n_groups_eff_warn = -1, -1 + psu_varies_within_warn = False else: obs_gids_warn = np.asarray(_obs_survey_info["group_ids"]) obs_ws_warn = np.asarray( @@ -2128,25 +2153,37 @@ def fit( ) pos_mask_warn = obs_ws_warn > 0 psu_codes_warn = np.asarray(psu_arr_warn) - # Collect the PSU label for each variance-eligible - # group. PSU that varies within group is rejected - # for the bootstrap path upstream (NotImplementedError - # gate), so the first positive-weight label - # represents the whole group here. - eligible_psu_labels: List[Any] = [] - for gid in _eligible_group_ids: - mask_g = (obs_gids_warn == gid) & pos_mask_warn - if mask_g.any(): - eligible_psu_labels.append( - psu_codes_warn[mask_g][0] - ) - n_groups_eff_warn = len(eligible_psu_labels) - n_psu_eff_warn = ( - int(len(np.unique(np.asarray(eligible_psu_labels)))) - if eligible_psu_labels - else -1 + eligible_gid_set = set(_eligible_group_ids) + elig_obs_mask_warn = pos_mask_warn & np.array( + [g in eligible_gid_set for g in obs_gids_warn], + dtype=bool, ) - if 0 <= n_psu_eff_warn < n_groups_eff_warn: + if elig_obs_mask_warn.any(): + elig_psu_labels_arr = psu_codes_warn[elig_obs_mask_warn] + n_psu_eff_warn = int( + len(np.unique(elig_psu_labels_arr)) + ) + n_groups_eff_warn = len(_eligible_group_ids) + # Detect within-group-varying PSU on the + # eligible subset so we can suppress the + # "strictly coarser PSU" warning there. + psu_varies_within_warn = bool( + pd.DataFrame({ + "g": obs_gids_warn[elig_obs_mask_warn], + "p": elig_psu_labels_arr, + }) + .groupby("g")["p"] + .nunique() + .gt(1) + .any() + ) + else: + n_psu_eff_warn, n_groups_eff_warn = -1, -1 + psu_varies_within_warn = False + if ( + 0 <= n_psu_eff_warn < n_groups_eff_warn + and not psu_varies_within_warn + ): warnings.warn( f"Bootstrap with survey_design uses Hall-Mammen " f"wild multiplier weights at the PSU level " @@ -2163,10 +2200,12 @@ def fit( stacklevel=2, ) joiners_inputs = ( - (U_centered_joiners, joiner_total, joiners_att) if joiners_available else None + (U_centered_joiners, joiner_total, joiners_att, U_centered_pp_joiners) + if joiners_available else None ) leavers_inputs = ( - (U_centered_leavers, leaver_total, leavers_att) if leavers_available else None + (U_centered_leavers, leaver_total, leavers_att, U_centered_pp_leavers) + if leavers_available else None ) # Phase 1 placebo bootstrap: the Phase 1 per-period placebo # DID_M^pl still uses NaN SE (no IF derivation for the @@ -2220,6 +2259,7 @@ def fit( U_centered_pl_b, pl_data["N_pl_l"], pl_data["placebo_l"], + _pl_pp_cache.get(lag_l), ) # Phase 2: build multi-horizon bootstrap inputs from the @@ -2275,23 +2315,28 @@ def fit( U_centered_h, h_data["N_l"], h_data["did_l"], + _mh_pp_cache.get(l_h), ) - # Under a survey design with PSU information, build a - # `group_id_to_psu_code` dict so the bootstrap mixin can - # derive each target's PSU map by ID lookup rather than by - # positional reuse. PSU that varies within group is - # rejected for the bootstrap path by a NotImplementedError - # gate in fit() (the cell-period allocator used by the - # analytical TSL path does NOT require within-group PSU - # constancy, but the Hall-Mammen wild PSU bootstrap still - # uses a group-level PSU map — PR 4 extends it). Each - # variance-eligible group therefore has exactly one PSU - # label here. Under auto-inject psu=group each group has a - # unique PSU code and the bootstrap mixin's identity-map - # fast path reproduces the pre-PSU behavior bit-for-bit. + # Under a survey design with PSU information, build both + # (a) a group-level `group_id_to_psu_code` dict (one PSU + # code per eligible group) and (b) a per-cell PSU tensor + # `psu_codes_per_cell` of shape (n_eligible, n_periods). + # The bootstrap mixin's dispatcher inspects (b) to decide + # whether PSU is within-group-constant: when constant, it + # runs the legacy group-level bootstrap via (a) for + # bit-identity with pre-release behavior; when varying, + # it switches to the cell-level wild PSU bootstrap that + # draws one multiplier per (g, t)'s PSU. Under auto-inject + # `psu=group` each group has a unique PSU code and every + # cell of a group shares it — the dispatcher routes to + # the legacy path and the identity-map fast path + # reproduces the pre-PSU behavior bit-for-bit. See + # REGISTRY.md ChaisemartinDHaultfoeuille Survey + + # bootstrap contract Note. group_id_to_psu_code_bootstrap: Optional[Dict[Any, int]] = None eligible_group_ids_bootstrap: Optional[np.ndarray] = None + psu_codes_per_cell_bootstrap: Optional[np.ndarray] = None if ( resolved_survey is not None and getattr(resolved_survey, "psu", None) is not None @@ -2299,33 +2344,95 @@ def fit( ): obs_psu_codes = np.asarray(resolved_survey.psu) obs_gids_boot = np.asarray(_obs_survey_info["group_ids"]) + obs_tids_boot = np.asarray(_obs_survey_info["time_ids"]) obs_weights_boot = np.asarray( _obs_survey_info["weights"], dtype=np.float64 ) pos_mask_boot = obs_weights_boot > 0 - group_psu_labels: List[Any] = [] - valid_map = True - for gid in _eligible_group_ids: - mask_g = (obs_gids_boot == gid) & pos_mask_boot - if not mask_g.any(): - valid_map = False - break - labels = obs_psu_codes[mask_g] - # Within-group-constant PSU is validated upstream; - # the first label represents the whole group. - group_psu_labels.append(labels[0]) - if valid_map and len(group_psu_labels) == n_groups_for_overall_var: - # Factor PSU labels to dense integer codes. - _, dense_codes = np.unique( - np.asarray(group_psu_labels), - return_inverse=True, + gid_to_idx = { + gid: i for i, gid in enumerate(_eligible_group_ids) + } + tid_to_idx = {t: i for i, t in enumerate(all_periods)} + n_elig_boot = len(_eligible_group_ids) + n_per_boot = len(all_periods) + g_idx_arr = np.array( + [gid_to_idx.get(g, -1) for g in obs_gids_boot], + dtype=np.int64, + ) + t_idx_arr = np.array( + [tid_to_idx.get(t, -1) for t in obs_tids_boot], + dtype=np.int64, + ) + # Factor PSU labels to dense int codes over the + # **eligible-subset** positive-weight observations only + # (not the full positive-weight population). Restricting + # to eligible obs ensures the resulting dense codes + # range ONLY over PSUs actually used by variance- + # eligible groups, so downstream n_psu = max(code) + 1 + # is exact: no gaps from singleton-baseline-excluded + # groups that would silently trigger the identity + # fast path in `_generate_psu_or_group_weights`. + elig_obs_mask = ( + pos_mask_boot & (g_idx_arr >= 0) & (t_idx_arr >= 0) + ) + elig_psu_labels = obs_psu_codes[elig_obs_mask] + dense_per_row: Optional[np.ndarray] = None + if elig_psu_labels.size > 0: + _, elig_dense_codes = np.unique( + elig_psu_labels, return_inverse=True, + ) + elig_dense_codes = np.asarray(elig_dense_codes, dtype=np.int64) + dense_per_row = np.full( + len(obs_psu_codes), -1, dtype=np.int64, + ) + dense_per_row[elig_obs_mask] = elig_dense_codes + + # Per-cell PSU tensor: (n_eligible, n_periods), -1 sentinel + # for ineligible / zero-weight cells. Populated + # unconditionally when `dense_per_row` exists — a row + # that ends up all-sentinel (eligible group with no + # positive-weight obs) is masked out at unroll time, + # not by discarding the entire tensor. See also the + # dispatcher's `_psu_varies_within_group` helper which + # ignores sentinel entries row-wise. + if dense_per_row is not None: + psu_codes_per_cell = np.full( + (n_elig_boot, n_per_boot), -1, dtype=np.int64, ) - dense_codes = np.asarray(dense_codes, dtype=np.int64) + psu_codes_per_cell[ + g_idx_arr[elig_obs_mask], + t_idx_arr[elig_obs_mask], + ] = dense_per_row[elig_obs_mask] + psu_codes_per_cell_bootstrap = psu_codes_per_cell + + # Group-level dict: one PSU code per eligible + # group. For rows that are all-sentinel (eligible + # group has no positive-weight obs), assign code + # `0` as a harmless placeholder — the group's IF + # mass is zero, so the bootstrap multiplier it + # receives is irrelevant on either the legacy or + # the cell-level path. Always populate the dict + # so the legacy group-level path keeps clustering + # correctly when psu_varies=False even if some + # eligible groups happen to have no positive- + # weight obs. + group_psu_labels: List[int] = [] + for i in range(n_elig_boot): + row = psu_codes_per_cell[i] + valid = row[row >= 0] + if valid.size == 0: + group_psu_labels.append(0) + else: + group_psu_labels.append(int(valid[0])) group_id_to_psu_code_bootstrap = { - gid: int(code) - for gid, code in zip(_eligible_group_ids, dense_codes) + gid: code + for gid, code in zip( + _eligible_group_ids, group_psu_labels + ) } - eligible_group_ids_bootstrap = np.asarray(_eligible_group_ids) + eligible_group_ids_bootstrap = np.asarray( + _eligible_group_ids + ) br = self._compute_dcdh_bootstrap( n_groups_for_overall=n_groups_for_overall_var, @@ -2339,6 +2446,8 @@ def fit( placebo_horizon_inputs=pl_boot_inputs, group_id_to_psu_code=group_id_to_psu_code_bootstrap, eligible_group_ids=eligible_group_ids_bootstrap, + u_per_period_overall=U_centered_pp_overall, + psu_codes_per_cell=psu_codes_per_cell_bootstrap, ) bootstrap_results = br @@ -5464,14 +5573,14 @@ def _strata_psu_vary_within_group( ) -> Tuple[bool, bool]: """Return (strata_varies_within_group, psu_varies_within_group). - Diagnostic helper used at ``fit()`` time to gate the remaining - out-of-scope combination for the cell-period IF allocator: - ``n_bootstrap > 0`` still uses a group-level PSU map and raises - ``NotImplementedError`` when PSU varies within group. The - heterogeneity WLS path supports within-group-varying PSU/strata - via the cell-period allocator (shipped in the PR that lifted the - previous gate). Zero-weight rows are excluded from the check - (subpopulation contract). + Diagnostic helper. No longer used to gate any out-of-scope + combination in ``fit()`` — the analytical TSL path, the + heterogeneity WLS path, and the PSU-level wild multiplier + bootstrap all support within-group-varying PSU/strata via the + cell-period allocator. The helper is retained for callers that + need to branch on the regime (e.g., documentation, diagnostic + warnings). Zero-weight rows are excluded (subpopulation + contract). """ if resolved is None: return False, False @@ -5778,6 +5887,53 @@ def _survey_se_from_group_if( and U_centered_per_period.size > 0 ) + # Binder TSL fallback: when the cell allocator would apply but + # PSU is within-group-constant (PSU=group, strictly-coarser PSU + # within-group-constant, or the auto-inject default), the cell + # and group allocators are equivalent at PSU-level aggregation + # via the row-sum identity `sum_{c in g} u_cell[c] == u_centered[g]`. + # Prefer the legacy group-level path in that regime: it sidesteps + # the sentinel-mass guard (below) that would otherwise fire + # spuriously on terminally-missing panels whose PSU structure + # does not require cell-level resolution. Matches the bootstrap + # dispatcher's routing rule (`_compute_dcdh_bootstrap` + + # `_psu_varies_within_group`). + # + # **Replicate-weight carve-out:** replicate designs do not carry + # `resolved.psu`, but the Class A replicate contract shipped in + # PR #323 applies `compute_replicate_if_variance` to the cell- + # allocator `psi_obs` — per-row-varying replicate ratios are + # allocator-sensitive, so forcing replicate fits onto the legacy + # group-level allocator would silently change their SE. The + # fallback therefore skips replicate fits; the sentinel-mass + # guard still fires on mass leakage when it applies. The Class + # A replicate allocator contract is asserted by + # `tests/test_survey_dcdh_replicate_psu.py::TestReplicateClassA:: + # test_att_cell_allocator_with_varying_replicate_ratios`. + if use_cell_allocator and not getattr( + resolved, "uses_replicate_variance", False + ): + psu_arr = getattr(resolved, "psu", None) + if psu_arr is None: + use_cell_allocator = False + else: + psu_eff = np.asarray(psu_arr)[pos_mask] + eligible_set = set(eligible_groups) + elig_row_mask = np.array( + [g in eligible_set for g in gids_eff], dtype=bool + ) + if elig_row_mask.any(): + psu_varies_within = bool( + pd.DataFrame({ + "g": gids_eff[elig_row_mask], + "p": psu_eff[elig_row_mask], + }).groupby("g")["p"].nunique().gt(1).any() + ) + if not psu_varies_within: + use_cell_allocator = False + else: + use_cell_allocator = False + if use_cell_allocator: tids_eff = np.asarray(time_ids)[pos_mask] # Map row's group to an index in eligible_groups (−1 when the @@ -5805,6 +5961,61 @@ def _survey_se_from_group_if( (elig_idx_eff[valid_cell], col_idx_eff[valid_cell]), w_eff[valid_cell], ) + # Sentinel-mass guard (mirror of `_unroll_target_to_cells` on + # the bootstrap path). Under terminal missingness, + # `_cohort_recenter_per_period` subtracts cohort column means + # across the full period grid, so a group with no observation + # at period t can acquire non-zero centered mass at that cell. + # The cell-level expansion `psi_i = U[g,t] * (w_i / W_{g,t})` + # has no observation to attach that mass to (W_{g,t} = 0), so + # silently dropping it would understate the SE. Raise a + # targeted ValueError instead (consistent with the cell-level + # bootstrap's `_unroll_target_to_cells` guard). + missing_cell_mask = W_cell == 0 + if missing_cell_mask.any(): + leaked = U_centered_per_period[missing_cell_mask] + if leaked.size > 0 and bool( + np.any(np.abs(leaked) > 1e-12) + ): + # Branch the wording on replicate-vs-TSL so replicate + # ATT users debugging this error are not misdirected + # to "within-group-varying PSU" (replicate designs + # have no PSU structure). Core mechanics and the + # pre-processing workaround are shared. + is_replicate = bool( + getattr(resolved, "uses_replicate_variance", False) + ) + path_name = ( + "Rao-Wu replicate-weight ATT variance" + if is_replicate + else "Analytical Binder TSL survey SE" + ) + trigger_detail = ( + "terminal missingness on a replicate-weight " + "design (replicate ATT unconditionally uses the " + "cell-period allocator per the Class A contract; " + "PSU structure is not involved)" + if is_replicate + else "terminal missingness combined with within-" + "group-varying PSU (the Binder TSL cell-period " + "allocator cannot allocate leaked mass to any " + "observation)" + ) + raise ValueError( + f"{path_name} cannot be computed on this panel: " + "cohort-recentered IF mass landed on (g, t) cells " + "with no positive-weight observations " + f"(W_{{g, t}} = 0). This typically occurs with " + f"{trigger_detail}: _cohort_recenter_per_period " + "subtracts cohort column means across the full " + "period grid, so a group with no observation at " + "period t acquires non-zero centered mass there, " + "which the cell-period allocator cannot allocate " + "to any observation. Pre-process the panel to " + "remove terminal missingness (drop late-exit " + "groups or trim to a balanced sub-panel) before " + "fitting." + ) # Lookup U_centered_per_period and W_cell per row. u_obs_cell = np.zeros(w_eff.shape[0], dtype=np.float64) u_obs_cell[valid_cell] = U_centered_per_period[ diff --git a/diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py b/diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py index 27552296..b1734ddd 100644 --- a/diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py +++ b/diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py @@ -9,19 +9,42 @@ ``CallawaySantAnna``, ``ImputationDiD``, and ``TwoStageDiD``). Under ``survey_design`` with an explicitly-coarser PSU, the bootstrap switches to PSU-level Hall-Mammen wild clustering: each PSU draws a single -multiplier and all groups within that PSU share it -(see ``_generate_psu_or_group_weights`` and ``_map_for_target`` below, -plus the REGISTRY.md ``ChaisemartinDHaultfoeuille`` Note on survey + +multiplier and all groups within that PSU share it (see +``_generate_psu_or_group_weights`` and ``_map_for_target`` below, plus +the REGISTRY.md ``ChaisemartinDHaultfoeuille`` Note on survey + bootstrap). Under the default auto-inject ``psu=group`` each group is its own PSU and the identity-map fast path reproduces the original -group-level behavior bit-for-bit. The bootstrap is a library extension, -not a paper requirement, and is documented as such in ``REGISTRY.md``. +group-level behavior bit-for-bit. + +**Cell-level wild PSU bootstrap (within-group-varying PSU):** when a +survey design's PSU varies across the cells of a group, the group-level +PSU map cannot represent per-PSU contributions (a group with cells in +PSUs ``{p1, p2}`` would collapse to the first PSU, under-clustering the +bootstrap). This module's dispatcher (``_psu_varies_within_group``) +detects the regime and switches to a cell-level allocator: each +observation-level ``psi_i`` is attributed to its ``(g, t)`` cell's PSU +via ``psu_codes_per_cell`` (shape ``(n_eligible_groups, n_periods)``, +-1 sentinel for zero-weight cells), and the bootstrap statistic becomes +``theta_r = sum_c multiplier[psu(c)] * u_centered_pp[c] / divisor`` +(using the cohort-recentered per-cell IF ``U_centered_per_period``). +Under PSU-within-group-constant the row-sum identity +``sum_{c in g} u_cell[c] == u_centered[g]`` (enforced by +``_cohort_recenter_per_period``) makes the cell-level and group-level +bootstraps statistically equivalent, and the dispatcher routes to the +legacy group-level path for bit-identity with pre-cell-level releases. +Multi-horizon bootstraps draw a single shared ``(n_bootstrap, n_psu)`` +PSU-level weight matrix per block and broadcast per-horizon via each +horizon's cell-to-PSU map, so the sup-t simultaneous band remains a +valid joint distribution. The bootstrap is a library extension, not a +paper requirement, and is documented in ``REGISTRY.md``. The mixin operates on **pre-computed cohort-centered influence-function values**: the main estimator class computes per-group ``U^G_g`` values -during the analytical variance calculation, recenters them by their -cohort means (using the ``(D_{g,1}, F_g, S_g)`` triple), and stores the -recentered vector. The bootstrap then multiplies this vector by random +(and, under survey designs, per-``(g, t)``-cell attributions +``U_centered_per_period[g, t]``) during the analytical variance +calculation, recenters them by their cohort means (using the +``(D_{g,1}, F_g, S_g)`` triple), and stores the recentered vector / +tensor. The bootstrap then multiplies this structure by random multiplier weights (Rademacher / Mammen / Webb) and re-aggregates to produce a bootstrap distribution per target. """ @@ -71,15 +94,28 @@ def _compute_dcdh_bootstrap( u_centered_overall: np.ndarray, divisor_overall: int, original_overall: float, - joiners_inputs: Optional[Tuple[np.ndarray, int, float]] = None, - leavers_inputs: Optional[Tuple[np.ndarray, int, float]] = None, - placebo_inputs: Optional[Tuple[np.ndarray, int, float]] = None, + joiners_inputs: Optional[ + Tuple[np.ndarray, int, float, Optional[np.ndarray]] + ] = None, + leavers_inputs: Optional[ + Tuple[np.ndarray, int, float, Optional[np.ndarray]] + ] = None, + placebo_inputs: Optional[ + Tuple[np.ndarray, int, float, Optional[np.ndarray]] + ] = None, # --- Phase 2: multi-horizon inputs --- - multi_horizon_inputs: Optional[Dict[int, Tuple[np.ndarray, int, float]]] = None, - placebo_horizon_inputs: Optional[Dict[int, Tuple[np.ndarray, int, float]]] = None, + multi_horizon_inputs: Optional[ + Dict[int, Tuple[np.ndarray, int, float, Optional[np.ndarray]]] + ] = None, + placebo_horizon_inputs: Optional[ + Dict[int, Tuple[np.ndarray, int, float, Optional[np.ndarray]]] + ] = None, # --- Survey: PSU-level bootstrap under survey designs --- group_id_to_psu_code: Optional[Dict[Any, int]] = None, eligible_group_ids: Optional[np.ndarray] = None, + # --- Survey: cell-level wild PSU bootstrap --- + u_per_period_overall: Optional[np.ndarray] = None, + psu_codes_per_cell: Optional[np.ndarray] = None, ) -> DCDHBootstrapResults: """ Compute multiplier-bootstrap inference for all dCDH targets. @@ -98,13 +134,22 @@ def _compute_dcdh_bootstrap( - the original point estimate of ``T`` (used as the centering point for the percentile p-value) - For each target, this method: + For each target, this method dispatches between two + allocator paths based on ``psu_codes_per_cell`` (via the + ``_psu_varies_within_group`` helper): + + **Legacy group-level path** — runs when + ``psu_codes_per_cell`` is ``None`` or PSU is within-group- + constant (PSU=group auto-inject, strictly-coarser PSU with + within-group constancy, or no survey design). For each target: 1. Generates an ``(n_bootstrap, n_groups_target)`` matrix of multiplier weights via :func:`~diff_diff.bootstrap_utils.generate_bootstrap_weights_batch`, where ``n_groups_target`` is the IF vector length (one - weight per contributing group). + weight per contributing group). When a coarser PSU map is + passed, weights are drawn once per PSU and broadcast to + groups so all groups in the same PSU share a multiplier. 2. Computes the bootstrap distribution as ``W @ u_centered / divisor`` (one bootstrap replicate per row), where ``divisor`` is the switching-cell count @@ -116,6 +161,34 @@ def _compute_dcdh_bootstrap( :func:`~diff_diff.bootstrap_utils.compute_effect_bootstrap_stats` to obtain ``(SE, CI, p_value)``. + **Cell-level wild PSU path** — runs when PSU varies within + group (any row of ``psu_codes_per_cell`` has > 1 unique non- + sentinel code). For each target: + + 1. Unrolls the target's ``u_per_period`` tensor and + ``psu_codes_per_cell`` to 1-D cell-level arrays via + :func:`_unroll_target_to_cells`, dropping cells with + sentinel (-1) PSU. Raises ``ValueError`` if cohort- + recentered mass lands on a sentinel cell (the terminal- + missingness guard — see REGISTRY.md). + 2. For **single-target** branches (overall, joiners, leavers, + placebo-horizon-per-l), generates an + ``(n_bootstrap, n_cells_in_target)`` weight matrix where + each cell gets the multiplier of its PSU code, via + ``psu_weights[:, psu_cell]`` broadcast. Bootstrap + distribution becomes ``W_cell @ u_cell / divisor``. + 3. For the **multi-horizon** block, draws a SINGLE shared + ``(n_bootstrap, n_psu)`` PSU-level weight matrix once and + broadcasts per-horizon via each horizon's cell-to-PSU map. + Preserves the sup-t simultaneous confidence band as a + valid joint distribution across horizons. + + Under PSU=group, the row-sum identity + ``sum_{c in g} u_cell[c] == u_centered[g]`` makes the two + paths statistically equivalent; the dispatcher prefers the + legacy path in that regime for bit-identity with pre-PR-4 + releases via the identity-map fast path. + Parameters ---------- n_groups_for_overall : int @@ -135,17 +208,50 @@ def _compute_dcdh_bootstrap( :func:`compute_effect_bootstrap_stats` for the percentile p-value computation. joiners_inputs : tuple, optional - ``(u_centered, divisor, original_effect)`` triple for the - joiners-only ``DID_+`` target. The ``divisor`` is the - joiner switching-cell total ``sum_t N_{1,0,t}``, NOT the - joiner group count. ``None`` when no joiners exist. + ``(u_centered, divisor, original_effect, u_per_period)`` + 4-tuple for the joiners-only ``DID_+`` target. The + ``divisor`` is the joiner switching-cell total + ``sum_t N_{1,0,t}`` (NOT the joiner group count). + ``u_per_period`` is the cohort-recentered per-cell IF + tensor of shape ``(n_eligible_groups, n_periods)`` (or + ``None`` when survey is inactive); it is consumed only + by the cell-level dispatch described below. leavers_inputs : tuple, optional - Same triple for the leavers-only ``DID_-`` target. The - ``divisor`` is the leaver switching-cell total - ``sum_t N_{0,1,t}``. + Same 4-tuple for the leavers-only ``DID_-`` target. + ``divisor`` is ``sum_t N_{0,1,t}``. placebo_inputs : tuple, optional - Same triple for the Phase 1 per-period placebo ``DID_M^pl``. - ``None`` when ``L_max=None`` (per-period placebo has no IF). + Same 4-tuple for the Phase 1 per-period placebo + ``DID_M^pl``. ``None`` when ``L_max=None`` (per-period + placebo has no IF). The 4th slot is always ``None`` here + because the Phase 1 placebo has no per-cell attribution. + multi_horizon_inputs : dict, optional + Per-horizon 4-tuples keyed by horizon ``l``. Same layout + as ``joiners_inputs``; the per-horizon per-cell tensor + drives the shared-PSU-weights broadcast under the cell- + level path. + placebo_horizon_inputs : dict, optional + Same shape as ``multi_horizon_inputs`` for the Phase 2 + placebo-horizon targets. + group_id_to_psu_code : dict, optional + Group-ID → dense PSU code map for the legacy group-level + bootstrap path. ``None`` when no PSU info is available. + eligible_group_ids : np.ndarray, optional + Ordered group IDs matching the ``u_centered_*`` vectors + for the legacy group-level path. + u_per_period_overall : np.ndarray, optional + Cohort-recentered per-cell IF tensor for the overall + ``DID_M`` target, shape ``(n_eligible_groups, n_periods)``. + Consumed only by the cell-level path. + psu_codes_per_cell : np.ndarray, optional + Per-cell PSU code tensor, shape + ``(n_eligible_groups, n_periods)``, with ``-1`` sentinel + for ineligible / zero-weight cells. Used by + ``_psu_varies_within_group`` to pick between the legacy + group-level bootstrap path (PSU-within-group-constant, + bit-identical to pre-PR-4 behavior via the identity-map + fast path) and the cell-level wild PSU bootstrap + (within-group-varying PSU, cell-level multipliers and + shared PSU-level weights across multi-horizon blocks). Returns ------- @@ -178,13 +284,53 @@ def _compute_dcdh_bootstrap( # group-IDs parameter for it rather than reusing the overall # eligible list. + # Dispatcher for the cell-level wild PSU bootstrap. When PSU + # varies across cells of a group, a group-level PSU map + # collapses within-group PSU variation and the bootstrap + # under-clusters. The cell-level path draws multipliers at + # PSU granularity and applies them per (g, t) cell via + # `psu_codes_per_cell`. Under PSU-within-group-constant + # regimes (including PSU=group and strictly-coarser PSU + # within-group-constant), the row-sum identity + # `sum_{c in g} u_cell[c] == u_centered[g]` makes the two + # paths statistically equivalent, and the dispatcher routes + # to the legacy path for bit-identity with pre-release + # behavior. See REGISTRY.md survey + bootstrap contract Note. + psu_varies = _psu_varies_within_group(psu_codes_per_cell) + # --- Overall DID_M --- # Skip the scalar DID_M bootstrap when divisor_overall <= 0 # (e.g., pure non-binary panels where N_S=0), but continue # to process multi_horizon_inputs and placebo_horizon_inputs. if divisor_overall > 0: + if psu_varies: + # Contract: when the cell-level path is required + # (psu_varies=True), the caller MUST provide the + # target's per-cell IF tensor. Silent fallback to the + # group-level allocator would under-cluster the + # bootstrap by collapsing multi-PSU group + # contributions to one PSU. + if u_per_period_overall is None: + raise ValueError( + "Cell-level bootstrap requires " + "u_per_period_overall when PSU varies " + "within group, got None. Caller must " + "thread the cohort-recentered per-cell IF " + "tensor (U_centered_pp_overall) from the " + "analytical TSL path." + ) + u_boot_overall, map_boot_overall = _unroll_target_to_cells( + u_per_period_overall, psu_codes_per_cell, + ) + else: + u_boot_overall = u_centered_overall + map_boot_overall = _map_for_target( + u_centered_overall.shape[0], + group_id_to_psu_code, + eligible_group_ids, + ) overall_se, overall_ci, overall_p, overall_dist = _bootstrap_one_target( - u_centered=u_centered_overall, + u_centered=u_boot_overall, divisor=divisor_overall, original=original_overall, n_bootstrap=self.n_bootstrap, @@ -193,11 +339,7 @@ def _compute_dcdh_bootstrap( rng=rng, context="dCDH overall DID_M bootstrap", return_distribution=True, - group_to_psu_map=_map_for_target( - u_centered_overall.shape[0], - group_id_to_psu_code, - eligible_group_ids, - ), + group_to_psu_map=map_boot_overall, ) else: overall_se = np.nan @@ -217,10 +359,25 @@ def _compute_dcdh_bootstrap( # --- Joiners (DID_+) --- if joiners_inputs is not None: - u_j, n_j, eff_j = joiners_inputs + u_j, n_j, eff_j, u_pp_j = joiners_inputs if u_j.size > 0 and n_j > 0: + if psu_varies: + if u_pp_j is None: + raise ValueError( + "Cell-level bootstrap requires joiners' " + "per-cell IF tensor (U_centered_pp_joiners) " + "when PSU varies within group, got None." + ) + u_boot_j, map_boot_j = _unroll_target_to_cells( + u_pp_j, psu_codes_per_cell, + ) + else: + u_boot_j = u_j + map_boot_j = _map_for_target( + u_j.size, group_id_to_psu_code, eligible_group_ids, + ) se_j, ci_j, p_j, _ = _bootstrap_one_target( - u_centered=u_j, + u_centered=u_boot_j, divisor=n_j, original=eff_j, n_bootstrap=self.n_bootstrap, @@ -229,9 +386,7 @@ def _compute_dcdh_bootstrap( rng=rng, context="dCDH joiners DID_+ bootstrap", return_distribution=False, - group_to_psu_map=_map_for_target( - u_j.size, group_id_to_psu_code, eligible_group_ids, - ), + group_to_psu_map=map_boot_j, ) results.joiners_se = se_j results.joiners_ci = ci_j @@ -239,10 +394,25 @@ def _compute_dcdh_bootstrap( # --- Leavers (DID_-) --- if leavers_inputs is not None: - u_l, n_l, eff_l = leavers_inputs + u_l, n_l, eff_l, u_pp_l = leavers_inputs if u_l.size > 0 and n_l > 0: + if psu_varies: + if u_pp_l is None: + raise ValueError( + "Cell-level bootstrap requires leavers' " + "per-cell IF tensor (U_centered_pp_leavers) " + "when PSU varies within group, got None." + ) + u_boot_l, map_boot_l = _unroll_target_to_cells( + u_pp_l, psu_codes_per_cell, + ) + else: + u_boot_l = u_l + map_boot_l = _map_for_target( + u_l.size, group_id_to_psu_code, eligible_group_ids, + ) se_l, ci_l, p_l, _ = _bootstrap_one_target( - u_centered=u_l, + u_centered=u_boot_l, divisor=n_l, original=eff_l, n_bootstrap=self.n_bootstrap, @@ -251,17 +421,17 @@ def _compute_dcdh_bootstrap( rng=rng, context="dCDH leavers DID_- bootstrap", return_distribution=False, - group_to_psu_map=_map_for_target( - u_l.size, group_id_to_psu_code, eligible_group_ids, - ), + group_to_psu_map=map_boot_l, ) results.leavers_se = se_l results.leavers_ci = ci_l results.leavers_p_value = p_l # --- Placebo (DID_M^pl) --- + # Phase 1 placebo has no per-cell IF (unpack tolerates None in + # the fourth slot — callers always pass None for this target). if placebo_inputs is not None: - u_pl, n_pl, eff_pl = placebo_inputs + u_pl, n_pl, eff_pl, _u_pp_pl_unused = placebo_inputs if u_pl.size > 0 and n_pl > 0: se_pl, ci_pl, p_pl, _ = _bootstrap_one_target( u_centered=u_pl, @@ -282,32 +452,58 @@ def _compute_dcdh_bootstrap( results.placebo_p_value = p_pl # --- Phase 2: Multi-horizon bootstrap with shared weight matrix --- - # Generate ONE shared (n_bootstrap, n_groups) weight matrix so all - # horizons use the same bootstrap draw, making the sup-t statistic - # a valid joint multiplier-bootstrap band. + # Generate ONE shared weight matrix so all horizons use the same + # bootstrap draw, making the sup-t statistic a valid joint + # multiplier-bootstrap band. Under PSU-within-group-constant the + # shared draws live at group granularity (bit-identical to + # pre-cell-level); under within-group-varying PSU the shared + # draws live at PSU granularity and are broadcast per-horizon to + # the horizon's cells via `psu_codes_per_cell`. if multi_horizon_inputs is not None: es_ses: Dict[int, float] = {} es_cis: Dict[int, Tuple[float, float]] = {} es_pvals: Dict[int, float] = {} es_dists: Dict[int, np.ndarray] = {} - # Shared weight matrix sized for the group set. Under PSU-level - # bootstrap (Hall-Mammen wild PSU), weights are drawn once per - # PSU and broadcast to groups so all groups in the same PSU - # share a multiplier within a single bootstrap replicate — - # preserving the sup-t joint distribution across horizons. n_groups_mh = n_groups_for_overall - shared_weights = _generate_psu_or_group_weights( - n_bootstrap=self.n_bootstrap, - n_groups_target=n_groups_mh, - weight_type=self.bootstrap_weights, - rng=rng, - group_to_psu_map=_map_for_target( - n_groups_mh, group_id_to_psu_code, eligible_group_ids, - ), - ) + if psu_varies: + # Draw ONE shared (n_bootstrap, n_psu) PSU-level weight + # matrix. Broadcast per-horizon via each horizon's + # cell-to-PSU map inside the loop. PSU count derived + # from the dense code domain of psu_codes_per_cell. + assert psu_codes_per_cell is not None + valid_psu_codes = psu_codes_per_cell[psu_codes_per_cell >= 0] + n_psu_mh = int(valid_psu_codes.max()) + 1 if valid_psu_codes.size > 0 else 0 + shared_psu_weights: Optional[np.ndarray] + if n_psu_mh > 0: + shared_psu_weights = _generate_bootstrap_weights_batch( + n_bootstrap=self.n_bootstrap, + n_units=n_psu_mh, + weight_type=self.bootstrap_weights, + rng=rng, + ) + else: + shared_psu_weights = None + shared_weights = None # not used on the cell path + else: + # Shared weight matrix sized for the group set. Under + # PSU-within-group-constant (Hall-Mammen wild PSU), + # weights are drawn once per PSU and broadcast to groups + # so all groups in the same PSU share a multiplier + # within a single bootstrap replicate — preserving the + # sup-t joint distribution across horizons. + shared_weights = _generate_psu_or_group_weights( + n_bootstrap=self.n_bootstrap, + n_groups_target=n_groups_mh, + weight_type=self.bootstrap_weights, + rng=rng, + group_to_psu_map=_map_for_target( + n_groups_mh, group_id_to_psu_code, eligible_group_ids, + ), + ) + shared_psu_weights = None - for l_h, (u_h, n_h, eff_h) in sorted(multi_horizon_inputs.items()): + for l_h, (u_h, n_h, eff_h, u_pp_h) in sorted(multi_horizon_inputs.items()): if u_h.size > 0 and n_h > 0: # Under the current contract every horizon's IF # vector uses the variance-eligible group ordering @@ -331,8 +527,27 @@ def _compute_dcdh_bootstrap( f"shared PSU draws onto the horizon's own " f"ordering via `_map_for_target`." ) - w_h = shared_weights - deviations = (w_h @ u_h) / n_h + if psu_varies: + if u_pp_h is None: + raise ValueError( + f"Cell-level bootstrap requires " + f"per-cell IF tensor for multi-horizon " + f"target l={l_h} when PSU varies " + f"within group, got None." + ) + assert shared_psu_weights is not None + # Cell-level: unroll this horizon's cells and + # broadcast the shared PSU weights. + u_cell_h, psu_cell_h = _unroll_target_to_cells( + u_pp_h, psu_codes_per_cell, + ) + if u_cell_h.size == 0: + continue + w_cell_h = shared_psu_weights[:, psu_cell_h] + deviations = (w_cell_h @ u_cell_h) / n_h + else: + assert shared_weights is not None + deviations = (shared_weights @ u_h) / n_h dist_h = deviations + eff_h se_h, ci_h, p_h = _compute_effect_bootstrap_stats( @@ -367,15 +582,38 @@ def _compute_dcdh_bootstrap( results.cband_crit_value = cband_crit # --- Phase 2: Placebo horizon bootstrap --- + # Note: placebo-horizons are treated as independent single-target + # draws (no sup-t joint-distribution requirement across placebo + # horizons), so each horizon gets its own RNG draw via + # `_bootstrap_one_target`. Under within-group-varying PSU the + # per-horizon cell unroll is used. if placebo_horizon_inputs is not None: pl_ses: Dict[int, float] = {} pl_cis: Dict[int, Tuple[float, float]] = {} pl_pvals: Dict[int, float] = {} - for l_h, (u_h, n_h, eff_h) in sorted(placebo_horizon_inputs.items()): + for l_h, (u_h, n_h, eff_h, u_pp_h) in sorted(placebo_horizon_inputs.items()): if u_h.size > 0 and n_h > 0: + if psu_varies: + if u_pp_h is None: + raise ValueError( + f"Cell-level bootstrap requires " + f"per-cell IF tensor for placebo-" + f"horizon target l={l_h} when PSU " + f"varies within group, got None." + ) + u_boot_plh, map_boot_plh = _unroll_target_to_cells( + u_pp_h, psu_codes_per_cell, + ) + if u_boot_plh.size == 0: + continue + else: + u_boot_plh = u_h + map_boot_plh = _map_for_target( + u_h.size, group_id_to_psu_code, eligible_group_ids, + ) se_h, ci_h, p_h, _ = _bootstrap_one_target( - u_centered=u_h, + u_centered=u_boot_plh, divisor=n_h, original=eff_h, n_bootstrap=self.n_bootstrap, @@ -384,9 +622,7 @@ def _compute_dcdh_bootstrap( rng=rng, context=f"dCDH placebo l={l_h} bootstrap", return_distribution=False, - group_to_psu_map=_map_for_target( - u_h.size, group_id_to_psu_code, eligible_group_ids, - ), + group_to_psu_map=map_boot_plh, ) pl_ses[l_h] = se_h pl_cis[l_h] = ci_h @@ -404,6 +640,117 @@ def _compute_dcdh_bootstrap( # ============================================================================= +def _psu_varies_within_group( + psu_codes_per_cell: Optional[np.ndarray], +) -> bool: + """True when any row of ``psu_codes_per_cell`` has more than one + unique PSU label (ignoring -1 sentinel entries). + + When ``False`` — including the ``None`` case for non-survey fits — + the legacy group-level bootstrap path is invoked. The row-sum + identity ``sum_{c in g} u_cell[c] == u_centered[g]`` established + by ``_cohort_recenter_per_period`` makes the cell-level and + group-level bootstraps statistically equivalent under this regime, + and the group-level path is bit-identical to pre-cell-level + releases through the existing identity-map fast path. + """ + if psu_codes_per_cell is None: + return False + for row in psu_codes_per_cell: + valid = row[row >= 0] + if valid.size > 1 and np.unique(valid).size > 1: + return True + return False + + +def _unroll_target_to_cells( + u_per_period_target: np.ndarray, + psu_codes_per_cell: Optional[np.ndarray], +) -> Tuple[np.ndarray, np.ndarray]: + """Flatten a target's cohort-recentered per-cell IF tensor + its + per-cell PSU map into 1-D arrays, dropping cells with sentinel + PSU code (-1 — zero-weight cells). + + Both inputs must have shape ``(n_eligible_groups, n_periods)``; + the dCDH bootstrap contract guarantees all targets share the + variance-eligible group ordering, so no per-target row subset + is needed. + + Raises ``ValueError`` when any sentinel cell (-1 PSU) carries + non-zero cohort-recentered IF mass. This is a supported-edge- + case guard: under terminal missingness, ``_cohort_recenter_per_period`` + subtracts column means across the full period grid, so a group + with no observation at period ``t`` can acquire non-zero centered + mass at that sentinel cell. The cell-level bootstrap cannot + allocate that mass to any PSU (the cell has no positive-weight + obs), so silently dropping it would under-weight the group's + bootstrap contribution. The analytical TSL path shares the same + cell-period allocator and fires a matching guard in + ``_survey_se_from_group_if``, so both paths reject this regime + consistently. **Documented workaround (bootstrap path):** + pre-process the panel to remove terminal missingness (drop + late-exit groups or trim to a balanced sub-panel). The within- + group-varying-PSU bootstrap has no allocator fallback — unlike + Binder TSL, where using an explicit ``psu=`` routes + the analytical path through the legacy group-level allocator; + that fallback is not available on the bootstrap side because + the cell-level wild PSU bootstrap IS the varying-PSU regime. + + Returns ``(u_cell, psu_cell)`` of shape + ``(n_valid_cells_in_target,)`` each. + """ + if psu_codes_per_cell is None: + raise ValueError( + "_unroll_target_to_cells requires psu_codes_per_cell; " + "caller should only invoke this on the cell-level path." + ) + if u_per_period_target.shape != psu_codes_per_cell.shape: + raise ValueError( + "Cell-level bootstrap shape mismatch: target per-period " + f"IF tensor has shape {u_per_period_target.shape} but " + f"psu_codes_per_cell has shape {psu_codes_per_cell.shape}. " + "The dCDH bootstrap contract requires every target's " + "per-cell tensor to align with the (n_eligible_groups, " + "n_periods) layout of psu_codes_per_cell." + ) + flat_u = u_per_period_target.ravel() + flat_psu = psu_codes_per_cell.ravel() + mask = flat_psu >= 0 + # Sentinel-mass guard: reject terminal-missingness + within-group- + # varying PSU + bootstrap. The cohort-recentering column-subtraction + # at `_cohort_recenter_per_period` can leak non-zero centered mass + # onto cells with no positive-weight obs (missing-cell rows in the + # cohort still get -col_mean added when other rows contribute at + # that column). Dropping that mass silently would under-cluster the + # bootstrap in a supported panel regime. + sentinel_mass = flat_u[~mask] + if sentinel_mass.size > 0 and bool( + np.any(np.abs(sentinel_mass) > 1e-12) + ): + raise ValueError( + "Cell-level bootstrap cannot be computed on this survey " + "panel: cohort-recentered IF mass landed on cells with " + "no positive-weight observations (psu_codes_per_cell == " + "-1). This typically occurs when terminal missingness " + "(groups observed only through some period) combines with " + "within-group-varying PSU: `_cohort_recenter_per_period` " + "subtracts column means across the full period grid, so a " + "group with no observation at period t acquires non-zero " + "centered mass there, which the cell-level allocator " + "cannot allocate to any PSU. The analytical TSL path " + "(`_survey_se_from_group_if`) fires a matching guard on " + "the same regime, so both paths reject this panel " + "consistently. Pre-process the panel to remove terminal " + "missingness (drop late-exit groups or trim to a balanced " + "sub-panel). The within-group-varying-PSU bootstrap has " + "no allocator fallback — unlike Binder TSL, switching to " + "`psu=` does not help here because the varying-" + "PSU bootstrap IS the cell-level path, not an analytical " + "surface with a legacy-allocator alternative." + ) + return flat_u[mask], flat_psu[mask].astype(np.int64, copy=False) + + def _map_for_target( target_size: int, group_id_to_psu_code: Optional[Dict[Any, int]], @@ -423,14 +770,22 @@ def _map_for_target( Returns ``None`` when no PSU information is available (plain multiplier-bootstrap path — identity across targets). + **Group-level granularity only.** This helper is invoked on the + legacy group-level bootstrap path; on the cell-level path + (within-group-varying PSU), callers build the cell-to-PSU map + directly via ``_unroll_target_to_cells`` and do not route through + ``_map_for_target``, so the size-mismatch raise below does not + fire. + Raises ``ValueError`` if ``target_size`` does not match ``len(eligible_group_ids)``: every current dCDH bootstrap target - uses the variance-eligible group ordering, so any size mismatch - signals that a caller introduced a target whose group subset - diverges and should pass its own ``target_group_ids`` rather than - reusing the overall eligible list. Also raises ``ValueError`` if - any group ID is missing from the dict (signaling misalignment - between the target's IF vector and the map's keys). + uses the variance-eligible group ordering on the group-level path, + so any size mismatch signals that a caller introduced a target + whose group subset diverges and should pass its own + ``target_group_ids`` rather than reusing the overall eligible + list. Also raises ``ValueError`` if any group ID is missing from + the dict (signaling misalignment between the target's IF vector + and the map's keys). """ if group_id_to_psu_code is None or eligible_group_ids is None: return None diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index bdcfef34..7bba1215 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -589,7 +589,7 @@ Alternative: Multiplier bootstrap clustered at group via the `n_bootstrap` param - **Note:** When every variance-eligible group forms its own `(D_{g,1}, F_g, S_g)` cohort (a degenerate small-panel case where the cohort framework has zero degrees of freedom), the cohort-recentered plug-in formula is unidentified: cohort recentering subtracts the cohort mean from each group's `U^G_g`, and for singleton cohorts the centered value is exactly zero, so the centered influence function vector collapses to all zeros. The estimator returns `overall_se = NaN` with a `UserWarning` rather than silently collapsing to `0.0` (which would falsely imply infinite precision). The `DID_M` point estimate remains well-defined. The bootstrap path inherits the same degeneracy on these panels — the multiplier weights act on an all-zero vector, so the bootstrap distribution is also degenerate. **Deviation from R `DIDmultiplegtDYN`:** R returns a non-zero SE on the canonical 4-group worked example via small-sample sandwich machinery that Python does not implement. Both responses are valid for a degenerate case; Python's `NaN`+warning is the safer default. To get a non-degenerate SE, include more groups so cohorts have peers (real-world panels typically have `G >> K`). -- **Note (cluster contract):** `ChaisemartinDHaultfoeuille` clusters at the group level by default. The analytical SE plug-in operates on per-group influence-function values (one `U^G_g` per group) and, under the cell-period allocator, on their per-cell decomposition `U[g, t]` which telescopes back to `U^G_g` at the PSU-level sum. The multiplier bootstrap generates one weight per group. The user-facing `cluster=` kwarg is not supported: the constructor accepts `cluster=None` (the default and only supported value); passing any non-`None` value raises `NotImplementedError` at construction time (and the same gate fires from `set_params`) — custom user-specified clustering is reserved for a future phase. **Automatic PSU-level clustering under `survey_design`:** the analytical TSL path supports PSU labels that vary across cells of a group (within-cell constancy required); the multiplier bootstrap still uses the legacy group-level PSU map and raises `NotImplementedError` when combined with within-group-varying PSU (PR 4 lifts this). Under the default auto-inject `psu=group`, PSU coincides with group and the group-level and cell-level paths are identical at PSU-sum granularity (bit-for-bit via the identity-map fast path). The matching test for the `cluster=` gate is `test_cluster_parameter_raises_not_implemented` in `tests/test_chaisemartin_dhaultfoeuille.py::TestForwardCompatGates`. +- **Note (cluster contract):** `ChaisemartinDHaultfoeuille` clusters at the group level by default. The analytical SE plug-in operates on per-group influence-function values (one `U^G_g` per group) and, under the cell-period allocator, on their per-cell decomposition `U[g, t]` which telescopes back to `U^G_g` at the PSU-level sum. The multiplier bootstrap generates one weight per group. The user-facing `cluster=` kwarg is not supported: the constructor accepts `cluster=None` (the default and only supported value); passing any non-`None` value raises `NotImplementedError` at construction time (and the same gate fires from `set_params`) — custom user-specified clustering is reserved for a future phase. **Automatic PSU-level clustering under `survey_design`:** the analytical TSL path supports PSU labels that vary across cells of a group (within-cell constancy required); the multiplier bootstrap supports the same regime via the cell-level wild PSU bootstrap documented in the survey + bootstrap contract Note below. Under PSU-within-group-constant regimes (including the default auto-inject `psu=group` and strictly-coarser PSU with within-group constancy), the bootstrap dispatcher routes through the legacy group-level path so the SE is bit-identical to pre-cell-level releases via the identity-map fast path. The matching test for the `cluster=` gate is `test_cluster_parameter_raises_not_implemented` in `tests/test_chaisemartin_dhaultfoeuille.py::TestForwardCompatGates`. - **Note (bootstrap inference surface):** When `n_bootstrap > 0`, the top-level `results.overall_p_value` / `results.overall_conf_int` (and joiners/leavers analogues) hold **percentile-based bootstrap inference** computed by the multiplier bootstrap, NOT normal-theory recomputations from the bootstrap SE. The t-stat (`overall_t_stat`, etc.) is computed from the SE via `safe_inference()[0]` to satisfy the project's anti-pattern rule (never compute `t = effect / se` inline) — bootstrap does not define an alternative t-stat semantic for percentile bootstrap, so the SE-based t-stat is the natural choice. `event_study_effects[1]`, `summary()`, `to_dataframe()`, `is_significant`, and `significance_stars` all read from these top-level fields and therefore reflect the bootstrap inference automatically. The library precedent for this propagation is `imputation.py:790-805`, `two_stage.py:778-787`, and `efficient_did.py:1009-1013`. The single-period placebo (`L_max=None`) still has NaN bootstrap fields; multi-horizon placebos (`L_max >= 1`) have valid bootstrap SE/CI/p via `placebo_horizon_ses/cis/p_values` on the bootstrap results object. The matching test is `test_bootstrap_p_value_and_ci_propagated_to_top_level` in `tests/test_chaisemartin_dhaultfoeuille.py::TestBootstrap`. @@ -607,7 +607,12 @@ Alternative: Multiplier bootstrap clustered at group via the `n_bootstrap` param - **Note (deviation from R DIDmultiplegtDYN):** Python uses **period-based** stable-control sets — `stable_0(t)` is any cell with `D_{g,t-1} = D_{g,t} = 0` regardless of baseline `D_{g,1}`, and similarly for `stable_1(t)`. R `DIDmultiplegtDYN` uses **cohort-based** stable-control sets that additionally require `D_{g,1}` to match the side. Python's definition matches the AER 2020 Theorem 3 cell-count notation `N_{0,0,t}` and `N_{1,1,t}` literally; R's definition matches the dynamic companion paper's cohort `(D_{g,1}, F_g, S_g)` framework. The two definitions agree exactly on (a) panels containing only joiners, (b) panels containing only leavers, (c) the hand-calculable 4-group worked example, or (d) any panel where no joiner's post-switch state overlaps a period when leavers are switching. They disagree by O(1%) on the **point estimate** when both joiners and leavers exist AND some joiners' post-switch cells could serve as leavers' controls (or vice versa). After the Round 2 fix that implemented the full `Lambda^G_{g,l=1}` influence function, the **standard error** parity gap on pure-direction scenarios narrowed from ~18% to ~3%. The R parity tests in `tests/test_chaisemartin_dhaultfoeuille_parity.py` use a tight `1e-4` tolerance for pure-direction point estimates, 10% rtol for multi-horizon SEs (15% for L_max=5 long panels where the cell-count weighting deviation compounds), 5% rtol for single-horizon SEs, and a 2.5% tolerance for mixed-direction point estimates (with the SE check skipped on mixed scenarios because the period-vs-cohort point-estimate deviation cascades into the variance). -- **Note (deviation from R DIDmultiplegtDYN):** Phase 1 requires panels with a **balanced baseline** (every group observed at the first global period) and **no interior period gaps**. The Step 5b validation in `fit()` enforces this contract: groups missing the baseline raise `ValueError`; groups with interior gaps are dropped with a `UserWarning`; groups with **terminal missingness** (early exit / right-censoring — observed at the baseline but missing one or more later periods) are retained and contribute from their observed periods only. R `DIDmultiplegtDYN` accepts unbalanced panels with documented missing-treatment-before-first-switch handling. Python's restriction is a Phase 1 limitation: the cohort enumeration uses `D_{g,1}` as the canonical baseline (so the baseline observation must exist) and the first-switch detection walks adjacent observed periods (so interior gaps create ambiguous transition counts). Terminal missingness is supported because the per-period `present = (N_mat[:, t] > 0) & (N_mat[:, t-1] > 0)` guard appears at three sites in the variance computation (`_compute_per_period_dids`, `_compute_full_per_group_contributions`, `_compute_cohort_recentered_inputs`) and cleanly masks out missing transitions without propagating NaN into the arithmetic. **Workaround for unbalanced panels:** pre-process your data to back-fill the baseline (or drop late-entry groups before fitting), or use R `DIDmultiplegtDYN` until a future phase lifts the restriction. The Step 5b `ValueError` and `UserWarning` messages name the offending group IDs so you can locate them quickly. +- **Note (deviation from R DIDmultiplegtDYN):** Phase 1 requires panels with a **balanced baseline** (every group observed at the first global period) and **no interior period gaps**. The Step 5b validation in `fit()` enforces this contract: groups missing the baseline raise `ValueError`; groups with interior gaps are dropped with a `UserWarning`; groups with **terminal missingness** (early exit / right-censoring — observed at the baseline but missing one or more later periods) are retained and contribute from their observed periods only. R `DIDmultiplegtDYN` accepts unbalanced panels with documented missing-treatment-before-first-switch handling. Python's restriction is a Phase 1 limitation: the cohort enumeration uses `D_{g,1}` as the canonical baseline (so the baseline observation must exist) and the first-switch detection walks adjacent observed periods (so interior gaps create ambiguous transition counts). Terminal missingness is supported at the POINT-ESTIMATE level because the per-period `present = (N_mat[:, t] > 0) & (N_mat[:, t-1] > 0)` guard appears at three sites in the variance computation (`_compute_per_period_dids`, `_compute_full_per_group_contributions`, `_compute_cohort_recentered_inputs`) and cleanly masks out missing transitions without propagating NaN into the arithmetic. **Scope limitation (terminal missingness under any cell-period-allocator path):** under any survey variance path that uses the cell-period allocator, a targeted `ValueError` is raised when cohort-recentering leaks non-zero centered IF mass onto cells with no positive-weight observations. Affected paths: +- **Binder TSL with within-group-varying PSU** (`n_bootstrap=0`, explicit `psu=` that varies within group). +- **Rao-Wu replicate-weight ATT** (`compute_replicate_if_variance` always reads the cell-allocator `psi_obs` per the Class A contract shipped in PR #323, regardless of PSU structure). +- **Cell-level wild PSU bootstrap** (`n_bootstrap > 0` with within-group-varying PSU). + +The guard is fired by `_survey_se_from_group_if` (analytical and replicate) and by `_unroll_target_to_cells` (bootstrap). **Unaffected paths**: Binder TSL under PSU-within-group-constant regimes (including PSU=group auto-inject) falls back to the legacy group-level allocator where the row-sum identity `sum_{c in g} U_centered_per_period[g, t] == U_centered[g]` makes the two statistically equivalent, and the bootstrap dispatcher routes the same regimes through the legacy group-level path. **Workaround:** pre-process the panel to remove terminal missingness (drop late-exit groups or trim to a balanced sub-panel). For Binder TSL, using an explicit `psu=` routes through the legacy group allocator. For replicate ATT and within-group-varying-PSU bootstrap, there is no allocator fallback — the panel itself must be pre-processed. The broader unbalanced-panel workaround (back-fill the baseline or drop late-entry groups before fitting, or use R `DIDmultiplegtDYN`) also applies. The Step 5b `ValueError` and `UserWarning` messages name the offending group IDs so you can locate them quickly. - **Note (Phase 3 DID^X covariate adjustment):** When `controls` is set, `per_period_effects` (the Phase 1 per-period DID_M decomposition) remains **unadjusted** (computed on raw outcomes). The covariate residualization applies only to the per-group `DID_{g,l}` path (`L_max >= 1`), which produces `event_study_effects` and `overall_att`. This means `per_period_effects` and `event_study_effects[1]` may diverge when controls are active - by design (the per-period path uses binary joiner/leaver categorization and is not part of the DID^X contract). Implements the residualization-style covariate adjustment from Web Appendix Section 1.2 (Assumption 11). For each baseline treatment value `d`, estimates `theta_hat_d` via OLS of first-differenced outcomes on first-differenced covariates with time FEs, restricted to not-yet-treated observations. Residualizes at levels: `Y_tilde[g,t] = Y[g,t] - X[g,t] @ theta_hat_d`. All downstream DID computations use residualized outcomes. This is NOT doubly-robust, NOT IPW, NOT Callaway-Sant'Anna-style. Plug-in IF (treating `theta_hat` as fixed) is valid by FWL theorem. **Deviation from R `DIDmultiplegtDYN`:** The first-stage OLS uses equal cell weights (one observation per `(g,t)` cell), consistent with the library's cell-count weighting convention documented in Phase 1. R weights by `N_gt` (observation count per cell). On panels with 1 observation per cell (the common case), results are identical. When baseline-specific first stages fail (`n_obs = 0` or `n_obs < n_params`), the affected strata are excluded from the estimation (outcomes set to NaN) rather than retained unadjusted - matching R's "drop failed strata" behavior. Requires `L_max >= 1`. Activated via `controls=["col1", "col2"]` in `fit()`. @@ -615,7 +620,7 @@ Alternative: Multiplier bootstrap clustered at group via the `n_bootstrap` param - **Note (Phase 3 state-set trends):** Implements state-set-specific trends from Web Appendix Section 1.4 (Assumptions 13-14). Restricts the control pool for each switcher to groups in the same set (e.g., same state in county-level data). The restriction applies in all four DID/IF paths: `_compute_multi_horizon_dids()`, `_compute_per_group_if_multi_horizon()`, `_compute_multi_horizon_placebos()`, and `_compute_per_group_if_placebo_horizon()`. Cohort structure stays as `(D_{g,1}, F_g, S_g)` triples (does not incorporate set membership). Set membership must be time-invariant per group. **Note on Assumption 14 (common support):** The paper requires a common last-untreated period across sets (`T_u^s` equal for all `s`). This implementation does NOT enforce Assumption 14 up front. Instead, when within-set controls are exhausted at a given horizon (because a set has shorter untreated support than others), the affected switcher/horizon pairs are silently excluded via the existing empty-control-pool mechanism. This means `N_l` may be smaller under `trends_nonparam` than without it, and the effective estimand is trimmed to the within-set support at each horizon. The existing multi-horizon A11 warning fires when exclusions occur. Activated via `trends_nonparam="state_column"` in `fit()`. -- **Note (Phase 3 heterogeneity testing - partial implementation):** Partial implementation of the heterogeneity test from Web Appendix Section 1.5 (Assumption 15, Lemma 7). Computes post-treatment saturated OLS regressions of `S_g * (Y_{g, F_g-1+l} - Y_{g, F_g-1})` on a time-invariant covariate `X_g` plus cohort indicator dummies. Standard OLS inference is valid (paper shows no DID error correction needed). **Deviation from R `predict_het`:** R's full `predict_het` option additionally computes placebo regressions and a joint null test, and disallows combination with `controls`. This implementation provides only post-treatment regressions. **Rejected combinations:** `controls` (matching R), `trends_linear` (heterogeneity test uses raw level changes, incompatible with second-differenced outcomes), and `trends_nonparam` (heterogeneity test does not thread state-set control-pool restrictions). Results stored in `results.heterogeneity_effects`. Activated via `heterogeneity="covariate_column"` in `fit()`. **Note (survey support):** Under `survey_design`, heterogeneity uses WLS with per-group weights `W_g = sum of obs-level survey weights in group g`, and the group-level WLS coefficient influence function is `ψ_g[X] = inv(X'WX)[1,:] @ x_g * W_g * r_g`. The group-level IF is then attributed to observation level via **one of two allocators, chosen by variance helper** so each path preserves byte-identity for its aggregation rule: (1) **Binder TSL** (`compute_survey_if_variance`) uses the **cell-period single-cell allocator** — at each horizon `l_h`, `ψ_g` is assigned in full to the post-period cell `(g, out_idx)` with `out_idx = first_switch_idx[g] - 1 + l_h` and expanded as `ψ_i = ψ_g * (w_i / W_{g, out_idx})` for obs in that cell, zero elsewhere (matches the DID_l post-period convention in the Survey IF expansion Note below). Under PSU=group per-observation distribution differs from the legacy `ψ_i = ψ_g * (w_i / W_g)`, but PSU-level aggregates telescope to the same `ψ_g` — so Binder TSL variance is byte-identical to the pre-cell-period release under PSU=group. Under within-group-varying PSU mass lands in the post-period PSU of the transition, which is what Binder TSL needs. An **empty post-period cell under zero-weight obs** (all obs at `(g, out_idx)` have `w_i = 0` despite `N > 0`) drops the group's contribution, matching the ATT cell allocator's convention; the pre-cell-period path diverged here by redistributing mass to other cells of the group. (2) **Rao-Wu replicate** (`compute_replicate_if_variance`) uses the **legacy group-level allocator** `ψ_i = ψ_g * (w_i / W_g)`. Replicate variance computes `θ_r = sum_i ratio_ir * ψ_i` at the observation level, so moving `ψ_g` mass onto the post-period cell only would silently change the replicate SE whenever a replicate column's ratios vary within group (the library accepts arbitrary per-row replicate matrices, not just PSU-aligned ones). Keeping the legacy allocator on this branch preserves byte-identity of replicate SE across every previously-supported fit; replicate + within-group-varying PSU is unreachable by construction (`SurveyDesign` rejects `replicate_weights` combined with explicit `strata/psu/fpc`). Inference uses the t-distribution with `df_survey` when provided. Under rank deficiency (any regression coefficient dropped by `solve_ols`'s R-style drop), all inference fields return NaN (conservative, matches the NaN-consistent contract). **Library extension (replicate weights):** Under a replicate-weight design (BRR/Fay/JK1/JKn/SDR), the heterogeneity regression dispatches to `compute_replicate_if_variance` (Rao-Wu weight-ratio rescaling) instead of the Binder TSL formula. The effective df is the shared `min(resolved_survey.df_survey, min(n_valid_across_sites) - 1)` used by the rest of the dCDH surfaces; if the base `df_survey` is undefined (QR-rank ≤ 1), heterogeneity inference is NaN regardless of the local `n_valid_het` (matching the dCDH top-level contract — per-site `n_valid` cannot rescue a rank-deficient design). **Library extension:** R `DIDmultiplegtDYN::predict_het` does not natively support survey weights. **Scope note (bootstrap):** Combining `heterogeneity=` with `n_bootstrap > 0` and within-group-varying PSU still raises `NotImplementedError` — the PSU-level Hall-Mammen wild bootstrap uses a group-level PSU map until the follow-up PR extends it. +- **Note (Phase 3 heterogeneity testing - partial implementation):** Partial implementation of the heterogeneity test from Web Appendix Section 1.5 (Assumption 15, Lemma 7). Computes post-treatment saturated OLS regressions of `S_g * (Y_{g, F_g-1+l} - Y_{g, F_g-1})` on a time-invariant covariate `X_g` plus cohort indicator dummies. Standard OLS inference is valid (paper shows no DID error correction needed). **Deviation from R `predict_het`:** R's full `predict_het` option additionally computes placebo regressions and a joint null test, and disallows combination with `controls`. This implementation provides only post-treatment regressions. **Rejected combinations:** `controls` (matching R), `trends_linear` (heterogeneity test uses raw level changes, incompatible with second-differenced outcomes), and `trends_nonparam` (heterogeneity test does not thread state-set control-pool restrictions). Results stored in `results.heterogeneity_effects`. Activated via `heterogeneity="covariate_column"` in `fit()`. **Note (survey support):** Under `survey_design`, heterogeneity uses WLS with per-group weights `W_g = sum of obs-level survey weights in group g`, and the group-level WLS coefficient influence function is `ψ_g[X] = inv(X'WX)[1,:] @ x_g * W_g * r_g`. The group-level IF is then attributed to observation level via **one of two allocators, chosen by variance helper** so each path preserves byte-identity for its aggregation rule: (1) **Binder TSL** (`compute_survey_if_variance`) uses the **cell-period single-cell allocator** — at each horizon `l_h`, `ψ_g` is assigned in full to the post-period cell `(g, out_idx)` with `out_idx = first_switch_idx[g] - 1 + l_h` and expanded as `ψ_i = ψ_g * (w_i / W_{g, out_idx})` for obs in that cell, zero elsewhere (matches the DID_l post-period convention in the Survey IF expansion Note below). Under PSU=group per-observation distribution differs from the legacy `ψ_i = ψ_g * (w_i / W_g)`, but PSU-level aggregates telescope to the same `ψ_g` — so Binder TSL variance is byte-identical to the pre-cell-period release under PSU=group. Under within-group-varying PSU mass lands in the post-period PSU of the transition, which is what Binder TSL needs. An **empty post-period cell under zero-weight obs** (all obs at `(g, out_idx)` have `w_i = 0` despite `N > 0`) drops the group's contribution, matching the ATT cell allocator's convention; the pre-cell-period path diverged here by redistributing mass to other cells of the group. (2) **Rao-Wu replicate** (`compute_replicate_if_variance`) uses the **legacy group-level allocator** `ψ_i = ψ_g * (w_i / W_g)`. Replicate variance computes `θ_r = sum_i ratio_ir * ψ_i` at the observation level, so moving `ψ_g` mass onto the post-period cell only would silently change the replicate SE whenever a replicate column's ratios vary within group (the library accepts arbitrary per-row replicate matrices, not just PSU-aligned ones). Keeping the legacy allocator on this branch preserves byte-identity of replicate SE across every previously-supported fit; replicate + within-group-varying PSU is unreachable by construction (`SurveyDesign` rejects `replicate_weights` combined with explicit `strata/psu/fpc`). Inference uses the t-distribution with `df_survey` when provided. Under rank deficiency (any regression coefficient dropped by `solve_ols`'s R-style drop), all inference fields return NaN (conservative, matches the NaN-consistent contract). **Library extension (replicate weights):** Under a replicate-weight design (BRR/Fay/JK1/JKn/SDR), the heterogeneity regression dispatches to `compute_replicate_if_variance` (Rao-Wu weight-ratio rescaling) instead of the Binder TSL formula. The effective df is the shared `min(resolved_survey.df_survey, min(n_valid_across_sites) - 1)` used by the rest of the dCDH surfaces; if the base `df_survey` is undefined (QR-rank ≤ 1), heterogeneity inference is NaN regardless of the local `n_valid_het` (matching the dCDH top-level contract — per-site `n_valid` cannot rescue a rank-deficient design). **Library extension:** R `DIDmultiplegtDYN::predict_het` does not natively support survey weights. **Scope note (bootstrap):** Heterogeneity inference is analytical (no bootstrap path). When `n_bootstrap > 0` is combined with `heterogeneity=`, the main ATT surfaces receive bootstrap SE/CI (via the cell-level wild PSU bootstrap described in the survey + bootstrap contract Note below) while `heterogeneity_effects` continues to use the Binder TSL / Rao-Wu analytical SE described above. No gate; the two inference paths are independent. - **Note (HonestDiD integration):** HonestDiD sensitivity analysis (Rambachan & Roth 2023) is available on the placebo + event study surface via `honest_did=True` in `fit()` or `compute_honest_did(results)` post-hoc. **Library extension:** dCDH HonestDiD uses `DID^{pl}_l` placebo estimates as pre-period coefficients rather than standard event-study pre-treatment coefficients. The Rambachan-Roth restrictions bound violations of the parallel trends assumption underlying the dCDH placebo estimand; interpretation differs from canonical event-study HonestDiD. A `UserWarning` is emitted at runtime. Uses diagonal variance (no full VCV available for dCDH). Relative magnitudes (DeltaRM) with Mbar=1.0 is the default when called from `fit()`, targeting the equal-weight average over all post-treatment horizons (`l_vec=None`). R's HonestDiD defaults to the first post/on-impact effect; use `compute_honest_did(results, ...)` with a custom `l_vec` to match that behavior. When `trends_linear=True`, bounds apply to the second-differenced estimand (parallel trends in first differences). Requires `L_max >= 1` for multi-horizon placebos. Gaps in the horizon grid from `trends_nonparam` support-trimming are handled by filtering to the largest consecutive block and warning. @@ -649,8 +654,8 @@ Alternative: Multiplier bootstrap clustered at group via the `n_bootstrap` param - [x] Design-2 switch-in/switch-out descriptive wrapper (Web Appendix Section 1.6) - [x] HonestDiD (Rambachan-Roth 2023) integration on placebo + event study surface - [x] Survey design support: pweight with strata/PSU/FPC via Taylor Series Linearization (analytical) **or replicate-weight variance (BRR/Fay/JK1/JKn/SDR)**, covering the main ATT surface, covariate adjustment (DID^X), heterogeneity testing, the TWFE diagnostic (fit and standalone `twowayfeweights()` helper), and HonestDiD bounds. Opt-in **PSU-level Hall-Mammen wild bootstrap** is also supported via `n_bootstrap > 0`. -- **Note (Survey IF expansion — library convention):** Survey IF expansion is a library extension not in the dCDH papers (the paper's plug-in variance assumes iid sampling). The library convention builds observation-level `psi_i` by proportionally distributing per-group IF mass within weight share: either at the group level (`psi_i = U_centered[g] * w_i / W_g`, the previous convention) or at the per-`(g, t)` cell level via the cell-period allocator shipped in this release. Cell-level expansion: decompose `U[g]` into per-period attributions `U[g, t]`, cohort-center each column independently, then expand to observation level as `psi_i = U_centered_per_period[g_i, t_i] * (w_i / W_{g_i, t_i})`. Binder (1983) stratified-PSU variance aggregates the resulting `psi` at PSU level. **Post-period attribution convention:** each transition term in the IF sum (of the form `role_weight * (Y_{g, t} - Y_{g, t-1})` for DID_M or `S_g * (Y_{g, out} - Y_{g, ref})` for DID_l) is attributed as a single *difference* to the POST-period cell, not split into a `+Y_post` / `-Y_pre` pair across two cells. This is a library *convention*, not a theorem — adopted because it preserves the group-sum, PSU-sum, and cohort-sum identities of the previous group-level expansion (so Binder variance coincides with the group-level variance under the auto-injected `psu=group`) and because Monte Carlo coverage at nominal 95% is empirically close to nominal on a DGP where PSUs vary across the cells of each group (see `tests/test_dcdh_cell_period_coverage.py`). A covariance-aware two-cell allocator is a plausible alternative and may be worth exploring if future designs motivate an explicit observation-level IF derivation; the method currently in the library is **not derived from the observation-level survey linearization of the contrast** and makes no stronger claim than "coverage is approximately nominal under the tested DGPs and the group-sum identity holds exactly." Under within-group-constant PSU (the pre-allocator accepted input), per-cell sums telescope to `U_centered[g]` and Binder variance is byte-identical (up to single-ULP floating-point noise) to the previous group-level expansion. **Strata and PSU must be constant within each `(g, t)` cell** (trivially satisfied in one-obs-per-cell panels — the canonical dCDH structure); variation **across cells of a group** is supported by the allocator. Within-group-varying **weights** are supported as before. When `survey_design.psu` is not specified, `fit()` auto-injects `psu=` so the TSL variance, `df_survey`, and t-based inference match the per-group PSU structure. **Strata that vary across cells of a group require either an explicit `psu=` or the original `SurveyDesign(..., nest=True)` flag** — under `nest=True` the resolver combines `(stratum, psu)` into globally-unique labels, so the auto-injected `psu=` is re-labeled per stratum and the cell allocator proceeds. Only the `nest=False` + varying-strata + omitted-psu combination is rejected up front with a targeted `ValueError` at `fit()` time (the synthesized PSU column would reuse group labels across strata and trip the cross-stratum PSU uniqueness check in `SurveyDesign.resolve()`). Under replicate-weight designs, the same cell-level `psi_i` is aggregated via Rao-Wu weight-ratio rescaling (`compute_replicate_if_variance` at `diff_diff/survey.py:1681`) rather than the Binder TSL formula. All five methods (BRR/Fay/JK1/JKn/SDR) are supported method-agnostically through the unified helper; the effective `df_survey` is reduced to `min(n_valid) - 1` across IF sites when some replicate solves fail (matching `efficient_did.py:1133-1135` and `triple_diff.py:676-686` precedents). Under DID^X, the first-stage residualization coefficient `theta_hat` is computed once on full-sample weights and treated as fixed (FWL plug-in IF convention) — per-replicate refits of `theta_hat` are not performed. **Post-period attribution extends to heterogeneity (Binder TSL branch only):** the heterogeneity WLS coefficient IF `ψ_g = inv(X'WX)[1,:] @ x_g * W_g * r_g` is attributed in full to the single post-period cell `(g, out_idx)` at each horizon (same single-cell convention as DID_l), then expanded as `ψ_i = ψ_g * (w_i / W_{g, out_idx})`, and fed through `compute_survey_if_variance`. Under PSU=group the PSU-level aggregate telescopes to `ψ_g`, so Binder variance is byte-identical relative to the pre-cell-period release; under within-group-varying PSU mass lands in the post-period PSU. **Replicate-weight branch keeps the legacy group-level allocator** `ψ_i = ψ_g * (w_i / W_g)` because `compute_replicate_if_variance` computes `θ_r = sum_i ratio_ir * ψ_i` at observation level and is therefore not PSU-telescoping: redistributing mass onto the post-period cell would silently change the replicate SE whenever a replicate column's ratios vary within a group (the library accepts arbitrary per-row replicate matrices, not just PSU-aligned ones). The legacy allocator preserves byte-identity of the replicate SE for every previously-supported fit. Replicate + within-group-varying PSU is unreachable by construction (`SurveyDesign` rejects `replicate_weights` combined with explicit `strata/psu/fpc`). **Scope limitation (follow-up PR):** `n_bootstrap > 0` combined with within-group-varying PSU raises `NotImplementedError` — the PSU-level Hall-Mammen wild bootstrap still uses the legacy group-level PSU map, to be extended in a follow-up PR. -- **Note (survey + bootstrap contract):** When `survey_design` and `n_bootstrap > 0` are both active, the bootstrap uses Hall-Mammen wild multiplier weights (Rademacher/Mammen/Webb) **at the PSU level**. Under the default auto-injected `psu=group`, the PSU coincides with the group so the wild bootstrap is a clean group-level clustered bootstrap (identity-map fast path, bit-identical to the non-survey multiplier bootstrap). When the user passes an explicit strictly-coarser PSU (e.g., `psu=state` with groups at county level), the IF contributions of all groups within a PSU receive the same bootstrap multiplier — the standard Hall-Mammen wild PSU bootstrap. Strata do not participate in the bootstrap randomization (they contribute only through the analytical TSL variance); this is conservative when strata differ substantially in variance. A `UserWarning` fires only when PSU is strictly coarser than group. **Scope note (cell-period allocator):** The PSU-level bootstrap uses a group-level `group_id_to_psu_code` map and therefore requires PSU to be constant within each group. Combining `n_bootstrap > 0` with a PSU that varies within group raises `NotImplementedError`; the cell-level Hall-Mammen extension is deferred to a follow-up PR. The analytical TSL variance fully supports within-group-varying PSU via the cell-period allocator — use `n_bootstrap=0` for those designs. **Replicate-weight designs and `n_bootstrap > 0` are mutually exclusive** (replicate variance is closed-form; bootstrap would double-count variance) — the combination raises `NotImplementedError`, matching `efficient_did.py:989`, `staggered.py:1869`, `two_stage.py:251-253`. For HonestDiD bounds under replicate weights, the replicate-effective `df_survey = min(resolved_survey.df_survey, min(n_valid_across_sites) - 1)` propagates to t-critical values — capped by the design's QR-rank-based df so a rank-deficient replicate matrix never produces a larger effective df than the design supports. When `resolved_survey.df_survey` is undefined (QR-rank ≤ 1), the effective df stays `None` and all inference fields (including HonestDiD bounds) are NaN — per-site `n_valid` cannot rescue a rank-deficient design. +- **Note (Survey IF expansion — library convention):** Survey IF expansion is a library extension not in the dCDH papers (the paper's plug-in variance assumes iid sampling). The library convention builds observation-level `psi_i` by proportionally distributing per-group IF mass within weight share: either at the group level (`psi_i = U_centered[g] * w_i / W_g`, the previous convention) or at the per-`(g, t)` cell level via the cell-period allocator shipped in this release. Cell-level expansion: decompose `U[g]` into per-period attributions `U[g, t]`, cohort-center each column independently, then expand to observation level as `psi_i = U_centered_per_period[g_i, t_i] * (w_i / W_{g_i, t_i})`. Binder (1983) stratified-PSU variance aggregates the resulting `psi` at PSU level. **Post-period attribution convention:** each transition term in the IF sum (of the form `role_weight * (Y_{g, t} - Y_{g, t-1})` for DID_M or `S_g * (Y_{g, out} - Y_{g, ref})` for DID_l) is attributed as a single *difference* to the POST-period cell, not split into a `+Y_post` / `-Y_pre` pair across two cells. This is a library *convention*, not a theorem — adopted because it preserves the group-sum, PSU-sum, and cohort-sum identities of the previous group-level expansion (so Binder variance coincides with the group-level variance under the auto-injected `psu=group`) and because Monte Carlo coverage at nominal 95% is empirically close to nominal on a DGP where PSUs vary across the cells of each group (see `tests/test_dcdh_cell_period_coverage.py`). A covariance-aware two-cell allocator is a plausible alternative and may be worth exploring if future designs motivate an explicit observation-level IF derivation; the method currently in the library is **not derived from the observation-level survey linearization of the contrast** and makes no stronger claim than "coverage is approximately nominal under the tested DGPs and the group-sum identity holds exactly." Under within-group-constant PSU (the pre-allocator accepted input), per-cell sums telescope to `U_centered[g]` and Binder variance is byte-identical (up to single-ULP floating-point noise) to the previous group-level expansion. **Strata and PSU must be constant within each `(g, t)` cell** (trivially satisfied in one-obs-per-cell panels — the canonical dCDH structure); variation **across cells of a group** is supported by the allocator. Within-group-varying **weights** are supported as before. When `survey_design.psu` is not specified, `fit()` auto-injects `psu=` so the TSL variance, `df_survey`, and t-based inference match the per-group PSU structure. **Strata that vary across cells of a group require either an explicit `psu=` or the original `SurveyDesign(..., nest=True)` flag** — under `nest=True` the resolver combines `(stratum, psu)` into globally-unique labels, so the auto-injected `psu=` is re-labeled per stratum and the cell allocator proceeds. Only the `nest=False` + varying-strata + omitted-psu combination is rejected up front with a targeted `ValueError` at `fit()` time (the synthesized PSU column would reuse group labels across strata and trip the cross-stratum PSU uniqueness check in `SurveyDesign.resolve()`). Under replicate-weight designs, the same cell-level `psi_i` is aggregated via Rao-Wu weight-ratio rescaling (`compute_replicate_if_variance` at `diff_diff/survey.py:1681`) rather than the Binder TSL formula. All five methods (BRR/Fay/JK1/JKn/SDR) are supported method-agnostically through the unified helper; the effective `df_survey` is reduced to `min(n_valid) - 1` across IF sites when some replicate solves fail (matching `efficient_did.py:1133-1135` and `triple_diff.py:676-686` precedents). Under DID^X, the first-stage residualization coefficient `theta_hat` is computed once on full-sample weights and treated as fixed (FWL plug-in IF convention) — per-replicate refits of `theta_hat` are not performed. **Post-period attribution extends to heterogeneity (Binder TSL branch only):** the heterogeneity WLS coefficient IF `ψ_g = inv(X'WX)[1,:] @ x_g * W_g * r_g` is attributed in full to the single post-period cell `(g, out_idx)` at each horizon (same single-cell convention as DID_l), then expanded as `ψ_i = ψ_g * (w_i / W_{g, out_idx})`, and fed through `compute_survey_if_variance`. Under PSU=group the PSU-level aggregate telescopes to `ψ_g`, so Binder variance is byte-identical relative to the pre-cell-period release; under within-group-varying PSU mass lands in the post-period PSU. **Replicate-weight branch keeps the legacy group-level allocator** `ψ_i = ψ_g * (w_i / W_g)` because `compute_replicate_if_variance` computes `θ_r = sum_i ratio_ir * ψ_i` at observation level and is therefore not PSU-telescoping: redistributing mass onto the post-period cell would silently change the replicate SE whenever a replicate column's ratios vary within a group (the library accepts arbitrary per-row replicate matrices, not just PSU-aligned ones). The legacy allocator preserves byte-identity of the replicate SE for every previously-supported fit. Replicate + within-group-varying PSU is unreachable by construction (`SurveyDesign` rejects `replicate_weights` combined with explicit `strata/psu/fpc`). +- **Note (survey + bootstrap contract):** When `survey_design` and `n_bootstrap > 0` are both active, the bootstrap uses Hall-Mammen wild multiplier weights (Rademacher/Mammen/Webb) **at the PSU level**. Under the default auto-injected `psu=group`, the PSU coincides with the group so the wild bootstrap is a clean group-level clustered bootstrap (identity-map fast path, bit-identical to the non-survey multiplier bootstrap). When the user passes an explicit strictly-coarser PSU (e.g., `psu=state` with groups at county level), the IF contributions of all groups within a PSU receive the same bootstrap multiplier — the standard Hall-Mammen wild PSU bootstrap. Strata do not participate in the bootstrap randomization (they contribute only through the analytical TSL variance); this is conservative when strata differ substantially in variance. A `UserWarning` fires only when PSU is strictly coarser than group. **Cell-level wild PSU bootstrap under within-group-varying PSU:** when the PSU varies across the cells of a group, the bootstrap switches to a cell-level allocator: each `(g, t)` cell draws its multiplier from `w[psu(cell)]` via the per-cell PSU map `psu_codes_per_cell` (shape `(n_eligible_groups, n_periods)`, -1 sentinel for zero-weight cells). The bootstrap statistic becomes `theta_r = sum_c w[psu(c)] * u_centered_pp[c] / divisor` using the cohort-recentered per-cell IF `U_centered_per_period`. Under PSU-within-group-constant regimes (including PSU=group and strictly-coarser PSU with within-group constancy), the per-cell sum telescopes to the group-level form via the row-sum identity `sum_{c in g} U_centered_per_period[g, t] == U_centered[g]` (enforced by `_cohort_recenter_per_period`). A dispatcher in `_compute_dcdh_bootstrap` detects within-group-constancy and routes those regimes through the legacy group-level bootstrap path so their SE is **bit-identical** to the pre-cell-level release (guarded primarily by `test_bootstrap_se_matches_pre_pr4_baseline` and by the existing `test_auto_inject_bit_identical_to_group_level`). Under within-group-varying PSU, a group contributing cells to PSUs `p1, p2, ...` receives independent multiplier draws per PSU — the correct Hall-Mammen wild PSU clustering at cell granularity. **Multi-horizon bootstraps** draw a single shared `(n_bootstrap, n_psu)` PSU-level weight matrix per block and broadcast per-horizon via each horizon's cell-to-PSU map, so the sup-t simultaneous confidence band remains a valid joint distribution across horizons. **Library extension** — R `DIDmultiplegtDYN` does not support survey designs, so "deviation from R" does not apply. **Scope note (terminal missingness + any cell-period-allocator path):** see the balanced-baseline Note above for the full carve-out. In brief: when a terminally-missing group is in a cohort whose other groups still contribute at the missing period, `_cohort_recenter_per_period` leaks non-zero centered IF mass onto cells with no positive-weight observations. The targeted `ValueError` fires from every survey variance path that uses the cell-period allocator: Binder TSL with within-group-varying PSU, Rao-Wu replicate ATT (which always uses the cell allocator), and the cell-level wild PSU bootstrap. Pre-process the panel to remove terminal missingness, or (for Binder TSL only) use an explicit `psu=` so the analytical path routes through the legacy group-level allocator. **Replicate-weight designs and `n_bootstrap > 0` are mutually exclusive** (replicate variance is closed-form; bootstrap would double-count variance) — the combination raises `NotImplementedError`, matching `efficient_did.py:989`, `staggered.py:1869`, `two_stage.py:251-253`. For HonestDiD bounds under replicate weights, the replicate-effective `df_survey = min(resolved_survey.df_survey, min(n_valid_across_sites) - 1)` propagates to t-critical values — capped by the design's QR-rank-based df so a rank-deficient replicate matrix never produces a larger effective df than the design supports. When `resolved_survey.df_survey` is undefined (QR-rank ≤ 1), the effective df stays `None` and all inference fields (including HonestDiD bounds) are NaN — per-site `n_valid` cannot rescue a rank-deficient design. --- diff --git a/tests/test_dcdh_bootstrap_cell_period_coverage.py b/tests/test_dcdh_bootstrap_cell_period_coverage.py new file mode 100644 index 00000000..b31283c3 --- /dev/null +++ b/tests/test_dcdh_bootstrap_cell_period_coverage.py @@ -0,0 +1,244 @@ +"""Monte Carlo coverage simulation for the cell-level wild PSU bootstrap +in ChaisemartinDHaultfoeuille (PR 4). + +Validates empirical coverage of the bootstrap confidence intervals under +a DGP with PSU that varies across cells of the same group — the regime +unlocked by PR 4's replacement of the group-level PSU map with a +cell-level map. Under PSU-within-group-constant the dispatcher routes +through the legacy bootstrap (covered by the pre-PR-4 test suite), so +the coverage check here exercises only the new cell-level code path. + +Asserts coverage at three surfaces, each covering a distinct code path: + +1. Overall DID_M bootstrap CI (`res.bootstrap_results.overall_ci`) + — single-target cell-level branch. +2. Event-study **horizon-1** CI (`res.bootstrap_results.event_study_cis[1]`) + — first horizon of the shared-PSU-weight multi-horizon block. +3. Event-study **horizon-2** CI + sup-t `cband_crit_value` finiteness + — exercises the cross-horizon shared-draw machinery that + guarantees sup-t joint distribution validity. At L_max >= 2 the + shared (n_bootstrap, n_psu) PSU-level weight matrix must be drawn + ONCE and reused across horizons; a regression where each horizon + re-draws weights would break the sup-t coherence and the finite + critical value check below would surface it. + +Marked ``slow`` and excluded from the default pytest run. To execute: + + pytest tests/test_dcdh_bootstrap_cell_period_coverage.py -m slow -v +""" + +from __future__ import annotations + +import warnings + +import numpy as np +import pandas as pd +import pytest + +from diff_diff.chaisemartin_dhaultfoeuille import ChaisemartinDHaultfoeuille +from diff_diff.survey import SurveyDesign + + +def _simulate_panel( + n_groups: int, + n_periods: int, + first_treated_period: int, + tau: float, + rng: np.random.Generator, + psu_sigma: float = 0.5, + obs_sigma: float = 1.0, + group_sigma: float = 1.0, +) -> pd.DataFrame: + """Generate a one-obs-per-cell panel with within-group-varying PSU + (PSU = period parity per group) so the cell-level bootstrap is + exercised. Matches the DGP shape of `test_dcdh_cell_period_coverage.py`. + """ + groups = np.arange(n_groups) + treated = groups < n_groups // 2 + group_fe = rng.normal(0.0, group_sigma, size=n_groups) + # (group, psu) effect — constant within (g, parity), drives + # within-group residual correlation that Binder variance + wild + # PSU bootstrap should capture. + psu_fe = rng.normal(0.0, psu_sigma, size=(n_groups, 2)) + + rows = [] + for g in groups: + for t in range(n_periods): + parity = int(t % 2) + # Per-(group, parity) PSU codes: 2 * n_groups distinct PSUs + # so Binder / bootstrap see cell granularity rather than + # two global PSUs reused across groups. + psu_id = int(g) * 2 + parity + d = 1 if (treated[g] and t >= first_treated_period) else 0 + y = ( + group_fe[g] + + 0.1 * t + + tau * d + + psu_fe[g, parity] + + rng.normal(0.0, obs_sigma) + ) + rows.append({ + "group": int(g), + "period": int(t), + "treatment": int(d), + "outcome": float(y), + "psu": psu_id, + "pw": 1.0, + }) + return pd.DataFrame(rows) + + +@pytest.mark.slow +def test_bootstrap_cell_period_coverage_varying_psu(): + """Empirical 95% coverage for bootstrap CIs at BOTH the overall + DID_M and per-horizon event-study surfaces on a DGP with + within-group-varying PSU. Tolerance ±2.5pp mirrors the analytical + coverage test; per-horizon coverage guards the shared-PSU-weight + machinery from CRITICAL #2 of the PR 4 plan review. + """ + n_reps = 500 + n_groups = 40 + n_periods = 6 + first_treated_period = 3 + tau_true = 2.0 + + rng = np.random.default_rng(20260419) + covered_overall = 0 + covered_h1 = 0 + covered_h2 = 0 + cband_finite = 0 + failed = 0 + + for r in range(n_reps): + df = _simulate_panel( + n_groups=n_groups, + n_periods=n_periods, + first_treated_period=first_treated_period, + tau=tau_true, + rng=rng, + ) + sd = SurveyDesign(weights="pw", psu="psu") + try: + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + # n_bootstrap=500 keeps internal bootstrap Monte-Carlo + # noise well below the across-reps variance (at B=500 + # the percentile-CI endpoints are stable to ~0.3pp per + # Efron-Tibshirani §13.3), so the across-reps coverage + # mostly reflects the sampling-distribution / bootstrap- + # consistency question rather than bootstrap MC noise. + # L_max=2 exercises the shared-PSU-weight multi-horizon + # block (a single `(n_bootstrap, n_psu)` weight matrix + # is drawn once and broadcast per-horizon via each + # horizon's cell-to-PSU map). L_max=1 would collapse to + # a single target and never exercise the cross-horizon + # shared-draw machinery. + # + # n_bootstrap=1000 keeps internal bootstrap MC noise + # below ~0.3pp per CI endpoint; the percentile-CI + # coverage at horizon-2 (where the shared-weight + # broadcast is exercised) is finite-sample-sensitive + # and B=500 would risk a spurious edge-of-band miss. + res = ChaisemartinDHaultfoeuille( + n_bootstrap=1000, seed=r + 1, + ).fit( + df, + outcome="outcome", group="group", + time="period", treatment="treatment", + survey_design=sd, L_max=2, + ) + except Exception: + failed += 1 + continue + + if res.bootstrap_results is None: + failed += 1 + continue + + # Overall bootstrap CI. + overall_ci = res.bootstrap_results.overall_ci + if overall_ci is None or not all(np.isfinite(overall_ci)): + failed += 1 + continue + lo_o, hi_o = float(overall_ci[0]), float(overall_ci[1]) + if lo_o <= tau_true <= hi_o: + covered_overall += 1 + + # Horizon-1 and horizon-2 bootstrap CIs (guard the shared- + # PSU-weight multi-horizon path). Horizon-2 in particular + # requires the SAME shared PSU weight matrix drawn once at + # the top of the multi-horizon block; a per-horizon re-draw + # would break the sup-t joint-distribution guarantee and + # `cband_crit_value` would be undefined or wrong. + es_cis = res.bootstrap_results.event_study_cis + if es_cis is not None: + if 1 in es_cis: + h1_ci = es_cis[1] + if h1_ci is not None and all(np.isfinite(h1_ci)): + lo_h, hi_h = float(h1_ci[0]), float(h1_ci[1]) + if lo_h <= tau_true <= hi_h: + covered_h1 += 1 + if 2 in es_cis: + h2_ci = es_cis[2] + if h2_ci is not None and all(np.isfinite(h2_ci)): + lo2, hi2 = float(h2_ci[0]), float(h2_ci[1]) + if lo2 <= tau_true <= hi2: + covered_h2 += 1 + + # Sup-t critical value: finite across reps means the shared- + # draw machinery produced coherent joint replicates at both + # horizons. NaN or unset would indicate the multi-horizon + # block short-circuited or the shared-weight broadcast + # misaligned across horizons. + cband = getattr(res.bootstrap_results, "cband_crit_value", None) + if cband is not None and np.isfinite(float(cband)): + cband_finite += 1 + + completed = n_reps - failed + assert completed >= int(0.95 * n_reps), ( + f"MC simulation had {failed}/{n_reps} fit failures, above " + f"the 5% tolerance." + ) + coverage_overall = covered_overall / completed + coverage_h1 = covered_h1 / completed + coverage_h2 = covered_h2 / completed + assert 0.925 <= coverage_overall <= 0.975, ( + f"Overall bootstrap CI coverage {coverage_overall:.3f} " + f"(completed {completed}) outside [0.925, 0.975]; " + f"tau_true={tau_true}, n_groups={n_groups}, n_periods={n_periods}, " + f"n_reps={n_reps}." + ) + assert 0.925 <= coverage_h1 <= 0.975, ( + f"Horizon-1 event-study bootstrap CI coverage " + f"{coverage_h1:.3f} (completed {completed}) outside " + f"[0.925, 0.975]; this is the shared-PSU-weight surface, " + f"regression here likely indicates a bug in the multi-horizon " + f"cell-level broadcast." + ) + # Horizon-2 tolerance is wider than horizon-1 because finite- + # sample coverage of the analytical TSL SE on this DGP is + # itself ~0.93 at l=2 (measured offline: analytical h-1 coverage + # 0.94, h-2 coverage 0.926 at n_groups=40). The bootstrap should + # track the analytical SE asymptotically, so an observed + # bootstrap coverage in [0.91, 0.98] at h-2 is consistent with + # correct clustering; a drop to ≤ 0.90 would indicate the + # shared-weight broadcast is not coherent across horizons. + assert 0.910 <= coverage_h2 <= 0.975, ( + f"Horizon-2 event-study bootstrap CI coverage " + f"{coverage_h2:.3f} (completed {completed}) outside " + f"[0.910, 0.975]; horizon-2 is the cross-horizon surface " + f"that exercises the SAME shared PSU weight matrix used " + f"at horizon-1 — a regression here indicates the shared-" + f"draw broadcast is not coherent across horizons." + ) + # Sup-t critical value must be finite in the vast majority of + # reps; occasional NaN on degenerate draws is tolerable but + # widespread NaN signals the shared-weight block never yielded + # a coherent joint distribution. + assert cband_finite >= int(0.90 * completed), ( + f"Sup-t critical value was finite in only {cband_finite}/" + f"{completed} reps. The shared (n_bootstrap, n_psu) PSU-" + f"level weight matrix must be drawn ONCE at the top of the " + f"multi-horizon block; a per-horizon re-draw would break " + f"the sup-t joint distribution." + ) diff --git a/tests/test_survey_dcdh.py b/tests/test_survey_dcdh.py index 4ac5472f..69b5a400 100644 --- a/tests/test_survey_dcdh.py +++ b/tests/test_survey_dcdh.py @@ -1,5 +1,8 @@ """Survey support tests for ChaisemartinDHaultfoeuille (dCDH).""" +import os +from typing import Optional + import numpy as np import pandas as pd import pytest @@ -1075,10 +1078,12 @@ class TestSurveyWithinGroupValidation: """Cell-period IF allocator contract: strata and PSU may vary ACROSS cells of a group, but must be constant WITHIN each (g, t) cell. In canonical one-obs-per-cell panels the cell-level constancy check is - trivially satisfied. Heterogeneity + within-group-varying PSU is - supported via the cell-period allocator. The remaining out-of-scope - combination is n_bootstrap > 0 + within-group-varying PSU, which - still raises NotImplementedError with a pointer to the follow-up PR. + trivially satisfied. All three variance paths — analytical TSL, + heterogeneity WLS, and the PSU-level wild multiplier bootstrap — + support within-group-varying PSU via the cell-period allocator. No + dCDH survey combination raises NotImplementedError beyond the + SurveyDesign-level exclusion of ``replicate_weights`` + + ``n_bootstrap > 0``. """ def test_accepts_varying_psu_within_group(self, base_data): @@ -1169,21 +1174,41 @@ def test_heterogeneity_with_varying_psu_succeeds(self, base_data): assert np.isfinite(entry["p_value"]) assert all(np.isfinite(entry["conf_int"])) - def test_bootstrap_with_varying_psu_raises(self, base_data): - """n_bootstrap > 0 is gated under within-group-varying PSU until - PR 4 ships the cell-level Hall-Mammen wild bootstrap. + def test_bootstrap_with_varying_psu_succeeds(self, base_data): + """PR 4: the PSU-level wild multiplier bootstrap now supports + within-group-varying PSU via the cell-level allocator. Each + (g, t) cell's IF mass is multiplied by its PSU's multiplier, + so a group spanning 2+ PSUs receives independent draws per + PSU. Assert a finite bootstrap SE and CI at the overall + surface and at every event-study horizon (the multi-horizon + path uses a shared PSU-level weight matrix for the sup-t + simultaneous band). """ df_ = base_data.copy() df_["pw"] = 1.0 - df_["stratum"] = 0 - df_["psu"] = df_["period"] % 2 # varies within group - sd = SurveyDesign(weights="pw", strata="stratum", psu="psu") - with pytest.raises(NotImplementedError, match="n_bootstrap"): - ChaisemartinDHaultfoeuille(n_bootstrap=50, seed=1).fit( - df_, outcome="outcome", group="group", - time="period", treatment="treatment", - survey_design=sd, - ) + # Per-group PSU parity — unique per (group, parity), varies + # within group so the cell-level dispatcher is exercised. + df_["psu"] = df_["group"].astype(int) * 2 + (df_["period"].astype(int) % 2) + sd = SurveyDesign(weights="pw", psu="psu") + res = ChaisemartinDHaultfoeuille(n_bootstrap=200, seed=1).fit( + df_, outcome="outcome", group="group", + time="period", treatment="treatment", + survey_design=sd, L_max=2, + ) + assert res.bootstrap_results is not None + assert np.isfinite(res.bootstrap_results.overall_se) + assert res.bootstrap_results.overall_se >= 0.0 + overall_ci = res.bootstrap_results.overall_ci + assert overall_ci is not None and all(np.isfinite(overall_ci)) + # Multi-horizon bootstrap must produce finite SE at each + # horizon (guards the shared-PSU-weight path from CRITICAL #2 + # of the PR 4 plan review). + es_ses = res.bootstrap_results.event_study_ses + assert es_ses is not None + for l_h in (1, 2): + assert l_h in es_ses, f"horizon {l_h} missing from event_study_ses" + assert np.isfinite(es_ses[l_h]), f"horizon {l_h} SE not finite" + assert es_ses[l_h] >= 0.0 def test_auto_inject_with_varying_strata_nest_true_succeeds(self, base_data): """When strata varies across cells of a group and the user @@ -1779,3 +1804,549 @@ def test_replicate_variance_non_invariance_under_varying_ratios(self): f"(counterexample from PR #329 CI review). Got " f"var_legacy={var_legacy}, var_new={var_new}." ) + + +class TestBootstrapCellPeriod: + """Regression guards for the cell-level wild PSU bootstrap allocator + (PR 4). Under PSU-within-group-constant regimes the dispatcher + routes to the legacy group-level bootstrap for bit-identity; under + within-group-varying PSU the cell-level path runs with per-cell + PSU multipliers. + """ + + # Captured on pre-PR-4 code (origin/main at SHA ac181b7f — PR #329 + # merge) via a scratch fit on the fixture below. Pinned here as + # the bit-identity regression guard for the dispatcher's + # PSU-within-group-constant legacy-path routing. If this test + # drifts, the dispatcher is no longer reproducing pre-PR-4 + # behavior under PSU=group and the legacy fast path has + # regressed. Values differ between Rust and pure-Python backends + # because `_generate_bootstrap_weights_batch` and `solve_ols` + # take different numeric paths; bit-identity still holds within + # each backend. + _BASELINE_OVERALL_SE_RUST = 0.30560839419979546 + _BASELINE_OVERALL_SE_PYTHON = 0.3030802540369796 + + @staticmethod + def _make_baseline_fixture() -> pd.DataFrame: + """Deterministic fixture for the pinned bootstrap SE baseline. + 12 groups, 5 periods, two switch cohorts (first-treated at + periods 2 and 3) plus never-switchers, fixed per-row + idiosyncratic draws so the bootstrap distribution is + non-degenerate. MUST stay identical to the capture fixture + or the pinned baseline becomes meaningless. + """ + rng = np.random.default_rng(12345) + rows = [] + n_groups, n_periods = 12, 5 + for g in range(n_groups): + if g < 4: + f: Optional[int] = None + elif g < 8: + f = 2 + else: + f = 3 + for t in range(n_periods): + d = 1 if (f is not None and t >= f) else 0 + y = float(g) + 0.3 * t + 1.5 * d + float(rng.normal(0.0, 0.5)) + rows.append({ + "group": int(g), + "period": int(t), + "treatment": int(d), + "outcome": y, + "pw": 1.0, + }) + return pd.DataFrame(rows) + + def test_bootstrap_se_matches_pre_pr4_baseline(self): + """Bit-identity regression guard: under PSU=group the + dispatcher routes through the legacy group-level bootstrap + path, so the overall bootstrap SE must match pre-PR-4 code + to ULP precision. The baseline values were captured on + `origin/main` at `ac181b7f` (the PR #329 merge) under each + backend independently. + """ + from diff_diff._backend import HAS_RUST_BACKEND + pure_python = ( + os.environ.get("DIFF_DIFF_BACKEND", "auto").lower() == "python" + or not HAS_RUST_BACKEND + ) + expected = ( + self._BASELINE_OVERALL_SE_PYTHON + if pure_python + else self._BASELINE_OVERALL_SE_RUST + ) + df_ = self._make_baseline_fixture() + sd = SurveyDesign(weights="pw", psu="group") + res = ChaisemartinDHaultfoeuille(n_bootstrap=500, seed=42).fit( + df_, outcome="outcome", group="group", + time="period", treatment="treatment", + survey_design=sd, + ) + assert res.bootstrap_results is not None + observed_se = float(res.bootstrap_results.overall_se) + backend_label = "pure-python" if pure_python else "rust" + assert observed_se == pytest.approx( + expected, rel=0.0, abs=1e-15, + ), ( + f"Bootstrap SE drifted from pre-PR-4 baseline " + f"(backend={backend_label}). expected={expected!r}, " + f"observed={observed_se!r}. The dispatcher's " + f"PSU-within-group-constant routing is no longer " + f"bit-identical to the legacy group-level bootstrap." + ) + + def test_bootstrap_cell_level_raises_on_missing_overall_tensor(self): + """Contract: when PSU varies within group, the bootstrap + dispatcher must NOT silently fall back to group-level for any + target. Invoking _compute_dcdh_bootstrap directly with a + varying `psu_codes_per_cell` but `u_per_period_overall=None` + must raise ValueError (not silently under-cluster). + """ + est = ChaisemartinDHaultfoeuille(n_bootstrap=50, seed=1) + # Minimal group-level IF inputs; matches arbitrary 2-group setup. + u_overall = np.array([0.5, -0.3], dtype=np.float64) + eligible_group_ids = np.array([0, 1]) + group_id_to_psu_code = {0: 0, 1: 1} + # Varying PSU: row 0 has two distinct PSU codes. + psu_codes_per_cell = np.array( + [[0, 1], [0, 0]], dtype=np.int64, + ) + with pytest.raises(ValueError, match="u_per_period_overall"): + est._compute_dcdh_bootstrap( + n_groups_for_overall=2, + u_centered_overall=u_overall, + divisor_overall=4, + original_overall=0.1, + group_id_to_psu_code=group_id_to_psu_code, + eligible_group_ids=eligible_group_ids, + u_per_period_overall=None, # missing — must raise + psu_codes_per_cell=psu_codes_per_cell, + ) + + def test_bootstrap_cell_level_raises_on_shape_mismatch(self): + """Contract: _unroll_target_to_cells rejects shape mismatches + between `u_per_period_target` and `psu_codes_per_cell` — a + silent misalignment would put PSU codes on the wrong cells. + """ + est = ChaisemartinDHaultfoeuille(n_bootstrap=50, seed=1) + u_overall = np.array([0.5, -0.3], dtype=np.float64) + eligible_group_ids = np.array([0, 1]) + group_id_to_psu_code = {0: 0, 1: 1} + psu_codes_per_cell = np.array( + [[0, 1], [0, 0]], dtype=np.int64, + ) + # Wrong shape — 3 columns instead of 2. + u_pp_wrong = np.array( + [[0.25, 0.25, 0.0], [-0.15, -0.15, 0.0]], dtype=np.float64, + ) + with pytest.raises(ValueError, match="shape mismatch"): + est._compute_dcdh_bootstrap( + n_groups_for_overall=2, + u_centered_overall=u_overall, + divisor_overall=4, + original_overall=0.1, + group_id_to_psu_code=group_id_to_psu_code, + eligible_group_ids=eligible_group_ids, + u_per_period_overall=u_pp_wrong, + psu_codes_per_cell=psu_codes_per_cell, + ) + + def test_bootstrap_cell_level_raises_on_sentinel_mass_leak(self): + """Contract: when `_cohort_recenter_per_period` subtracts + column means across the full period grid, a group with no + observation at period t can acquire non-zero centered mass + at that cell. Under the cell-level bootstrap path, such + mass lands on a `psu_codes_per_cell == -1` sentinel cell + and has no PSU to attach to — the bootstrap must raise + rather than silently drop the mass. + """ + est = ChaisemartinDHaultfoeuille(n_bootstrap=50, seed=1) + # Build a per-cell IF tensor with non-zero mass at a cell + # whose PSU code is -1 (simulating terminal missingness + # after cohort-recentering leaks mass to a missing cell). + psu_codes_per_cell = np.array( + [[0, 1, -1], [0, 1, 0]], dtype=np.int64, + ) + u_pp_overall_with_leak = np.array( + [[0.25, 0.25, -0.15], [-0.15, -0.15, 0.15]], + dtype=np.float64, + ) + u_overall = np.array([0.5, -0.3], dtype=np.float64) + eligible_group_ids = np.array([0, 1]) + group_id_to_psu_code = {0: 0, 1: 1} + with pytest.raises(ValueError, match="no positive-weight observations"): + est._compute_dcdh_bootstrap( + n_groups_for_overall=2, + u_centered_overall=u_overall, + divisor_overall=4, + original_overall=0.1, + group_id_to_psu_code=group_id_to_psu_code, + eligible_group_ids=eligible_group_ids, + u_per_period_overall=u_pp_overall_with_leak, + psu_codes_per_cell=psu_codes_per_cell, + ) + + def test_bootstrap_cell_level_raises_on_missing_horizon_tensor(self): + """Contract: when PSU varies within group, each multi-horizon + target must supply its per-cell IF tensor; missing one raises + ValueError rather than degrading the sup-t joint distribution. + """ + est = ChaisemartinDHaultfoeuille(n_bootstrap=50, seed=1) + u_overall = np.array([0.5, -0.3], dtype=np.float64) + u_pp_overall = np.array( + [[0.25, 0.25], [-0.15, -0.15]], dtype=np.float64, + ) + eligible_group_ids = np.array([0, 1]) + group_id_to_psu_code = {0: 0, 1: 1} + psu_codes_per_cell = np.array( + [[0, 1], [0, 0]], dtype=np.int64, + ) + # Horizon tuple missing its per-cell tensor (4th slot = None). + u_h = np.array([0.4, -0.2], dtype=np.float64) + mh_inputs = {1: (u_h, 4, 0.1, None)} + with pytest.raises(ValueError, match="multi-horizon.*l=1"): + est._compute_dcdh_bootstrap( + n_groups_for_overall=2, + u_centered_overall=u_overall, + divisor_overall=4, + original_overall=0.1, + multi_horizon_inputs=mh_inputs, + group_id_to_psu_code=group_id_to_psu_code, + eligible_group_ids=eligible_group_ids, + u_per_period_overall=u_pp_overall, + psu_codes_per_cell=psu_codes_per_cell, + ) + + def test_bootstrap_cell_level_with_all_zero_weight_group_does_not_crash(self): + """When one eligible group has all zero-weight observations, + every entry of its `psu_codes_per_cell` row is the sentinel + -1 and it contributes no cells to the bootstrap. The + dispatcher must handle this without crashing; overall + inference stays finite (driven by the remaining groups). + """ + rows = [] + n_groups, n_periods = 10, 5 + for g in range(n_groups): + f = 3 if g < n_groups // 2 else None + for t in range(n_periods): + # Zero-weight the last group entirely. + pw = 0.0 if g == n_groups - 1 else 1.0 + d = 1 if (f is not None and t >= f) else 0 + y = float(g) + 0.1 * t + 1.0 * d + rows.append({ + "group": int(g), + "period": int(t), + "treatment": int(d), + "outcome": y, + "pw": pw, + "psu": int(g) * 2 + (int(t) % 2), # varying PSU + }) + df_ = pd.DataFrame(rows) + sd = SurveyDesign(weights="pw", psu="psu") + res = ChaisemartinDHaultfoeuille(n_bootstrap=50, seed=1).fit( + df_, outcome="outcome", group="group", + time="period", treatment="treatment", + survey_design=sd, + ) + assert res.bootstrap_results is not None + # Bootstrap SE should be finite (zero-weight group does not + # disturb the other groups' contributions). + assert np.isfinite(res.bootstrap_results.overall_se) + + def test_bootstrap_zero_weight_group_equivalent_to_removing_it(self): + """Fixture A: 9 groups (1 all-zero-weighted + 8 positive) + with **within-group-varying PSU** so the dispatcher routes + through the cell-level path. Fixture B: 8 groups (same panel + without the zero-weight group), same varying PSU. Under the + fix, an eligible group with no positive-weight cells + contributes nothing to the bootstrap (its row of + `psu_codes_per_cell` is all sentinel), so both fits produce + byte-identical bootstrap SE at the same seed. Without the + fix, the `valid_map` gate disabled the entire PSU-aware + path — silently dropping fixture A to unclustered group- + level bootstrap while fixture B correctly ran the cell- + level path. Using `psu=group` (a within-group-constant PSU) + would not exercise this regression because the buggy and + correct paths collapse to the same identity-draw structure + under PSU=group — we deliberately use varying PSU here. + """ + def _make(include_zero_group: bool) -> pd.DataFrame: + rows = [] + n_groups = 9 if include_zero_group else 8 + for g in range(n_groups): + f = 3 if g < 4 else None + for t in range(5): + pw = 0.0 if (include_zero_group and g == 8) else 1.0 + d = 1 if (f is not None and t >= f) else 0 + y = float(g) + 0.1 * t + 1.0 * d + # Within-group-varying PSU (period parity per + # group) — exercises the cell-level dispatcher. + psu = int(g) * 2 + (int(t) % 2) + rows.append({ + "group": int(g), + "period": int(t), + "treatment": int(d), + "outcome": y, + "pw": pw, + "psu": psu, + }) + return pd.DataFrame(rows) + + sd = SurveyDesign(weights="pw", psu="psu") + res_a = ChaisemartinDHaultfoeuille(n_bootstrap=200, seed=7).fit( + _make(include_zero_group=True), + outcome="outcome", group="group", + time="period", treatment="treatment", + survey_design=sd, + ) + res_b = ChaisemartinDHaultfoeuille(n_bootstrap=200, seed=7).fit( + _make(include_zero_group=False), + outcome="outcome", group="group", + time="period", treatment="treatment", + survey_design=sd, + ) + assert res_a.bootstrap_results is not None + assert res_b.bootstrap_results is not None + se_a = float(res_a.bootstrap_results.overall_se) + se_b = float(res_b.bootstrap_results.overall_se) + assert np.isfinite(se_a) and np.isfinite(se_b) + assert se_a == pytest.approx(se_b, rel=0.0, abs=1e-15), ( + f"Bootstrap SE must match when a zero-weight eligible " + f"group is added under within-group-varying PSU (fix " + f"P0 #1 — no silent dropback to unclustered group-" + f"level). Got SE_with_zero={se_a!r}, " + f"SE_without_zero={se_b!r}." + ) + + def test_fit_raises_on_terminal_missingness_with_varying_psu(self): + """End-to-end `fit()` regression: when a survey panel has a + terminally-missing group in a cohort whose other groups still + contribute at the missing period, combined with within-group- + varying PSU, both the analytical TSL path and the cell-level + bootstrap path must raise `ValueError` — cohort-recentering + leaks non-zero centered IF mass onto cells with no positive- + weight obs, and both paths (`_survey_se_from_group_if` for + analytical, `_unroll_target_to_cells` for bootstrap) use the + cell-period allocator and therefore cannot allocate leaked + mass to any observation or PSU. Pre-processing the panel + (dropping late-exit groups or trimming to a balanced sub- + panel) is the documented workaround. + """ + rows = [] + # 10 groups. Joiners at period 3 (cohort A): groups 0-4. + # Leavers at period 4 (cohort B, D=1 at period 0): groups 5-7. + # Never-treated: groups 8-9. + # Group 2 is terminally missing at periods 4-5. It is in + # cohort A; at period 4 the other joiners (0, 1, 3, 4) serve + # as stable_1 controls (they switched on at period 3 and + # contribute when leavers appear at period 4). The cohort + # mean at period 4 is therefore non-zero, and + # `_cohort_recenter_per_period` leaks `-col_mean` onto + # group 2's missing cell — which the cell-level bootstrap + # cannot allocate to any PSU. + for g in range(10): + if g < 5: + # Joiners at period 3 (D=0 at baseline, D=1 from t=3). + d_pattern = [0, 0, 0, 1, 1, 1] + elif g < 8: + # Leavers at period 4 (D=1 at baseline, D=0 from t=4). + d_pattern = [1, 1, 1, 1, 0, 0] + else: + # Never-treated controls. + d_pattern = [0, 0, 0, 0, 0, 0] + for t in range(6): + if g == 2 and t >= 4: + # Terminal missingness: drop rows past period 3. + continue + d = d_pattern[t] + y = float(g) + 0.1 * t + 1.0 * d + rows.append({ + "group": int(g), + "period": int(t), + "treatment": int(d), + "outcome": y, + "pw": 1.0, + # Within-group-varying PSU: period parity per group. + "psu": int(g) * 2 + (int(t) % 2), + }) + df_ = pd.DataFrame(rows) + sd = SurveyDesign(weights="pw", psu="psu") + + import warnings as _w + # Analytical path (n_bootstrap=0): the sentinel-mass guard in + # `_survey_se_from_group_if` raises on the same leakage the + # bootstrap guard rejects — both paths use the cell-period + # allocator and cannot allocate leaked mass to any + # observation. + with _w.catch_warnings(): + _w.simplefilter("ignore") # terminal-missingness UserWarning + with pytest.raises( + ValueError, match="no positive-weight observations", + ): + ChaisemartinDHaultfoeuille(n_bootstrap=0, seed=1).fit( + df_, outcome="outcome", group="group", + time="period", treatment="treatment", + survey_design=sd, L_max=1, + ) + + # Bootstrap path (n_bootstrap > 0): same sentinel-mass guard + # fires via `_unroll_target_to_cells`. + with _w.catch_warnings(): + _w.simplefilter("ignore") + with pytest.raises( + ValueError, match="no positive-weight observations", + ): + ChaisemartinDHaultfoeuille(n_bootstrap=50, seed=1).fit( + df_, outcome="outcome", group="group", + time="period", treatment="treatment", + survey_design=sd, L_max=1, + ) + + def test_fit_succeeds_on_terminal_missingness_with_psu_group(self): + """Companion regression: the terminal-missingness + varying-PSU + sentinel-mass guard must NOT fire when PSU is within-group- + constant. Same fixture as the varying-PSU test above but with + `psu=` (auto-inject default) — both analytical + and bootstrap paths route through the legacy group-level + allocator (the analytical dispatcher in + `_survey_se_from_group_if` falls back to the group-level + allocator when PSU does not vary within group; the bootstrap + dispatcher in `_compute_dcdh_bootstrap` does the same). Fit + must succeed with finite SE. + """ + rows = [] + for g in range(10): + if g < 5: + d_pattern = [0, 0, 0, 1, 1, 1] + elif g < 8: + d_pattern = [1, 1, 1, 1, 0, 0] + else: + d_pattern = [0, 0, 0, 0, 0, 0] + for t in range(6): + if g == 2 and t >= 4: + continue + d = d_pattern[t] + y = float(g) + 0.1 * t + 1.0 * d + rows.append({ + "group": int(g), + "period": int(t), + "treatment": int(d), + "outcome": y, + "pw": 1.0, + }) + df_ = pd.DataFrame(rows) + # Auto-inject: no explicit `psu` → `SurveyDesign` falls back to + # `psu=` at fit() time. Within-group-constant. + sd = SurveyDesign(weights="pw") + import warnings as _w + for n_boot in (0, 50): + with _w.catch_warnings(): + _w.simplefilter("ignore") + res = ChaisemartinDHaultfoeuille( + n_bootstrap=n_boot, seed=1, + ).fit( + df_, outcome="outcome", group="group", + time="period", treatment="treatment", + survey_design=sd, L_max=1, + ) + assert np.isfinite(res.overall_att), ( + f"n_bootstrap={n_boot}: overall_att must be finite " + f"under PSU=group + terminal missingness." + ) + assert np.isfinite(res.overall_se) and res.overall_se >= 0.0, ( + f"n_bootstrap={n_boot}: overall_se must be finite " + f"under PSU=group + terminal missingness." + ) + + def test_bootstrap_dense_codes_under_singleton_baseline_excluded_group(self): + """Regression for P0 #2: when a group is singleton-baseline- + excluded (e.g., an always-treated group whose baseline D=1 + has no peer), its PSU label must NOT pollute the dense code + factorization used by `_compute_dcdh_bootstrap`. Otherwise + eligible groups that share a PSU receive gapped dense codes + (e.g., `[1, 1]`), `_generate_psu_or_group_weights` computes + `n_psu = max + 1 = 2 == n_groups_target = 2`, and the + identity fast path wrongly triggers — giving those eligible + groups independent multiplier draws instead of a shared + one. Assertion: instrument the call to capture the + `group_id_to_psu_code` dict actually passed and confirm its + values form a contiguous range `[0, n_unique - 1]`. + """ + # Fixture: one always-treated group (D=1 at period 0 → singleton- + # baseline-excluded), plus eligible groups that share a PSU + # label while the excluded group has a different PSU. + rows = [] + for g in range(5): + for t in range(5): + if g == 0: + d = 1 # always-treated; baseline D=1 singleton + psu = 100 # distinct PSU for the excluded group + else: + d = 1 if t >= 3 else 0 # joiners at period 3 + # Groups 1, 2 share PSU=200; groups 3, 4 share PSU=300. + psu = 200 if g in (1, 2) else 300 + rows.append({ + "group": int(g), + "period": int(t), + "treatment": int(d), + "outcome": float(g) + 0.1 * t + 0.5 * d, + "pw": 1.0, + "psu": psu, + }) + df_ = pd.DataFrame(rows) + sd = SurveyDesign(weights="pw", psu="psu") + + captured: dict = {} + + est = ChaisemartinDHaultfoeuille(n_bootstrap=50, seed=1) + original_bootstrap = est._compute_dcdh_bootstrap + + def _spy(**kwargs): + captured["group_id_to_psu_code"] = kwargs.get( + "group_id_to_psu_code" + ) + return original_bootstrap(**kwargs) + + est._compute_dcdh_bootstrap = _spy # type: ignore[method-assign] + + import warnings as _w + with _w.catch_warnings(): + _w.simplefilter("ignore") # singleton-baseline warning + est.fit( + df_, outcome="outcome", group="group", + time="period", treatment="treatment", + survey_design=sd, + ) + + dict_passed = captured["group_id_to_psu_code"] + assert dict_passed is not None, ( + "bootstrap received group_id_to_psu_code=None — the " + "PSU-aware path was disabled instead of routing to the " + "cell/legacy path via densified codes." + ) + codes = sorted(set(dict_passed.values())) + # Eligible groups share only two PSUs (200 for g=1,2; + # 300 for g=3,4). Dense codes must be [0, 1], NOT [1, 2] + # (which would happen if the excluded g=0's PSU=100 were + # dense-coded first). + assert codes == list(range(len(codes))), ( + f"group_id_to_psu_code values must be contiguous " + f"dense codes starting at 0, got {codes}. A non-" + f"contiguous range signals the excluded group's PSU " + f"polluted the dense factorization (P0 #2 regression)." + ) + # Sanity: eligible groups 1, 2 must share a code (PSU=200), + # and eligible groups 3, 4 must share a code (PSU=300). + assert dict_passed[1] == dict_passed[2], ( + "Groups 1 and 2 share PSU=200 and must receive the same " + "dense code under correct densification." + ) + assert dict_passed[3] == dict_passed[4], ( + "Groups 3 and 4 share PSU=300 and must receive the same " + "dense code." + ) + assert dict_passed[1] != dict_passed[3], ( + "Groups in PSU=200 and PSU=300 must receive distinct " + "dense codes." + ) diff --git a/tests/test_survey_dcdh_replicate_psu.py b/tests/test_survey_dcdh_replicate_psu.py index 595f80bd..aac02615 100644 --- a/tests/test_survey_dcdh_replicate_psu.py +++ b/tests/test_survey_dcdh_replicate_psu.py @@ -170,6 +170,184 @@ def test_overall_se_finite(self, base_panel, replicate_design, method): assert np.isfinite(res.overall_se) assert res.overall_se > 0 + def test_att_raises_on_terminal_missingness_replicate_path(self): + """Replicate ATT + terminal missingness: the Class A replicate + path uses the cell-period allocator unconditionally (per the + PR #323 contract), so cohort-recentering leakage onto missing + cells hits the `_survey_se_from_group_if` sentinel-mass guard + regardless of PSU structure (replicate designs do not carry + `resolved.psu`). Unlike Binder TSL, replicate has no + `psu=` fallback — the documented workaround is to + pre-process the panel. Locks this user-facing failure mode. + """ + import warnings as _w + rows = [] + for g in range(10): + if g < 5: + d_pattern = [0, 0, 0, 1, 1, 1] + elif g < 8: + d_pattern = [1, 1, 1, 1, 0, 0] + else: + d_pattern = [0, 0, 0, 0, 0, 0] + for t in range(6): + if g == 2 and t >= 4: + continue # terminal missingness for group 2 + d = d_pattern[t] + y = float(g) + 0.1 * t + 1.0 * d + row = { + "group": int(g), + "period": int(t), + "treatment": int(d), + "outcome": y, + "pw": 1.0, + } + # Attach 5 replicate-weight columns (per-row + # Rademacher-like draws; SDR is the simplest method + # that does not require a replicate_strata schedule). + for r in range(5): + row[f"rep{r}"] = 0.5 if (g + t + r) % 2 == 0 else 1.5 + rows.append(row) + df_ = pd.DataFrame(rows) + sd = SurveyDesign( + weights="pw", + replicate_weights=[f"rep{r}" for r in range(5)], + replicate_method="SDR", + ) + with _w.catch_warnings(): + _w.simplefilter("ignore") # terminal-missingness UserWarning + with pytest.raises( + ValueError, match="no positive-weight observations", + ): + ChaisemartinDHaultfoeuille(seed=1).fit( + df_, + outcome="outcome", + group="group", + time="period", + treatment="treatment", + survey_design=sd, + L_max=1, + ) + + def test_att_cell_allocator_with_varying_replicate_ratios( + self, base_panel, replicate_design, monkeypatch, + ): + """Class A ATT replicate contract: `_survey_se_from_group_if` + dispatches replicate fits through the cell-period allocator + (`U_centered_per_period`-based psi_obs), not the legacy + group-level expansion. Under per-row replicate matrices where + ratios vary within group, the two allocators produce + different Rao-Wu variances (obs-level `theta_r = sum_i + ratio_ir * psi_i` is not PSU-telescoping — same observation + as PR #329's heterogeneity non-invariance guard). + + Locks the contract by spying on `compute_replicate_if_variance` + to capture the psi_obs actually passed during the fit, + reconstructing the legacy group-level counterpart via the + row-sum identity, and asserting: + (a) fit's `overall_se**2` matches the captured cell-allocator + psi_obs fed back through the helper; + (b) the legacy-reconstructed psi_obs yields a materially + different Rao-Wu variance, i.e. a future refactor that + switches Class A to the legacy allocator would drift + the fit's SE. + """ + import numpy as _np + import diff_diff.survey as _survey_mod + + R = 20 + df = replicate_design(base_panel, R=R, method="SDR") + sd = _build_replicate_design(R, "SDR") + + # Spy on compute_replicate_if_variance to capture each + # psi_obs the fit passes. Pass through to the real helper so + # fit completes normally. + original_repvar = _survey_mod.compute_replicate_if_variance + captured_psi: list = [] + + def _spy(psi, resolved): + captured_psi.append(_np.asarray(psi).copy()) + return original_repvar(psi, resolved) + + monkeypatch.setattr( + _survey_mod, "compute_replicate_if_variance", _spy + ) + + res = ChaisemartinDHaultfoeuille(seed=1).fit( + df, + outcome="outcome", + group="group", + time="period", + treatment="treatment", + survey_design=sd, + ) + assert np.isfinite(res.overall_se) and res.overall_se > 0 + assert captured_psi, ( + "Spy captured no psi_obs — the fit did not route through " + "compute_replicate_if_variance. Did the replicate " + "dispatch change?" + ) + + # The first capture corresponds to the overall DID_M target + # (called from the `_compute_se`→`_survey_se_from_group_if` + # chain for the overall surface). + psi_obs_actual = captured_psi[0] + + resolved = sd.resolve(df) + obs_group = df["group"].values + w = resolved.weights.astype(_np.float64) + n_obs = len(w) + + # Row-sum identity on the cell-allocator psi_obs: summing + # over obs in group g recovers U_centered[g]. Use this to + # reconstruct the legacy group-level expansion on the SAME + # fitted U_centered. + unique_gids = _np.unique(obs_group) + g_to_idx = {g: i for i, g in enumerate(unique_gids)} + U_centered_recon = _np.zeros(len(unique_gids), dtype=_np.float64) + W_g = _np.zeros(len(unique_gids), dtype=_np.float64) + for i in range(n_obs): + gi = g_to_idx[obs_group[i]] + U_centered_recon[gi] += psi_obs_actual[i] + W_g[gi] += w[i] + psi_obs_legacy = _np.zeros_like(w) + for i in range(n_obs): + gi = g_to_idx[obs_group[i]] + if W_g[gi] > 0: + psi_obs_legacy[i] = ( + U_centered_recon[gi] * w[i] / W_g[gi] + ) + + # (a) Fit's overall_se^2 must match the variance recomputed + # from the captured psi_obs. Trivial sanity check — the fit + # computed it from this exact psi_obs via the spy pass-through. + var_cell_recompute, _ = original_repvar(psi_obs_actual, resolved) + assert float(res.overall_se) ** 2 == pytest.approx( + var_cell_recompute, rel=1e-12, + ), ( + f"Fit's overall_se^2 does not match the recomputed " + f"variance on the captured psi_obs — spy / dispatch " + f"misalignment." + ) + + # (b) Legacy-reconstructed psi_obs yields a different Rao-Wu + # variance on this fixture. Locks the allocator-sensitivity + # contract: if a future refactor routes Class A replicate + # ATT through the legacy group-level allocator, the fit's + # overall_se would silently match var_legacy instead — and + # this assertion would fail, surfacing the regression. + var_legacy, _ = original_repvar(psi_obs_legacy, resolved) + assert not _np.isclose( + var_cell_recompute, var_legacy, rtol=1e-6, + ), ( + f"Expected cell-allocator variance to differ from legacy " + f"group-level variance on this fixture (per-row " + f"replicate matrix varies within group). Got " + f"cell={var_cell_recompute}, legacy={var_legacy}. Either " + f"the fixture no longer exercises the allocator " + f"sensitivity, OR the fit's allocator has silently " + f"reverted to legacy." + ) + @pytest.mark.parametrize("method", REPLICATE_METHODS) def test_inference_fields_finite(self, base_panel, replicate_design, method): R = 20