diff --git a/CHANGELOG.md b/CHANGELOG.md
index a84732c7..ef26f511 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -9,7 +9,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Changed
 - Add Zenodo DOI badge to README; upgrade the BibTeX citation block with the concept DOI (`10.5281/zenodo.19646175`) and list author as Isaac Gerber (matching `CITATION.cff`). Add `doi:` and `identifiers:` entries (concept + versioned) to `CITATION.cff`. DOI was minted by Zenodo when v3.1.3 was released.
-- **`ChaisemartinDHaultfoeuille` heterogeneity + within-group-varying PSU/strata now supported under Binder TSL** - `fit(heterogeneity=..., survey_design=...)` no longer raises `NotImplementedError` when the resolved design's PSU or strata vary across the cells of a group. On the **Binder TSL** branch (`compute_survey_if_variance`), the heterogeneity WLS coefficient IF is expanded to observation level via the cell-period allocator `ψ_i = ψ_g * (w_i / W_{g, out_idx})` on the post-period cell — the DID_l post-period single-cell convention shipped in v3.1.x. Under PSU=group the PSU-level Binder TSL variance is byte-identical to the previous release (PSU-level aggregate telescopes to `ψ_g`); under within-group-varying PSU, mass lands in the post-period PSU of the transition. The **Rao-Wu replicate-weight** branch (`compute_replicate_if_variance`) retains the legacy group-level allocator `ψ_i = ψ_g * (w_i / W_g)`: replicate variance computes `θ_r = sum_i ratio_ir * ψ_i` at observation level and is therefore not PSU-telescoping, so the cell-period allocator would silently change the replicate SE whenever a replicate column's ratios vary within group (e.g., per-row replicate matrices). Replicate + heterogeneity fits therefore produce byte-identical SE to the previous release, and the newly-unblocked `heterogeneity=` + within-group-varying PSU combination is unreachable under replicate designs by construction (`SurveyDesign` rejects `replicate_weights` combined with explicit `strata/psu/fpc`). `n_bootstrap > 0` combined with within-group-varying PSU remains gated with `NotImplementedError` — the PSU-level Hall-Mammen wild bootstrap still uses the legacy group-level PSU map and will be extended in a follow-up PR.
+- **`ChaisemartinDHaultfoeuille` heterogeneity + within-group-varying PSU/strata now supported under Binder TSL** - `fit(heterogeneity=..., survey_design=...)` no longer raises `NotImplementedError` when the resolved design's PSU or strata vary across the cells of a group. On the **Binder TSL** branch (`compute_survey_if_variance`), the heterogeneity WLS coefficient IF is expanded to observation level via the cell-period allocator `ψ_i = ψ_g * (w_i / W_{g, out_idx})` on the post-period cell — the DID_l post-period single-cell convention shipped in v3.1.x. Under PSU=group the PSU-level Binder TSL variance is byte-identical to the previous release (PSU-level aggregate telescopes to `ψ_g`); under within-group-varying PSU, mass lands in the post-period PSU of the transition. The **Rao-Wu replicate-weight** branch (`compute_replicate_if_variance`) retains the legacy group-level allocator `ψ_i = ψ_g * (w_i / W_g)`: replicate variance computes `θ_r = sum_i ratio_ir * ψ_i` at observation level and is therefore not PSU-telescoping, so the cell-period allocator would silently change the replicate SE whenever a replicate column's ratios vary within group (e.g., per-row replicate matrices). Replicate + heterogeneity fits therefore produce byte-identical SE to the previous release, and the newly-unblocked `heterogeneity=` + within-group-varying PSU combination is unreachable under replicate designs by construction (`SurveyDesign` rejects `replicate_weights` combined with explicit `strata/psu/fpc`).
+- **`ChaisemartinDHaultfoeuille.fit(survey_design=..., n_bootstrap > 0)` now supports within-group-varying PSU** — the PSU-level Hall-Mammen wild multiplier bootstrap has been extended from a group-level PSU map (one multiplier per group) to a cell-level PSU map (one multiplier per `(g, t)` cell's PSU). A dispatcher in `_compute_dcdh_bootstrap` detects PSU-within-group-constant regimes (including PSU=group auto-inject and strictly-coarser PSU with within-group constancy) and routes them through the legacy group-level path so the bootstrap SE is bit-identical to the previous release (guarded by the new `test_bootstrap_se_matches_pre_pr4_baseline` and the pre-existing `test_auto_inject_bit_identical_to_group_level`). Under within-group-varying PSU, a group contributing cells to multiple PSUs receives independent multiplier draws per PSU — the correct Hall-Mammen wild PSU clustering at cell granularity. Multi-horizon bootstraps draw a single shared `(n_bootstrap, n_psu)` PSU-level weight matrix per block and broadcast per-horizon via each horizon's cell-to-PSU map, so the sup-t simultaneous confidence band remains a valid joint distribution. Closes the last `NotImplementedError` gate in the dCDH survey contract; replicate-weight variance and `n_bootstrap > 0` remain mutually exclusive by construction. **Scope note:** panels with *terminal missingness* where the terminally-missing group is in a cohort whose other groups still contribute at the missing period now raise a targeted `ValueError` on every survey variance path that uses the cell-period allocator: Binder TSL with within-group-varying PSU, Rao-Wu replicate-weight ATT (which always uses the cell allocator per the Class A contract shipped in PR #323), and the cell-level wild PSU bootstrap. Cohort-recentering leaks centered IF mass onto cells with no positive-weight observations, which the cell-period allocator cannot attach to any observation/PSU. This closes a silent mass-drop bug the cell-period allocator introduced across all three paths in v3.1.x; pre-process the panel to remove terminal missingness (drop late-exit groups or trim to a balanced sub-panel) as the documented workaround. For Binder TSL only, using an explicit `psu=<group_col>` routes through the legacy group-level allocator where the row-sum identity makes the two allocators statistically equivalent. Replicate-weight ATT and within-group-varying-PSU bootstrap have no such allocator fallback — the panel itself must be pre-processed. PSU-within-group-constant Binder TSL (including PSU=group auto-inject) is unaffected.
 
 ## [3.1.3] - 2026-04-18
 
diff --git a/diff_diff/chaisemartin_dhaultfoeuille.py b/diff_diff/chaisemartin_dhaultfoeuille.py
index 846c9506..1416d5ed 100644
--- a/diff_diff/chaisemartin_dhaultfoeuille.py
+++ b/diff_diff/chaisemartin_dhaultfoeuille.py
@@ -662,14 +662,32 @@ def fit(
             design is supplied, the multiplier bootstrap operates at
             the PSU level (Hall-Mammen wild PSU bootstrap) — under the
             default auto-inject this collapses to a group-level
-            clustered bootstrap. **Out-of-scope combinations raise
-            ``NotImplementedError``**: (a) replicate weights with
-            ``n_bootstrap > 0`` (replicate variance is closed-form;
-            bootstrap would double-count variance); (b) ``n_bootstrap >
-            0`` with PSU that varies within group (PSU-level bootstrap
-            still uses the legacy group-level PSU map; follow-up PR
-            extends it). See REGISTRY.md
-            ``ChaisemartinDHaultfoeuille`` Notes for the full contract.
+            clustered bootstrap. Under within-group-varying PSU the
+            bootstrap uses a cell-level wild PSU allocator — a group
+            contributing cells to multiple PSUs receives independent
+            multiplier draws per PSU (see the Survey + bootstrap
+            contract Note in REGISTRY.md). **Scope note (terminal
+            missingness under any cell-period-allocator path):** on
+            panels where a terminally-missing group is in a cohort
+            whose other groups still contribute at the missing
+            period, every survey variance path that uses the cell-
+            period allocator raises a targeted ``ValueError``:
+            Binder TSL with within-group-varying PSU, Rao-Wu
+            replicate-weight ATT (which always uses the cell
+            allocator), and the cell-level wild PSU bootstrap.
+            Cohort-recentering leaks centered IF mass onto cells
+            with no positive-weight obs, which the cell-period
+            allocator cannot allocate to any observation or PSU.
+            Pre-process the panel (drop late-exit groups or trim to
+            a balanced sub-panel), or — for Binder TSL only — use
+            an explicit ``psu=<group_col>`` so the analytical path
+            routes through the legacy group-level allocator.
+            Replicate ATT and within-group-varying-PSU bootstrap
+            have no such allocator fallback. **Replicate weights with ``n_bootstrap > 0``
+            raises ``NotImplementedError``** (replicate variance is
+            closed-form; bootstrap would double-count variance). See
+            REGISTRY.md ``ChaisemartinDHaultfoeuille`` Notes for the
+            full contract.
 
         Returns
         -------
@@ -825,25 +843,12 @@ def fit(
 
             # Cell-period IF allocator contract: strata and PSU must be
             # constant within each (g, t) cell, a strict relaxation of
-            # the previous within-group constancy rule. One out-of-scope
-            # combination remains gated with NotImplementedError until
-            # the corresponding follow-up PR extends it:
-            #   - n_bootstrap > 0 + within-group-varying PSU
-            #     (PR 4: cell-level Hall-Mammen wild bootstrap)
-            _, psu_varies = _strata_psu_vary_within_group(
-                resolved_survey, data, group, survey_weights,
-            )
-            if psu_varies and self.n_bootstrap > 0:
-                raise NotImplementedError(
-                    "n_bootstrap > 0 is not supported under a "
-                    "survey design whose PSU varies within group. "
-                    "The PSU-level Hall-Mammen wild bootstrap uses "
-                    "the legacy group-level PSU map and will be "
-                    "extended to cell-level PSU in a follow-up PR. "
-                    "For now, use n_bootstrap=0 (analytical TSL "
-                    "variance, which fully supports within-group-"
-                    "varying PSU via the cell-period allocator)."
-                )
+            # the previous within-group constancy rule. Both the
+            # analytical TSL path and the PSU-level wild bootstrap now
+            # honor within-group-varying PSU via the cell-period
+            # allocator (the bootstrap dispatcher routes PSU-within-
+            # group-constant regimes through the legacy group-level
+            # path for bit-identity with prior releases).
             _validate_cell_constant_strata_psu(
                 resolved_survey, data, group, time, survey_weights,
             )
@@ -1693,6 +1698,14 @@ def fit(
 
             multi_horizon_se = {}
             multi_horizon_inference = {}
+            # Cache the cohort-recentered per-cell IF tensors built
+            # inside this loop so the bootstrap block below can reuse
+            # them without recomputing cohort-recentering. Keyed by
+            # horizon; value shape (n_eligible_var, n_periods). Populated
+            # only when a survey design is active (else the per-period
+            # tensor is None and the bootstrap has no cell-level path
+            # to take).
+            _mh_pp_cache: Dict[int, np.ndarray] = {}
             # Compute inference for ALL horizons 1..L_max (including l=1)
             # so the event_study_effects dict uses a consistent estimand
             # (per-group DID_{g,l}) across all horizons.
@@ -1736,6 +1749,7 @@ def fit(
                     U_centered_pp_l: Optional[np.ndarray] = _cohort_recenter_per_period(
                         U_pp_l[eligible_mask_var], cid_elig
                     )
+                    _mh_pp_cache[l_h] = U_centered_pp_l
                 else:
                     U_centered_pp_l = None
                 N_l_h = multi_horizon_dids[l_h]["N_l"]
@@ -1842,6 +1856,10 @@ def fit(
                     [g not in singleton_baseline_set_pl for g in all_groups],
                     dtype=bool,
                 )
+                # Cache per-placebo-horizon cohort-recentered per-cell
+                # IF tensors for the bootstrap block below (same
+                # pattern as _mh_pp_cache for positive horizons).
+                _pl_pp_cache: Dict[int, np.ndarray] = {}
                 for lag_l in range(1, L_max + 1):
                     pl_data = multi_horizon_placebos.get(lag_l)
                     if pl_data is None or pl_data["N_pl_l"] == 0:
@@ -1874,6 +1892,7 @@ def fit(
                         U_centered_pp_pl_l: Optional[np.ndarray] = _cohort_recenter_per_period(
                             U_pp_pl[eligible_mask_pl], cid_elig_pl
                         )
+                        _pl_pp_cache[lag_l] = U_centered_pp_pl_l
                     else:
                         U_centered_pp_pl_l = None
                     _elig_groups_pl = [all_groups[g] for g in range(len(all_groups)) if eligible_mask_pl[g]]
@@ -2106,21 +2125,27 @@ def fit(
                         "instead of replicate weights."
                     )
 
-                # Warning fires only when PSU is strictly coarser than
-                # group. Under auto-inject psu=group (or explicit
-                # psu=<group_col>), groups and PSUs coincide so
-                # Hall-Mammen wild PSU bootstrap equals group-level
-                # multiplier bootstrap — no need to warn.
-                # Count groups/PSUs on the POST-FILTER eligible set
-                # (`_eligible_group_ids`) — the same set the bootstrap
-                # map is built from below. Using raw positive-weight
-                # groups here would emit a misleading warning on
-                # panels where upstream dCDH filtering drops groups
-                # that happen to share PSUs with kept groups.
+                # Warning fires only when PSU is **strictly coarser
+                # than group** on an otherwise within-group-constant
+                # design (multiple eligible groups share a PSU label
+                # but no group spans more than one PSU). Two regimes
+                # are explicitly excluded:
+                # - PSU=group (auto-inject default): identity-map
+                #   fast path — no warning needed.
+                # - Within-group-varying PSU: the cell-level
+                #   allocator honors the per-cell PSU structure;
+                #   "n_psu < n_groups" is expected whenever cells
+                #   of a group share a PSU with cells of another
+                #   group, which does not indicate coarser-than-group
+                #   clustering in the Hall-Mammen sense.
+                # Count unique PSUs across ALL positive-weight obs of
+                # eligible groups AND detect within-group-varying
+                # PSU; suppress the warning in that regime.
                 psu_arr_warn = getattr(resolved_survey, "psu", None)
                 if psu_arr_warn is None or _obs_survey_info is None:
                     # No PSU info — can't compare to group count.
                     n_psu_eff_warn, n_groups_eff_warn = -1, -1
+                    psu_varies_within_warn = False
                 else:
                     obs_gids_warn = np.asarray(_obs_survey_info["group_ids"])
                     obs_ws_warn = np.asarray(
@@ -2128,25 +2153,37 @@ def fit(
                     )
                     pos_mask_warn = obs_ws_warn > 0
                     psu_codes_warn = np.asarray(psu_arr_warn)
-                    # Collect the PSU label for each variance-eligible
-                    # group. PSU that varies within group is rejected
-                    # for the bootstrap path upstream (NotImplementedError
-                    # gate), so the first positive-weight label
-                    # represents the whole group here.
-                    eligible_psu_labels: List[Any] = []
-                    for gid in _eligible_group_ids:
-                        mask_g = (obs_gids_warn == gid) & pos_mask_warn
-                        if mask_g.any():
-                            eligible_psu_labels.append(
-                                psu_codes_warn[mask_g][0]
-                            )
-                    n_groups_eff_warn = len(eligible_psu_labels)
-                    n_psu_eff_warn = (
-                        int(len(np.unique(np.asarray(eligible_psu_labels))))
-                        if eligible_psu_labels
-                        else -1
+                    eligible_gid_set = set(_eligible_group_ids)
+                    elig_obs_mask_warn = pos_mask_warn & np.array(
+                        [g in eligible_gid_set for g in obs_gids_warn],
+                        dtype=bool,
                     )
-                if 0 <= n_psu_eff_warn < n_groups_eff_warn:
+                    if elig_obs_mask_warn.any():
+                        elig_psu_labels_arr = psu_codes_warn[elig_obs_mask_warn]
+                        n_psu_eff_warn = int(
+                            len(np.unique(elig_psu_labels_arr))
+                        )
+                        n_groups_eff_warn = len(_eligible_group_ids)
+                        # Detect within-group-varying PSU on the
+                        # eligible subset so we can suppress the
+                        # "strictly coarser PSU" warning there.
+                        psu_varies_within_warn = bool(
+                            pd.DataFrame({
+                                "g": obs_gids_warn[elig_obs_mask_warn],
+                                "p": elig_psu_labels_arr,
+                            })
+                            .groupby("g")["p"]
+                            .nunique()
+                            .gt(1)
+                            .any()
+                        )
+                    else:
+                        n_psu_eff_warn, n_groups_eff_warn = -1, -1
+                        psu_varies_within_warn = False
+                if (
+                    0 <= n_psu_eff_warn < n_groups_eff_warn
+                    and not psu_varies_within_warn
+                ):
                     warnings.warn(
                         f"Bootstrap with survey_design uses Hall-Mammen "
                         f"wild multiplier weights at the PSU level "
@@ -2163,10 +2200,12 @@ def fit(
                         stacklevel=2,
                     )
             joiners_inputs = (
-                (U_centered_joiners, joiner_total, joiners_att) if joiners_available else None
+                (U_centered_joiners, joiner_total, joiners_att, U_centered_pp_joiners)
+                if joiners_available else None
             )
             leavers_inputs = (
-                (U_centered_leavers, leaver_total, leavers_att) if leavers_available else None
+                (U_centered_leavers, leaver_total, leavers_att, U_centered_pp_leavers)
+                if leavers_available else None
             )
             # Phase 1 placebo bootstrap: the Phase 1 per-period placebo
             # DID_M^pl still uses NaN SE (no IF derivation for the
@@ -2220,6 +2259,7 @@ def fit(
                         U_centered_pl_b,
                         pl_data["N_pl_l"],
                         pl_data["placebo_l"],
+                        _pl_pp_cache.get(lag_l),
                     )
 
             # Phase 2: build multi-horizon bootstrap inputs from the
@@ -2275,23 +2315,28 @@ def fit(
                         U_centered_h,
                         h_data["N_l"],
                         h_data["did_l"],
+                        _mh_pp_cache.get(l_h),
                     )
 
-            # Under a survey design with PSU information, build a
-            # `group_id_to_psu_code` dict so the bootstrap mixin can
-            # derive each target's PSU map by ID lookup rather than by
-            # positional reuse. PSU that varies within group is
-            # rejected for the bootstrap path by a NotImplementedError
-            # gate in fit() (the cell-period allocator used by the
-            # analytical TSL path does NOT require within-group PSU
-            # constancy, but the Hall-Mammen wild PSU bootstrap still
-            # uses a group-level PSU map — PR 4 extends it). Each
-            # variance-eligible group therefore has exactly one PSU
-            # label here. Under auto-inject psu=group each group has a
-            # unique PSU code and the bootstrap mixin's identity-map
-            # fast path reproduces the pre-PSU behavior bit-for-bit.
+            # Under a survey design with PSU information, build both
+            # (a) a group-level `group_id_to_psu_code` dict (one PSU
+            # code per eligible group) and (b) a per-cell PSU tensor
+            # `psu_codes_per_cell` of shape (n_eligible, n_periods).
+            # The bootstrap mixin's dispatcher inspects (b) to decide
+            # whether PSU is within-group-constant: when constant, it
+            # runs the legacy group-level bootstrap via (a) for
+            # bit-identity with pre-release behavior; when varying,
+            # it switches to the cell-level wild PSU bootstrap that
+            # draws one multiplier per (g, t)'s PSU. Under auto-inject
+            # `psu=group` each group has a unique PSU code and every
+            # cell of a group shares it — the dispatcher routes to
+            # the legacy path and the identity-map fast path
+            # reproduces the pre-PSU behavior bit-for-bit. See
+            # REGISTRY.md ChaisemartinDHaultfoeuille Survey +
+            # bootstrap contract Note.
             group_id_to_psu_code_bootstrap: Optional[Dict[Any, int]] = None
             eligible_group_ids_bootstrap: Optional[np.ndarray] = None
+            psu_codes_per_cell_bootstrap: Optional[np.ndarray] = None
             if (
                 resolved_survey is not None
                 and getattr(resolved_survey, "psu", None) is not None
@@ -2299,33 +2344,95 @@ def fit(
             ):
                 obs_psu_codes = np.asarray(resolved_survey.psu)
                 obs_gids_boot = np.asarray(_obs_survey_info["group_ids"])
+                obs_tids_boot = np.asarray(_obs_survey_info["time_ids"])
                 obs_weights_boot = np.asarray(
                     _obs_survey_info["weights"], dtype=np.float64
                 )
                 pos_mask_boot = obs_weights_boot > 0
-                group_psu_labels: List[Any] = []
-                valid_map = True
-                for gid in _eligible_group_ids:
-                    mask_g = (obs_gids_boot == gid) & pos_mask_boot
-                    if not mask_g.any():
-                        valid_map = False
-                        break
-                    labels = obs_psu_codes[mask_g]
-                    # Within-group-constant PSU is validated upstream;
-                    # the first label represents the whole group.
-                    group_psu_labels.append(labels[0])
-                if valid_map and len(group_psu_labels) == n_groups_for_overall_var:
-                    # Factor PSU labels to dense integer codes.
-                    _, dense_codes = np.unique(
-                        np.asarray(group_psu_labels),
-                        return_inverse=True,
+                gid_to_idx = {
+                    gid: i for i, gid in enumerate(_eligible_group_ids)
+                }
+                tid_to_idx = {t: i for i, t in enumerate(all_periods)}
+                n_elig_boot = len(_eligible_group_ids)
+                n_per_boot = len(all_periods)
+                g_idx_arr = np.array(
+                    [gid_to_idx.get(g, -1) for g in obs_gids_boot],
+                    dtype=np.int64,
+                )
+                t_idx_arr = np.array(
+                    [tid_to_idx.get(t, -1) for t in obs_tids_boot],
+                    dtype=np.int64,
+                )
+                # Factor PSU labels to dense int codes over the
+                # **eligible-subset** positive-weight observations only
+                # (not the full positive-weight population). Restricting
+                # to eligible obs ensures the resulting dense codes
+                # range ONLY over PSUs actually used by variance-
+                # eligible groups, so downstream n_psu = max(code) + 1
+                # is exact: no gaps from singleton-baseline-excluded
+                # groups that would silently trigger the identity
+                # fast path in `_generate_psu_or_group_weights`.
+                elig_obs_mask = (
+                    pos_mask_boot & (g_idx_arr >= 0) & (t_idx_arr >= 0)
+                )
+                elig_psu_labels = obs_psu_codes[elig_obs_mask]
+                dense_per_row: Optional[np.ndarray] = None
+                if elig_psu_labels.size > 0:
+                    _, elig_dense_codes = np.unique(
+                        elig_psu_labels, return_inverse=True,
+                    )
+                    elig_dense_codes = np.asarray(elig_dense_codes, dtype=np.int64)
+                    dense_per_row = np.full(
+                        len(obs_psu_codes), -1, dtype=np.int64,
+                    )
+                    dense_per_row[elig_obs_mask] = elig_dense_codes
+
+                # Per-cell PSU tensor: (n_eligible, n_periods), -1 sentinel
+                # for ineligible / zero-weight cells. Populated
+                # unconditionally when `dense_per_row` exists — a row
+                # that ends up all-sentinel (eligible group with no
+                # positive-weight obs) is masked out at unroll time,
+                # not by discarding the entire tensor. See also the
+                # dispatcher's `_psu_varies_within_group` helper which
+                # ignores sentinel entries row-wise.
+                if dense_per_row is not None:
+                    psu_codes_per_cell = np.full(
+                        (n_elig_boot, n_per_boot), -1, dtype=np.int64,
                     )
-                    dense_codes = np.asarray(dense_codes, dtype=np.int64)
+                    psu_codes_per_cell[
+                        g_idx_arr[elig_obs_mask],
+                        t_idx_arr[elig_obs_mask],
+                    ] = dense_per_row[elig_obs_mask]
+                    psu_codes_per_cell_bootstrap = psu_codes_per_cell
+
+                    # Group-level dict: one PSU code per eligible
+                    # group. For rows that are all-sentinel (eligible
+                    # group has no positive-weight obs), assign code
+                    # `0` as a harmless placeholder — the group's IF
+                    # mass is zero, so the bootstrap multiplier it
+                    # receives is irrelevant on either the legacy or
+                    # the cell-level path. Always populate the dict
+                    # so the legacy group-level path keeps clustering
+                    # correctly when psu_varies=False even if some
+                    # eligible groups happen to have no positive-
+                    # weight obs.
+                    group_psu_labels: List[int] = []
+                    for i in range(n_elig_boot):
+                        row = psu_codes_per_cell[i]
+                        valid = row[row >= 0]
+                        if valid.size == 0:
+                            group_psu_labels.append(0)
+                        else:
+                            group_psu_labels.append(int(valid[0]))
                     group_id_to_psu_code_bootstrap = {
-                        gid: int(code)
-                        for gid, code in zip(_eligible_group_ids, dense_codes)
+                        gid: code
+                        for gid, code in zip(
+                            _eligible_group_ids, group_psu_labels
+                        )
                     }
-                    eligible_group_ids_bootstrap = np.asarray(_eligible_group_ids)
+                    eligible_group_ids_bootstrap = np.asarray(
+                        _eligible_group_ids
+                    )
 
             br = self._compute_dcdh_bootstrap(
                 n_groups_for_overall=n_groups_for_overall_var,
@@ -2339,6 +2446,8 @@ def fit(
                 placebo_horizon_inputs=pl_boot_inputs,
                 group_id_to_psu_code=group_id_to_psu_code_bootstrap,
                 eligible_group_ids=eligible_group_ids_bootstrap,
+                u_per_period_overall=U_centered_pp_overall,
+                psu_codes_per_cell=psu_codes_per_cell_bootstrap,
             )
             bootstrap_results = br
 
@@ -5464,14 +5573,14 @@ def _strata_psu_vary_within_group(
 ) -> Tuple[bool, bool]:
     """Return (strata_varies_within_group, psu_varies_within_group).
 
-    Diagnostic helper used at ``fit()`` time to gate the remaining
-    out-of-scope combination for the cell-period IF allocator:
-    ``n_bootstrap > 0`` still uses a group-level PSU map and raises
-    ``NotImplementedError`` when PSU varies within group. The
-    heterogeneity WLS path supports within-group-varying PSU/strata
-    via the cell-period allocator (shipped in the PR that lifted the
-    previous gate). Zero-weight rows are excluded from the check
-    (subpopulation contract).
+    Diagnostic helper. No longer used to gate any out-of-scope
+    combination in ``fit()`` — the analytical TSL path, the
+    heterogeneity WLS path, and the PSU-level wild multiplier
+    bootstrap all support within-group-varying PSU/strata via the
+    cell-period allocator. The helper is retained for callers that
+    need to branch on the regime (e.g., documentation, diagnostic
+    warnings). Zero-weight rows are excluded (subpopulation
+    contract).
     """
     if resolved is None:
         return False, False
@@ -5778,6 +5887,53 @@ def _survey_se_from_group_if(
         and U_centered_per_period.size > 0
     )
 
+    # Binder TSL fallback: when the cell allocator would apply but
+    # PSU is within-group-constant (PSU=group, strictly-coarser PSU
+    # within-group-constant, or the auto-inject default), the cell
+    # and group allocators are equivalent at PSU-level aggregation
+    # via the row-sum identity `sum_{c in g} u_cell[c] == u_centered[g]`.
+    # Prefer the legacy group-level path in that regime: it sidesteps
+    # the sentinel-mass guard (below) that would otherwise fire
+    # spuriously on terminally-missing panels whose PSU structure
+    # does not require cell-level resolution. Matches the bootstrap
+    # dispatcher's routing rule (`_compute_dcdh_bootstrap` +
+    # `_psu_varies_within_group`).
+    #
+    # **Replicate-weight carve-out:** replicate designs do not carry
+    # `resolved.psu`, but the Class A replicate contract shipped in
+    # PR #323 applies `compute_replicate_if_variance` to the cell-
+    # allocator `psi_obs` — per-row-varying replicate ratios are
+    # allocator-sensitive, so forcing replicate fits onto the legacy
+    # group-level allocator would silently change their SE. The
+    # fallback therefore skips replicate fits; the sentinel-mass
+    # guard still fires on mass leakage when it applies. The Class
+    # A replicate allocator contract is asserted by
+    # `tests/test_survey_dcdh_replicate_psu.py::TestReplicateClassA::
+    # test_att_cell_allocator_with_varying_replicate_ratios`.
+    if use_cell_allocator and not getattr(
+        resolved, "uses_replicate_variance", False
+    ):
+        psu_arr = getattr(resolved, "psu", None)
+        if psu_arr is None:
+            use_cell_allocator = False
+        else:
+            psu_eff = np.asarray(psu_arr)[pos_mask]
+            eligible_set = set(eligible_groups)
+            elig_row_mask = np.array(
+                [g in eligible_set for g in gids_eff], dtype=bool
+            )
+            if elig_row_mask.any():
+                psu_varies_within = bool(
+                    pd.DataFrame({
+                        "g": gids_eff[elig_row_mask],
+                        "p": psu_eff[elig_row_mask],
+                    }).groupby("g")["p"].nunique().gt(1).any()
+                )
+                if not psu_varies_within:
+                    use_cell_allocator = False
+            else:
+                use_cell_allocator = False
+
     if use_cell_allocator:
         tids_eff = np.asarray(time_ids)[pos_mask]
         # Map row's group to an index in eligible_groups (−1 when the
@@ -5805,6 +5961,61 @@ def _survey_se_from_group_if(
             (elig_idx_eff[valid_cell], col_idx_eff[valid_cell]),
             w_eff[valid_cell],
         )
+        # Sentinel-mass guard (mirror of `_unroll_target_to_cells` on
+        # the bootstrap path). Under terminal missingness,
+        # `_cohort_recenter_per_period` subtracts cohort column means
+        # across the full period grid, so a group with no observation
+        # at period t can acquire non-zero centered mass at that cell.
+        # The cell-level expansion `psi_i = U[g,t] * (w_i / W_{g,t})`
+        # has no observation to attach that mass to (W_{g,t} = 0), so
+        # silently dropping it would understate the SE. Raise a
+        # targeted ValueError instead (consistent with the cell-level
+        # bootstrap's `_unroll_target_to_cells` guard).
+        missing_cell_mask = W_cell == 0
+        if missing_cell_mask.any():
+            leaked = U_centered_per_period[missing_cell_mask]
+            if leaked.size > 0 and bool(
+                np.any(np.abs(leaked) > 1e-12)
+            ):
+                # Branch the wording on replicate-vs-TSL so replicate
+                # ATT users debugging this error are not misdirected
+                # to "within-group-varying PSU" (replicate designs
+                # have no PSU structure). Core mechanics and the
+                # pre-processing workaround are shared.
+                is_replicate = bool(
+                    getattr(resolved, "uses_replicate_variance", False)
+                )
+                path_name = (
+                    "Rao-Wu replicate-weight ATT variance"
+                    if is_replicate
+                    else "Analytical Binder TSL survey SE"
+                )
+                trigger_detail = (
+                    "terminal missingness on a replicate-weight "
+                    "design (replicate ATT unconditionally uses the "
+                    "cell-period allocator per the Class A contract; "
+                    "PSU structure is not involved)"
+                    if is_replicate
+                    else "terminal missingness combined with within-"
+                    "group-varying PSU (the Binder TSL cell-period "
+                    "allocator cannot allocate leaked mass to any "
+                    "observation)"
+                )
+                raise ValueError(
+                    f"{path_name} cannot be computed on this panel: "
+                    "cohort-recentered IF mass landed on (g, t) cells "
+                    "with no positive-weight observations "
+                    f"(W_{{g, t}} = 0). This typically occurs with "
+                    f"{trigger_detail}: _cohort_recenter_per_period "
+                    "subtracts cohort column means across the full "
+                    "period grid, so a group with no observation at "
+                    "period t acquires non-zero centered mass there, "
+                    "which the cell-period allocator cannot allocate "
+                    "to any observation. Pre-process the panel to "
+                    "remove terminal missingness (drop late-exit "
+                    "groups or trim to a balanced sub-panel) before "
+                    "fitting."
+                )
         # Lookup U_centered_per_period and W_cell per row.
         u_obs_cell = np.zeros(w_eff.shape[0], dtype=np.float64)
         u_obs_cell[valid_cell] = U_centered_per_period[
diff --git a/diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py b/diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py
index 27552296..b1734ddd 100644
--- a/diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py
+++ b/diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py
@@ -9,19 +9,42 @@
 ``CallawaySantAnna``, ``ImputationDiD``, and ``TwoStageDiD``). Under
 ``survey_design`` with an explicitly-coarser PSU, the bootstrap switches
 to PSU-level Hall-Mammen wild clustering: each PSU draws a single
-multiplier and all groups within that PSU share it
-(see ``_generate_psu_or_group_weights`` and ``_map_for_target`` below,
-plus the REGISTRY.md ``ChaisemartinDHaultfoeuille`` Note on survey +
+multiplier and all groups within that PSU share it (see
+``_generate_psu_or_group_weights`` and ``_map_for_target`` below, plus
+the REGISTRY.md ``ChaisemartinDHaultfoeuille`` Note on survey +
 bootstrap). Under the default auto-inject ``psu=group`` each group is
 its own PSU and the identity-map fast path reproduces the original
-group-level behavior bit-for-bit. The bootstrap is a library extension,
-not a paper requirement, and is documented as such in ``REGISTRY.md``.
+group-level behavior bit-for-bit.
+
+**Cell-level wild PSU bootstrap (within-group-varying PSU):** when a
+survey design's PSU varies across the cells of a group, the group-level
+PSU map cannot represent per-PSU contributions (a group with cells in
+PSUs ``{p1, p2}`` would collapse to the first PSU, under-clustering the
+bootstrap). This module's dispatcher (``_psu_varies_within_group``)
+detects the regime and switches to a cell-level allocator: each
+observation-level ``psi_i`` is attributed to its ``(g, t)`` cell's PSU
+via ``psu_codes_per_cell`` (shape ``(n_eligible_groups, n_periods)``,
+-1 sentinel for zero-weight cells), and the bootstrap statistic becomes
+``theta_r = sum_c multiplier[psu(c)] * u_centered_pp[c] / divisor``
+(using the cohort-recentered per-cell IF ``U_centered_per_period``).
+Under PSU-within-group-constant the row-sum identity
+``sum_{c in g} u_cell[c] == u_centered[g]`` (enforced by
+``_cohort_recenter_per_period``) makes the cell-level and group-level
+bootstraps statistically equivalent, and the dispatcher routes to the
+legacy group-level path for bit-identity with pre-cell-level releases.
+Multi-horizon bootstraps draw a single shared ``(n_bootstrap, n_psu)``
+PSU-level weight matrix per block and broadcast per-horizon via each
+horizon's cell-to-PSU map, so the sup-t simultaneous band remains a
+valid joint distribution. The bootstrap is a library extension, not a
+paper requirement, and is documented in ``REGISTRY.md``.
 
 The mixin operates on **pre-computed cohort-centered influence-function
 values**: the main estimator class computes per-group ``U^G_g`` values
-during the analytical variance calculation, recenters them by their
-cohort means (using the ``(D_{g,1}, F_g, S_g)`` triple), and stores the
-recentered vector. The bootstrap then multiplies this vector by random
+(and, under survey designs, per-``(g, t)``-cell attributions
+``U_centered_per_period[g, t]``) during the analytical variance
+calculation, recenters them by their cohort means (using the
+``(D_{g,1}, F_g, S_g)`` triple), and stores the recentered vector /
+tensor. The bootstrap then multiplies this structure by random
 multiplier weights (Rademacher / Mammen / Webb) and re-aggregates to
 produce a bootstrap distribution per target.
 """
@@ -71,15 +94,28 @@ def _compute_dcdh_bootstrap(
         u_centered_overall: np.ndarray,
         divisor_overall: int,
         original_overall: float,
-        joiners_inputs: Optional[Tuple[np.ndarray, int, float]] = None,
-        leavers_inputs: Optional[Tuple[np.ndarray, int, float]] = None,
-        placebo_inputs: Optional[Tuple[np.ndarray, int, float]] = None,
+        joiners_inputs: Optional[
+            Tuple[np.ndarray, int, float, Optional[np.ndarray]]
+        ] = None,
+        leavers_inputs: Optional[
+            Tuple[np.ndarray, int, float, Optional[np.ndarray]]
+        ] = None,
+        placebo_inputs: Optional[
+            Tuple[np.ndarray, int, float, Optional[np.ndarray]]
+        ] = None,
         # --- Phase 2: multi-horizon inputs ---
-        multi_horizon_inputs: Optional[Dict[int, Tuple[np.ndarray, int, float]]] = None,
-        placebo_horizon_inputs: Optional[Dict[int, Tuple[np.ndarray, int, float]]] = None,
+        multi_horizon_inputs: Optional[
+            Dict[int, Tuple[np.ndarray, int, float, Optional[np.ndarray]]]
+        ] = None,
+        placebo_horizon_inputs: Optional[
+            Dict[int, Tuple[np.ndarray, int, float, Optional[np.ndarray]]]
+        ] = None,
         # --- Survey: PSU-level bootstrap under survey designs ---
         group_id_to_psu_code: Optional[Dict[Any, int]] = None,
         eligible_group_ids: Optional[np.ndarray] = None,
+        # --- Survey: cell-level wild PSU bootstrap ---
+        u_per_period_overall: Optional[np.ndarray] = None,
+        psu_codes_per_cell: Optional[np.ndarray] = None,
     ) -> DCDHBootstrapResults:
         """
         Compute multiplier-bootstrap inference for all dCDH targets.
@@ -98,13 +134,22 @@ def _compute_dcdh_bootstrap(
         - the original point estimate of ``T`` (used as the centering
           point for the percentile p-value)
 
-        For each target, this method:
+        For each target, this method dispatches between two
+        allocator paths based on ``psu_codes_per_cell`` (via the
+        ``_psu_varies_within_group`` helper):
+
+        **Legacy group-level path** — runs when
+        ``psu_codes_per_cell`` is ``None`` or PSU is within-group-
+        constant (PSU=group auto-inject, strictly-coarser PSU with
+        within-group constancy, or no survey design). For each target:
 
         1. Generates an ``(n_bootstrap, n_groups_target)`` matrix of
            multiplier weights via
            :func:`~diff_diff.bootstrap_utils.generate_bootstrap_weights_batch`,
            where ``n_groups_target`` is the IF vector length (one
-           weight per contributing group).
+           weight per contributing group). When a coarser PSU map is
+           passed, weights are drawn once per PSU and broadcast to
+           groups so all groups in the same PSU share a multiplier.
         2. Computes the bootstrap distribution as
            ``W @ u_centered / divisor`` (one bootstrap replicate per
            row), where ``divisor`` is the switching-cell count
@@ -116,6 +161,34 @@ def _compute_dcdh_bootstrap(
            :func:`~diff_diff.bootstrap_utils.compute_effect_bootstrap_stats`
            to obtain ``(SE, CI, p_value)``.
 
+        **Cell-level wild PSU path** — runs when PSU varies within
+        group (any row of ``psu_codes_per_cell`` has > 1 unique non-
+        sentinel code). For each target:
+
+        1. Unrolls the target's ``u_per_period`` tensor and
+           ``psu_codes_per_cell`` to 1-D cell-level arrays via
+           :func:`_unroll_target_to_cells`, dropping cells with
+           sentinel (-1) PSU. Raises ``ValueError`` if cohort-
+           recentered mass lands on a sentinel cell (the terminal-
+           missingness guard — see REGISTRY.md).
+        2. For **single-target** branches (overall, joiners, leavers,
+           placebo-horizon-per-l), generates an
+           ``(n_bootstrap, n_cells_in_target)`` weight matrix where
+           each cell gets the multiplier of its PSU code, via
+           ``psu_weights[:, psu_cell]`` broadcast. Bootstrap
+           distribution becomes ``W_cell @ u_cell / divisor``.
+        3. For the **multi-horizon** block, draws a SINGLE shared
+           ``(n_bootstrap, n_psu)`` PSU-level weight matrix once and
+           broadcasts per-horizon via each horizon's cell-to-PSU map.
+           Preserves the sup-t simultaneous confidence band as a
+           valid joint distribution across horizons.
+
+        Under PSU=group, the row-sum identity
+        ``sum_{c in g} u_cell[c] == u_centered[g]`` makes the two
+        paths statistically equivalent; the dispatcher prefers the
+        legacy path in that regime for bit-identity with pre-PR-4
+        releases via the identity-map fast path.
+
         Parameters
         ----------
         n_groups_for_overall : int
@@ -135,17 +208,50 @@ def _compute_dcdh_bootstrap(
             :func:`compute_effect_bootstrap_stats` for the percentile
             p-value computation.
         joiners_inputs : tuple, optional
-            ``(u_centered, divisor, original_effect)`` triple for the
-            joiners-only ``DID_+`` target. The ``divisor`` is the
-            joiner switching-cell total ``sum_t N_{1,0,t}``, NOT the
-            joiner group count. ``None`` when no joiners exist.
+            ``(u_centered, divisor, original_effect, u_per_period)``
+            4-tuple for the joiners-only ``DID_+`` target. The
+            ``divisor`` is the joiner switching-cell total
+            ``sum_t N_{1,0,t}`` (NOT the joiner group count).
+            ``u_per_period`` is the cohort-recentered per-cell IF
+            tensor of shape ``(n_eligible_groups, n_periods)`` (or
+            ``None`` when survey is inactive); it is consumed only
+            by the cell-level dispatch described below.
         leavers_inputs : tuple, optional
-            Same triple for the leavers-only ``DID_-`` target. The
-            ``divisor`` is the leaver switching-cell total
-            ``sum_t N_{0,1,t}``.
+            Same 4-tuple for the leavers-only ``DID_-`` target.
+            ``divisor`` is ``sum_t N_{0,1,t}``.
         placebo_inputs : tuple, optional
-            Same triple for the Phase 1 per-period placebo ``DID_M^pl``.
-            ``None`` when ``L_max=None`` (per-period placebo has no IF).
+            Same 4-tuple for the Phase 1 per-period placebo
+            ``DID_M^pl``. ``None`` when ``L_max=None`` (per-period
+            placebo has no IF). The 4th slot is always ``None`` here
+            because the Phase 1 placebo has no per-cell attribution.
+        multi_horizon_inputs : dict, optional
+            Per-horizon 4-tuples keyed by horizon ``l``. Same layout
+            as ``joiners_inputs``; the per-horizon per-cell tensor
+            drives the shared-PSU-weights broadcast under the cell-
+            level path.
+        placebo_horizon_inputs : dict, optional
+            Same shape as ``multi_horizon_inputs`` for the Phase 2
+            placebo-horizon targets.
+        group_id_to_psu_code : dict, optional
+            Group-ID → dense PSU code map for the legacy group-level
+            bootstrap path. ``None`` when no PSU info is available.
+        eligible_group_ids : np.ndarray, optional
+            Ordered group IDs matching the ``u_centered_*`` vectors
+            for the legacy group-level path.
+        u_per_period_overall : np.ndarray, optional
+            Cohort-recentered per-cell IF tensor for the overall
+            ``DID_M`` target, shape ``(n_eligible_groups, n_periods)``.
+            Consumed only by the cell-level path.
+        psu_codes_per_cell : np.ndarray, optional
+            Per-cell PSU code tensor, shape
+            ``(n_eligible_groups, n_periods)``, with ``-1`` sentinel
+            for ineligible / zero-weight cells. Used by
+            ``_psu_varies_within_group`` to pick between the legacy
+            group-level bootstrap path (PSU-within-group-constant,
+            bit-identical to pre-PR-4 behavior via the identity-map
+            fast path) and the cell-level wild PSU bootstrap
+            (within-group-varying PSU, cell-level multipliers and
+            shared PSU-level weights across multi-horizon blocks).
 
         Returns
         -------
@@ -178,13 +284,53 @@ def _compute_dcdh_bootstrap(
         # group-IDs parameter for it rather than reusing the overall
         # eligible list.
 
+        # Dispatcher for the cell-level wild PSU bootstrap. When PSU
+        # varies across cells of a group, a group-level PSU map
+        # collapses within-group PSU variation and the bootstrap
+        # under-clusters. The cell-level path draws multipliers at
+        # PSU granularity and applies them per (g, t) cell via
+        # `psu_codes_per_cell`. Under PSU-within-group-constant
+        # regimes (including PSU=group and strictly-coarser PSU
+        # within-group-constant), the row-sum identity
+        # `sum_{c in g} u_cell[c] == u_centered[g]` makes the two
+        # paths statistically equivalent, and the dispatcher routes
+        # to the legacy path for bit-identity with pre-release
+        # behavior. See REGISTRY.md survey + bootstrap contract Note.
+        psu_varies = _psu_varies_within_group(psu_codes_per_cell)
+
         # --- Overall DID_M ---
         # Skip the scalar DID_M bootstrap when divisor_overall <= 0
         # (e.g., pure non-binary panels where N_S=0), but continue
         # to process multi_horizon_inputs and placebo_horizon_inputs.
         if divisor_overall > 0:
+            if psu_varies:
+                # Contract: when the cell-level path is required
+                # (psu_varies=True), the caller MUST provide the
+                # target's per-cell IF tensor. Silent fallback to the
+                # group-level allocator would under-cluster the
+                # bootstrap by collapsing multi-PSU group
+                # contributions to one PSU.
+                if u_per_period_overall is None:
+                    raise ValueError(
+                        "Cell-level bootstrap requires "
+                        "u_per_period_overall when PSU varies "
+                        "within group, got None. Caller must "
+                        "thread the cohort-recentered per-cell IF "
+                        "tensor (U_centered_pp_overall) from the "
+                        "analytical TSL path."
+                    )
+                u_boot_overall, map_boot_overall = _unroll_target_to_cells(
+                    u_per_period_overall, psu_codes_per_cell,
+                )
+            else:
+                u_boot_overall = u_centered_overall
+                map_boot_overall = _map_for_target(
+                    u_centered_overall.shape[0],
+                    group_id_to_psu_code,
+                    eligible_group_ids,
+                )
             overall_se, overall_ci, overall_p, overall_dist = _bootstrap_one_target(
-                u_centered=u_centered_overall,
+                u_centered=u_boot_overall,
                 divisor=divisor_overall,
                 original=original_overall,
                 n_bootstrap=self.n_bootstrap,
@@ -193,11 +339,7 @@ def _compute_dcdh_bootstrap(
                 rng=rng,
                 context="dCDH overall DID_M bootstrap",
                 return_distribution=True,
-                group_to_psu_map=_map_for_target(
-                    u_centered_overall.shape[0],
-                    group_id_to_psu_code,
-                    eligible_group_ids,
-                ),
+                group_to_psu_map=map_boot_overall,
             )
         else:
             overall_se = np.nan
@@ -217,10 +359,25 @@ def _compute_dcdh_bootstrap(
 
         # --- Joiners (DID_+) ---
         if joiners_inputs is not None:
-            u_j, n_j, eff_j = joiners_inputs
+            u_j, n_j, eff_j, u_pp_j = joiners_inputs
             if u_j.size > 0 and n_j > 0:
+                if psu_varies:
+                    if u_pp_j is None:
+                        raise ValueError(
+                            "Cell-level bootstrap requires joiners' "
+                            "per-cell IF tensor (U_centered_pp_joiners) "
+                            "when PSU varies within group, got None."
+                        )
+                    u_boot_j, map_boot_j = _unroll_target_to_cells(
+                        u_pp_j, psu_codes_per_cell,
+                    )
+                else:
+                    u_boot_j = u_j
+                    map_boot_j = _map_for_target(
+                        u_j.size, group_id_to_psu_code, eligible_group_ids,
+                    )
                 se_j, ci_j, p_j, _ = _bootstrap_one_target(
-                    u_centered=u_j,
+                    u_centered=u_boot_j,
                     divisor=n_j,
                     original=eff_j,
                     n_bootstrap=self.n_bootstrap,
@@ -229,9 +386,7 @@ def _compute_dcdh_bootstrap(
                     rng=rng,
                     context="dCDH joiners DID_+ bootstrap",
                     return_distribution=False,
-                    group_to_psu_map=_map_for_target(
-                        u_j.size, group_id_to_psu_code, eligible_group_ids,
-                    ),
+                    group_to_psu_map=map_boot_j,
                 )
                 results.joiners_se = se_j
                 results.joiners_ci = ci_j
@@ -239,10 +394,25 @@ def _compute_dcdh_bootstrap(
 
         # --- Leavers (DID_-) ---
         if leavers_inputs is not None:
-            u_l, n_l, eff_l = leavers_inputs
+            u_l, n_l, eff_l, u_pp_l = leavers_inputs
             if u_l.size > 0 and n_l > 0:
+                if psu_varies:
+                    if u_pp_l is None:
+                        raise ValueError(
+                            "Cell-level bootstrap requires leavers' "
+                            "per-cell IF tensor (U_centered_pp_leavers) "
+                            "when PSU varies within group, got None."
+                        )
+                    u_boot_l, map_boot_l = _unroll_target_to_cells(
+                        u_pp_l, psu_codes_per_cell,
+                    )
+                else:
+                    u_boot_l = u_l
+                    map_boot_l = _map_for_target(
+                        u_l.size, group_id_to_psu_code, eligible_group_ids,
+                    )
                 se_l, ci_l, p_l, _ = _bootstrap_one_target(
-                    u_centered=u_l,
+                    u_centered=u_boot_l,
                     divisor=n_l,
                     original=eff_l,
                     n_bootstrap=self.n_bootstrap,
@@ -251,17 +421,17 @@ def _compute_dcdh_bootstrap(
                     rng=rng,
                     context="dCDH leavers DID_- bootstrap",
                     return_distribution=False,
-                    group_to_psu_map=_map_for_target(
-                        u_l.size, group_id_to_psu_code, eligible_group_ids,
-                    ),
+                    group_to_psu_map=map_boot_l,
                 )
                 results.leavers_se = se_l
                 results.leavers_ci = ci_l
                 results.leavers_p_value = p_l
 
         # --- Placebo (DID_M^pl) ---
+        # Phase 1 placebo has no per-cell IF (unpack tolerates None in
+        # the fourth slot — callers always pass None for this target).
         if placebo_inputs is not None:
-            u_pl, n_pl, eff_pl = placebo_inputs
+            u_pl, n_pl, eff_pl, _u_pp_pl_unused = placebo_inputs
             if u_pl.size > 0 and n_pl > 0:
                 se_pl, ci_pl, p_pl, _ = _bootstrap_one_target(
                     u_centered=u_pl,
@@ -282,32 +452,58 @@ def _compute_dcdh_bootstrap(
                 results.placebo_p_value = p_pl
 
         # --- Phase 2: Multi-horizon bootstrap with shared weight matrix ---
-        # Generate ONE shared (n_bootstrap, n_groups) weight matrix so all
-        # horizons use the same bootstrap draw, making the sup-t statistic
-        # a valid joint multiplier-bootstrap band.
+        # Generate ONE shared weight matrix so all horizons use the same
+        # bootstrap draw, making the sup-t statistic a valid joint
+        # multiplier-bootstrap band. Under PSU-within-group-constant the
+        # shared draws live at group granularity (bit-identical to
+        # pre-cell-level); under within-group-varying PSU the shared
+        # draws live at PSU granularity and are broadcast per-horizon to
+        # the horizon's cells via `psu_codes_per_cell`.
         if multi_horizon_inputs is not None:
             es_ses: Dict[int, float] = {}
             es_cis: Dict[int, Tuple[float, float]] = {}
             es_pvals: Dict[int, float] = {}
             es_dists: Dict[int, np.ndarray] = {}
 
-            # Shared weight matrix sized for the group set. Under PSU-level
-            # bootstrap (Hall-Mammen wild PSU), weights are drawn once per
-            # PSU and broadcast to groups so all groups in the same PSU
-            # share a multiplier within a single bootstrap replicate —
-            # preserving the sup-t joint distribution across horizons.
             n_groups_mh = n_groups_for_overall
-            shared_weights = _generate_psu_or_group_weights(
-                n_bootstrap=self.n_bootstrap,
-                n_groups_target=n_groups_mh,
-                weight_type=self.bootstrap_weights,
-                rng=rng,
-                group_to_psu_map=_map_for_target(
-                    n_groups_mh, group_id_to_psu_code, eligible_group_ids,
-                ),
-            )
+            if psu_varies:
+                # Draw ONE shared (n_bootstrap, n_psu) PSU-level weight
+                # matrix. Broadcast per-horizon via each horizon's
+                # cell-to-PSU map inside the loop. PSU count derived
+                # from the dense code domain of psu_codes_per_cell.
+                assert psu_codes_per_cell is not None
+                valid_psu_codes = psu_codes_per_cell[psu_codes_per_cell >= 0]
+                n_psu_mh = int(valid_psu_codes.max()) + 1 if valid_psu_codes.size > 0 else 0
+                shared_psu_weights: Optional[np.ndarray]
+                if n_psu_mh > 0:
+                    shared_psu_weights = _generate_bootstrap_weights_batch(
+                        n_bootstrap=self.n_bootstrap,
+                        n_units=n_psu_mh,
+                        weight_type=self.bootstrap_weights,
+                        rng=rng,
+                    )
+                else:
+                    shared_psu_weights = None
+                shared_weights = None  # not used on the cell path
+            else:
+                # Shared weight matrix sized for the group set. Under
+                # PSU-within-group-constant (Hall-Mammen wild PSU),
+                # weights are drawn once per PSU and broadcast to groups
+                # so all groups in the same PSU share a multiplier
+                # within a single bootstrap replicate — preserving the
+                # sup-t joint distribution across horizons.
+                shared_weights = _generate_psu_or_group_weights(
+                    n_bootstrap=self.n_bootstrap,
+                    n_groups_target=n_groups_mh,
+                    weight_type=self.bootstrap_weights,
+                    rng=rng,
+                    group_to_psu_map=_map_for_target(
+                        n_groups_mh, group_id_to_psu_code, eligible_group_ids,
+                    ),
+                )
+                shared_psu_weights = None
 
-            for l_h, (u_h, n_h, eff_h) in sorted(multi_horizon_inputs.items()):
+            for l_h, (u_h, n_h, eff_h, u_pp_h) in sorted(multi_horizon_inputs.items()):
                 if u_h.size > 0 and n_h > 0:
                     # Under the current contract every horizon's IF
                     # vector uses the variance-eligible group ordering
@@ -331,8 +527,27 @@ def _compute_dcdh_bootstrap(
                             f"shared PSU draws onto the horizon's own "
                             f"ordering via `_map_for_target`."
                         )
-                    w_h = shared_weights
-                    deviations = (w_h @ u_h) / n_h
+                    if psu_varies:
+                        if u_pp_h is None:
+                            raise ValueError(
+                                f"Cell-level bootstrap requires "
+                                f"per-cell IF tensor for multi-horizon "
+                                f"target l={l_h} when PSU varies "
+                                f"within group, got None."
+                            )
+                        assert shared_psu_weights is not None
+                        # Cell-level: unroll this horizon's cells and
+                        # broadcast the shared PSU weights.
+                        u_cell_h, psu_cell_h = _unroll_target_to_cells(
+                            u_pp_h, psu_codes_per_cell,
+                        )
+                        if u_cell_h.size == 0:
+                            continue
+                        w_cell_h = shared_psu_weights[:, psu_cell_h]
+                        deviations = (w_cell_h @ u_cell_h) / n_h
+                    else:
+                        assert shared_weights is not None
+                        deviations = (shared_weights @ u_h) / n_h
                     dist_h = deviations + eff_h
 
                     se_h, ci_h, p_h = _compute_effect_bootstrap_stats(
@@ -367,15 +582,38 @@ def _compute_dcdh_bootstrap(
                     results.cband_crit_value = cband_crit
 
         # --- Phase 2: Placebo horizon bootstrap ---
+        # Note: placebo-horizons are treated as independent single-target
+        # draws (no sup-t joint-distribution requirement across placebo
+        # horizons), so each horizon gets its own RNG draw via
+        # `_bootstrap_one_target`. Under within-group-varying PSU the
+        # per-horizon cell unroll is used.
         if placebo_horizon_inputs is not None:
             pl_ses: Dict[int, float] = {}
             pl_cis: Dict[int, Tuple[float, float]] = {}
             pl_pvals: Dict[int, float] = {}
 
-            for l_h, (u_h, n_h, eff_h) in sorted(placebo_horizon_inputs.items()):
+            for l_h, (u_h, n_h, eff_h, u_pp_h) in sorted(placebo_horizon_inputs.items()):
                 if u_h.size > 0 and n_h > 0:
+                    if psu_varies:
+                        if u_pp_h is None:
+                            raise ValueError(
+                                f"Cell-level bootstrap requires "
+                                f"per-cell IF tensor for placebo-"
+                                f"horizon target l={l_h} when PSU "
+                                f"varies within group, got None."
+                            )
+                        u_boot_plh, map_boot_plh = _unroll_target_to_cells(
+                            u_pp_h, psu_codes_per_cell,
+                        )
+                        if u_boot_plh.size == 0:
+                            continue
+                    else:
+                        u_boot_plh = u_h
+                        map_boot_plh = _map_for_target(
+                            u_h.size, group_id_to_psu_code, eligible_group_ids,
+                        )
                     se_h, ci_h, p_h, _ = _bootstrap_one_target(
-                        u_centered=u_h,
+                        u_centered=u_boot_plh,
                         divisor=n_h,
                         original=eff_h,
                         n_bootstrap=self.n_bootstrap,
@@ -384,9 +622,7 @@ def _compute_dcdh_bootstrap(
                         rng=rng,
                         context=f"dCDH placebo l={l_h} bootstrap",
                         return_distribution=False,
-                        group_to_psu_map=_map_for_target(
-                            u_h.size, group_id_to_psu_code, eligible_group_ids,
-                        ),
+                        group_to_psu_map=map_boot_plh,
                     )
                     pl_ses[l_h] = se_h
                     pl_cis[l_h] = ci_h
@@ -404,6 +640,117 @@ def _compute_dcdh_bootstrap(
 # =============================================================================
 
 
+def _psu_varies_within_group(
+    psu_codes_per_cell: Optional[np.ndarray],
+) -> bool:
+    """True when any row of ``psu_codes_per_cell`` has more than one
+    unique PSU label (ignoring -1 sentinel entries).
+
+    When ``False`` — including the ``None`` case for non-survey fits —
+    the legacy group-level bootstrap path is invoked. The row-sum
+    identity ``sum_{c in g} u_cell[c] == u_centered[g]`` established
+    by ``_cohort_recenter_per_period`` makes the cell-level and
+    group-level bootstraps statistically equivalent under this regime,
+    and the group-level path is bit-identical to pre-cell-level
+    releases through the existing identity-map fast path.
+    """
+    if psu_codes_per_cell is None:
+        return False
+    for row in psu_codes_per_cell:
+        valid = row[row >= 0]
+        if valid.size > 1 and np.unique(valid).size > 1:
+            return True
+    return False
+
+
+def _unroll_target_to_cells(
+    u_per_period_target: np.ndarray,
+    psu_codes_per_cell: Optional[np.ndarray],
+) -> Tuple[np.ndarray, np.ndarray]:
+    """Flatten a target's cohort-recentered per-cell IF tensor + its
+    per-cell PSU map into 1-D arrays, dropping cells with sentinel
+    PSU code (-1 — zero-weight cells).
+
+    Both inputs must have shape ``(n_eligible_groups, n_periods)``;
+    the dCDH bootstrap contract guarantees all targets share the
+    variance-eligible group ordering, so no per-target row subset
+    is needed.
+
+    Raises ``ValueError`` when any sentinel cell (-1 PSU) carries
+    non-zero cohort-recentered IF mass. This is a supported-edge-
+    case guard: under terminal missingness, ``_cohort_recenter_per_period``
+    subtracts column means across the full period grid, so a group
+    with no observation at period ``t`` can acquire non-zero centered
+    mass at that sentinel cell. The cell-level bootstrap cannot
+    allocate that mass to any PSU (the cell has no positive-weight
+    obs), so silently dropping it would under-weight the group's
+    bootstrap contribution. The analytical TSL path shares the same
+    cell-period allocator and fires a matching guard in
+    ``_survey_se_from_group_if``, so both paths reject this regime
+    consistently. **Documented workaround (bootstrap path):**
+    pre-process the panel to remove terminal missingness (drop
+    late-exit groups or trim to a balanced sub-panel). The within-
+    group-varying-PSU bootstrap has no allocator fallback — unlike
+    Binder TSL, where using an explicit ``psu=<group_col>`` routes
+    the analytical path through the legacy group-level allocator;
+    that fallback is not available on the bootstrap side because
+    the cell-level wild PSU bootstrap IS the varying-PSU regime.
+
+    Returns ``(u_cell, psu_cell)`` of shape
+    ``(n_valid_cells_in_target,)`` each.
+    """
+    if psu_codes_per_cell is None:
+        raise ValueError(
+            "_unroll_target_to_cells requires psu_codes_per_cell; "
+            "caller should only invoke this on the cell-level path."
+        )
+    if u_per_period_target.shape != psu_codes_per_cell.shape:
+        raise ValueError(
+            "Cell-level bootstrap shape mismatch: target per-period "
+            f"IF tensor has shape {u_per_period_target.shape} but "
+            f"psu_codes_per_cell has shape {psu_codes_per_cell.shape}. "
+            "The dCDH bootstrap contract requires every target's "
+            "per-cell tensor to align with the (n_eligible_groups, "
+            "n_periods) layout of psu_codes_per_cell."
+        )
+    flat_u = u_per_period_target.ravel()
+    flat_psu = psu_codes_per_cell.ravel()
+    mask = flat_psu >= 0
+    # Sentinel-mass guard: reject terminal-missingness + within-group-
+    # varying PSU + bootstrap. The cohort-recentering column-subtraction
+    # at `_cohort_recenter_per_period` can leak non-zero centered mass
+    # onto cells with no positive-weight obs (missing-cell rows in the
+    # cohort still get -col_mean added when other rows contribute at
+    # that column). Dropping that mass silently would under-cluster the
+    # bootstrap in a supported panel regime.
+    sentinel_mass = flat_u[~mask]
+    if sentinel_mass.size > 0 and bool(
+        np.any(np.abs(sentinel_mass) > 1e-12)
+    ):
+        raise ValueError(
+            "Cell-level bootstrap cannot be computed on this survey "
+            "panel: cohort-recentered IF mass landed on cells with "
+            "no positive-weight observations (psu_codes_per_cell == "
+            "-1). This typically occurs when terminal missingness "
+            "(groups observed only through some period) combines with "
+            "within-group-varying PSU: `_cohort_recenter_per_period` "
+            "subtracts column means across the full period grid, so a "
+            "group with no observation at period t acquires non-zero "
+            "centered mass there, which the cell-level allocator "
+            "cannot allocate to any PSU. The analytical TSL path "
+            "(`_survey_se_from_group_if`) fires a matching guard on "
+            "the same regime, so both paths reject this panel "
+            "consistently. Pre-process the panel to remove terminal "
+            "missingness (drop late-exit groups or trim to a balanced "
+            "sub-panel). The within-group-varying-PSU bootstrap has "
+            "no allocator fallback — unlike Binder TSL, switching to "
+            "`psu=<group_col>` does not help here because the varying-"
+            "PSU bootstrap IS the cell-level path, not an analytical "
+            "surface with a legacy-allocator alternative."
+        )
+    return flat_u[mask], flat_psu[mask].astype(np.int64, copy=False)
+
+
 def _map_for_target(
     target_size: int,
     group_id_to_psu_code: Optional[Dict[Any, int]],
@@ -423,14 +770,22 @@ def _map_for_target(
     Returns ``None`` when no PSU information is available (plain
     multiplier-bootstrap path — identity across targets).
 
+    **Group-level granularity only.** This helper is invoked on the
+    legacy group-level bootstrap path; on the cell-level path
+    (within-group-varying PSU), callers build the cell-to-PSU map
+    directly via ``_unroll_target_to_cells`` and do not route through
+    ``_map_for_target``, so the size-mismatch raise below does not
+    fire.
+
     Raises ``ValueError`` if ``target_size`` does not match
     ``len(eligible_group_ids)``: every current dCDH bootstrap target
-    uses the variance-eligible group ordering, so any size mismatch
-    signals that a caller introduced a target whose group subset
-    diverges and should pass its own ``target_group_ids`` rather than
-    reusing the overall eligible list. Also raises ``ValueError`` if
-    any group ID is missing from the dict (signaling misalignment
-    between the target's IF vector and the map's keys).
+    uses the variance-eligible group ordering on the group-level path,
+    so any size mismatch signals that a caller introduced a target
+    whose group subset diverges and should pass its own
+    ``target_group_ids`` rather than reusing the overall eligible
+    list. Also raises ``ValueError`` if any group ID is missing from
+    the dict (signaling misalignment between the target's IF vector
+    and the map's keys).
     """
     if group_id_to_psu_code is None or eligible_group_ids is None:
         return None
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index bdcfef34..7bba1215 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -589,7 +589,7 @@ Alternative: Multiplier bootstrap clustered at group via the `n_bootstrap` param
 
 - **Note:** When every variance-eligible group forms its own `(D_{g,1}, F_g, S_g)` cohort (a degenerate small-panel case where the cohort framework has zero degrees of freedom), the cohort-recentered plug-in formula is unidentified: cohort recentering subtracts the cohort mean from each group's `U^G_g`, and for singleton cohorts the centered value is exactly zero, so the centered influence function vector collapses to all zeros. The estimator returns `overall_se = NaN` with a `UserWarning` rather than silently collapsing to `0.0` (which would falsely imply infinite precision). The `DID_M` point estimate remains well-defined. The bootstrap path inherits the same degeneracy on these panels — the multiplier weights act on an all-zero vector, so the bootstrap distribution is also degenerate. **Deviation from R `DIDmultiplegtDYN`:** R returns a non-zero SE on the canonical 4-group worked example via small-sample sandwich machinery that Python does not implement. Both responses are valid for a degenerate case; Python's `NaN`+warning is the safer default. To get a non-degenerate SE, include more groups so cohorts have peers (real-world panels typically have `G >> K`).
 
-- **Note (cluster contract):** `ChaisemartinDHaultfoeuille` clusters at the group level by default. The analytical SE plug-in operates on per-group influence-function values (one `U^G_g` per group) and, under the cell-period allocator, on their per-cell decomposition `U[g, t]` which telescopes back to `U^G_g` at the PSU-level sum. The multiplier bootstrap generates one weight per group. The user-facing `cluster=` kwarg is not supported: the constructor accepts `cluster=None` (the default and only supported value); passing any non-`None` value raises `NotImplementedError` at construction time (and the same gate fires from `set_params`) — custom user-specified clustering is reserved for a future phase. **Automatic PSU-level clustering under `survey_design`:** the analytical TSL path supports PSU labels that vary across cells of a group (within-cell constancy required); the multiplier bootstrap still uses the legacy group-level PSU map and raises `NotImplementedError` when combined with within-group-varying PSU (PR 4 lifts this). Under the default auto-inject `psu=group`, PSU coincides with group and the group-level and cell-level paths are identical at PSU-sum granularity (bit-for-bit via the identity-map fast path). The matching test for the `cluster=` gate is `test_cluster_parameter_raises_not_implemented` in `tests/test_chaisemartin_dhaultfoeuille.py::TestForwardCompatGates`.
+- **Note (cluster contract):** `ChaisemartinDHaultfoeuille` clusters at the group level by default. The analytical SE plug-in operates on per-group influence-function values (one `U^G_g` per group) and, under the cell-period allocator, on their per-cell decomposition `U[g, t]` which telescopes back to `U^G_g` at the PSU-level sum. The multiplier bootstrap generates one weight per group. The user-facing `cluster=` kwarg is not supported: the constructor accepts `cluster=None` (the default and only supported value); passing any non-`None` value raises `NotImplementedError` at construction time (and the same gate fires from `set_params`) — custom user-specified clustering is reserved for a future phase. **Automatic PSU-level clustering under `survey_design`:** the analytical TSL path supports PSU labels that vary across cells of a group (within-cell constancy required); the multiplier bootstrap supports the same regime via the cell-level wild PSU bootstrap documented in the survey + bootstrap contract Note below. Under PSU-within-group-constant regimes (including the default auto-inject `psu=group` and strictly-coarser PSU with within-group constancy), the bootstrap dispatcher routes through the legacy group-level path so the SE is bit-identical to pre-cell-level releases via the identity-map fast path. The matching test for the `cluster=` gate is `test_cluster_parameter_raises_not_implemented` in `tests/test_chaisemartin_dhaultfoeuille.py::TestForwardCompatGates`.
 
 - **Note (bootstrap inference surface):** When `n_bootstrap > 0`, the top-level `results.overall_p_value` / `results.overall_conf_int` (and joiners/leavers analogues) hold **percentile-based bootstrap inference** computed by the multiplier bootstrap, NOT normal-theory recomputations from the bootstrap SE. The t-stat (`overall_t_stat`, etc.) is computed from the SE via `safe_inference()[0]` to satisfy the project's anti-pattern rule (never compute `t = effect / se` inline) — bootstrap does not define an alternative t-stat semantic for percentile bootstrap, so the SE-based t-stat is the natural choice. `event_study_effects[1]`, `summary()`, `to_dataframe()`, `is_significant`, and `significance_stars` all read from these top-level fields and therefore reflect the bootstrap inference automatically. The library precedent for this propagation is `imputation.py:790-805`, `two_stage.py:778-787`, and `efficient_did.py:1009-1013`. The single-period placebo (`L_max=None`) still has NaN bootstrap fields; multi-horizon placebos (`L_max >= 1`) have valid bootstrap SE/CI/p via `placebo_horizon_ses/cis/p_values` on the bootstrap results object. The matching test is `test_bootstrap_p_value_and_ci_propagated_to_top_level` in `tests/test_chaisemartin_dhaultfoeuille.py::TestBootstrap`.
 
@@ -607,7 +607,12 @@ Alternative: Multiplier bootstrap clustered at group via the `n_bootstrap` param
 
 - **Note (deviation from R DIDmultiplegtDYN):** Python uses **period-based** stable-control sets — `stable_0(t)` is any cell with `D_{g,t-1} = D_{g,t} = 0` regardless of baseline `D_{g,1}`, and similarly for `stable_1(t)`. R `DIDmultiplegtDYN` uses **cohort-based** stable-control sets that additionally require `D_{g,1}` to match the side. Python's definition matches the AER 2020 Theorem 3 cell-count notation `N_{0,0,t}` and `N_{1,1,t}` literally; R's definition matches the dynamic companion paper's cohort `(D_{g,1}, F_g, S_g)` framework. The two definitions agree exactly on (a) panels containing only joiners, (b) panels containing only leavers, (c) the hand-calculable 4-group worked example, or (d) any panel where no joiner's post-switch state overlaps a period when leavers are switching. They disagree by O(1%) on the **point estimate** when both joiners and leavers exist AND some joiners' post-switch cells could serve as leavers' controls (or vice versa). After the Round 2 fix that implemented the full `Lambda^G_{g,l=1}` influence function, the **standard error** parity gap on pure-direction scenarios narrowed from ~18% to ~3%. The R parity tests in `tests/test_chaisemartin_dhaultfoeuille_parity.py` use a tight `1e-4` tolerance for pure-direction point estimates, 10% rtol for multi-horizon SEs (15% for L_max=5 long panels where the cell-count weighting deviation compounds), 5% rtol for single-horizon SEs, and a 2.5% tolerance for mixed-direction point estimates (with the SE check skipped on mixed scenarios because the period-vs-cohort point-estimate deviation cascades into the variance).
 
-- **Note (deviation from R DIDmultiplegtDYN):** Phase 1 requires panels with a **balanced baseline** (every group observed at the first global period) and **no interior period gaps**. The Step 5b validation in `fit()` enforces this contract: groups missing the baseline raise `ValueError`; groups with interior gaps are dropped with a `UserWarning`; groups with **terminal missingness** (early exit / right-censoring — observed at the baseline but missing one or more later periods) are retained and contribute from their observed periods only. R `DIDmultiplegtDYN` accepts unbalanced panels with documented missing-treatment-before-first-switch handling. Python's restriction is a Phase 1 limitation: the cohort enumeration uses `D_{g,1}` as the canonical baseline (so the baseline observation must exist) and the first-switch detection walks adjacent observed periods (so interior gaps create ambiguous transition counts). Terminal missingness is supported because the per-period `present = (N_mat[:, t] > 0) & (N_mat[:, t-1] > 0)` guard appears at three sites in the variance computation (`_compute_per_period_dids`, `_compute_full_per_group_contributions`, `_compute_cohort_recentered_inputs`) and cleanly masks out missing transitions without propagating NaN into the arithmetic. **Workaround for unbalanced panels:** pre-process your data to back-fill the baseline (or drop late-entry groups before fitting), or use R `DIDmultiplegtDYN` until a future phase lifts the restriction. The Step 5b `ValueError` and `UserWarning` messages name the offending group IDs so you can locate them quickly.
+- **Note (deviation from R DIDmultiplegtDYN):** Phase 1 requires panels with a **balanced baseline** (every group observed at the first global period) and **no interior period gaps**. The Step 5b validation in `fit()` enforces this contract: groups missing the baseline raise `ValueError`; groups with interior gaps are dropped with a `UserWarning`; groups with **terminal missingness** (early exit / right-censoring — observed at the baseline but missing one or more later periods) are retained and contribute from their observed periods only. R `DIDmultiplegtDYN` accepts unbalanced panels with documented missing-treatment-before-first-switch handling. Python's restriction is a Phase 1 limitation: the cohort enumeration uses `D_{g,1}` as the canonical baseline (so the baseline observation must exist) and the first-switch detection walks adjacent observed periods (so interior gaps create ambiguous transition counts). Terminal missingness is supported at the POINT-ESTIMATE level because the per-period `present = (N_mat[:, t] > 0) & (N_mat[:, t-1] > 0)` guard appears at three sites in the variance computation (`_compute_per_period_dids`, `_compute_full_per_group_contributions`, `_compute_cohort_recentered_inputs`) and cleanly masks out missing transitions without propagating NaN into the arithmetic. **Scope limitation (terminal missingness under any cell-period-allocator path):** under any survey variance path that uses the cell-period allocator, a targeted `ValueError` is raised when cohort-recentering leaks non-zero centered IF mass onto cells with no positive-weight observations. Affected paths:
+- **Binder TSL with within-group-varying PSU** (`n_bootstrap=0`, explicit `psu=<col>` that varies within group).
+- **Rao-Wu replicate-weight ATT** (`compute_replicate_if_variance` always reads the cell-allocator `psi_obs` per the Class A contract shipped in PR #323, regardless of PSU structure).
+- **Cell-level wild PSU bootstrap** (`n_bootstrap > 0` with within-group-varying PSU).
+
+The guard is fired by `_survey_se_from_group_if` (analytical and replicate) and by `_unroll_target_to_cells` (bootstrap). **Unaffected paths**: Binder TSL under PSU-within-group-constant regimes (including PSU=group auto-inject) falls back to the legacy group-level allocator where the row-sum identity `sum_{c in g} U_centered_per_period[g, t] == U_centered[g]` makes the two statistically equivalent, and the bootstrap dispatcher routes the same regimes through the legacy group-level path. **Workaround:** pre-process the panel to remove terminal missingness (drop late-exit groups or trim to a balanced sub-panel). For Binder TSL, using an explicit `psu=<group_col>` routes through the legacy group allocator. For replicate ATT and within-group-varying-PSU bootstrap, there is no allocator fallback — the panel itself must be pre-processed. The broader unbalanced-panel workaround (back-fill the baseline or drop late-entry groups before fitting, or use R `DIDmultiplegtDYN`) also applies. The Step 5b `ValueError` and `UserWarning` messages name the offending group IDs so you can locate them quickly.
 
 - **Note (Phase 3 DID^X covariate adjustment):** When `controls` is set, `per_period_effects` (the Phase 1 per-period DID_M decomposition) remains **unadjusted** (computed on raw outcomes). The covariate residualization applies only to the per-group `DID_{g,l}` path (`L_max >= 1`), which produces `event_study_effects` and `overall_att`. This means `per_period_effects` and `event_study_effects[1]` may diverge when controls are active - by design (the per-period path uses binary joiner/leaver categorization and is not part of the DID^X contract). Implements the residualization-style covariate adjustment from Web Appendix Section 1.2 (Assumption 11). For each baseline treatment value `d`, estimates `theta_hat_d` via OLS of first-differenced outcomes on first-differenced covariates with time FEs, restricted to not-yet-treated observations. Residualizes at levels: `Y_tilde[g,t] = Y[g,t] - X[g,t] @ theta_hat_d`. All downstream DID computations use residualized outcomes. This is NOT doubly-robust, NOT IPW, NOT Callaway-Sant'Anna-style. Plug-in IF (treating `theta_hat` as fixed) is valid by FWL theorem. **Deviation from R `DIDmultiplegtDYN`:** The first-stage OLS uses equal cell weights (one observation per `(g,t)` cell), consistent with the library's cell-count weighting convention documented in Phase 1. R weights by `N_gt` (observation count per cell). On panels with 1 observation per cell (the common case), results are identical. When baseline-specific first stages fail (`n_obs = 0` or `n_obs < n_params`), the affected strata are excluded from the estimation (outcomes set to NaN) rather than retained unadjusted - matching R's "drop failed strata" behavior. Requires `L_max >= 1`. Activated via `controls=["col1", "col2"]` in `fit()`.
 
@@ -615,7 +620,7 @@ Alternative: Multiplier bootstrap clustered at group via the `n_bootstrap` param
 
 - **Note (Phase 3 state-set trends):** Implements state-set-specific trends from Web Appendix Section 1.4 (Assumptions 13-14). Restricts the control pool for each switcher to groups in the same set (e.g., same state in county-level data). The restriction applies in all four DID/IF paths: `_compute_multi_horizon_dids()`, `_compute_per_group_if_multi_horizon()`, `_compute_multi_horizon_placebos()`, and `_compute_per_group_if_placebo_horizon()`. Cohort structure stays as `(D_{g,1}, F_g, S_g)` triples (does not incorporate set membership). Set membership must be time-invariant per group. **Note on Assumption 14 (common support):** The paper requires a common last-untreated period across sets (`T_u^s` equal for all `s`). This implementation does NOT enforce Assumption 14 up front. Instead, when within-set controls are exhausted at a given horizon (because a set has shorter untreated support than others), the affected switcher/horizon pairs are silently excluded via the existing empty-control-pool mechanism. This means `N_l` may be smaller under `trends_nonparam` than without it, and the effective estimand is trimmed to the within-set support at each horizon. The existing multi-horizon A11 warning fires when exclusions occur. Activated via `trends_nonparam="state_column"` in `fit()`.
 
-- **Note (Phase 3 heterogeneity testing - partial implementation):** Partial implementation of the heterogeneity test from Web Appendix Section 1.5 (Assumption 15, Lemma 7). Computes post-treatment saturated OLS regressions of `S_g * (Y_{g, F_g-1+l} - Y_{g, F_g-1})` on a time-invariant covariate `X_g` plus cohort indicator dummies. Standard OLS inference is valid (paper shows no DID error correction needed). **Deviation from R `predict_het`:** R's full `predict_het` option additionally computes placebo regressions and a joint null test, and disallows combination with `controls`. This implementation provides only post-treatment regressions. **Rejected combinations:** `controls` (matching R), `trends_linear` (heterogeneity test uses raw level changes, incompatible with second-differenced outcomes), and `trends_nonparam` (heterogeneity test does not thread state-set control-pool restrictions). Results stored in `results.heterogeneity_effects`. Activated via `heterogeneity="covariate_column"` in `fit()`. **Note (survey support):** Under `survey_design`, heterogeneity uses WLS with per-group weights `W_g = sum of obs-level survey weights in group g`, and the group-level WLS coefficient influence function is `ψ_g[X] = inv(X'WX)[1,:] @ x_g * W_g * r_g`. The group-level IF is then attributed to observation level via **one of two allocators, chosen by variance helper** so each path preserves byte-identity for its aggregation rule: (1) **Binder TSL** (`compute_survey_if_variance`) uses the **cell-period single-cell allocator** — at each horizon `l_h`, `ψ_g` is assigned in full to the post-period cell `(g, out_idx)` with `out_idx = first_switch_idx[g] - 1 + l_h` and expanded as `ψ_i = ψ_g * (w_i / W_{g, out_idx})` for obs in that cell, zero elsewhere (matches the DID_l post-period convention in the Survey IF expansion Note below). Under PSU=group per-observation distribution differs from the legacy `ψ_i = ψ_g * (w_i / W_g)`, but PSU-level aggregates telescope to the same `ψ_g` — so Binder TSL variance is byte-identical to the pre-cell-period release under PSU=group. Under within-group-varying PSU mass lands in the post-period PSU of the transition, which is what Binder TSL needs. An **empty post-period cell under zero-weight obs** (all obs at `(g, out_idx)` have `w_i = 0` despite `N > 0`) drops the group's contribution, matching the ATT cell allocator's convention; the pre-cell-period path diverged here by redistributing mass to other cells of the group. (2) **Rao-Wu replicate** (`compute_replicate_if_variance`) uses the **legacy group-level allocator** `ψ_i = ψ_g * (w_i / W_g)`. Replicate variance computes `θ_r = sum_i ratio_ir * ψ_i` at the observation level, so moving `ψ_g` mass onto the post-period cell only would silently change the replicate SE whenever a replicate column's ratios vary within group (the library accepts arbitrary per-row replicate matrices, not just PSU-aligned ones). Keeping the legacy allocator on this branch preserves byte-identity of replicate SE across every previously-supported fit; replicate + within-group-varying PSU is unreachable by construction (`SurveyDesign` rejects `replicate_weights` combined with explicit `strata/psu/fpc`). Inference uses the t-distribution with `df_survey` when provided. Under rank deficiency (any regression coefficient dropped by `solve_ols`'s R-style drop), all inference fields return NaN (conservative, matches the NaN-consistent contract). **Library extension (replicate weights):** Under a replicate-weight design (BRR/Fay/JK1/JKn/SDR), the heterogeneity regression dispatches to `compute_replicate_if_variance` (Rao-Wu weight-ratio rescaling) instead of the Binder TSL formula. The effective df is the shared `min(resolved_survey.df_survey, min(n_valid_across_sites) - 1)` used by the rest of the dCDH surfaces; if the base `df_survey` is undefined (QR-rank ≤ 1), heterogeneity inference is NaN regardless of the local `n_valid_het` (matching the dCDH top-level contract — per-site `n_valid` cannot rescue a rank-deficient design). **Library extension:** R `DIDmultiplegtDYN::predict_het` does not natively support survey weights. **Scope note (bootstrap):** Combining `heterogeneity=` with `n_bootstrap > 0` and within-group-varying PSU still raises `NotImplementedError` — the PSU-level Hall-Mammen wild bootstrap uses a group-level PSU map until the follow-up PR extends it.
+- **Note (Phase 3 heterogeneity testing - partial implementation):** Partial implementation of the heterogeneity test from Web Appendix Section 1.5 (Assumption 15, Lemma 7). Computes post-treatment saturated OLS regressions of `S_g * (Y_{g, F_g-1+l} - Y_{g, F_g-1})` on a time-invariant covariate `X_g` plus cohort indicator dummies. Standard OLS inference is valid (paper shows no DID error correction needed). **Deviation from R `predict_het`:** R's full `predict_het` option additionally computes placebo regressions and a joint null test, and disallows combination with `controls`. This implementation provides only post-treatment regressions. **Rejected combinations:** `controls` (matching R), `trends_linear` (heterogeneity test uses raw level changes, incompatible with second-differenced outcomes), and `trends_nonparam` (heterogeneity test does not thread state-set control-pool restrictions). Results stored in `results.heterogeneity_effects`. Activated via `heterogeneity="covariate_column"` in `fit()`. **Note (survey support):** Under `survey_design`, heterogeneity uses WLS with per-group weights `W_g = sum of obs-level survey weights in group g`, and the group-level WLS coefficient influence function is `ψ_g[X] = inv(X'WX)[1,:] @ x_g * W_g * r_g`. The group-level IF is then attributed to observation level via **one of two allocators, chosen by variance helper** so each path preserves byte-identity for its aggregation rule: (1) **Binder TSL** (`compute_survey_if_variance`) uses the **cell-period single-cell allocator** — at each horizon `l_h`, `ψ_g` is assigned in full to the post-period cell `(g, out_idx)` with `out_idx = first_switch_idx[g] - 1 + l_h` and expanded as `ψ_i = ψ_g * (w_i / W_{g, out_idx})` for obs in that cell, zero elsewhere (matches the DID_l post-period convention in the Survey IF expansion Note below). Under PSU=group per-observation distribution differs from the legacy `ψ_i = ψ_g * (w_i / W_g)`, but PSU-level aggregates telescope to the same `ψ_g` — so Binder TSL variance is byte-identical to the pre-cell-period release under PSU=group. Under within-group-varying PSU mass lands in the post-period PSU of the transition, which is what Binder TSL needs. An **empty post-period cell under zero-weight obs** (all obs at `(g, out_idx)` have `w_i = 0` despite `N > 0`) drops the group's contribution, matching the ATT cell allocator's convention; the pre-cell-period path diverged here by redistributing mass to other cells of the group. (2) **Rao-Wu replicate** (`compute_replicate_if_variance`) uses the **legacy group-level allocator** `ψ_i = ψ_g * (w_i / W_g)`. Replicate variance computes `θ_r = sum_i ratio_ir * ψ_i` at the observation level, so moving `ψ_g` mass onto the post-period cell only would silently change the replicate SE whenever a replicate column's ratios vary within group (the library accepts arbitrary per-row replicate matrices, not just PSU-aligned ones). Keeping the legacy allocator on this branch preserves byte-identity of replicate SE across every previously-supported fit; replicate + within-group-varying PSU is unreachable by construction (`SurveyDesign` rejects `replicate_weights` combined with explicit `strata/psu/fpc`). Inference uses the t-distribution with `df_survey` when provided. Under rank deficiency (any regression coefficient dropped by `solve_ols`'s R-style drop), all inference fields return NaN (conservative, matches the NaN-consistent contract). **Library extension (replicate weights):** Under a replicate-weight design (BRR/Fay/JK1/JKn/SDR), the heterogeneity regression dispatches to `compute_replicate_if_variance` (Rao-Wu weight-ratio rescaling) instead of the Binder TSL formula. The effective df is the shared `min(resolved_survey.df_survey, min(n_valid_across_sites) - 1)` used by the rest of the dCDH surfaces; if the base `df_survey` is undefined (QR-rank ≤ 1), heterogeneity inference is NaN regardless of the local `n_valid_het` (matching the dCDH top-level contract — per-site `n_valid` cannot rescue a rank-deficient design). **Library extension:** R `DIDmultiplegtDYN::predict_het` does not natively support survey weights. **Scope note (bootstrap):** Heterogeneity inference is analytical (no bootstrap path). When `n_bootstrap > 0` is combined with `heterogeneity=`, the main ATT surfaces receive bootstrap SE/CI (via the cell-level wild PSU bootstrap described in the survey + bootstrap contract Note below) while `heterogeneity_effects` continues to use the Binder TSL / Rao-Wu analytical SE described above. No gate; the two inference paths are independent.
 
 - **Note (HonestDiD integration):** HonestDiD sensitivity analysis (Rambachan & Roth 2023) is available on the placebo + event study surface via `honest_did=True` in `fit()` or `compute_honest_did(results)` post-hoc. **Library extension:** dCDH HonestDiD uses `DID^{pl}_l` placebo estimates as pre-period coefficients rather than standard event-study pre-treatment coefficients. The Rambachan-Roth restrictions bound violations of the parallel trends assumption underlying the dCDH placebo estimand; interpretation differs from canonical event-study HonestDiD. A `UserWarning` is emitted at runtime. Uses diagonal variance (no full VCV available for dCDH). Relative magnitudes (DeltaRM) with Mbar=1.0 is the default when called from `fit()`, targeting the equal-weight average over all post-treatment horizons (`l_vec=None`). R's HonestDiD defaults to the first post/on-impact effect; use `compute_honest_did(results, ...)` with a custom `l_vec` to match that behavior. When `trends_linear=True`, bounds apply to the second-differenced estimand (parallel trends in first differences). Requires `L_max >= 1` for multi-horizon placebos. Gaps in the horizon grid from `trends_nonparam` support-trimming are handled by filtering to the largest consecutive block and warning.
 
@@ -649,8 +654,8 @@ Alternative: Multiplier bootstrap clustered at group via the `n_bootstrap` param
 - [x] Design-2 switch-in/switch-out descriptive wrapper (Web Appendix Section 1.6)
 - [x] HonestDiD (Rambachan-Roth 2023) integration on placebo + event study surface
 - [x] Survey design support: pweight with strata/PSU/FPC via Taylor Series Linearization (analytical) **or replicate-weight variance (BRR/Fay/JK1/JKn/SDR)**, covering the main ATT surface, covariate adjustment (DID^X), heterogeneity testing, the TWFE diagnostic (fit and standalone `twowayfeweights()` helper), and HonestDiD bounds. Opt-in **PSU-level Hall-Mammen wild bootstrap** is also supported via `n_bootstrap > 0`.
-- **Note (Survey IF expansion — library convention):** Survey IF expansion is a library extension not in the dCDH papers (the paper's plug-in variance assumes iid sampling). The library convention builds observation-level `psi_i` by proportionally distributing per-group IF mass within weight share: either at the group level (`psi_i = U_centered[g] * w_i / W_g`, the previous convention) or at the per-`(g, t)` cell level via the cell-period allocator shipped in this release. Cell-level expansion: decompose `U[g]` into per-period attributions `U[g, t]`, cohort-center each column independently, then expand to observation level as `psi_i = U_centered_per_period[g_i, t_i] * (w_i / W_{g_i, t_i})`. Binder (1983) stratified-PSU variance aggregates the resulting `psi` at PSU level. **Post-period attribution convention:** each transition term in the IF sum (of the form `role_weight * (Y_{g, t} - Y_{g, t-1})` for DID_M or `S_g * (Y_{g, out} - Y_{g, ref})` for DID_l) is attributed as a single *difference* to the POST-period cell, not split into a `+Y_post` / `-Y_pre` pair across two cells. This is a library *convention*, not a theorem — adopted because it preserves the group-sum, PSU-sum, and cohort-sum identities of the previous group-level expansion (so Binder variance coincides with the group-level variance under the auto-injected `psu=group`) and because Monte Carlo coverage at nominal 95% is empirically close to nominal on a DGP where PSUs vary across the cells of each group (see `tests/test_dcdh_cell_period_coverage.py`). A covariance-aware two-cell allocator is a plausible alternative and may be worth exploring if future designs motivate an explicit observation-level IF derivation; the method currently in the library is **not derived from the observation-level survey linearization of the contrast** and makes no stronger claim than "coverage is approximately nominal under the tested DGPs and the group-sum identity holds exactly." Under within-group-constant PSU (the pre-allocator accepted input), per-cell sums telescope to `U_centered[g]` and Binder variance is byte-identical (up to single-ULP floating-point noise) to the previous group-level expansion. **Strata and PSU must be constant within each `(g, t)` cell** (trivially satisfied in one-obs-per-cell panels — the canonical dCDH structure); variation **across cells of a group** is supported by the allocator. Within-group-varying **weights** are supported as before. When `survey_design.psu` is not specified, `fit()` auto-injects `psu=<group column>` so the TSL variance, `df_survey`, and t-based inference match the per-group PSU structure. **Strata that vary across cells of a group require either an explicit `psu=<col>` or the original `SurveyDesign(..., nest=True)` flag** — under `nest=True` the resolver combines `(stratum, psu)` into globally-unique labels, so the auto-injected `psu=<group>` is re-labeled per stratum and the cell allocator proceeds. Only the `nest=False` + varying-strata + omitted-psu combination is rejected up front with a targeted `ValueError` at `fit()` time (the synthesized PSU column would reuse group labels across strata and trip the cross-stratum PSU uniqueness check in `SurveyDesign.resolve()`). Under replicate-weight designs, the same cell-level `psi_i` is aggregated via Rao-Wu weight-ratio rescaling (`compute_replicate_if_variance` at `diff_diff/survey.py:1681`) rather than the Binder TSL formula. All five methods (BRR/Fay/JK1/JKn/SDR) are supported method-agnostically through the unified helper; the effective `df_survey` is reduced to `min(n_valid) - 1` across IF sites when some replicate solves fail (matching `efficient_did.py:1133-1135` and `triple_diff.py:676-686` precedents). Under DID^X, the first-stage residualization coefficient `theta_hat` is computed once on full-sample weights and treated as fixed (FWL plug-in IF convention) — per-replicate refits of `theta_hat` are not performed. **Post-period attribution extends to heterogeneity (Binder TSL branch only):** the heterogeneity WLS coefficient IF `ψ_g = inv(X'WX)[1,:] @ x_g * W_g * r_g` is attributed in full to the single post-period cell `(g, out_idx)` at each horizon (same single-cell convention as DID_l), then expanded as `ψ_i = ψ_g * (w_i / W_{g, out_idx})`, and fed through `compute_survey_if_variance`. Under PSU=group the PSU-level aggregate telescopes to `ψ_g`, so Binder variance is byte-identical relative to the pre-cell-period release; under within-group-varying PSU mass lands in the post-period PSU. **Replicate-weight branch keeps the legacy group-level allocator** `ψ_i = ψ_g * (w_i / W_g)` because `compute_replicate_if_variance` computes `θ_r = sum_i ratio_ir * ψ_i` at observation level and is therefore not PSU-telescoping: redistributing mass onto the post-period cell would silently change the replicate SE whenever a replicate column's ratios vary within a group (the library accepts arbitrary per-row replicate matrices, not just PSU-aligned ones). The legacy allocator preserves byte-identity of the replicate SE for every previously-supported fit. Replicate + within-group-varying PSU is unreachable by construction (`SurveyDesign` rejects `replicate_weights` combined with explicit `strata/psu/fpc`). **Scope limitation (follow-up PR):** `n_bootstrap > 0` combined with within-group-varying PSU raises `NotImplementedError` — the PSU-level Hall-Mammen wild bootstrap still uses the legacy group-level PSU map, to be extended in a follow-up PR.
-- **Note (survey + bootstrap contract):** When `survey_design` and `n_bootstrap > 0` are both active, the bootstrap uses Hall-Mammen wild multiplier weights (Rademacher/Mammen/Webb) **at the PSU level**. Under the default auto-injected `psu=group`, the PSU coincides with the group so the wild bootstrap is a clean group-level clustered bootstrap (identity-map fast path, bit-identical to the non-survey multiplier bootstrap). When the user passes an explicit strictly-coarser PSU (e.g., `psu=state` with groups at county level), the IF contributions of all groups within a PSU receive the same bootstrap multiplier — the standard Hall-Mammen wild PSU bootstrap. Strata do not participate in the bootstrap randomization (they contribute only through the analytical TSL variance); this is conservative when strata differ substantially in variance. A `UserWarning` fires only when PSU is strictly coarser than group. **Scope note (cell-period allocator):** The PSU-level bootstrap uses a group-level `group_id_to_psu_code` map and therefore requires PSU to be constant within each group. Combining `n_bootstrap > 0` with a PSU that varies within group raises `NotImplementedError`; the cell-level Hall-Mammen extension is deferred to a follow-up PR. The analytical TSL variance fully supports within-group-varying PSU via the cell-period allocator — use `n_bootstrap=0` for those designs. **Replicate-weight designs and `n_bootstrap > 0` are mutually exclusive** (replicate variance is closed-form; bootstrap would double-count variance) — the combination raises `NotImplementedError`, matching `efficient_did.py:989`, `staggered.py:1869`, `two_stage.py:251-253`. For HonestDiD bounds under replicate weights, the replicate-effective `df_survey = min(resolved_survey.df_survey, min(n_valid_across_sites) - 1)` propagates to t-critical values — capped by the design's QR-rank-based df so a rank-deficient replicate matrix never produces a larger effective df than the design supports. When `resolved_survey.df_survey` is undefined (QR-rank ≤ 1), the effective df stays `None` and all inference fields (including HonestDiD bounds) are NaN — per-site `n_valid` cannot rescue a rank-deficient design.
+- **Note (Survey IF expansion — library convention):** Survey IF expansion is a library extension not in the dCDH papers (the paper's plug-in variance assumes iid sampling). The library convention builds observation-level `psi_i` by proportionally distributing per-group IF mass within weight share: either at the group level (`psi_i = U_centered[g] * w_i / W_g`, the previous convention) or at the per-`(g, t)` cell level via the cell-period allocator shipped in this release. Cell-level expansion: decompose `U[g]` into per-period attributions `U[g, t]`, cohort-center each column independently, then expand to observation level as `psi_i = U_centered_per_period[g_i, t_i] * (w_i / W_{g_i, t_i})`. Binder (1983) stratified-PSU variance aggregates the resulting `psi` at PSU level. **Post-period attribution convention:** each transition term in the IF sum (of the form `role_weight * (Y_{g, t} - Y_{g, t-1})` for DID_M or `S_g * (Y_{g, out} - Y_{g, ref})` for DID_l) is attributed as a single *difference* to the POST-period cell, not split into a `+Y_post` / `-Y_pre` pair across two cells. This is a library *convention*, not a theorem — adopted because it preserves the group-sum, PSU-sum, and cohort-sum identities of the previous group-level expansion (so Binder variance coincides with the group-level variance under the auto-injected `psu=group`) and because Monte Carlo coverage at nominal 95% is empirically close to nominal on a DGP where PSUs vary across the cells of each group (see `tests/test_dcdh_cell_period_coverage.py`). A covariance-aware two-cell allocator is a plausible alternative and may be worth exploring if future designs motivate an explicit observation-level IF derivation; the method currently in the library is **not derived from the observation-level survey linearization of the contrast** and makes no stronger claim than "coverage is approximately nominal under the tested DGPs and the group-sum identity holds exactly." Under within-group-constant PSU (the pre-allocator accepted input), per-cell sums telescope to `U_centered[g]` and Binder variance is byte-identical (up to single-ULP floating-point noise) to the previous group-level expansion. **Strata and PSU must be constant within each `(g, t)` cell** (trivially satisfied in one-obs-per-cell panels — the canonical dCDH structure); variation **across cells of a group** is supported by the allocator. Within-group-varying **weights** are supported as before. When `survey_design.psu` is not specified, `fit()` auto-injects `psu=<group column>` so the TSL variance, `df_survey`, and t-based inference match the per-group PSU structure. **Strata that vary across cells of a group require either an explicit `psu=<col>` or the original `SurveyDesign(..., nest=True)` flag** — under `nest=True` the resolver combines `(stratum, psu)` into globally-unique labels, so the auto-injected `psu=<group>` is re-labeled per stratum and the cell allocator proceeds. Only the `nest=False` + varying-strata + omitted-psu combination is rejected up front with a targeted `ValueError` at `fit()` time (the synthesized PSU column would reuse group labels across strata and trip the cross-stratum PSU uniqueness check in `SurveyDesign.resolve()`). Under replicate-weight designs, the same cell-level `psi_i` is aggregated via Rao-Wu weight-ratio rescaling (`compute_replicate_if_variance` at `diff_diff/survey.py:1681`) rather than the Binder TSL formula. All five methods (BRR/Fay/JK1/JKn/SDR) are supported method-agnostically through the unified helper; the effective `df_survey` is reduced to `min(n_valid) - 1` across IF sites when some replicate solves fail (matching `efficient_did.py:1133-1135` and `triple_diff.py:676-686` precedents). Under DID^X, the first-stage residualization coefficient `theta_hat` is computed once on full-sample weights and treated as fixed (FWL plug-in IF convention) — per-replicate refits of `theta_hat` are not performed. **Post-period attribution extends to heterogeneity (Binder TSL branch only):** the heterogeneity WLS coefficient IF `ψ_g = inv(X'WX)[1,:] @ x_g * W_g * r_g` is attributed in full to the single post-period cell `(g, out_idx)` at each horizon (same single-cell convention as DID_l), then expanded as `ψ_i = ψ_g * (w_i / W_{g, out_idx})`, and fed through `compute_survey_if_variance`. Under PSU=group the PSU-level aggregate telescopes to `ψ_g`, so Binder variance is byte-identical relative to the pre-cell-period release; under within-group-varying PSU mass lands in the post-period PSU. **Replicate-weight branch keeps the legacy group-level allocator** `ψ_i = ψ_g * (w_i / W_g)` because `compute_replicate_if_variance` computes `θ_r = sum_i ratio_ir * ψ_i` at observation level and is therefore not PSU-telescoping: redistributing mass onto the post-period cell would silently change the replicate SE whenever a replicate column's ratios vary within a group (the library accepts arbitrary per-row replicate matrices, not just PSU-aligned ones). The legacy allocator preserves byte-identity of the replicate SE for every previously-supported fit. Replicate + within-group-varying PSU is unreachable by construction (`SurveyDesign` rejects `replicate_weights` combined with explicit `strata/psu/fpc`).
+- **Note (survey + bootstrap contract):** When `survey_design` and `n_bootstrap > 0` are both active, the bootstrap uses Hall-Mammen wild multiplier weights (Rademacher/Mammen/Webb) **at the PSU level**. Under the default auto-injected `psu=group`, the PSU coincides with the group so the wild bootstrap is a clean group-level clustered bootstrap (identity-map fast path, bit-identical to the non-survey multiplier bootstrap). When the user passes an explicit strictly-coarser PSU (e.g., `psu=state` with groups at county level), the IF contributions of all groups within a PSU receive the same bootstrap multiplier — the standard Hall-Mammen wild PSU bootstrap. Strata do not participate in the bootstrap randomization (they contribute only through the analytical TSL variance); this is conservative when strata differ substantially in variance. A `UserWarning` fires only when PSU is strictly coarser than group. **Cell-level wild PSU bootstrap under within-group-varying PSU:** when the PSU varies across the cells of a group, the bootstrap switches to a cell-level allocator: each `(g, t)` cell draws its multiplier from `w[psu(cell)]` via the per-cell PSU map `psu_codes_per_cell` (shape `(n_eligible_groups, n_periods)`, -1 sentinel for zero-weight cells). The bootstrap statistic becomes `theta_r = sum_c w[psu(c)] * u_centered_pp[c] / divisor` using the cohort-recentered per-cell IF `U_centered_per_period`. Under PSU-within-group-constant regimes (including PSU=group and strictly-coarser PSU with within-group constancy), the per-cell sum telescopes to the group-level form via the row-sum identity `sum_{c in g} U_centered_per_period[g, t] == U_centered[g]` (enforced by `_cohort_recenter_per_period`). A dispatcher in `_compute_dcdh_bootstrap` detects within-group-constancy and routes those regimes through the legacy group-level bootstrap path so their SE is **bit-identical** to the pre-cell-level release (guarded primarily by `test_bootstrap_se_matches_pre_pr4_baseline` and by the existing `test_auto_inject_bit_identical_to_group_level`). Under within-group-varying PSU, a group contributing cells to PSUs `p1, p2, ...` receives independent multiplier draws per PSU — the correct Hall-Mammen wild PSU clustering at cell granularity. **Multi-horizon bootstraps** draw a single shared `(n_bootstrap, n_psu)` PSU-level weight matrix per block and broadcast per-horizon via each horizon's cell-to-PSU map, so the sup-t simultaneous confidence band remains a valid joint distribution across horizons. **Library extension** — R `DIDmultiplegtDYN` does not support survey designs, so "deviation from R" does not apply. **Scope note (terminal missingness + any cell-period-allocator path):** see the balanced-baseline Note above for the full carve-out. In brief: when a terminally-missing group is in a cohort whose other groups still contribute at the missing period, `_cohort_recenter_per_period` leaks non-zero centered IF mass onto cells with no positive-weight observations. The targeted `ValueError` fires from every survey variance path that uses the cell-period allocator: Binder TSL with within-group-varying PSU, Rao-Wu replicate ATT (which always uses the cell allocator), and the cell-level wild PSU bootstrap. Pre-process the panel to remove terminal missingness, or (for Binder TSL only) use an explicit `psu=<group_col>` so the analytical path routes through the legacy group-level allocator. **Replicate-weight designs and `n_bootstrap > 0` are mutually exclusive** (replicate variance is closed-form; bootstrap would double-count variance) — the combination raises `NotImplementedError`, matching `efficient_did.py:989`, `staggered.py:1869`, `two_stage.py:251-253`. For HonestDiD bounds under replicate weights, the replicate-effective `df_survey = min(resolved_survey.df_survey, min(n_valid_across_sites) - 1)` propagates to t-critical values — capped by the design's QR-rank-based df so a rank-deficient replicate matrix never produces a larger effective df than the design supports. When `resolved_survey.df_survey` is undefined (QR-rank ≤ 1), the effective df stays `None` and all inference fields (including HonestDiD bounds) are NaN — per-site `n_valid` cannot rescue a rank-deficient design.
 
 ---
 
diff --git a/tests/test_dcdh_bootstrap_cell_period_coverage.py b/tests/test_dcdh_bootstrap_cell_period_coverage.py
new file mode 100644
index 00000000..b31283c3
--- /dev/null
+++ b/tests/test_dcdh_bootstrap_cell_period_coverage.py
@@ -0,0 +1,244 @@
+"""Monte Carlo coverage simulation for the cell-level wild PSU bootstrap
+in ChaisemartinDHaultfoeuille (PR 4).
+
+Validates empirical coverage of the bootstrap confidence intervals under
+a DGP with PSU that varies across cells of the same group — the regime
+unlocked by PR 4's replacement of the group-level PSU map with a
+cell-level map. Under PSU-within-group-constant the dispatcher routes
+through the legacy bootstrap (covered by the pre-PR-4 test suite), so
+the coverage check here exercises only the new cell-level code path.
+
+Asserts coverage at three surfaces, each covering a distinct code path:
+
+1. Overall DID_M bootstrap CI (`res.bootstrap_results.overall_ci`)
+   — single-target cell-level branch.
+2. Event-study **horizon-1** CI (`res.bootstrap_results.event_study_cis[1]`)
+   — first horizon of the shared-PSU-weight multi-horizon block.
+3. Event-study **horizon-2** CI + sup-t `cband_crit_value` finiteness
+   — exercises the cross-horizon shared-draw machinery that
+   guarantees sup-t joint distribution validity. At L_max >= 2 the
+   shared (n_bootstrap, n_psu) PSU-level weight matrix must be drawn
+   ONCE and reused across horizons; a regression where each horizon
+   re-draws weights would break the sup-t coherence and the finite
+   critical value check below would surface it.
+
+Marked ``slow`` and excluded from the default pytest run. To execute:
+
+    pytest tests/test_dcdh_bootstrap_cell_period_coverage.py -m slow -v
+"""
+
+from __future__ import annotations
+
+import warnings
+
+import numpy as np
+import pandas as pd
+import pytest
+
+from diff_diff.chaisemartin_dhaultfoeuille import ChaisemartinDHaultfoeuille
+from diff_diff.survey import SurveyDesign
+
+
+def _simulate_panel(
+    n_groups: int,
+    n_periods: int,
+    first_treated_period: int,
+    tau: float,
+    rng: np.random.Generator,
+    psu_sigma: float = 0.5,
+    obs_sigma: float = 1.0,
+    group_sigma: float = 1.0,
+) -> pd.DataFrame:
+    """Generate a one-obs-per-cell panel with within-group-varying PSU
+    (PSU = period parity per group) so the cell-level bootstrap is
+    exercised. Matches the DGP shape of `test_dcdh_cell_period_coverage.py`.
+    """
+    groups = np.arange(n_groups)
+    treated = groups < n_groups // 2
+    group_fe = rng.normal(0.0, group_sigma, size=n_groups)
+    # (group, psu) effect — constant within (g, parity), drives
+    # within-group residual correlation that Binder variance + wild
+    # PSU bootstrap should capture.
+    psu_fe = rng.normal(0.0, psu_sigma, size=(n_groups, 2))
+
+    rows = []
+    for g in groups:
+        for t in range(n_periods):
+            parity = int(t % 2)
+            # Per-(group, parity) PSU codes: 2 * n_groups distinct PSUs
+            # so Binder / bootstrap see cell granularity rather than
+            # two global PSUs reused across groups.
+            psu_id = int(g) * 2 + parity
+            d = 1 if (treated[g] and t >= first_treated_period) else 0
+            y = (
+                group_fe[g]
+                + 0.1 * t
+                + tau * d
+                + psu_fe[g, parity]
+                + rng.normal(0.0, obs_sigma)
+            )
+            rows.append({
+                "group": int(g),
+                "period": int(t),
+                "treatment": int(d),
+                "outcome": float(y),
+                "psu": psu_id,
+                "pw": 1.0,
+            })
+    return pd.DataFrame(rows)
+
+
+@pytest.mark.slow
+def test_bootstrap_cell_period_coverage_varying_psu():
+    """Empirical 95% coverage for bootstrap CIs at BOTH the overall
+    DID_M and per-horizon event-study surfaces on a DGP with
+    within-group-varying PSU. Tolerance ±2.5pp mirrors the analytical
+    coverage test; per-horizon coverage guards the shared-PSU-weight
+    machinery from CRITICAL #2 of the PR 4 plan review.
+    """
+    n_reps = 500
+    n_groups = 40
+    n_periods = 6
+    first_treated_period = 3
+    tau_true = 2.0
+
+    rng = np.random.default_rng(20260419)
+    covered_overall = 0
+    covered_h1 = 0
+    covered_h2 = 0
+    cband_finite = 0
+    failed = 0
+
+    for r in range(n_reps):
+        df = _simulate_panel(
+            n_groups=n_groups,
+            n_periods=n_periods,
+            first_treated_period=first_treated_period,
+            tau=tau_true,
+            rng=rng,
+        )
+        sd = SurveyDesign(weights="pw", psu="psu")
+        try:
+            with warnings.catch_warnings():
+                warnings.simplefilter("ignore")
+                # n_bootstrap=500 keeps internal bootstrap Monte-Carlo
+                # noise well below the across-reps variance (at B=500
+                # the percentile-CI endpoints are stable to ~0.3pp per
+                # Efron-Tibshirani §13.3), so the across-reps coverage
+                # mostly reflects the sampling-distribution / bootstrap-
+                # consistency question rather than bootstrap MC noise.
+                # L_max=2 exercises the shared-PSU-weight multi-horizon
+                # block (a single `(n_bootstrap, n_psu)` weight matrix
+                # is drawn once and broadcast per-horizon via each
+                # horizon's cell-to-PSU map). L_max=1 would collapse to
+                # a single target and never exercise the cross-horizon
+                # shared-draw machinery.
+                #
+                # n_bootstrap=1000 keeps internal bootstrap MC noise
+                # below ~0.3pp per CI endpoint; the percentile-CI
+                # coverage at horizon-2 (where the shared-weight
+                # broadcast is exercised) is finite-sample-sensitive
+                # and B=500 would risk a spurious edge-of-band miss.
+                res = ChaisemartinDHaultfoeuille(
+                    n_bootstrap=1000, seed=r + 1,
+                ).fit(
+                    df,
+                    outcome="outcome", group="group",
+                    time="period", treatment="treatment",
+                    survey_design=sd, L_max=2,
+                )
+        except Exception:
+            failed += 1
+            continue
+
+        if res.bootstrap_results is None:
+            failed += 1
+            continue
+
+        # Overall bootstrap CI.
+        overall_ci = res.bootstrap_results.overall_ci
+        if overall_ci is None or not all(np.isfinite(overall_ci)):
+            failed += 1
+            continue
+        lo_o, hi_o = float(overall_ci[0]), float(overall_ci[1])
+        if lo_o <= tau_true <= hi_o:
+            covered_overall += 1
+
+        # Horizon-1 and horizon-2 bootstrap CIs (guard the shared-
+        # PSU-weight multi-horizon path). Horizon-2 in particular
+        # requires the SAME shared PSU weight matrix drawn once at
+        # the top of the multi-horizon block; a per-horizon re-draw
+        # would break the sup-t joint-distribution guarantee and
+        # `cband_crit_value` would be undefined or wrong.
+        es_cis = res.bootstrap_results.event_study_cis
+        if es_cis is not None:
+            if 1 in es_cis:
+                h1_ci = es_cis[1]
+                if h1_ci is not None and all(np.isfinite(h1_ci)):
+                    lo_h, hi_h = float(h1_ci[0]), float(h1_ci[1])
+                    if lo_h <= tau_true <= hi_h:
+                        covered_h1 += 1
+            if 2 in es_cis:
+                h2_ci = es_cis[2]
+                if h2_ci is not None and all(np.isfinite(h2_ci)):
+                    lo2, hi2 = float(h2_ci[0]), float(h2_ci[1])
+                    if lo2 <= tau_true <= hi2:
+                        covered_h2 += 1
+
+        # Sup-t critical value: finite across reps means the shared-
+        # draw machinery produced coherent joint replicates at both
+        # horizons. NaN or unset would indicate the multi-horizon
+        # block short-circuited or the shared-weight broadcast
+        # misaligned across horizons.
+        cband = getattr(res.bootstrap_results, "cband_crit_value", None)
+        if cband is not None and np.isfinite(float(cband)):
+            cband_finite += 1
+
+    completed = n_reps - failed
+    assert completed >= int(0.95 * n_reps), (
+        f"MC simulation had {failed}/{n_reps} fit failures, above "
+        f"the 5% tolerance."
+    )
+    coverage_overall = covered_overall / completed
+    coverage_h1 = covered_h1 / completed
+    coverage_h2 = covered_h2 / completed
+    assert 0.925 <= coverage_overall <= 0.975, (
+        f"Overall bootstrap CI coverage {coverage_overall:.3f} "
+        f"(completed {completed}) outside [0.925, 0.975]; "
+        f"tau_true={tau_true}, n_groups={n_groups}, n_periods={n_periods}, "
+        f"n_reps={n_reps}."
+    )
+    assert 0.925 <= coverage_h1 <= 0.975, (
+        f"Horizon-1 event-study bootstrap CI coverage "
+        f"{coverage_h1:.3f} (completed {completed}) outside "
+        f"[0.925, 0.975]; this is the shared-PSU-weight surface, "
+        f"regression here likely indicates a bug in the multi-horizon "
+        f"cell-level broadcast."
+    )
+    # Horizon-2 tolerance is wider than horizon-1 because finite-
+    # sample coverage of the analytical TSL SE on this DGP is
+    # itself ~0.93 at l=2 (measured offline: analytical h-1 coverage
+    # 0.94, h-2 coverage 0.926 at n_groups=40). The bootstrap should
+    # track the analytical SE asymptotically, so an observed
+    # bootstrap coverage in [0.91, 0.98] at h-2 is consistent with
+    # correct clustering; a drop to ≤ 0.90 would indicate the
+    # shared-weight broadcast is not coherent across horizons.
+    assert 0.910 <= coverage_h2 <= 0.975, (
+        f"Horizon-2 event-study bootstrap CI coverage "
+        f"{coverage_h2:.3f} (completed {completed}) outside "
+        f"[0.910, 0.975]; horizon-2 is the cross-horizon surface "
+        f"that exercises the SAME shared PSU weight matrix used "
+        f"at horizon-1 — a regression here indicates the shared-"
+        f"draw broadcast is not coherent across horizons."
+    )
+    # Sup-t critical value must be finite in the vast majority of
+    # reps; occasional NaN on degenerate draws is tolerable but
+    # widespread NaN signals the shared-weight block never yielded
+    # a coherent joint distribution.
+    assert cband_finite >= int(0.90 * completed), (
+        f"Sup-t critical value was finite in only {cband_finite}/"
+        f"{completed} reps. The shared (n_bootstrap, n_psu) PSU-"
+        f"level weight matrix must be drawn ONCE at the top of the "
+        f"multi-horizon block; a per-horizon re-draw would break "
+        f"the sup-t joint distribution."
+    )
diff --git a/tests/test_survey_dcdh.py b/tests/test_survey_dcdh.py
index 4ac5472f..69b5a400 100644
--- a/tests/test_survey_dcdh.py
+++ b/tests/test_survey_dcdh.py
@@ -1,5 +1,8 @@
 """Survey support tests for ChaisemartinDHaultfoeuille (dCDH)."""
 
+import os
+from typing import Optional
+
 import numpy as np
 import pandas as pd
 import pytest
@@ -1075,10 +1078,12 @@ class TestSurveyWithinGroupValidation:
     """Cell-period IF allocator contract: strata and PSU may vary ACROSS
     cells of a group, but must be constant WITHIN each (g, t) cell. In
     canonical one-obs-per-cell panels the cell-level constancy check is
-    trivially satisfied. Heterogeneity + within-group-varying PSU is
-    supported via the cell-period allocator. The remaining out-of-scope
-    combination is n_bootstrap > 0 + within-group-varying PSU, which
-    still raises NotImplementedError with a pointer to the follow-up PR.
+    trivially satisfied. All three variance paths — analytical TSL,
+    heterogeneity WLS, and the PSU-level wild multiplier bootstrap —
+    support within-group-varying PSU via the cell-period allocator. No
+    dCDH survey combination raises NotImplementedError beyond the
+    SurveyDesign-level exclusion of ``replicate_weights`` +
+    ``n_bootstrap > 0``.
     """
 
     def test_accepts_varying_psu_within_group(self, base_data):
@@ -1169,21 +1174,41 @@ def test_heterogeneity_with_varying_psu_succeeds(self, base_data):
         assert np.isfinite(entry["p_value"])
         assert all(np.isfinite(entry["conf_int"]))
 
-    def test_bootstrap_with_varying_psu_raises(self, base_data):
-        """n_bootstrap > 0 is gated under within-group-varying PSU until
-        PR 4 ships the cell-level Hall-Mammen wild bootstrap.
+    def test_bootstrap_with_varying_psu_succeeds(self, base_data):
+        """PR 4: the PSU-level wild multiplier bootstrap now supports
+        within-group-varying PSU via the cell-level allocator. Each
+        (g, t) cell's IF mass is multiplied by its PSU's multiplier,
+        so a group spanning 2+ PSUs receives independent draws per
+        PSU. Assert a finite bootstrap SE and CI at the overall
+        surface and at every event-study horizon (the multi-horizon
+        path uses a shared PSU-level weight matrix for the sup-t
+        simultaneous band).
         """
         df_ = base_data.copy()
         df_["pw"] = 1.0
-        df_["stratum"] = 0
-        df_["psu"] = df_["period"] % 2  # varies within group
-        sd = SurveyDesign(weights="pw", strata="stratum", psu="psu")
-        with pytest.raises(NotImplementedError, match="n_bootstrap"):
-            ChaisemartinDHaultfoeuille(n_bootstrap=50, seed=1).fit(
-                df_, outcome="outcome", group="group",
-                time="period", treatment="treatment",
-                survey_design=sd,
-            )
+        # Per-group PSU parity — unique per (group, parity), varies
+        # within group so the cell-level dispatcher is exercised.
+        df_["psu"] = df_["group"].astype(int) * 2 + (df_["period"].astype(int) % 2)
+        sd = SurveyDesign(weights="pw", psu="psu")
+        res = ChaisemartinDHaultfoeuille(n_bootstrap=200, seed=1).fit(
+            df_, outcome="outcome", group="group",
+            time="period", treatment="treatment",
+            survey_design=sd, L_max=2,
+        )
+        assert res.bootstrap_results is not None
+        assert np.isfinite(res.bootstrap_results.overall_se)
+        assert res.bootstrap_results.overall_se >= 0.0
+        overall_ci = res.bootstrap_results.overall_ci
+        assert overall_ci is not None and all(np.isfinite(overall_ci))
+        # Multi-horizon bootstrap must produce finite SE at each
+        # horizon (guards the shared-PSU-weight path from CRITICAL #2
+        # of the PR 4 plan review).
+        es_ses = res.bootstrap_results.event_study_ses
+        assert es_ses is not None
+        for l_h in (1, 2):
+            assert l_h in es_ses, f"horizon {l_h} missing from event_study_ses"
+            assert np.isfinite(es_ses[l_h]), f"horizon {l_h} SE not finite"
+            assert es_ses[l_h] >= 0.0
 
     def test_auto_inject_with_varying_strata_nest_true_succeeds(self, base_data):
         """When strata varies across cells of a group and the user
@@ -1779,3 +1804,549 @@ def test_replicate_variance_non_invariance_under_varying_ratios(self):
             f"(counterexample from PR #329 CI review). Got "
             f"var_legacy={var_legacy}, var_new={var_new}."
         )
+
+
+class TestBootstrapCellPeriod:
+    """Regression guards for the cell-level wild PSU bootstrap allocator
+    (PR 4). Under PSU-within-group-constant regimes the dispatcher
+    routes to the legacy group-level bootstrap for bit-identity; under
+    within-group-varying PSU the cell-level path runs with per-cell
+    PSU multipliers.
+    """
+
+    # Captured on pre-PR-4 code (origin/main at SHA ac181b7f — PR #329
+    # merge) via a scratch fit on the fixture below. Pinned here as
+    # the bit-identity regression guard for the dispatcher's
+    # PSU-within-group-constant legacy-path routing. If this test
+    # drifts, the dispatcher is no longer reproducing pre-PR-4
+    # behavior under PSU=group and the legacy fast path has
+    # regressed. Values differ between Rust and pure-Python backends
+    # because `_generate_bootstrap_weights_batch` and `solve_ols`
+    # take different numeric paths; bit-identity still holds within
+    # each backend.
+    _BASELINE_OVERALL_SE_RUST = 0.30560839419979546
+    _BASELINE_OVERALL_SE_PYTHON = 0.3030802540369796
+
+    @staticmethod
+    def _make_baseline_fixture() -> pd.DataFrame:
+        """Deterministic fixture for the pinned bootstrap SE baseline.
+        12 groups, 5 periods, two switch cohorts (first-treated at
+        periods 2 and 3) plus never-switchers, fixed per-row
+        idiosyncratic draws so the bootstrap distribution is
+        non-degenerate. MUST stay identical to the capture fixture
+        or the pinned baseline becomes meaningless.
+        """
+        rng = np.random.default_rng(12345)
+        rows = []
+        n_groups, n_periods = 12, 5
+        for g in range(n_groups):
+            if g < 4:
+                f: Optional[int] = None
+            elif g < 8:
+                f = 2
+            else:
+                f = 3
+            for t in range(n_periods):
+                d = 1 if (f is not None and t >= f) else 0
+                y = float(g) + 0.3 * t + 1.5 * d + float(rng.normal(0.0, 0.5))
+                rows.append({
+                    "group": int(g),
+                    "period": int(t),
+                    "treatment": int(d),
+                    "outcome": y,
+                    "pw": 1.0,
+                })
+        return pd.DataFrame(rows)
+
+    def test_bootstrap_se_matches_pre_pr4_baseline(self):
+        """Bit-identity regression guard: under PSU=group the
+        dispatcher routes through the legacy group-level bootstrap
+        path, so the overall bootstrap SE must match pre-PR-4 code
+        to ULP precision. The baseline values were captured on
+        `origin/main` at `ac181b7f` (the PR #329 merge) under each
+        backend independently.
+        """
+        from diff_diff._backend import HAS_RUST_BACKEND
+        pure_python = (
+            os.environ.get("DIFF_DIFF_BACKEND", "auto").lower() == "python"
+            or not HAS_RUST_BACKEND
+        )
+        expected = (
+            self._BASELINE_OVERALL_SE_PYTHON
+            if pure_python
+            else self._BASELINE_OVERALL_SE_RUST
+        )
+        df_ = self._make_baseline_fixture()
+        sd = SurveyDesign(weights="pw", psu="group")
+        res = ChaisemartinDHaultfoeuille(n_bootstrap=500, seed=42).fit(
+            df_, outcome="outcome", group="group",
+            time="period", treatment="treatment",
+            survey_design=sd,
+        )
+        assert res.bootstrap_results is not None
+        observed_se = float(res.bootstrap_results.overall_se)
+        backend_label = "pure-python" if pure_python else "rust"
+        assert observed_se == pytest.approx(
+            expected, rel=0.0, abs=1e-15,
+        ), (
+            f"Bootstrap SE drifted from pre-PR-4 baseline "
+            f"(backend={backend_label}). expected={expected!r}, "
+            f"observed={observed_se!r}. The dispatcher's "
+            f"PSU-within-group-constant routing is no longer "
+            f"bit-identical to the legacy group-level bootstrap."
+        )
+
+    def test_bootstrap_cell_level_raises_on_missing_overall_tensor(self):
+        """Contract: when PSU varies within group, the bootstrap
+        dispatcher must NOT silently fall back to group-level for any
+        target. Invoking _compute_dcdh_bootstrap directly with a
+        varying `psu_codes_per_cell` but `u_per_period_overall=None`
+        must raise ValueError (not silently under-cluster).
+        """
+        est = ChaisemartinDHaultfoeuille(n_bootstrap=50, seed=1)
+        # Minimal group-level IF inputs; matches arbitrary 2-group setup.
+        u_overall = np.array([0.5, -0.3], dtype=np.float64)
+        eligible_group_ids = np.array([0, 1])
+        group_id_to_psu_code = {0: 0, 1: 1}
+        # Varying PSU: row 0 has two distinct PSU codes.
+        psu_codes_per_cell = np.array(
+            [[0, 1], [0, 0]], dtype=np.int64,
+        )
+        with pytest.raises(ValueError, match="u_per_period_overall"):
+            est._compute_dcdh_bootstrap(
+                n_groups_for_overall=2,
+                u_centered_overall=u_overall,
+                divisor_overall=4,
+                original_overall=0.1,
+                group_id_to_psu_code=group_id_to_psu_code,
+                eligible_group_ids=eligible_group_ids,
+                u_per_period_overall=None,  # missing — must raise
+                psu_codes_per_cell=psu_codes_per_cell,
+            )
+
+    def test_bootstrap_cell_level_raises_on_shape_mismatch(self):
+        """Contract: _unroll_target_to_cells rejects shape mismatches
+        between `u_per_period_target` and `psu_codes_per_cell` — a
+        silent misalignment would put PSU codes on the wrong cells.
+        """
+        est = ChaisemartinDHaultfoeuille(n_bootstrap=50, seed=1)
+        u_overall = np.array([0.5, -0.3], dtype=np.float64)
+        eligible_group_ids = np.array([0, 1])
+        group_id_to_psu_code = {0: 0, 1: 1}
+        psu_codes_per_cell = np.array(
+            [[0, 1], [0, 0]], dtype=np.int64,
+        )
+        # Wrong shape — 3 columns instead of 2.
+        u_pp_wrong = np.array(
+            [[0.25, 0.25, 0.0], [-0.15, -0.15, 0.0]], dtype=np.float64,
+        )
+        with pytest.raises(ValueError, match="shape mismatch"):
+            est._compute_dcdh_bootstrap(
+                n_groups_for_overall=2,
+                u_centered_overall=u_overall,
+                divisor_overall=4,
+                original_overall=0.1,
+                group_id_to_psu_code=group_id_to_psu_code,
+                eligible_group_ids=eligible_group_ids,
+                u_per_period_overall=u_pp_wrong,
+                psu_codes_per_cell=psu_codes_per_cell,
+            )
+
+    def test_bootstrap_cell_level_raises_on_sentinel_mass_leak(self):
+        """Contract: when `_cohort_recenter_per_period` subtracts
+        column means across the full period grid, a group with no
+        observation at period t can acquire non-zero centered mass
+        at that cell. Under the cell-level bootstrap path, such
+        mass lands on a `psu_codes_per_cell == -1` sentinel cell
+        and has no PSU to attach to — the bootstrap must raise
+        rather than silently drop the mass.
+        """
+        est = ChaisemartinDHaultfoeuille(n_bootstrap=50, seed=1)
+        # Build a per-cell IF tensor with non-zero mass at a cell
+        # whose PSU code is -1 (simulating terminal missingness
+        # after cohort-recentering leaks mass to a missing cell).
+        psu_codes_per_cell = np.array(
+            [[0, 1, -1], [0, 1, 0]], dtype=np.int64,
+        )
+        u_pp_overall_with_leak = np.array(
+            [[0.25, 0.25, -0.15], [-0.15, -0.15, 0.15]],
+            dtype=np.float64,
+        )
+        u_overall = np.array([0.5, -0.3], dtype=np.float64)
+        eligible_group_ids = np.array([0, 1])
+        group_id_to_psu_code = {0: 0, 1: 1}
+        with pytest.raises(ValueError, match="no positive-weight observations"):
+            est._compute_dcdh_bootstrap(
+                n_groups_for_overall=2,
+                u_centered_overall=u_overall,
+                divisor_overall=4,
+                original_overall=0.1,
+                group_id_to_psu_code=group_id_to_psu_code,
+                eligible_group_ids=eligible_group_ids,
+                u_per_period_overall=u_pp_overall_with_leak,
+                psu_codes_per_cell=psu_codes_per_cell,
+            )
+
+    def test_bootstrap_cell_level_raises_on_missing_horizon_tensor(self):
+        """Contract: when PSU varies within group, each multi-horizon
+        target must supply its per-cell IF tensor; missing one raises
+        ValueError rather than degrading the sup-t joint distribution.
+        """
+        est = ChaisemartinDHaultfoeuille(n_bootstrap=50, seed=1)
+        u_overall = np.array([0.5, -0.3], dtype=np.float64)
+        u_pp_overall = np.array(
+            [[0.25, 0.25], [-0.15, -0.15]], dtype=np.float64,
+        )
+        eligible_group_ids = np.array([0, 1])
+        group_id_to_psu_code = {0: 0, 1: 1}
+        psu_codes_per_cell = np.array(
+            [[0, 1], [0, 0]], dtype=np.int64,
+        )
+        # Horizon tuple missing its per-cell tensor (4th slot = None).
+        u_h = np.array([0.4, -0.2], dtype=np.float64)
+        mh_inputs = {1: (u_h, 4, 0.1, None)}
+        with pytest.raises(ValueError, match="multi-horizon.*l=1"):
+            est._compute_dcdh_bootstrap(
+                n_groups_for_overall=2,
+                u_centered_overall=u_overall,
+                divisor_overall=4,
+                original_overall=0.1,
+                multi_horizon_inputs=mh_inputs,
+                group_id_to_psu_code=group_id_to_psu_code,
+                eligible_group_ids=eligible_group_ids,
+                u_per_period_overall=u_pp_overall,
+                psu_codes_per_cell=psu_codes_per_cell,
+            )
+
+    def test_bootstrap_cell_level_with_all_zero_weight_group_does_not_crash(self):
+        """When one eligible group has all zero-weight observations,
+        every entry of its `psu_codes_per_cell` row is the sentinel
+        -1 and it contributes no cells to the bootstrap. The
+        dispatcher must handle this without crashing; overall
+        inference stays finite (driven by the remaining groups).
+        """
+        rows = []
+        n_groups, n_periods = 10, 5
+        for g in range(n_groups):
+            f = 3 if g < n_groups // 2 else None
+            for t in range(n_periods):
+                # Zero-weight the last group entirely.
+                pw = 0.0 if g == n_groups - 1 else 1.0
+                d = 1 if (f is not None and t >= f) else 0
+                y = float(g) + 0.1 * t + 1.0 * d
+                rows.append({
+                    "group": int(g),
+                    "period": int(t),
+                    "treatment": int(d),
+                    "outcome": y,
+                    "pw": pw,
+                    "psu": int(g) * 2 + (int(t) % 2),  # varying PSU
+                })
+        df_ = pd.DataFrame(rows)
+        sd = SurveyDesign(weights="pw", psu="psu")
+        res = ChaisemartinDHaultfoeuille(n_bootstrap=50, seed=1).fit(
+            df_, outcome="outcome", group="group",
+            time="period", treatment="treatment",
+            survey_design=sd,
+        )
+        assert res.bootstrap_results is not None
+        # Bootstrap SE should be finite (zero-weight group does not
+        # disturb the other groups' contributions).
+        assert np.isfinite(res.bootstrap_results.overall_se)
+
+    def test_bootstrap_zero_weight_group_equivalent_to_removing_it(self):
+        """Fixture A: 9 groups (1 all-zero-weighted + 8 positive)
+        with **within-group-varying PSU** so the dispatcher routes
+        through the cell-level path. Fixture B: 8 groups (same panel
+        without the zero-weight group), same varying PSU. Under the
+        fix, an eligible group with no positive-weight cells
+        contributes nothing to the bootstrap (its row of
+        `psu_codes_per_cell` is all sentinel), so both fits produce
+        byte-identical bootstrap SE at the same seed. Without the
+        fix, the `valid_map` gate disabled the entire PSU-aware
+        path — silently dropping fixture A to unclustered group-
+        level bootstrap while fixture B correctly ran the cell-
+        level path. Using `psu=group` (a within-group-constant PSU)
+        would not exercise this regression because the buggy and
+        correct paths collapse to the same identity-draw structure
+        under PSU=group — we deliberately use varying PSU here.
+        """
+        def _make(include_zero_group: bool) -> pd.DataFrame:
+            rows = []
+            n_groups = 9 if include_zero_group else 8
+            for g in range(n_groups):
+                f = 3 if g < 4 else None
+                for t in range(5):
+                    pw = 0.0 if (include_zero_group and g == 8) else 1.0
+                    d = 1 if (f is not None and t >= f) else 0
+                    y = float(g) + 0.1 * t + 1.0 * d
+                    # Within-group-varying PSU (period parity per
+                    # group) — exercises the cell-level dispatcher.
+                    psu = int(g) * 2 + (int(t) % 2)
+                    rows.append({
+                        "group": int(g),
+                        "period": int(t),
+                        "treatment": int(d),
+                        "outcome": y,
+                        "pw": pw,
+                        "psu": psu,
+                    })
+            return pd.DataFrame(rows)
+
+        sd = SurveyDesign(weights="pw", psu="psu")
+        res_a = ChaisemartinDHaultfoeuille(n_bootstrap=200, seed=7).fit(
+            _make(include_zero_group=True),
+            outcome="outcome", group="group",
+            time="period", treatment="treatment",
+            survey_design=sd,
+        )
+        res_b = ChaisemartinDHaultfoeuille(n_bootstrap=200, seed=7).fit(
+            _make(include_zero_group=False),
+            outcome="outcome", group="group",
+            time="period", treatment="treatment",
+            survey_design=sd,
+        )
+        assert res_a.bootstrap_results is not None
+        assert res_b.bootstrap_results is not None
+        se_a = float(res_a.bootstrap_results.overall_se)
+        se_b = float(res_b.bootstrap_results.overall_se)
+        assert np.isfinite(se_a) and np.isfinite(se_b)
+        assert se_a == pytest.approx(se_b, rel=0.0, abs=1e-15), (
+            f"Bootstrap SE must match when a zero-weight eligible "
+            f"group is added under within-group-varying PSU (fix "
+            f"P0 #1 — no silent dropback to unclustered group-"
+            f"level). Got SE_with_zero={se_a!r}, "
+            f"SE_without_zero={se_b!r}."
+        )
+
+    def test_fit_raises_on_terminal_missingness_with_varying_psu(self):
+        """End-to-end `fit()` regression: when a survey panel has a
+        terminally-missing group in a cohort whose other groups still
+        contribute at the missing period, combined with within-group-
+        varying PSU, both the analytical TSL path and the cell-level
+        bootstrap path must raise `ValueError` — cohort-recentering
+        leaks non-zero centered IF mass onto cells with no positive-
+        weight obs, and both paths (`_survey_se_from_group_if` for
+        analytical, `_unroll_target_to_cells` for bootstrap) use the
+        cell-period allocator and therefore cannot allocate leaked
+        mass to any observation or PSU. Pre-processing the panel
+        (dropping late-exit groups or trimming to a balanced sub-
+        panel) is the documented workaround.
+        """
+        rows = []
+        # 10 groups. Joiners at period 3 (cohort A): groups 0-4.
+        # Leavers at period 4 (cohort B, D=1 at period 0): groups 5-7.
+        # Never-treated: groups 8-9.
+        # Group 2 is terminally missing at periods 4-5. It is in
+        # cohort A; at period 4 the other joiners (0, 1, 3, 4) serve
+        # as stable_1 controls (they switched on at period 3 and
+        # contribute when leavers appear at period 4). The cohort
+        # mean at period 4 is therefore non-zero, and
+        # `_cohort_recenter_per_period` leaks `-col_mean` onto
+        # group 2's missing cell — which the cell-level bootstrap
+        # cannot allocate to any PSU.
+        for g in range(10):
+            if g < 5:
+                # Joiners at period 3 (D=0 at baseline, D=1 from t=3).
+                d_pattern = [0, 0, 0, 1, 1, 1]
+            elif g < 8:
+                # Leavers at period 4 (D=1 at baseline, D=0 from t=4).
+                d_pattern = [1, 1, 1, 1, 0, 0]
+            else:
+                # Never-treated controls.
+                d_pattern = [0, 0, 0, 0, 0, 0]
+            for t in range(6):
+                if g == 2 and t >= 4:
+                    # Terminal missingness: drop rows past period 3.
+                    continue
+                d = d_pattern[t]
+                y = float(g) + 0.1 * t + 1.0 * d
+                rows.append({
+                    "group": int(g),
+                    "period": int(t),
+                    "treatment": int(d),
+                    "outcome": y,
+                    "pw": 1.0,
+                    # Within-group-varying PSU: period parity per group.
+                    "psu": int(g) * 2 + (int(t) % 2),
+                })
+        df_ = pd.DataFrame(rows)
+        sd = SurveyDesign(weights="pw", psu="psu")
+
+        import warnings as _w
+        # Analytical path (n_bootstrap=0): the sentinel-mass guard in
+        # `_survey_se_from_group_if` raises on the same leakage the
+        # bootstrap guard rejects — both paths use the cell-period
+        # allocator and cannot allocate leaked mass to any
+        # observation.
+        with _w.catch_warnings():
+            _w.simplefilter("ignore")  # terminal-missingness UserWarning
+            with pytest.raises(
+                ValueError, match="no positive-weight observations",
+            ):
+                ChaisemartinDHaultfoeuille(n_bootstrap=0, seed=1).fit(
+                    df_, outcome="outcome", group="group",
+                    time="period", treatment="treatment",
+                    survey_design=sd, L_max=1,
+                )
+
+        # Bootstrap path (n_bootstrap > 0): same sentinel-mass guard
+        # fires via `_unroll_target_to_cells`.
+        with _w.catch_warnings():
+            _w.simplefilter("ignore")
+            with pytest.raises(
+                ValueError, match="no positive-weight observations",
+            ):
+                ChaisemartinDHaultfoeuille(n_bootstrap=50, seed=1).fit(
+                    df_, outcome="outcome", group="group",
+                    time="period", treatment="treatment",
+                    survey_design=sd, L_max=1,
+                )
+
+    def test_fit_succeeds_on_terminal_missingness_with_psu_group(self):
+        """Companion regression: the terminal-missingness + varying-PSU
+        sentinel-mass guard must NOT fire when PSU is within-group-
+        constant. Same fixture as the varying-PSU test above but with
+        `psu=<group column>` (auto-inject default) — both analytical
+        and bootstrap paths route through the legacy group-level
+        allocator (the analytical dispatcher in
+        `_survey_se_from_group_if` falls back to the group-level
+        allocator when PSU does not vary within group; the bootstrap
+        dispatcher in `_compute_dcdh_bootstrap` does the same). Fit
+        must succeed with finite SE.
+        """
+        rows = []
+        for g in range(10):
+            if g < 5:
+                d_pattern = [0, 0, 0, 1, 1, 1]
+            elif g < 8:
+                d_pattern = [1, 1, 1, 1, 0, 0]
+            else:
+                d_pattern = [0, 0, 0, 0, 0, 0]
+            for t in range(6):
+                if g == 2 and t >= 4:
+                    continue
+                d = d_pattern[t]
+                y = float(g) + 0.1 * t + 1.0 * d
+                rows.append({
+                    "group": int(g),
+                    "period": int(t),
+                    "treatment": int(d),
+                    "outcome": y,
+                    "pw": 1.0,
+                })
+        df_ = pd.DataFrame(rows)
+        # Auto-inject: no explicit `psu` → `SurveyDesign` falls back to
+        # `psu=<group_col>` at fit() time. Within-group-constant.
+        sd = SurveyDesign(weights="pw")
+        import warnings as _w
+        for n_boot in (0, 50):
+            with _w.catch_warnings():
+                _w.simplefilter("ignore")
+                res = ChaisemartinDHaultfoeuille(
+                    n_bootstrap=n_boot, seed=1,
+                ).fit(
+                    df_, outcome="outcome", group="group",
+                    time="period", treatment="treatment",
+                    survey_design=sd, L_max=1,
+                )
+            assert np.isfinite(res.overall_att), (
+                f"n_bootstrap={n_boot}: overall_att must be finite "
+                f"under PSU=group + terminal missingness."
+            )
+            assert np.isfinite(res.overall_se) and res.overall_se >= 0.0, (
+                f"n_bootstrap={n_boot}: overall_se must be finite "
+                f"under PSU=group + terminal missingness."
+            )
+
+    def test_bootstrap_dense_codes_under_singleton_baseline_excluded_group(self):
+        """Regression for P0 #2: when a group is singleton-baseline-
+        excluded (e.g., an always-treated group whose baseline D=1
+        has no peer), its PSU label must NOT pollute the dense code
+        factorization used by `_compute_dcdh_bootstrap`. Otherwise
+        eligible groups that share a PSU receive gapped dense codes
+        (e.g., `[1, 1]`), `_generate_psu_or_group_weights` computes
+        `n_psu = max + 1 = 2 == n_groups_target = 2`, and the
+        identity fast path wrongly triggers — giving those eligible
+        groups independent multiplier draws instead of a shared
+        one. Assertion: instrument the call to capture the
+        `group_id_to_psu_code` dict actually passed and confirm its
+        values form a contiguous range `[0, n_unique - 1]`.
+        """
+        # Fixture: one always-treated group (D=1 at period 0 → singleton-
+        # baseline-excluded), plus eligible groups that share a PSU
+        # label while the excluded group has a different PSU.
+        rows = []
+        for g in range(5):
+            for t in range(5):
+                if g == 0:
+                    d = 1  # always-treated; baseline D=1 singleton
+                    psu = 100  # distinct PSU for the excluded group
+                else:
+                    d = 1 if t >= 3 else 0  # joiners at period 3
+                    # Groups 1, 2 share PSU=200; groups 3, 4 share PSU=300.
+                    psu = 200 if g in (1, 2) else 300
+                rows.append({
+                    "group": int(g),
+                    "period": int(t),
+                    "treatment": int(d),
+                    "outcome": float(g) + 0.1 * t + 0.5 * d,
+                    "pw": 1.0,
+                    "psu": psu,
+                })
+        df_ = pd.DataFrame(rows)
+        sd = SurveyDesign(weights="pw", psu="psu")
+
+        captured: dict = {}
+
+        est = ChaisemartinDHaultfoeuille(n_bootstrap=50, seed=1)
+        original_bootstrap = est._compute_dcdh_bootstrap
+
+        def _spy(**kwargs):
+            captured["group_id_to_psu_code"] = kwargs.get(
+                "group_id_to_psu_code"
+            )
+            return original_bootstrap(**kwargs)
+
+        est._compute_dcdh_bootstrap = _spy  # type: ignore[method-assign]
+
+        import warnings as _w
+        with _w.catch_warnings():
+            _w.simplefilter("ignore")  # singleton-baseline warning
+            est.fit(
+                df_, outcome="outcome", group="group",
+                time="period", treatment="treatment",
+                survey_design=sd,
+            )
+
+        dict_passed = captured["group_id_to_psu_code"]
+        assert dict_passed is not None, (
+            "bootstrap received group_id_to_psu_code=None — the "
+            "PSU-aware path was disabled instead of routing to the "
+            "cell/legacy path via densified codes."
+        )
+        codes = sorted(set(dict_passed.values()))
+        # Eligible groups share only two PSUs (200 for g=1,2;
+        # 300 for g=3,4). Dense codes must be [0, 1], NOT [1, 2]
+        # (which would happen if the excluded g=0's PSU=100 were
+        # dense-coded first).
+        assert codes == list(range(len(codes))), (
+            f"group_id_to_psu_code values must be contiguous "
+            f"dense codes starting at 0, got {codes}. A non-"
+            f"contiguous range signals the excluded group's PSU "
+            f"polluted the dense factorization (P0 #2 regression)."
+        )
+        # Sanity: eligible groups 1, 2 must share a code (PSU=200),
+        # and eligible groups 3, 4 must share a code (PSU=300).
+        assert dict_passed[1] == dict_passed[2], (
+            "Groups 1 and 2 share PSU=200 and must receive the same "
+            "dense code under correct densification."
+        )
+        assert dict_passed[3] == dict_passed[4], (
+            "Groups 3 and 4 share PSU=300 and must receive the same "
+            "dense code."
+        )
+        assert dict_passed[1] != dict_passed[3], (
+            "Groups in PSU=200 and PSU=300 must receive distinct "
+            "dense codes."
+        )
diff --git a/tests/test_survey_dcdh_replicate_psu.py b/tests/test_survey_dcdh_replicate_psu.py
index 595f80bd..aac02615 100644
--- a/tests/test_survey_dcdh_replicate_psu.py
+++ b/tests/test_survey_dcdh_replicate_psu.py
@@ -170,6 +170,184 @@ def test_overall_se_finite(self, base_panel, replicate_design, method):
         assert np.isfinite(res.overall_se)
         assert res.overall_se > 0
 
+    def test_att_raises_on_terminal_missingness_replicate_path(self):
+        """Replicate ATT + terminal missingness: the Class A replicate
+        path uses the cell-period allocator unconditionally (per the
+        PR #323 contract), so cohort-recentering leakage onto missing
+        cells hits the `_survey_se_from_group_if` sentinel-mass guard
+        regardless of PSU structure (replicate designs do not carry
+        `resolved.psu`). Unlike Binder TSL, replicate has no
+        `psu=<group_col>` fallback — the documented workaround is to
+        pre-process the panel. Locks this user-facing failure mode.
+        """
+        import warnings as _w
+        rows = []
+        for g in range(10):
+            if g < 5:
+                d_pattern = [0, 0, 0, 1, 1, 1]
+            elif g < 8:
+                d_pattern = [1, 1, 1, 1, 0, 0]
+            else:
+                d_pattern = [0, 0, 0, 0, 0, 0]
+            for t in range(6):
+                if g == 2 and t >= 4:
+                    continue  # terminal missingness for group 2
+                d = d_pattern[t]
+                y = float(g) + 0.1 * t + 1.0 * d
+                row = {
+                    "group": int(g),
+                    "period": int(t),
+                    "treatment": int(d),
+                    "outcome": y,
+                    "pw": 1.0,
+                }
+                # Attach 5 replicate-weight columns (per-row
+                # Rademacher-like draws; SDR is the simplest method
+                # that does not require a replicate_strata schedule).
+                for r in range(5):
+                    row[f"rep{r}"] = 0.5 if (g + t + r) % 2 == 0 else 1.5
+                rows.append(row)
+        df_ = pd.DataFrame(rows)
+        sd = SurveyDesign(
+            weights="pw",
+            replicate_weights=[f"rep{r}" for r in range(5)],
+            replicate_method="SDR",
+        )
+        with _w.catch_warnings():
+            _w.simplefilter("ignore")  # terminal-missingness UserWarning
+            with pytest.raises(
+                ValueError, match="no positive-weight observations",
+            ):
+                ChaisemartinDHaultfoeuille(seed=1).fit(
+                    df_,
+                    outcome="outcome",
+                    group="group",
+                    time="period",
+                    treatment="treatment",
+                    survey_design=sd,
+                    L_max=1,
+                )
+
+    def test_att_cell_allocator_with_varying_replicate_ratios(
+        self, base_panel, replicate_design, monkeypatch,
+    ):
+        """Class A ATT replicate contract: `_survey_se_from_group_if`
+        dispatches replicate fits through the cell-period allocator
+        (`U_centered_per_period`-based psi_obs), not the legacy
+        group-level expansion. Under per-row replicate matrices where
+        ratios vary within group, the two allocators produce
+        different Rao-Wu variances (obs-level `theta_r = sum_i
+        ratio_ir * psi_i` is not PSU-telescoping — same observation
+        as PR #329's heterogeneity non-invariance guard).
+
+        Locks the contract by spying on `compute_replicate_if_variance`
+        to capture the psi_obs actually passed during the fit,
+        reconstructing the legacy group-level counterpart via the
+        row-sum identity, and asserting:
+          (a) fit's `overall_se**2` matches the captured cell-allocator
+              psi_obs fed back through the helper;
+          (b) the legacy-reconstructed psi_obs yields a materially
+              different Rao-Wu variance, i.e. a future refactor that
+              switches Class A to the legacy allocator would drift
+              the fit's SE.
+        """
+        import numpy as _np
+        import diff_diff.survey as _survey_mod
+
+        R = 20
+        df = replicate_design(base_panel, R=R, method="SDR")
+        sd = _build_replicate_design(R, "SDR")
+
+        # Spy on compute_replicate_if_variance to capture each
+        # psi_obs the fit passes. Pass through to the real helper so
+        # fit completes normally.
+        original_repvar = _survey_mod.compute_replicate_if_variance
+        captured_psi: list = []
+
+        def _spy(psi, resolved):
+            captured_psi.append(_np.asarray(psi).copy())
+            return original_repvar(psi, resolved)
+
+        monkeypatch.setattr(
+            _survey_mod, "compute_replicate_if_variance", _spy
+        )
+
+        res = ChaisemartinDHaultfoeuille(seed=1).fit(
+            df,
+            outcome="outcome",
+            group="group",
+            time="period",
+            treatment="treatment",
+            survey_design=sd,
+        )
+        assert np.isfinite(res.overall_se) and res.overall_se > 0
+        assert captured_psi, (
+            "Spy captured no psi_obs — the fit did not route through "
+            "compute_replicate_if_variance. Did the replicate "
+            "dispatch change?"
+        )
+
+        # The first capture corresponds to the overall DID_M target
+        # (called from the `_compute_se`→`_survey_se_from_group_if`
+        # chain for the overall surface).
+        psi_obs_actual = captured_psi[0]
+
+        resolved = sd.resolve(df)
+        obs_group = df["group"].values
+        w = resolved.weights.astype(_np.float64)
+        n_obs = len(w)
+
+        # Row-sum identity on the cell-allocator psi_obs: summing
+        # over obs in group g recovers U_centered[g]. Use this to
+        # reconstruct the legacy group-level expansion on the SAME
+        # fitted U_centered.
+        unique_gids = _np.unique(obs_group)
+        g_to_idx = {g: i for i, g in enumerate(unique_gids)}
+        U_centered_recon = _np.zeros(len(unique_gids), dtype=_np.float64)
+        W_g = _np.zeros(len(unique_gids), dtype=_np.float64)
+        for i in range(n_obs):
+            gi = g_to_idx[obs_group[i]]
+            U_centered_recon[gi] += psi_obs_actual[i]
+            W_g[gi] += w[i]
+        psi_obs_legacy = _np.zeros_like(w)
+        for i in range(n_obs):
+            gi = g_to_idx[obs_group[i]]
+            if W_g[gi] > 0:
+                psi_obs_legacy[i] = (
+                    U_centered_recon[gi] * w[i] / W_g[gi]
+                )
+
+        # (a) Fit's overall_se^2 must match the variance recomputed
+        # from the captured psi_obs. Trivial sanity check — the fit
+        # computed it from this exact psi_obs via the spy pass-through.
+        var_cell_recompute, _ = original_repvar(psi_obs_actual, resolved)
+        assert float(res.overall_se) ** 2 == pytest.approx(
+            var_cell_recompute, rel=1e-12,
+        ), (
+            f"Fit's overall_se^2 does not match the recomputed "
+            f"variance on the captured psi_obs — spy / dispatch "
+            f"misalignment."
+        )
+
+        # (b) Legacy-reconstructed psi_obs yields a different Rao-Wu
+        # variance on this fixture. Locks the allocator-sensitivity
+        # contract: if a future refactor routes Class A replicate
+        # ATT through the legacy group-level allocator, the fit's
+        # overall_se would silently match var_legacy instead — and
+        # this assertion would fail, surfacing the regression.
+        var_legacy, _ = original_repvar(psi_obs_legacy, resolved)
+        assert not _np.isclose(
+            var_cell_recompute, var_legacy, rtol=1e-6,
+        ), (
+            f"Expected cell-allocator variance to differ from legacy "
+            f"group-level variance on this fixture (per-row "
+            f"replicate matrix varies within group). Got "
+            f"cell={var_cell_recompute}, legacy={var_legacy}. Either "
+            f"the fixture no longer exercises the allocator "
+            f"sensitivity, OR the fit's allocator has silently "
+            f"reverted to legacy."
+        )
+
     @pytest.mark.parametrize("method", REPLICATE_METHODS)
     def test_inference_fields_finite(self, base_panel, replicate_design, method):
         R = 20