igerber · igerber · Apr 24, 2026 · Apr 24, 2026 · Apr 24, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -27,6 +27,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
     1. Removed the `if self._precomputed is not None:` branch that silently substituted `self._precomputed["Y"]` / `["D"]` / `["time_dist_matrix"]` (original-panel cache populated during main fit) for the function-argument `Y, D`. Under bootstrap, `_fit_with_fixed_lambda` computes fresh `Y, D` from the resampled `boot_data` and passes them in; the helper was discarding those and recomputing unit distances from the original panel, so Python's local bootstrap resampled units but reused stale unit-distance weights. Rust's bootstrap was already correct (always consumed `y_boot, d_boot`).
     2. Removed the `valid_control_at_t = D[t, :] == 0` target-period donor gate that zeroed `ω_j` for any unit `j` treated at the target period (other than the target unit itself). Per REGISTRY Eq. 2/3 and Rust's `compute_weight_matrix`, `ω_j = exp(-λ_unit × dist(j, i))` for all `j ≠ i`; treated-cell exclusion happens via the `(1 − W_{js})` factor applied inside `_estimate_model`. Same-cohort donors now contribute via their pre-treatment rows. Empirically the main-fit ATT is unchanged on tested fixtures because same-cohort pre-treatment observations are exactly absorbed by their own unit fixed effect `alpha_j` without propagating into `mu`, `beta`, or other units' parameters — so this change is structural alignment rather than a numerical shift in output. Users on same-cohort panels with very few controls may still see tiny differences in edge cases; the new `test_local_method_same_cohort_donor_parity` regression guards the aligned behavior.
   Together with the normalization fix above, TROP local-method backend parity on the main-fit ATT is regime-dependent: `atol=rtol=1e-14` for `lambda_nn=inf` (no nuclear-norm regularization, uniform weight scaling leaves the WLS argmin invariant) and `atol=1e-10` for finite `lambda_nn` (FISTA inner loop + BLAS reduction ordering introduce sub-1e-10 roundoff across Rust `faer` vs numpy paths). Bootstrap SE parity is asserted at `atol=1e-5` to accommodate ~1e-7 roundoff between Rust's `estimate_model` matrix factorization and numpy's `lstsq` that accumulates across per-replicate fits; sub-1e-14 bootstrap parity is tracked as a follow-up in `TODO.md` under "unify Rust local-method solver path". Closes silent-failures audit finding #23 (local-method half; the RNG half closed in PR #354 and the grid-search half in PR #348).
+- **Multiplier-bootstrap weight generation (`generate_bootstrap_weights_batch`) no longer diverges across backends under a fixed `seed`.** Previously the Rust helper in `rust/src/bootstrap.rs` seeded `Xoshiro256PlusPlus::seed_from_u64(seed + i)` per row while the Python fallback consumed `numpy.random.default_rng` (PCG64), so the same `seed` produced different Rademacher/Mammen/Webb multiplier weights depending on whether the Rust backend was compiled in — and therefore different bootstrap SE / CI / p-values in the six downstream estimators routed through `diff_diff.bootstrap_utils.generate_bootstrap_weights_batch`: `CallawaySantAnna` (non-survey multiplier), `ContinuousDiD`, `ImputationDiD`, `TwoStageDiD`, `ChaisemartinDHaultfoeuille` (multi-horizon PSU + group-level + PSU-broadcast multiplier bootstrap), and `EfficientDiD` (cluster + unit multiplier bootstrap). Fixed by making the Python numpy path canonical and removing the Rust batch-weight helper entirely (including the `rand` and `rand_xoshiro` Cargo dependencies and the `rust/src/bootstrap.rs` module) — the Rust function only generated weights, so routing weights through Python-side `default_rng` left nothing for Rust to do. Sampling distributions (Rademacher, Mammen, Webb) and numerical identities are unchanged. **User-visible effect**: users who were previously running with the Rust backend compiled in will see different-but-equally-valid multiplier-bootstrap SE / CI / p-values under the same `seed` after this change (PCG64 replaces Xoshiro); pure-Python-backend users are unaffected. Bootstrap runtime may regress single-digit percent on very wide bootstraps (`n_units > 10000`) where Rust's rayon parallelism was previously providing a modest weight-generation speedup; sub-second at typical sizes (`n_bootstrap=999, n_units ≤ 1000`). Regression guard `tests/test_rust_backend.py::TestRustBackend::test_rust_bootstrap_weights_batch_is_removed` asserts the binding is not reintroduced; distributional coverage moves to `tests/test_bootstrap_utils.py::TestBootstrapWeightsBatchDistributions` (11 tests including seed-pinned byte-level baselines for each of the three distributions). Closes the silent-failures audit TODO row for the Rust multiplier-weight RNG.
 
 ### Changed
 - **`did_had_pretest_workflow(aggregate="event_study")` verdict no longer emits the "paper step 2 deferred to Phase 3 follow-up" caveat** — the joint pre-trends Stute test closes that gap. The two-period `aggregate="overall"` path retains the existing caveat since the joint variant does not apply to single-pre-period panels. Downstream code that greps verdict strings for the Phase 3 caveat will see it suppressed on the event-study path.

diff --git a/TODO.md b/TODO.md
@@ -84,7 +84,6 @@ Deferred items from PR reviews that were not addressed before merge.
 | Regenerate `benchmarks/data/clubsandwich_cr2_golden.json` from R (`Rscript benchmarks/R/generate_clubsandwich_golden.R`). Current JSON has `source: python_self_reference` as a stability anchor until an authoritative R run. | `benchmarks/R/generate_clubsandwich_golden.R` | Phase 1a | Medium |
 | `honest_did.py:1907` `np.linalg.solve(A_sys, b_sys) / except LinAlgError: continue` is a silent basis-rejection in the vertex-enumeration loop that is algorithmically intentional (try the next basis). Consider surfacing a count of rejected bases as a diagnostic when ARP enumeration exhausts, so users see when the vertex search was heavily constrained. Not a silent failure in the sense of the Phase 2 audit (the algorithm is supposed to skip), but the diagnostic would help debug borderline cases. | `honest_did.py` | #334 | Low |
 | Unify Rust local-method `estimate_model` solver path to `solve_wls_svd` (the same SVD helper used by the global-method since PR #348) for sub-1e-14 bootstrap SE parity. Current local-method bootstrap parity test (`tests/test_rust_backend.py::TestTROPRustEdgeCaseParity::test_bootstrap_seed_reproducibility_local`) passes at `atol=1e-5` — the residual ~1e-7 gap is roundoff between Rust's `estimate_model` matrix factorization and numpy's `lstsq`, which accumulates differently across per-replicate bootstrap fits. Main-fit ATT parity is regime-dependent (`atol=1e-14` for `lambda_nn=inf`, `atol=1e-10` for finite `lambda_nn` — see `test_local_method_main_fit_parity`); the bootstrap gap is a same-solver-path roundoff concern and not a user-visible correctness bug. | `rust/src/trop.rs::estimate_model`, `rust/src/linalg.rs::solve_wls_svd` | follow-up | Low |
-| Rust multiplier-bootstrap weight RNG (`generate_bootstrap_weights_batch` in `rust/src/bootstrap.rs:9-10, 57-75`) uses `Xoshiro256PlusPlus::seed_from_u64(seed + i)` per row for Rademacher/Mammen/Webb generation. If any Python caller (SDID / efficient-DiD multiplier bootstrap) has a numpy-canonical equivalent, the two backends likely diverge under the same seed. Audit Python callers (`diff_diff/sdid.py`, `diff_diff/efficient_did_bootstrap.py`, `diff_diff/bootstrap_utils.py::generate_bootstrap_weights_batch_numpy`) for parity-test gaps. Same fix shape as TROP RNG parity (PR #354): pre-generate weights in Python via numpy and pass them to Rust through PyO3. | `rust/src/bootstrap.rs`, `diff_diff/bootstrap_utils.py` | follow-up | Medium |
 | `bias_corrected_local_linear`: extend golden parity to `kernel="triangular"` and `kernel="uniform"` (currently epa-only; all three kernels share `kernel_W` and the `lprobust` math, so parity is expected but not separately asserted). | `benchmarks/R/generate_nprobust_lprobust_golden.R`, `tests/test_bias_corrected_lprobust.py` | Phase 1c | Low |
 | `bias_corrected_local_linear`: expose `vce in {"hc0", "hc1", "hc2", "hc3"}` on the public wrapper once R parity goldens exist (currently raises `NotImplementedError`). The port-level `lprobust` and `lprobust_res` already support all four; expanding the public surface requires a golden generator for each hc mode and a decision on hc2/hc3 q-fit leverage (R reuses p-fit `hii` for q-fit residuals; whether to match that or stage-match deserves a derivation before the wrapper advertises CCT-2014 conformance). | `diff_diff/local_linear.py::bias_corrected_local_linear`, `benchmarks/R/generate_nprobust_lprobust_golden.R`, `tests/test_bias_corrected_lprobust.py` | Phase 1c | Medium |
 | `bias_corrected_local_linear`: support `weights=` once survey-design adaptation lands. nprobust's `lprobust` has no weight argument so there is no parity anchor; derivation needed. | `diff_diff/local_linear.py`, `diff_diff/_nprobust_port.py::lprobust` | Phase 1c | Medium |

diff --git a/diff_diff/__init__.py b/diff_diff/__init__.py
@@ -17,7 +17,6 @@
 # Import backend detection from dedicated module (avoids circular imports)
 from diff_diff._backend import (
     HAS_RUST_BACKEND,
-    _rust_bootstrap_weights,
     _rust_compute_robust_vcov,
     _rust_project_simplex,
     _rust_solve_ols,

diff --git a/diff_diff/_backend.py b/diff_diff/_backend.py
@@ -18,7 +18,6 @@
 # Try to import Rust backend for accelerated operations
 try:
     from diff_diff._rust_backend import (
-        generate_bootstrap_weights_batch as _rust_bootstrap_weights,
         project_simplex as _rust_project_simplex,
         solve_ols as _rust_solve_ols,
         compute_robust_vcov as _rust_compute_robust_vcov,
@@ -44,7 +43,6 @@
     _rust_available = True
 except ImportError:
     _rust_available = False
-    _rust_bootstrap_weights = None
     _rust_project_simplex = None
     _rust_solve_ols = None
     _rust_compute_robust_vcov = None
@@ -69,7 +67,6 @@
 if _backend_env == "python":
     # Force pure Python mode - disable Rust even if available
     HAS_RUST_BACKEND = False
-    _rust_bootstrap_weights = None
     _rust_project_simplex = None
     _rust_solve_ols = None
     _rust_compute_robust_vcov = None
@@ -120,7 +117,6 @@ def rust_backend_info():
 __all__ = [
     "HAS_RUST_BACKEND",
     "rust_backend_info",
-    "_rust_bootstrap_weights",
     "_rust_project_simplex",
     "_rust_solve_ols",
     "_rust_compute_robust_vcov",

diff --git a/diff_diff/bootstrap_utils.py b/diff_diff/bootstrap_utils.py
@@ -10,8 +10,6 @@
 
 import numpy as np
 
-from diff_diff._backend import HAS_RUST_BACKEND, _rust_bootstrap_weights
-
 __all__ = [
     "generate_bootstrap_weights",
     "generate_bootstrap_weights_batch",
@@ -168,38 +166,11 @@ def generate_bootstrap_weights_batch(
     """
     Generate all bootstrap weights at once (vectorized).
 
-    Uses Rust backend if available for parallel generation.
-
-    Parameters
-    ----------
-    n_bootstrap : int
-        Number of bootstrap iterations.
-    n_units : int
-        Number of units (clusters) to generate weights for.
-    weight_type : str
-        Type of weights: "rademacher", "mammen", or "webb".
-    rng : np.random.Generator
-        Random number generator.
-
-    Returns
-    -------
-    np.ndarray
-        Array of bootstrap weights with shape (n_bootstrap, n_units).
-    """
-    if HAS_RUST_BACKEND and _rust_bootstrap_weights is not None:
-        seed = rng.integers(0, 2**63 - 1)
-        return _rust_bootstrap_weights(n_bootstrap, n_units, weight_type, seed)
-    return generate_bootstrap_weights_batch_numpy(n_bootstrap, n_units, weight_type, rng)
-
-
-def generate_bootstrap_weights_batch_numpy(
-    n_bootstrap: int,
-    n_units: int,
-    weight_type: str,
-    rng: np.random.Generator,
-) -> np.ndarray:
-    """
-    NumPy fallback implementation of :func:`generate_bootstrap_weights_batch`.
+    Output is a deterministic function of the supplied ``rng`` state,
+    so a single ``seed`` produces identical multiplier-bootstrap weights
+    regardless of whether the Rust backend is compiled in. (Library
+    call sites seed via ``np.random.default_rng`` — PCG64 — but any
+    ``np.random.Generator`` is accepted.)
 
     Parameters
     ----------
@@ -243,6 +214,11 @@ def generate_bootstrap_weights_batch_numpy(
         )
 
 
+# Alias kept for backward compatibility with callers that imported the
+# explicit numpy variant; both names resolve to the same implementation.
+generate_bootstrap_weights_batch_numpy = generate_bootstrap_weights_batch
+
+
 def compute_percentile_ci(
     boot_dist: np.ndarray,
     alpha: float,

diff --git a/rust/Cargo.toml b/rust/Cargo.toml
@@ -25,8 +25,6 @@ openblas = ["ndarray/blas"]
 pyo3 = "0.28"
 numpy = "0.28"
 ndarray = { version = "0.17", features = ["rayon"] }
-rand = "0.8"
-rand_xoshiro = "0.6"
 rayon = "1.8"
 
 # Pure Rust linear algebra for SVD/matrix inversion (no external deps).