From 4f3b163d373ab6ebcb4e58ad8840a2f058c87c0d Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Fri, 24 Apr 2026 16:19:42 -0400
Subject: [PATCH 1/3] Drop Rust multiplier-bootstrap weight helper; numpy path
 is canonical
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Closes the silent-failures audit follow-up for the Rust multiplier-weight
RNG. `rust/src/bootstrap.rs::generate_bootstrap_weights_batch` seeded
`Xoshiro256PlusPlus::seed_from_u64(seed + i)` per row while
`diff_diff.bootstrap_utils.generate_bootstrap_weights_batch_numpy` ran
under `numpy.random.default_rng` (PCG64), so the same `seed` produced
different Rademacher/Mammen/Webb weights depending on whether the Rust
backend was compiled in — and therefore different multiplier-bootstrap
SE / CI / p-values in CallawaySantAnna, ContinuousDiD, ImputationDiD,
TwoStageDiD, ChaisemartinDHaultfoeuille, and EfficientDiD.

Unlike the TROP bootstrap fix (PR #354), this Rust helper did nothing
after generating weights, so threading pre-generated weights through
PyO3 would have left the function as a pointless pass-through.
Deleted the `rust/src/bootstrap.rs` module entirely together with the
orphaned `rand` + `rand_xoshiro` Cargo deps; the Python shim now calls
the numpy implementation directly (with `_numpy` kept as an alias for
backward compatibility). The three distributional property tests move
to `tests/test_bootstrap_utils.py::TestBootstrapWeightsBatchDistributions`
including byte-level seed-pinned baselines for each distribution.
`test_rust_bootstrap_weights_batch_is_removed` mirrors the existing
`test_compute_synthetic_weights_is_removed` pattern as a regression
guard against accidental re-export.

Users running with the Rust backend compiled in will see
different-but-equally-valid bootstrap SE / CI / p-values under the
same `seed` (PCG64 replaces Xoshiro); pure-Python users are unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 CHANGELOG.md                  |   1 +
 TODO.md                       |   1 -
 diff_diff/__init__.py         |   1 -
 diff_diff/_backend.py         |   4 -
 diff_diff/bootstrap_utils.py  |  43 ++----
 rust/Cargo.toml               |   2 -
 rust/src/bootstrap.rs         | 282 ----------------------------------
 rust/src/lib.rs               |   7 -
 tests/test_bootstrap_utils.py | 123 +++++++++++++++
 tests/test_rust_backend.py    | 134 ++++------------
 10 files changed, 164 insertions(+), 434 deletions(-)
 delete mode 100644 rust/src/bootstrap.rs

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 0f2357b8..3e6ec889 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -27,6 +27,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
     1. Removed the `if self._precomputed is not None:` branch that silently substituted `self._precomputed["Y"]` / `["D"]` / `["time_dist_matrix"]` (original-panel cache populated during main fit) for the function-argument `Y, D`. Under bootstrap, `_fit_with_fixed_lambda` computes fresh `Y, D` from the resampled `boot_data` and passes them in; the helper was discarding those and recomputing unit distances from the original panel, so Python's local bootstrap resampled units but reused stale unit-distance weights. Rust's bootstrap was already correct (always consumed `y_boot, d_boot`).
     2. Removed the `valid_control_at_t = D[t, :] == 0` target-period donor gate that zeroed `ω_j` for any unit `j` treated at the target period (other than the target unit itself). Per REGISTRY Eq. 2/3 and Rust's `compute_weight_matrix`, `ω_j = exp(-λ_unit × dist(j, i))` for all `j ≠ i`; treated-cell exclusion happens via the `(1 − W_{js})` factor applied inside `_estimate_model`. Same-cohort donors now contribute via their pre-treatment rows. Empirically the main-fit ATT is unchanged on tested fixtures because same-cohort pre-treatment observations are exactly absorbed by their own unit fixed effect `alpha_j` without propagating into `mu`, `beta`, or other units' parameters — so this change is structural alignment rather than a numerical shift in output. Users on same-cohort panels with very few controls may still see tiny differences in edge cases; the new `test_local_method_same_cohort_donor_parity` regression guards the aligned behavior.
   Together with the normalization fix above, TROP local-method backend parity on the main-fit ATT is regime-dependent: `atol=rtol=1e-14` for `lambda_nn=inf` (no nuclear-norm regularization, uniform weight scaling leaves the WLS argmin invariant) and `atol=1e-10` for finite `lambda_nn` (FISTA inner loop + BLAS reduction ordering introduce sub-1e-10 roundoff across Rust `faer` vs numpy paths). Bootstrap SE parity is asserted at `atol=1e-5` to accommodate ~1e-7 roundoff between Rust's `estimate_model` matrix factorization and numpy's `lstsq` that accumulates across per-replicate fits; sub-1e-14 bootstrap parity is tracked as a follow-up in `TODO.md` under "unify Rust local-method solver path". Closes silent-failures audit finding #23 (local-method half; the RNG half closed in PR #354 and the grid-search half in PR #348).
+- **Multiplier-bootstrap weight generation (`generate_bootstrap_weights_batch`) no longer diverges across backends under a fixed `seed`.** Previously the Rust helper in `rust/src/bootstrap.rs` seeded `Xoshiro256PlusPlus::seed_from_u64(seed + i)` per row while the Python fallback consumed `numpy.random.default_rng` (PCG64), so the same `seed` produced different Rademacher/Mammen/Webb multiplier weights depending on whether the Rust backend was compiled in — and therefore different bootstrap SE / CI / p-values in the six downstream estimators routed through `diff_diff.bootstrap_utils.generate_bootstrap_weights_batch`: `CallawaySantAnna` (non-survey multiplier), `ContinuousDiD`, `ImputationDiD`, `TwoStageDiD`, `ChaisemartinDHaultfoeuille` (multi-horizon PSU + group-level + PSU-broadcast multiplier bootstrap), and `EfficientDiD` (cluster + unit multiplier bootstrap). Fixed by making the Python numpy path canonical and removing the Rust batch-weight helper entirely (including the `rand` and `rand_xoshiro` Cargo dependencies and the `rust/src/bootstrap.rs` module) — the Rust function only generated weights, so routing weights through Python-side `default_rng` left nothing for Rust to do. Sampling distributions (Rademacher, Mammen, Webb) and numerical identities are unchanged. **User-visible effect**: users who were previously running with the Rust backend compiled in will see different-but-equally-valid multiplier-bootstrap SE / CI / p-values under the same `seed` after this change (PCG64 replaces Xoshiro); pure-Python-backend users are unaffected. Bootstrap runtime may regress single-digit percent on very wide bootstraps (`n_units > 10000`) where Rust's rayon parallelism was previously providing a modest weight-generation speedup; sub-second at typical sizes (`n_bootstrap=999, n_units ≤ 1000`). Regression guard `tests/test_rust_backend.py::TestRustBackend::test_rust_bootstrap_weights_batch_is_removed` asserts the binding is not reintroduced; distributional coverage moves to `tests/test_bootstrap_utils.py::TestBootstrapWeightsBatchDistributions` (11 tests including seed-pinned byte-level baselines for each of the three distributions). Closes the silent-failures audit TODO row for the Rust multiplier-weight RNG.
 
 ### Changed
 - **`did_had_pretest_workflow(aggregate="event_study")` verdict no longer emits the "paper step 2 deferred to Phase 3 follow-up" caveat** — the joint pre-trends Stute test closes that gap. The two-period `aggregate="overall"` path retains the existing caveat since the joint variant does not apply to single-pre-period panels. Downstream code that greps verdict strings for the Phase 3 caveat will see it suppressed on the event-study path.
diff --git a/TODO.md b/TODO.md
index 5c34735b..5534f87b 100644
--- a/TODO.md
+++ b/TODO.md
@@ -84,7 +84,6 @@ Deferred items from PR reviews that were not addressed before merge.
 | Regenerate `benchmarks/data/clubsandwich_cr2_golden.json` from R (`Rscript benchmarks/R/generate_clubsandwich_golden.R`). Current JSON has `source: python_self_reference` as a stability anchor until an authoritative R run. | `benchmarks/R/generate_clubsandwich_golden.R` | Phase 1a | Medium |
 | `honest_did.py:1907` `np.linalg.solve(A_sys, b_sys) / except LinAlgError: continue` is a silent basis-rejection in the vertex-enumeration loop that is algorithmically intentional (try the next basis). Consider surfacing a count of rejected bases as a diagnostic when ARP enumeration exhausts, so users see when the vertex search was heavily constrained. Not a silent failure in the sense of the Phase 2 audit (the algorithm is supposed to skip), but the diagnostic would help debug borderline cases. | `honest_did.py` | #334 | Low |
 | Unify Rust local-method `estimate_model` solver path to `solve_wls_svd` (the same SVD helper used by the global-method since PR #348) for sub-1e-14 bootstrap SE parity. Current local-method bootstrap parity test (`tests/test_rust_backend.py::TestTROPRustEdgeCaseParity::test_bootstrap_seed_reproducibility_local`) passes at `atol=1e-5` — the residual ~1e-7 gap is roundoff between Rust's `estimate_model` matrix factorization and numpy's `lstsq`, which accumulates differently across per-replicate bootstrap fits. Main-fit ATT parity is regime-dependent (`atol=1e-14` for `lambda_nn=inf`, `atol=1e-10` for finite `lambda_nn` — see `test_local_method_main_fit_parity`); the bootstrap gap is a same-solver-path roundoff concern and not a user-visible correctness bug. | `rust/src/trop.rs::estimate_model`, `rust/src/linalg.rs::solve_wls_svd` | follow-up | Low |
-| Rust multiplier-bootstrap weight RNG (`generate_bootstrap_weights_batch` in `rust/src/bootstrap.rs:9-10, 57-75`) uses `Xoshiro256PlusPlus::seed_from_u64(seed + i)` per row for Rademacher/Mammen/Webb generation. If any Python caller (SDID / efficient-DiD multiplier bootstrap) has a numpy-canonical equivalent, the two backends likely diverge under the same seed. Audit Python callers (`diff_diff/sdid.py`, `diff_diff/efficient_did_bootstrap.py`, `diff_diff/bootstrap_utils.py::generate_bootstrap_weights_batch_numpy`) for parity-test gaps. Same fix shape as TROP RNG parity (PR #354): pre-generate weights in Python via numpy and pass them to Rust through PyO3. | `rust/src/bootstrap.rs`, `diff_diff/bootstrap_utils.py` | follow-up | Medium |
 | `bias_corrected_local_linear`: extend golden parity to `kernel="triangular"` and `kernel="uniform"` (currently epa-only; all three kernels share `kernel_W` and the `lprobust` math, so parity is expected but not separately asserted). | `benchmarks/R/generate_nprobust_lprobust_golden.R`, `tests/test_bias_corrected_lprobust.py` | Phase 1c | Low |
 | `bias_corrected_local_linear`: expose `vce in {"hc0", "hc1", "hc2", "hc3"}` on the public wrapper once R parity goldens exist (currently raises `NotImplementedError`). The port-level `lprobust` and `lprobust_res` already support all four; expanding the public surface requires a golden generator for each hc mode and a decision on hc2/hc3 q-fit leverage (R reuses p-fit `hii` for q-fit residuals; whether to match that or stage-match deserves a derivation before the wrapper advertises CCT-2014 conformance). | `diff_diff/local_linear.py::bias_corrected_local_linear`, `benchmarks/R/generate_nprobust_lprobust_golden.R`, `tests/test_bias_corrected_lprobust.py` | Phase 1c | Medium |
 | `bias_corrected_local_linear`: support `weights=` once survey-design adaptation lands. nprobust's `lprobust` has no weight argument so there is no parity anchor; derivation needed. | `diff_diff/local_linear.py`, `diff_diff/_nprobust_port.py::lprobust` | Phase 1c | Medium |
diff --git a/diff_diff/__init__.py b/diff_diff/__init__.py
index 4a9b93c4..4f770273 100644
--- a/diff_diff/__init__.py
+++ b/diff_diff/__init__.py
@@ -17,7 +17,6 @@
 # Import backend detection from dedicated module (avoids circular imports)
 from diff_diff._backend import (
     HAS_RUST_BACKEND,
-    _rust_bootstrap_weights,
     _rust_compute_robust_vcov,
     _rust_project_simplex,
     _rust_solve_ols,
diff --git a/diff_diff/_backend.py b/diff_diff/_backend.py
index f763f87c..2c9b6b04 100644
--- a/diff_diff/_backend.py
+++ b/diff_diff/_backend.py
@@ -18,7 +18,6 @@
 # Try to import Rust backend for accelerated operations
 try:
     from diff_diff._rust_backend import (
-        generate_bootstrap_weights_batch as _rust_bootstrap_weights,
         project_simplex as _rust_project_simplex,
         solve_ols as _rust_solve_ols,
         compute_robust_vcov as _rust_compute_robust_vcov,
@@ -44,7 +43,6 @@
     _rust_available = True
 except ImportError:
     _rust_available = False
-    _rust_bootstrap_weights = None
     _rust_project_simplex = None
     _rust_solve_ols = None
     _rust_compute_robust_vcov = None
@@ -69,7 +67,6 @@
 if _backend_env == "python":
     # Force pure Python mode - disable Rust even if available
     HAS_RUST_BACKEND = False
-    _rust_bootstrap_weights = None
     _rust_project_simplex = None
     _rust_solve_ols = None
     _rust_compute_robust_vcov = None
@@ -120,7 +117,6 @@ def rust_backend_info():
 __all__ = [
     "HAS_RUST_BACKEND",
     "rust_backend_info",
-    "_rust_bootstrap_weights",
     "_rust_project_simplex",
     "_rust_solve_ols",
     "_rust_compute_robust_vcov",
diff --git a/diff_diff/bootstrap_utils.py b/diff_diff/bootstrap_utils.py
index 44d94546..8023e35b 100644
--- a/diff_diff/bootstrap_utils.py
+++ b/diff_diff/bootstrap_utils.py
@@ -10,8 +10,6 @@
 
 import numpy as np
 
-from diff_diff._backend import HAS_RUST_BACKEND, _rust_bootstrap_weights
-
 __all__ = [
     "generate_bootstrap_weights",
     "generate_bootstrap_weights_batch",
@@ -168,38 +166,10 @@ def generate_bootstrap_weights_batch(
     """
     Generate all bootstrap weights at once (vectorized).
 
-    Uses Rust backend if available for parallel generation.
-
-    Parameters
-    ----------
-    n_bootstrap : int
-        Number of bootstrap iterations.
-    n_units : int
-        Number of units (clusters) to generate weights for.
-    weight_type : str
-        Type of weights: "rademacher", "mammen", or "webb".
-    rng : np.random.Generator
-        Random number generator.
-
-    Returns
-    -------
-    np.ndarray
-        Array of bootstrap weights with shape (n_bootstrap, n_units).
-    """
-    if HAS_RUST_BACKEND and _rust_bootstrap_weights is not None:
-        seed = rng.integers(0, 2**63 - 1)
-        return _rust_bootstrap_weights(n_bootstrap, n_units, weight_type, seed)
-    return generate_bootstrap_weights_batch_numpy(n_bootstrap, n_units, weight_type, rng)
-
-
-def generate_bootstrap_weights_batch_numpy(
-    n_bootstrap: int,
-    n_units: int,
-    weight_type: str,
-    rng: np.random.Generator,
-) -> np.ndarray:
-    """
-    NumPy fallback implementation of :func:`generate_bootstrap_weights_batch`.
+    Runs under the numpy PCG64 RNG exclusively, so the output is a
+    deterministic function of ``rng`` state regardless of whether the
+    Rust backend is available. A single ``seed`` therefore produces
+    identical multiplier-bootstrap weights in both installs.
 
     Parameters
     ----------
@@ -243,6 +213,11 @@ def generate_bootstrap_weights_batch_numpy(
         )
 
 
+# Alias kept for backward compatibility with callers that imported the
+# explicit numpy variant; both names resolve to the same implementation.
+generate_bootstrap_weights_batch_numpy = generate_bootstrap_weights_batch
+
+
 def compute_percentile_ci(
     boot_dist: np.ndarray,
     alpha: float,
diff --git a/rust/Cargo.toml b/rust/Cargo.toml
index 1aad72a2..aa102fa6 100644
--- a/rust/Cargo.toml
+++ b/rust/Cargo.toml
@@ -25,8 +25,6 @@ openblas = ["ndarray/blas"]
 pyo3 = "0.28"
 numpy = "0.28"
 ndarray = { version = "0.17", features = ["rayon"] }
-rand = "0.8"
-rand_xoshiro = "0.6"
 rayon = "1.8"
 
 # Pure Rust linear algebra for SVD/matrix inversion (no external deps).
diff --git a/rust/src/bootstrap.rs b/rust/src/bootstrap.rs
deleted file mode 100644
index 71d5093f..00000000
--- a/rust/src/bootstrap.rs
+++ /dev/null
@@ -1,282 +0,0 @@
-//! Bootstrap weight generation for multiplier bootstrap inference.
-//!
-//! This module provides efficient generation of bootstrap weights
-//! using various distributions (Rademacher, Mammen, Webb).
-
-use ndarray::{Array2, Axis};
-use numpy::{PyArray2, ToPyArray};
-use pyo3::prelude::*;
-use rand::prelude::*;
-use rand_xoshiro::Xoshiro256PlusPlus;
-use rayon::prelude::*;
-
-/// Minimum number of bootstrap iterations per parallel task.
-/// This reduces scheduling overhead for large n_bootstrap values.
-const MIN_CHUNK_SIZE: usize = 64;
-
-/// Generate a batch of bootstrap weights.
-///
-/// Generates (n_bootstrap, n_units) matrix of bootstrap weights
-/// for multiplier bootstrap inference.
-///
-/// # Arguments
-/// * `n_bootstrap` - Number of bootstrap iterations
-/// * `n_units` - Number of units (clusters)
-/// * `weight_type` - Type of weights: "rademacher", "mammen", or "webb"
-/// * `seed` - Random seed for reproducibility
-///
-/// # Returns
-/// (n_bootstrap, n_units) array of bootstrap weights
-#[pyfunction]
-#[pyo3(signature = (n_bootstrap, n_units, weight_type, seed))]
-pub fn generate_bootstrap_weights_batch<'py>(
-    py: Python<'py>,
-    n_bootstrap: usize,
-    n_units: usize,
-    weight_type: &str,
-    seed: u64,
-) -> PyResult<Bound<'py, PyArray2<f64>>> {
-    let weights = match weight_type.to_lowercase().as_str() {
-        "rademacher" => generate_rademacher_batch(n_bootstrap, n_units, seed),
-        "mammen" => generate_mammen_batch(n_bootstrap, n_units, seed),
-        "webb" => generate_webb_batch(n_bootstrap, n_units, seed),
-        _ => {
-            return Err(PyErr::new::<pyo3::exceptions::PyValueError, _>(format!(
-                "Unknown weight type: {}. Expected 'rademacher', 'mammen', or 'webb'",
-                weight_type
-            )))
-        }
-    };
-
-    Ok(weights.to_pyarray(py))
-}
-
-/// Generate Rademacher weights: ±1 with equal probability.
-///
-/// E[w] = 0, Var[w] = 1
-fn generate_rademacher_batch(n_bootstrap: usize, n_units: usize, seed: u64) -> Array2<f64> {
-    // Pre-allocate output array - eliminates double allocation from Vec<Vec<f64>>
-    let mut weights = Array2::<f64>::zeros((n_bootstrap, n_units));
-
-    // Fill rows in parallel using rayon with chunk size tuning
-    weights
-        .axis_iter_mut(Axis(0))
-        .into_par_iter()
-        .with_min_len(MIN_CHUNK_SIZE)
-        .enumerate()
-        .for_each(|(i, mut row)| {
-            let mut rng = Xoshiro256PlusPlus::seed_from_u64(seed.wrapping_add(i as u64));
-            for elem in row.iter_mut() {
-                *elem = if rng.gen::<bool>() { 1.0 } else { -1.0 };
-            }
-        });
-
-    weights
-}
-
-/// Generate Mammen weights with two-point distribution.
-///
-/// w = -(√5 - 1)/2 with probability (√5 + 1)/(2√5)
-/// w = (√5 + 1)/2  with probability (√5 - 1)/(2√5)
-///
-/// E[w] = 0, E[w²] = 1, E[w³] = 1
-fn generate_mammen_batch(n_bootstrap: usize, n_units: usize, seed: u64) -> Array2<f64> {
-    let sqrt5 = 5.0_f64.sqrt();
-
-    // Two-point distribution values
-    let val_neg = -(sqrt5 - 1.0) / 2.0; // ≈ -0.618
-    let val_pos = (sqrt5 + 1.0) / 2.0; // ≈ 1.618
-
-    // Probability of negative value
-    let prob_neg = (sqrt5 + 1.0) / (2.0 * sqrt5); // ≈ 0.724
-
-    // Pre-allocate output array - eliminates double allocation
-    let mut weights = Array2::<f64>::zeros((n_bootstrap, n_units));
-
-    // Fill rows in parallel with chunk size tuning
-    weights
-        .axis_iter_mut(Axis(0))
-        .into_par_iter()
-        .with_min_len(MIN_CHUNK_SIZE)
-        .enumerate()
-        .for_each(|(i, mut row)| {
-            let mut rng = Xoshiro256PlusPlus::seed_from_u64(seed.wrapping_add(i as u64));
-            for elem in row.iter_mut() {
-                *elem = if rng.gen::<f64>() < prob_neg {
-                    val_neg
-                } else {
-                    val_pos
-                };
-            }
-        });
-
-    weights
-}
-
-/// Generate Webb 6-point distribution weights.
-///
-/// Six-point distribution with equal probabilities (1/6 each) matching R's `did` package:
-/// E[w] = 0, Var[w] = 1
-///
-/// Values: ±√(3/2), ±√(2/2)=±1, ±√(1/2)
-fn generate_webb_batch(n_bootstrap: usize, n_units: usize, seed: u64) -> Array2<f64> {
-    // Webb 6-point values
-    let val1 = (3.0_f64 / 2.0).sqrt(); // √(3/2) ≈ 1.2247
-    let val2 = 1.0_f64; // √(2/2) = 1.0
-    let val3 = (1.0_f64 / 2.0).sqrt(); // √(1/2) ≈ 0.7071
-
-    // Values in order: -val1, -val2, -val3, val3, val2, val1
-    let weights_table = [-val1, -val2, -val3, val3, val2, val1];
-
-    // Pre-allocate output array - eliminates double allocation
-    let mut weights = Array2::<f64>::zeros((n_bootstrap, n_units));
-
-    // Fill rows in parallel with chunk size tuning
-    // Use uniform selection (1/6 probability each) matching R's did package
-    weights
-        .axis_iter_mut(Axis(0))
-        .into_par_iter()
-        .with_min_len(MIN_CHUNK_SIZE)
-        .enumerate()
-        .for_each(|(i, mut row)| {
-            let mut rng = Xoshiro256PlusPlus::seed_from_u64(seed.wrapping_add(i as u64));
-            for elem in row.iter_mut() {
-                // Uniform selection: generate integer 0-5, index into weights_table
-                let bucket = rng.gen_range(0..6);
-                *elem = weights_table[bucket];
-            }
-        });
-
-    weights
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_rademacher_shape() {
-        let weights = generate_rademacher_batch(100, 50, 42);
-        assert_eq!(weights.shape(), &[100, 50]);
-    }
-
-    #[test]
-    fn test_rademacher_values() {
-        let weights = generate_rademacher_batch(10, 100, 42);
-
-        for w in weights.iter() {
-            assert!(*w == 1.0 || *w == -1.0, "Rademacher weight should be ±1");
-        }
-    }
-
-    #[test]
-    fn test_rademacher_mean_approx_zero() {
-        let weights = generate_rademacher_batch(1000, 1, 42);
-        let mean: f64 = weights.iter().sum::<f64>() / weights.len() as f64;
-
-        // With 1000 samples, mean should be close to 0
-        assert!(
-            mean.abs() < 0.1,
-            "Rademacher mean should be close to 0, got {}",
-            mean
-        );
-    }
-
-    #[test]
-    fn test_mammen_shape() {
-        let weights = generate_mammen_batch(100, 50, 42);
-        assert_eq!(weights.shape(), &[100, 50]);
-    }
-
-    #[test]
-    fn test_mammen_mean_approx_zero() {
-        let weights = generate_mammen_batch(1000, 1, 42);
-        let mean: f64 = weights.iter().sum::<f64>() / weights.len() as f64;
-
-        assert!(
-            mean.abs() < 0.1,
-            "Mammen mean should be close to 0, got {}",
-            mean
-        );
-    }
-
-    #[test]
-    fn test_webb_shape() {
-        let weights = generate_webb_batch(100, 50, 42);
-        assert_eq!(weights.shape(), &[100, 50]);
-    }
-
-    #[test]
-    fn test_reproducibility() {
-        let weights1 = generate_rademacher_batch(100, 50, 42);
-        let weights2 = generate_rademacher_batch(100, 50, 42);
-
-        // Same seed should produce same results
-        assert_eq!(weights1, weights2);
-    }
-
-    #[test]
-    fn test_different_seeds() {
-        let weights1 = generate_rademacher_batch(100, 50, 42);
-        let weights2 = generate_rademacher_batch(100, 50, 43);
-
-        // Different seeds should produce different results
-        assert_ne!(weights1, weights2);
-    }
-
-    #[test]
-    fn test_webb_mean_approx_zero() {
-        let weights = generate_webb_batch(10000, 1, 42);
-        let mean: f64 = weights.iter().sum::<f64>() / weights.len() as f64;
-
-        // With 10000 samples, mean should be close to 0
-        assert!(
-            mean.abs() < 0.1,
-            "Webb mean should be close to 0, got {}",
-            mean
-        );
-    }
-
-    #[test]
-    fn test_webb_variance_approx_correct() {
-        // Webb's 6-point distribution with values ±√(3/2), ±1, ±√(1/2)
-        // and equal probabilities (1/6 each) should have variance = 1.0
-        // This matches R's did package behavior.
-        // Theoretical: Var = (1/6) * (3/2 + 1 + 1/2 + 1/2 + 1 + 3/2) = (1/6) * 6 = 1.0
-        let weights = generate_webb_batch(10000, 100, 42);
-        let n = weights.len() as f64;
-        let mean: f64 = weights.iter().sum::<f64>() / n;
-        let variance: f64 = weights.iter().map(|x| (x - mean).powi(2)).sum::<f64>() / n;
-
-        // Theoretical variance = 1.0 with equal probabilities
-        // Allow some statistical variance in the estimate
-        assert!(
-            (variance - 1.0).abs() < 0.05,
-            "Webb variance should be ~1.0 (matching R's did package), got {}",
-            variance
-        );
-    }
-
-    #[test]
-    fn test_webb_values_correct() {
-        // Verify that Webb weights only take the expected 6 values
-        let weights = generate_webb_batch(100, 1000, 42);
-
-        let val1 = (3.0_f64 / 2.0).sqrt(); // ≈ 1.2247
-        let val2 = 1.0_f64;
-        let val3 = (1.0_f64 / 2.0).sqrt(); // ≈ 0.7071
-
-        let expected_values = [-val1, -val2, -val3, val3, val2, val1];
-
-        for w in weights.iter() {
-            let matches_expected = expected_values
-                .iter()
-                .any(|&expected| (*w - expected).abs() < 1e-10);
-            assert!(
-                matches_expected,
-                "Webb weight {} is not one of the expected values",
-                w
-            );
-        }
-    }
-}
diff --git a/rust/src/lib.rs b/rust/src/lib.rs
index cc8fd13a..c745548a 100644
--- a/rust/src/lib.rs
+++ b/rust/src/lib.rs
@@ -14,7 +14,6 @@ extern crate blas_src;
 use pyo3::prelude::*;
 use std::collections::HashMap;
 
-mod bootstrap;
 mod linalg;
 mod trop;
 mod weights;
@@ -22,12 +21,6 @@ mod weights;
 /// A Python module implemented in Rust for diff-diff acceleration.
 #[pymodule]
 fn _rust_backend(m: &Bound<'_, PyModule>) -> PyResult<()> {
-    // Bootstrap weight generation
-    m.add_function(wrap_pyfunction!(
-        bootstrap::generate_bootstrap_weights_batch,
-        m
-    )?)?;
-
     // Simplex projection
     m.add_function(wrap_pyfunction!(weights::project_simplex, m)?)?;
 
diff --git a/tests/test_bootstrap_utils.py b/tests/test_bootstrap_utils.py
index d4cb7ae1..a4f7e98c 100644
--- a/tests/test_bootstrap_utils.py
+++ b/tests/test_bootstrap_utils.py
@@ -8,6 +8,8 @@
 from diff_diff.bootstrap_utils import (
     compute_effect_bootstrap_stats,
     compute_effect_bootstrap_stats_batch,
+    generate_bootstrap_weights_batch,
+    generate_bootstrap_weights_batch_numpy,
     stratified_bootstrap_indices,
     warn_bootstrap_failure_rate,
 )
@@ -267,3 +269,124 @@ def test_empty_treated_pool(self):
         assert ctrl.shape == (4, 3)
         assert trt.shape == (4, 0)
         assert ctrl.min() >= 0 and ctrl.max() < 3
+
+
+class TestBootstrapWeightsBatchDistributions:
+    """Distributional coverage for the numpy multiplier-weight helper.
+
+    Ported from the deleted Rust-backend tests when the Rust
+    ``generate_bootstrap_weights_batch`` binding was removed (silent-failures
+    audit follow-up). The numpy helper is now the sole code path, so these
+    tests cover shape, support, and statistical moments for each of the
+    Rademacher, Mammen, and Webb multiplier distributions.
+    """
+
+    def test_alias_is_same_function(self):
+        """``generate_bootstrap_weights_batch_numpy`` is a back-compat alias."""
+        assert generate_bootstrap_weights_batch_numpy is generate_bootstrap_weights_batch
+
+    def test_invalid_weight_type_raises(self):
+        rng = np.random.default_rng(0)
+        with pytest.raises(ValueError, match="weight_type must be"):
+            generate_bootstrap_weights_batch(5, 3, "unknown", rng)
+
+    def test_rademacher_shape_and_support(self):
+        rng = np.random.default_rng(42)
+        weights = generate_bootstrap_weights_batch(100, 50, "rademacher", rng)
+        assert weights.shape == (100, 50)
+        assert set(np.unique(weights)) == {-1.0, 1.0}
+
+    def test_rademacher_mean_and_variance(self):
+        rng = np.random.default_rng(42)
+        weights = generate_bootstrap_weights_batch(10000, 100, "rademacher", rng)
+        assert abs(weights.mean()) < 0.02, f"mean={weights.mean()}"
+        assert abs(weights.var() - 1.0) < 0.02, f"var={weights.var()}"
+
+    def test_mammen_moments(self):
+        rng = np.random.default_rng(42)
+        weights = generate_bootstrap_weights_batch(10000, 100, "mammen", rng)
+        # Mammen: E[w]=0, E[w^2]=1, E[w^3]=1
+        assert abs(weights.mean()) < 0.02, f"E[w]={weights.mean()}"
+        assert abs((weights**2).mean() - 1.0) < 0.02, f"E[w^2]={(weights ** 2).mean()}"
+        assert abs((weights**3).mean() - 1.0) < 0.1, f"E[w^3]={(weights ** 3).mean()}"
+        # Support is exactly two values
+        unique = np.unique(weights)
+        assert len(unique) == 2, f"Mammen has two-point support, got {len(unique)}"
+
+    def test_webb_support_and_moments(self):
+        rng = np.random.default_rng(42)
+        weights = generate_bootstrap_weights_batch(10000, 100, "webb", rng)
+        # Webb: six-point support, E[w]=0, Var[w]=1
+        unique = np.unique(weights)
+        assert len(unique) == 6, f"Webb has six-point support, got {len(unique)}"
+        assert abs(weights.mean()) < 0.05, f"E[w]={weights.mean()}"
+        assert abs(weights.var() - 1.0) < 0.05, f"Var[w]={weights.var()}"
+
+    def test_reproducibility_same_seed(self):
+        rng1 = np.random.default_rng(42)
+        rng2 = np.random.default_rng(42)
+        w1 = generate_bootstrap_weights_batch(100, 50, "rademacher", rng1)
+        w2 = generate_bootstrap_weights_batch(100, 50, "rademacher", rng2)
+        np.testing.assert_array_equal(w1, w2)
+
+    def test_different_seeds_differ(self):
+        rng1 = np.random.default_rng(42)
+        rng2 = np.random.default_rng(43)
+        w1 = generate_bootstrap_weights_batch(100, 50, "rademacher", rng1)
+        w2 = generate_bootstrap_weights_batch(100, 50, "rademacher", rng2)
+        assert not np.array_equal(w1, w2)
+
+    def test_value_pin_default_rng_42_rademacher(self):
+        """Byte-level pin of ``default_rng(42)`` Rademacher output.
+
+        Catches silent numpy-RNG stream drift across numpy versions. Mirror
+        of the ``TestStratifiedBootstrapIndices::test_value_pin_default_rng_42``
+        pattern.
+        """
+        rng = np.random.default_rng(42)
+        w = generate_bootstrap_weights_batch(5, 3, "rademacher", rng)
+        expected = np.array(
+            [
+                [-1.0, 1.0, 1.0],
+                [-1.0, -1.0, 1.0],
+                [-1.0, 1.0, -1.0],
+                [-1.0, 1.0, 1.0],
+                [1.0, 1.0, 1.0],
+            ]
+        )
+        np.testing.assert_array_equal(w, expected)
+
+    def test_value_pin_default_rng_42_mammen(self):
+        """Byte-level pin of ``default_rng(42)`` Mammen output."""
+        rng = np.random.default_rng(42)
+        w = generate_bootstrap_weights_batch(5, 3, "mammen", rng)
+        sqrt5 = np.sqrt(5.0)
+        neg = -(sqrt5 - 1.0) / 2.0
+        pos = (sqrt5 + 1.0) / 2.0
+        expected = np.array(
+            [
+                [pos, neg, pos],
+                [neg, neg, pos],
+                [pos, pos, neg],
+                [neg, neg, pos],
+                [neg, pos, neg],
+            ]
+        )
+        np.testing.assert_allclose(w, expected, atol=1e-14, rtol=1e-14)
+
+    def test_value_pin_default_rng_42_webb(self):
+        """Byte-level pin of ``default_rng(42)`` Webb output."""
+        rng = np.random.default_rng(42)
+        w = generate_bootstrap_weights_batch(5, 3, "webb", rng)
+        v3 = np.sqrt(3.0 / 2.0)
+        v1 = np.sqrt(1.0 / 2.0)
+        expected = np.array(
+            [
+                [-v3, 1.0, v1],
+                [-v1, -v1, v3],
+                [-v3, 1.0, -1.0],
+                [-v3, v1, v3],
+                [1.0, 1.0, 1.0],
+            ]
+        )
+        np.testing.assert_allclose(w, expected, atol=1e-14, rtol=1e-14)
diff --git a/tests/test_rust_backend.py b/tests/test_rust_backend.py
index 48d1f0e5..75271de7 100644
--- a/tests/test_rust_backend.py
+++ b/tests/test_rust_backend.py
@@ -43,62 +43,35 @@ def test_rust_backend_info(self):
     # Bootstrap Weight Tests
     # =========================================================================
 
-    def test_bootstrap_weights_shape(self):
-        """Test bootstrap weights have correct shape."""
-        from diff_diff._rust_backend import generate_bootstrap_weights_batch
-
-        n_bootstrap, n_units = 100, 50
-        weights = generate_bootstrap_weights_batch(n_bootstrap, n_units, "rademacher", 42)
-        assert weights.shape == (n_bootstrap, n_units)
-
-    def test_rademacher_weights_values(self):
-        """Test Rademacher weights are +-1."""
-        from diff_diff._rust_backend import generate_bootstrap_weights_batch
-
-        weights = generate_bootstrap_weights_batch(100, 50, "rademacher", 42)
-        unique_vals = np.unique(weights)
-        assert len(unique_vals) == 2
-        assert set(unique_vals) == {-1.0, 1.0}
-
-    def test_rademacher_weights_mean_zero(self):
-        """Test Rademacher weights have approximately zero mean."""
-        from diff_diff._rust_backend import generate_bootstrap_weights_batch
-
-        weights = generate_bootstrap_weights_batch(10000, 1, "rademacher", 42)
-        mean = weights.mean()
-        assert abs(mean) < 0.05, f"Rademacher mean should be ~0, got {mean}"
-
-    def test_mammen_weights_mean_zero(self):
-        """Test Mammen weights have approximately zero mean."""
-        from diff_diff._rust_backend import generate_bootstrap_weights_batch
-
-        weights = generate_bootstrap_weights_batch(10000, 1, "mammen", 42)
-        mean = weights.mean()
-        assert abs(mean) < 0.05, f"Mammen mean should be ~0, got {mean}"
-
-    def test_webb_weights_mean_zero(self):
-        """Test Webb weights have approximately zero mean."""
-        from diff_diff._rust_backend import generate_bootstrap_weights_batch
-
-        weights = generate_bootstrap_weights_batch(10000, 1, "webb", 42)
-        mean = weights.mean()
-        assert abs(mean) < 0.1, f"Webb mean should be ~0, got {mean}"
-
-    def test_bootstrap_reproducibility(self):
-        """Test bootstrap weights are reproducible with same seed."""
-        from diff_diff._rust_backend import generate_bootstrap_weights_batch
-
-        weights1 = generate_bootstrap_weights_batch(100, 50, "rademacher", 42)
-        weights2 = generate_bootstrap_weights_batch(100, 50, "rademacher", 42)
-        np.testing.assert_array_equal(weights1, weights2)
+    # The direct-Rust multiplier-bootstrap weight tests were removed when the
+    # Rust `generate_bootstrap_weights_batch` PyO3 binding was deleted as part
+    # of the silent-failures audit follow-up. Python's numpy implementation is
+    # now the canonical (and only) path, so `seed` is backend-invariant across
+    # all six downstream multiplier-bootstrap estimators. Distributional
+    # coverage lives in
+    # `tests/test_bootstrap_utils.py::TestBootstrapWeightsBatchDistributions`.
+
+    def test_rust_bootstrap_weights_batch_is_removed(self):
+        """Regression guard against accidental re-export of the deleted
+        `generate_bootstrap_weights_batch` PyO3 binding. The Rust RNG path
+        diverged from Python's numpy PCG64 under fixed seeds, producing
+        different multiplier-bootstrap SE / CI / p-values depending on
+        whether the Rust backend was compiled in. Removing the binding and
+        routing everything through the numpy helper closes that silent
+        failure. If this test fails, someone reintroduced the binding —
+        audit the reason before adding it back."""
+        import diff_diff._rust_backend as rb
 
-    def test_bootstrap_different_seeds(self):
-        """Test different seeds produce different weights."""
-        from diff_diff._rust_backend import generate_bootstrap_weights_batch
+        with pytest.raises(ImportError):
+            from diff_diff._rust_backend import (  # noqa: F401
+                generate_bootstrap_weights_batch,
+            )
 
-        weights1 = generate_bootstrap_weights_batch(100, 50, "rademacher", 42)
-        weights2 = generate_bootstrap_weights_batch(100, 50, "rademacher", 43)
-        assert not np.array_equal(weights1, weights2)
+        assert not hasattr(rb, "generate_bootstrap_weights_batch"), (
+            "generate_bootstrap_weights_batch was removed from the Rust "
+            "backend to restore cross-backend RNG parity for multiplier "
+            "bootstrap; its presence here indicates accidental re-export."
+        )
 
     # =========================================================================
     # Synthetic Weight Tests
@@ -664,55 +637,10 @@ def test_robust_vcov_clustered_match(self):
     # Bootstrap Weights Equivalence (Statistical Properties)
     # =========================================================================
 
-    def test_bootstrap_weights_rademacher_properties(self):
-        """Test Rust Rademacher weights have correct statistical properties."""
-        from diff_diff._rust_backend import generate_bootstrap_weights_batch as rust_fn
-
-        # Generate large sample for statistical tests
-        n_bootstrap, n_units = 10000, 100
-        weights = rust_fn(n_bootstrap, n_units, "rademacher", 42)
-
-        # Rademacher: values are +-1, mean ~0, variance ~1
-        unique_vals = np.unique(weights)
-        assert set(unique_vals) == {-1.0, 1.0}, "Rademacher weights should be +-1"
-
-        mean = weights.mean()
-        assert abs(mean) < 0.02, f"Rademacher mean should be ~0, got {mean}"
-
-        var = weights.var()
-        assert abs(var - 1.0) < 0.02, f"Rademacher variance should be ~1, got {var}"
-
-    def test_bootstrap_weights_mammen_properties(self):
-        """Test Rust Mammen weights have correct statistical properties."""
-        from diff_diff._rust_backend import generate_bootstrap_weights_batch as rust_fn
-
-        n_bootstrap, n_units = 10000, 100
-        weights = rust_fn(n_bootstrap, n_units, "mammen", 42)
-
-        # Mammen: E[w] = 0, E[w^2] = 1, E[w^3] = 1
-        mean = weights.mean()
-        assert abs(mean) < 0.02, f"Mammen mean should be ~0, got {mean}"
-
-        second_moment = (weights ** 2).mean()
-        assert abs(second_moment - 1.0) < 0.02, f"Mammen E[w^2] should be ~1, got {second_moment}"
-
-        third_moment = (weights ** 3).mean()
-        assert abs(third_moment - 1.0) < 0.1, f"Mammen E[w^3] should be ~1, got {third_moment}"
-
-    def test_bootstrap_weights_webb_properties(self):
-        """Test Rust Webb weights have correct statistical properties."""
-        from diff_diff._rust_backend import generate_bootstrap_weights_batch as rust_fn
-
-        n_bootstrap, n_units = 10000, 100
-        weights = rust_fn(n_bootstrap, n_units, "webb", 42)
-
-        # Webb: 6-point distribution with E[w] = 0
-        mean = weights.mean()
-        assert abs(mean) < 0.1, f"Webb mean should be ~0, got {mean}"
-
-        # Should have 6 unique values
-        unique_vals = np.unique(weights.flatten())
-        assert len(unique_vals) == 6, f"Webb should have 6 unique values, got {len(unique_vals)}"
+    # Moved to
+    # `tests/test_bootstrap_utils.py::TestBootstrapWeightsBatchDistributions`
+    # after the Rust `generate_bootstrap_weights_batch` binding was removed
+    # and the Python numpy helper became the canonical path.
 
     # =========================================================================
     # Synthetic Weights Equivalence

From d58fa6497558d2ea3540e21d233bdcaee4e68f3c Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Fri, 24 Apr 2026 16:41:17 -0400
Subject: [PATCH 2/3] Address PR #362 CI review P3: docstring precision on rng
 contract

The helper accepts any ``np.random.Generator``, not specifically PCG64;
library callers happen to seed via ``np.random.default_rng``.
Clarifies that backend invariance is with respect to the supplied
generator state, not a specific engine.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 diff_diff/bootstrap_utils.py | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/diff_diff/bootstrap_utils.py b/diff_diff/bootstrap_utils.py
index 8023e35b..4ffa589e 100644
--- a/diff_diff/bootstrap_utils.py
+++ b/diff_diff/bootstrap_utils.py
@@ -166,10 +166,11 @@ def generate_bootstrap_weights_batch(
     """
     Generate all bootstrap weights at once (vectorized).
 
-    Runs under the numpy PCG64 RNG exclusively, so the output is a
-    deterministic function of ``rng`` state regardless of whether the
-    Rust backend is available. A single ``seed`` therefore produces
-    identical multiplier-bootstrap weights in both installs.
+    Output is a deterministic function of the supplied ``rng`` state,
+    so a single ``seed`` produces identical multiplier-bootstrap weights
+    regardless of whether the Rust backend is compiled in. (Library
+    call sites seed via ``np.random.default_rng`` — PCG64 — but any
+    ``np.random.Generator`` is accepted.)
 
     Parameters
     ----------

From 9af7199787dcd2ba578bbf75015dd992e46b52ef Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Fri, 24 Apr 2026 17:40:28 -0400
Subject: [PATCH 3/3] Fix CI: collapse dCDH bootstrap-baseline to single
 backend-invariant value
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The PR's RNG unification is working as intended — Rust now produces the
same bootstrap SE as pure-Python under the same seed — but the pinned
`TestBootstrapCellPeriod::test_bootstrap_se_matches_pre_pr4_baseline`
guard held two separate per-backend baselines from the pre-RNG-unified
era. The Rust backend now produces exactly the value that was formerly
pinned as the Python-only baseline (0.3030802540369796), so the split
is obsolete. Collapse the two constants into one, drop the HAS_RUST_BACKEND
branch + os env-var check, and document the convergence in the comment.

The regression guard's semantic is unchanged: under PSU=group the
dispatcher still routes through the legacy group-level bootstrap path,
and the assertion is still ULP-precision bit-identity against the
captured baseline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 tests/test_survey_dcdh.py | 47 ++++++++++++++-------------------------
 1 file changed, 17 insertions(+), 30 deletions(-)

diff --git a/tests/test_survey_dcdh.py b/tests/test_survey_dcdh.py
index 69b5a400..11769b7c 100644
--- a/tests/test_survey_dcdh.py
+++ b/tests/test_survey_dcdh.py
@@ -1,6 +1,5 @@
 """Survey support tests for ChaisemartinDHaultfoeuille (dCDH)."""
 
-import os
 from typing import Optional
 
 import numpy as np
@@ -1815,17 +1814,16 @@ class TestBootstrapCellPeriod:
     """
 
     # Captured on pre-PR-4 code (origin/main at SHA ac181b7f — PR #329
-    # merge) via a scratch fit on the fixture below. Pinned here as
-    # the bit-identity regression guard for the dispatcher's
-    # PSU-within-group-constant legacy-path routing. If this test
-    # drifts, the dispatcher is no longer reproducing pre-PR-4
-    # behavior under PSU=group and the legacy fast path has
-    # regressed. Values differ between Rust and pure-Python backends
-    # because `_generate_bootstrap_weights_batch` and `solve_ols`
-    # take different numeric paths; bit-identity still holds within
-    # each backend.
-    _BASELINE_OVERALL_SE_RUST = 0.30560839419979546
-    _BASELINE_OVERALL_SE_PYTHON = 0.3030802540369796
+    # merge) via a scratch fit on the fixture below, on the pure-Python
+    # backend. Pinned here as the bit-identity regression guard for the
+    # dispatcher's PSU-within-group-constant legacy-path routing: if
+    # this test drifts, the dispatcher is no longer reproducing pre-PR-4
+    # behavior under PSU=group and the legacy fast path has regressed.
+    # Both backends now converge on this value — PR #362 unified the
+    # multiplier-weight RNG on the numpy PCG64 path (previously Rust
+    # seeded Xoshiro per-row, producing a different Rust baseline of
+    # 0.30560839419979546 under the same fixture).
+    _BASELINE_OVERALL_SE = 0.3030802540369796
 
     @staticmethod
     def _make_baseline_fixture() -> pd.DataFrame:
@@ -1861,21 +1859,11 @@ def _make_baseline_fixture() -> pd.DataFrame:
     def test_bootstrap_se_matches_pre_pr4_baseline(self):
         """Bit-identity regression guard: under PSU=group the
         dispatcher routes through the legacy group-level bootstrap
-        path, so the overall bootstrap SE must match pre-PR-4 code
-        to ULP precision. The baseline values were captured on
-        `origin/main` at `ac181b7f` (the PR #329 merge) under each
-        backend independently.
+        path, so the overall bootstrap SE must match the pre-PR-4
+        pure-Python baseline to ULP precision under either backend
+        (PR #362 unified the multiplier-weight RNG on numpy PCG64, so
+        Rust and Python now produce the same value).
         """
-        from diff_diff._backend import HAS_RUST_BACKEND
-        pure_python = (
-            os.environ.get("DIFF_DIFF_BACKEND", "auto").lower() == "python"
-            or not HAS_RUST_BACKEND
-        )
-        expected = (
-            self._BASELINE_OVERALL_SE_PYTHON
-            if pure_python
-            else self._BASELINE_OVERALL_SE_RUST
-        )
         df_ = self._make_baseline_fixture()
         sd = SurveyDesign(weights="pw", psu="group")
         res = ChaisemartinDHaultfoeuille(n_bootstrap=500, seed=42).fit(
@@ -1885,12 +1873,11 @@ def test_bootstrap_se_matches_pre_pr4_baseline(self):
         )
         assert res.bootstrap_results is not None
         observed_se = float(res.bootstrap_results.overall_se)
-        backend_label = "pure-python" if pure_python else "rust"
         assert observed_se == pytest.approx(
-            expected, rel=0.0, abs=1e-15,
+            self._BASELINE_OVERALL_SE, rel=0.0, abs=1e-15,
         ), (
-            f"Bootstrap SE drifted from pre-PR-4 baseline "
-            f"(backend={backend_label}). expected={expected!r}, "
+            f"Bootstrap SE drifted from pre-PR-4 baseline. "
+            f"expected={self._BASELINE_OVERALL_SE!r}, "
             f"observed={observed_se!r}. The dispatcher's "
             f"PSU-within-group-constant routing is no longer "
             f"bit-identical to the legacy group-level bootstrap."