From 4f3b163d373ab6ebcb4e58ad8840a2f058c87c0d Mon Sep 17 00:00:00 2001 From: igerber Date: Fri, 24 Apr 2026 16:19:42 -0400 Subject: [PATCH 1/3] Drop Rust multiplier-bootstrap weight helper; numpy path is canonical MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes the silent-failures audit follow-up for the Rust multiplier-weight RNG. `rust/src/bootstrap.rs::generate_bootstrap_weights_batch` seeded `Xoshiro256PlusPlus::seed_from_u64(seed + i)` per row while `diff_diff.bootstrap_utils.generate_bootstrap_weights_batch_numpy` ran under `numpy.random.default_rng` (PCG64), so the same `seed` produced different Rademacher/Mammen/Webb weights depending on whether the Rust backend was compiled in — and therefore different multiplier-bootstrap SE / CI / p-values in CallawaySantAnna, ContinuousDiD, ImputationDiD, TwoStageDiD, ChaisemartinDHaultfoeuille, and EfficientDiD. Unlike the TROP bootstrap fix (PR #354), this Rust helper did nothing after generating weights, so threading pre-generated weights through PyO3 would have left the function as a pointless pass-through. Deleted the `rust/src/bootstrap.rs` module entirely together with the orphaned `rand` + `rand_xoshiro` Cargo deps; the Python shim now calls the numpy implementation directly (with `_numpy` kept as an alias for backward compatibility). The three distributional property tests move to `tests/test_bootstrap_utils.py::TestBootstrapWeightsBatchDistributions` including byte-level seed-pinned baselines for each distribution. `test_rust_bootstrap_weights_batch_is_removed` mirrors the existing `test_compute_synthetic_weights_is_removed` pattern as a regression guard against accidental re-export. Users running with the Rust backend compiled in will see different-but-equally-valid bootstrap SE / CI / p-values under the same `seed` (PCG64 replaces Xoshiro); pure-Python users are unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 1 + TODO.md | 1 - diff_diff/__init__.py | 1 - diff_diff/_backend.py | 4 - diff_diff/bootstrap_utils.py | 43 ++---- rust/Cargo.toml | 2 - rust/src/bootstrap.rs | 282 ---------------------------------- rust/src/lib.rs | 7 - tests/test_bootstrap_utils.py | 123 +++++++++++++++ tests/test_rust_backend.py | 134 ++++------------ 10 files changed, 164 insertions(+), 434 deletions(-) delete mode 100644 rust/src/bootstrap.rs diff --git a/CHANGELOG.md b/CHANGELOG.md index 0f2357b8..3e6ec889 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -27,6 +27,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 1. Removed the `if self._precomputed is not None:` branch that silently substituted `self._precomputed["Y"]` / `["D"]` / `["time_dist_matrix"]` (original-panel cache populated during main fit) for the function-argument `Y, D`. Under bootstrap, `_fit_with_fixed_lambda` computes fresh `Y, D` from the resampled `boot_data` and passes them in; the helper was discarding those and recomputing unit distances from the original panel, so Python's local bootstrap resampled units but reused stale unit-distance weights. Rust's bootstrap was already correct (always consumed `y_boot, d_boot`). 2. Removed the `valid_control_at_t = D[t, :] == 0` target-period donor gate that zeroed `ω_j` for any unit `j` treated at the target period (other than the target unit itself). Per REGISTRY Eq. 2/3 and Rust's `compute_weight_matrix`, `ω_j = exp(-λ_unit × dist(j, i))` for all `j ≠ i`; treated-cell exclusion happens via the `(1 − W_{js})` factor applied inside `_estimate_model`. Same-cohort donors now contribute via their pre-treatment rows. Empirically the main-fit ATT is unchanged on tested fixtures because same-cohort pre-treatment observations are exactly absorbed by their own unit fixed effect `alpha_j` without propagating into `mu`, `beta`, or other units' parameters — so this change is structural alignment rather than a numerical shift in output. Users on same-cohort panels with very few controls may still see tiny differences in edge cases; the new `test_local_method_same_cohort_donor_parity` regression guards the aligned behavior. Together with the normalization fix above, TROP local-method backend parity on the main-fit ATT is regime-dependent: `atol=rtol=1e-14` for `lambda_nn=inf` (no nuclear-norm regularization, uniform weight scaling leaves the WLS argmin invariant) and `atol=1e-10` for finite `lambda_nn` (FISTA inner loop + BLAS reduction ordering introduce sub-1e-10 roundoff across Rust `faer` vs numpy paths). Bootstrap SE parity is asserted at `atol=1e-5` to accommodate ~1e-7 roundoff between Rust's `estimate_model` matrix factorization and numpy's `lstsq` that accumulates across per-replicate fits; sub-1e-14 bootstrap parity is tracked as a follow-up in `TODO.md` under "unify Rust local-method solver path". Closes silent-failures audit finding #23 (local-method half; the RNG half closed in PR #354 and the grid-search half in PR #348). +- **Multiplier-bootstrap weight generation (`generate_bootstrap_weights_batch`) no longer diverges across backends under a fixed `seed`.** Previously the Rust helper in `rust/src/bootstrap.rs` seeded `Xoshiro256PlusPlus::seed_from_u64(seed + i)` per row while the Python fallback consumed `numpy.random.default_rng` (PCG64), so the same `seed` produced different Rademacher/Mammen/Webb multiplier weights depending on whether the Rust backend was compiled in — and therefore different bootstrap SE / CI / p-values in the six downstream estimators routed through `diff_diff.bootstrap_utils.generate_bootstrap_weights_batch`: `CallawaySantAnna` (non-survey multiplier), `ContinuousDiD`, `ImputationDiD`, `TwoStageDiD`, `ChaisemartinDHaultfoeuille` (multi-horizon PSU + group-level + PSU-broadcast multiplier bootstrap), and `EfficientDiD` (cluster + unit multiplier bootstrap). Fixed by making the Python numpy path canonical and removing the Rust batch-weight helper entirely (including the `rand` and `rand_xoshiro` Cargo dependencies and the `rust/src/bootstrap.rs` module) — the Rust function only generated weights, so routing weights through Python-side `default_rng` left nothing for Rust to do. Sampling distributions (Rademacher, Mammen, Webb) and numerical identities are unchanged. **User-visible effect**: users who were previously running with the Rust backend compiled in will see different-but-equally-valid multiplier-bootstrap SE / CI / p-values under the same `seed` after this change (PCG64 replaces Xoshiro); pure-Python-backend users are unaffected. Bootstrap runtime may regress single-digit percent on very wide bootstraps (`n_units > 10000`) where Rust's rayon parallelism was previously providing a modest weight-generation speedup; sub-second at typical sizes (`n_bootstrap=999, n_units ≤ 1000`). Regression guard `tests/test_rust_backend.py::TestRustBackend::test_rust_bootstrap_weights_batch_is_removed` asserts the binding is not reintroduced; distributional coverage moves to `tests/test_bootstrap_utils.py::TestBootstrapWeightsBatchDistributions` (11 tests including seed-pinned byte-level baselines for each of the three distributions). Closes the silent-failures audit TODO row for the Rust multiplier-weight RNG. ### Changed - **`did_had_pretest_workflow(aggregate="event_study")` verdict no longer emits the "paper step 2 deferred to Phase 3 follow-up" caveat** — the joint pre-trends Stute test closes that gap. The two-period `aggregate="overall"` path retains the existing caveat since the joint variant does not apply to single-pre-period panels. Downstream code that greps verdict strings for the Phase 3 caveat will see it suppressed on the event-study path. diff --git a/TODO.md b/TODO.md index 5c34735b..5534f87b 100644 --- a/TODO.md +++ b/TODO.md @@ -84,7 +84,6 @@ Deferred items from PR reviews that were not addressed before merge. | Regenerate `benchmarks/data/clubsandwich_cr2_golden.json` from R (`Rscript benchmarks/R/generate_clubsandwich_golden.R`). Current JSON has `source: python_self_reference` as a stability anchor until an authoritative R run. | `benchmarks/R/generate_clubsandwich_golden.R` | Phase 1a | Medium | | `honest_did.py:1907` `np.linalg.solve(A_sys, b_sys) / except LinAlgError: continue` is a silent basis-rejection in the vertex-enumeration loop that is algorithmically intentional (try the next basis). Consider surfacing a count of rejected bases as a diagnostic when ARP enumeration exhausts, so users see when the vertex search was heavily constrained. Not a silent failure in the sense of the Phase 2 audit (the algorithm is supposed to skip), but the diagnostic would help debug borderline cases. | `honest_did.py` | #334 | Low | | Unify Rust local-method `estimate_model` solver path to `solve_wls_svd` (the same SVD helper used by the global-method since PR #348) for sub-1e-14 bootstrap SE parity. Current local-method bootstrap parity test (`tests/test_rust_backend.py::TestTROPRustEdgeCaseParity::test_bootstrap_seed_reproducibility_local`) passes at `atol=1e-5` — the residual ~1e-7 gap is roundoff between Rust's `estimate_model` matrix factorization and numpy's `lstsq`, which accumulates differently across per-replicate bootstrap fits. Main-fit ATT parity is regime-dependent (`atol=1e-14` for `lambda_nn=inf`, `atol=1e-10` for finite `lambda_nn` — see `test_local_method_main_fit_parity`); the bootstrap gap is a same-solver-path roundoff concern and not a user-visible correctness bug. | `rust/src/trop.rs::estimate_model`, `rust/src/linalg.rs::solve_wls_svd` | follow-up | Low | -| Rust multiplier-bootstrap weight RNG (`generate_bootstrap_weights_batch` in `rust/src/bootstrap.rs:9-10, 57-75`) uses `Xoshiro256PlusPlus::seed_from_u64(seed + i)` per row for Rademacher/Mammen/Webb generation. If any Python caller (SDID / efficient-DiD multiplier bootstrap) has a numpy-canonical equivalent, the two backends likely diverge under the same seed. Audit Python callers (`diff_diff/sdid.py`, `diff_diff/efficient_did_bootstrap.py`, `diff_diff/bootstrap_utils.py::generate_bootstrap_weights_batch_numpy`) for parity-test gaps. Same fix shape as TROP RNG parity (PR #354): pre-generate weights in Python via numpy and pass them to Rust through PyO3. | `rust/src/bootstrap.rs`, `diff_diff/bootstrap_utils.py` | follow-up | Medium | | `bias_corrected_local_linear`: extend golden parity to `kernel="triangular"` and `kernel="uniform"` (currently epa-only; all three kernels share `kernel_W` and the `lprobust` math, so parity is expected but not separately asserted). | `benchmarks/R/generate_nprobust_lprobust_golden.R`, `tests/test_bias_corrected_lprobust.py` | Phase 1c | Low | | `bias_corrected_local_linear`: expose `vce in {"hc0", "hc1", "hc2", "hc3"}` on the public wrapper once R parity goldens exist (currently raises `NotImplementedError`). The port-level `lprobust` and `lprobust_res` already support all four; expanding the public surface requires a golden generator for each hc mode and a decision on hc2/hc3 q-fit leverage (R reuses p-fit `hii` for q-fit residuals; whether to match that or stage-match deserves a derivation before the wrapper advertises CCT-2014 conformance). | `diff_diff/local_linear.py::bias_corrected_local_linear`, `benchmarks/R/generate_nprobust_lprobust_golden.R`, `tests/test_bias_corrected_lprobust.py` | Phase 1c | Medium | | `bias_corrected_local_linear`: support `weights=` once survey-design adaptation lands. nprobust's `lprobust` has no weight argument so there is no parity anchor; derivation needed. | `diff_diff/local_linear.py`, `diff_diff/_nprobust_port.py::lprobust` | Phase 1c | Medium | diff --git a/diff_diff/__init__.py b/diff_diff/__init__.py index 4a9b93c4..4f770273 100644 --- a/diff_diff/__init__.py +++ b/diff_diff/__init__.py @@ -17,7 +17,6 @@ # Import backend detection from dedicated module (avoids circular imports) from diff_diff._backend import ( HAS_RUST_BACKEND, - _rust_bootstrap_weights, _rust_compute_robust_vcov, _rust_project_simplex, _rust_solve_ols, diff --git a/diff_diff/_backend.py b/diff_diff/_backend.py index f763f87c..2c9b6b04 100644 --- a/diff_diff/_backend.py +++ b/diff_diff/_backend.py @@ -18,7 +18,6 @@ # Try to import Rust backend for accelerated operations try: from diff_diff._rust_backend import ( - generate_bootstrap_weights_batch as _rust_bootstrap_weights, project_simplex as _rust_project_simplex, solve_ols as _rust_solve_ols, compute_robust_vcov as _rust_compute_robust_vcov, @@ -44,7 +43,6 @@ _rust_available = True except ImportError: _rust_available = False - _rust_bootstrap_weights = None _rust_project_simplex = None _rust_solve_ols = None _rust_compute_robust_vcov = None @@ -69,7 +67,6 @@ if _backend_env == "python": # Force pure Python mode - disable Rust even if available HAS_RUST_BACKEND = False - _rust_bootstrap_weights = None _rust_project_simplex = None _rust_solve_ols = None _rust_compute_robust_vcov = None @@ -120,7 +117,6 @@ def rust_backend_info(): __all__ = [ "HAS_RUST_BACKEND", "rust_backend_info", - "_rust_bootstrap_weights", "_rust_project_simplex", "_rust_solve_ols", "_rust_compute_robust_vcov", diff --git a/diff_diff/bootstrap_utils.py b/diff_diff/bootstrap_utils.py index 44d94546..8023e35b 100644 --- a/diff_diff/bootstrap_utils.py +++ b/diff_diff/bootstrap_utils.py @@ -10,8 +10,6 @@ import numpy as np -from diff_diff._backend import HAS_RUST_BACKEND, _rust_bootstrap_weights - __all__ = [ "generate_bootstrap_weights", "generate_bootstrap_weights_batch", @@ -168,38 +166,10 @@ def generate_bootstrap_weights_batch( """ Generate all bootstrap weights at once (vectorized). - Uses Rust backend if available for parallel generation. - - Parameters - ---------- - n_bootstrap : int - Number of bootstrap iterations. - n_units : int - Number of units (clusters) to generate weights for. - weight_type : str - Type of weights: "rademacher", "mammen", or "webb". - rng : np.random.Generator - Random number generator. - - Returns - ------- - np.ndarray - Array of bootstrap weights with shape (n_bootstrap, n_units). - """ - if HAS_RUST_BACKEND and _rust_bootstrap_weights is not None: - seed = rng.integers(0, 2**63 - 1) - return _rust_bootstrap_weights(n_bootstrap, n_units, weight_type, seed) - return generate_bootstrap_weights_batch_numpy(n_bootstrap, n_units, weight_type, rng) - - -def generate_bootstrap_weights_batch_numpy( - n_bootstrap: int, - n_units: int, - weight_type: str, - rng: np.random.Generator, -) -> np.ndarray: - """ - NumPy fallback implementation of :func:`generate_bootstrap_weights_batch`. + Runs under the numpy PCG64 RNG exclusively, so the output is a + deterministic function of ``rng`` state regardless of whether the + Rust backend is available. A single ``seed`` therefore produces + identical multiplier-bootstrap weights in both installs. Parameters ---------- @@ -243,6 +213,11 @@ def generate_bootstrap_weights_batch_numpy( ) +# Alias kept for backward compatibility with callers that imported the +# explicit numpy variant; both names resolve to the same implementation. +generate_bootstrap_weights_batch_numpy = generate_bootstrap_weights_batch + + def compute_percentile_ci( boot_dist: np.ndarray, alpha: float, diff --git a/rust/Cargo.toml b/rust/Cargo.toml index 1aad72a2..aa102fa6 100644 --- a/rust/Cargo.toml +++ b/rust/Cargo.toml @@ -25,8 +25,6 @@ openblas = ["ndarray/blas"] pyo3 = "0.28" numpy = "0.28" ndarray = { version = "0.17", features = ["rayon"] } -rand = "0.8" -rand_xoshiro = "0.6" rayon = "1.8" # Pure Rust linear algebra for SVD/matrix inversion (no external deps). diff --git a/rust/src/bootstrap.rs b/rust/src/bootstrap.rs deleted file mode 100644 index 71d5093f..00000000 --- a/rust/src/bootstrap.rs +++ /dev/null @@ -1,282 +0,0 @@ -//! Bootstrap weight generation for multiplier bootstrap inference. -//! -//! This module provides efficient generation of bootstrap weights -//! using various distributions (Rademacher, Mammen, Webb). - -use ndarray::{Array2, Axis}; -use numpy::{PyArray2, ToPyArray}; -use pyo3::prelude::*; -use rand::prelude::*; -use rand_xoshiro::Xoshiro256PlusPlus; -use rayon::prelude::*; - -/// Minimum number of bootstrap iterations per parallel task. -/// This reduces scheduling overhead for large n_bootstrap values. -const MIN_CHUNK_SIZE: usize = 64; - -/// Generate a batch of bootstrap weights. -/// -/// Generates (n_bootstrap, n_units) matrix of bootstrap weights -/// for multiplier bootstrap inference. -/// -/// # Arguments -/// * `n_bootstrap` - Number of bootstrap iterations -/// * `n_units` - Number of units (clusters) -/// * `weight_type` - Type of weights: "rademacher", "mammen", or "webb" -/// * `seed` - Random seed for reproducibility -/// -/// # Returns -/// (n_bootstrap, n_units) array of bootstrap weights -#[pyfunction] -#[pyo3(signature = (n_bootstrap, n_units, weight_type, seed))] -pub fn generate_bootstrap_weights_batch<'py>( - py: Python<'py>, - n_bootstrap: usize, - n_units: usize, - weight_type: &str, - seed: u64, -) -> PyResult>> { - let weights = match weight_type.to_lowercase().as_str() { - "rademacher" => generate_rademacher_batch(n_bootstrap, n_units, seed), - "mammen" => generate_mammen_batch(n_bootstrap, n_units, seed), - "webb" => generate_webb_batch(n_bootstrap, n_units, seed), - _ => { - return Err(PyErr::new::(format!( - "Unknown weight type: {}. Expected 'rademacher', 'mammen', or 'webb'", - weight_type - ))) - } - }; - - Ok(weights.to_pyarray(py)) -} - -/// Generate Rademacher weights: ±1 with equal probability. -/// -/// E[w] = 0, Var[w] = 1 -fn generate_rademacher_batch(n_bootstrap: usize, n_units: usize, seed: u64) -> Array2 { - // Pre-allocate output array - eliminates double allocation from Vec> - let mut weights = Array2::::zeros((n_bootstrap, n_units)); - - // Fill rows in parallel using rayon with chunk size tuning - weights - .axis_iter_mut(Axis(0)) - .into_par_iter() - .with_min_len(MIN_CHUNK_SIZE) - .enumerate() - .for_each(|(i, mut row)| { - let mut rng = Xoshiro256PlusPlus::seed_from_u64(seed.wrapping_add(i as u64)); - for elem in row.iter_mut() { - *elem = if rng.gen::() { 1.0 } else { -1.0 }; - } - }); - - weights -} - -/// Generate Mammen weights with two-point distribution. -/// -/// w = -(√5 - 1)/2 with probability (√5 + 1)/(2√5) -/// w = (√5 + 1)/2 with probability (√5 - 1)/(2√5) -/// -/// E[w] = 0, E[w²] = 1, E[w³] = 1 -fn generate_mammen_batch(n_bootstrap: usize, n_units: usize, seed: u64) -> Array2 { - let sqrt5 = 5.0_f64.sqrt(); - - // Two-point distribution values - let val_neg = -(sqrt5 - 1.0) / 2.0; // ≈ -0.618 - let val_pos = (sqrt5 + 1.0) / 2.0; // ≈ 1.618 - - // Probability of negative value - let prob_neg = (sqrt5 + 1.0) / (2.0 * sqrt5); // ≈ 0.724 - - // Pre-allocate output array - eliminates double allocation - let mut weights = Array2::::zeros((n_bootstrap, n_units)); - - // Fill rows in parallel with chunk size tuning - weights - .axis_iter_mut(Axis(0)) - .into_par_iter() - .with_min_len(MIN_CHUNK_SIZE) - .enumerate() - .for_each(|(i, mut row)| { - let mut rng = Xoshiro256PlusPlus::seed_from_u64(seed.wrapping_add(i as u64)); - for elem in row.iter_mut() { - *elem = if rng.gen::() < prob_neg { - val_neg - } else { - val_pos - }; - } - }); - - weights -} - -/// Generate Webb 6-point distribution weights. -/// -/// Six-point distribution with equal probabilities (1/6 each) matching R's `did` package: -/// E[w] = 0, Var[w] = 1 -/// -/// Values: ±√(3/2), ±√(2/2)=±1, ±√(1/2) -fn generate_webb_batch(n_bootstrap: usize, n_units: usize, seed: u64) -> Array2 { - // Webb 6-point values - let val1 = (3.0_f64 / 2.0).sqrt(); // √(3/2) ≈ 1.2247 - let val2 = 1.0_f64; // √(2/2) = 1.0 - let val3 = (1.0_f64 / 2.0).sqrt(); // √(1/2) ≈ 0.7071 - - // Values in order: -val1, -val2, -val3, val3, val2, val1 - let weights_table = [-val1, -val2, -val3, val3, val2, val1]; - - // Pre-allocate output array - eliminates double allocation - let mut weights = Array2::::zeros((n_bootstrap, n_units)); - - // Fill rows in parallel with chunk size tuning - // Use uniform selection (1/6 probability each) matching R's did package - weights - .axis_iter_mut(Axis(0)) - .into_par_iter() - .with_min_len(MIN_CHUNK_SIZE) - .enumerate() - .for_each(|(i, mut row)| { - let mut rng = Xoshiro256PlusPlus::seed_from_u64(seed.wrapping_add(i as u64)); - for elem in row.iter_mut() { - // Uniform selection: generate integer 0-5, index into weights_table - let bucket = rng.gen_range(0..6); - *elem = weights_table[bucket]; - } - }); - - weights -} - -#[cfg(test)] -mod tests { - use super::*; - - #[test] - fn test_rademacher_shape() { - let weights = generate_rademacher_batch(100, 50, 42); - assert_eq!(weights.shape(), &[100, 50]); - } - - #[test] - fn test_rademacher_values() { - let weights = generate_rademacher_batch(10, 100, 42); - - for w in weights.iter() { - assert!(*w == 1.0 || *w == -1.0, "Rademacher weight should be ±1"); - } - } - - #[test] - fn test_rademacher_mean_approx_zero() { - let weights = generate_rademacher_batch(1000, 1, 42); - let mean: f64 = weights.iter().sum::() / weights.len() as f64; - - // With 1000 samples, mean should be close to 0 - assert!( - mean.abs() < 0.1, - "Rademacher mean should be close to 0, got {}", - mean - ); - } - - #[test] - fn test_mammen_shape() { - let weights = generate_mammen_batch(100, 50, 42); - assert_eq!(weights.shape(), &[100, 50]); - } - - #[test] - fn test_mammen_mean_approx_zero() { - let weights = generate_mammen_batch(1000, 1, 42); - let mean: f64 = weights.iter().sum::() / weights.len() as f64; - - assert!( - mean.abs() < 0.1, - "Mammen mean should be close to 0, got {}", - mean - ); - } - - #[test] - fn test_webb_shape() { - let weights = generate_webb_batch(100, 50, 42); - assert_eq!(weights.shape(), &[100, 50]); - } - - #[test] - fn test_reproducibility() { - let weights1 = generate_rademacher_batch(100, 50, 42); - let weights2 = generate_rademacher_batch(100, 50, 42); - - // Same seed should produce same results - assert_eq!(weights1, weights2); - } - - #[test] - fn test_different_seeds() { - let weights1 = generate_rademacher_batch(100, 50, 42); - let weights2 = generate_rademacher_batch(100, 50, 43); - - // Different seeds should produce different results - assert_ne!(weights1, weights2); - } - - #[test] - fn test_webb_mean_approx_zero() { - let weights = generate_webb_batch(10000, 1, 42); - let mean: f64 = weights.iter().sum::() / weights.len() as f64; - - // With 10000 samples, mean should be close to 0 - assert!( - mean.abs() < 0.1, - "Webb mean should be close to 0, got {}", - mean - ); - } - - #[test] - fn test_webb_variance_approx_correct() { - // Webb's 6-point distribution with values ±√(3/2), ±1, ±√(1/2) - // and equal probabilities (1/6 each) should have variance = 1.0 - // This matches R's did package behavior. - // Theoretical: Var = (1/6) * (3/2 + 1 + 1/2 + 1/2 + 1 + 3/2) = (1/6) * 6 = 1.0 - let weights = generate_webb_batch(10000, 100, 42); - let n = weights.len() as f64; - let mean: f64 = weights.iter().sum::() / n; - let variance: f64 = weights.iter().map(|x| (x - mean).powi(2)).sum::() / n; - - // Theoretical variance = 1.0 with equal probabilities - // Allow some statistical variance in the estimate - assert!( - (variance - 1.0).abs() < 0.05, - "Webb variance should be ~1.0 (matching R's did package), got {}", - variance - ); - } - - #[test] - fn test_webb_values_correct() { - // Verify that Webb weights only take the expected 6 values - let weights = generate_webb_batch(100, 1000, 42); - - let val1 = (3.0_f64 / 2.0).sqrt(); // ≈ 1.2247 - let val2 = 1.0_f64; - let val3 = (1.0_f64 / 2.0).sqrt(); // ≈ 0.7071 - - let expected_values = [-val1, -val2, -val3, val3, val2, val1]; - - for w in weights.iter() { - let matches_expected = expected_values - .iter() - .any(|&expected| (*w - expected).abs() < 1e-10); - assert!( - matches_expected, - "Webb weight {} is not one of the expected values", - w - ); - } - } -} diff --git a/rust/src/lib.rs b/rust/src/lib.rs index cc8fd13a..c745548a 100644 --- a/rust/src/lib.rs +++ b/rust/src/lib.rs @@ -14,7 +14,6 @@ extern crate blas_src; use pyo3::prelude::*; use std::collections::HashMap; -mod bootstrap; mod linalg; mod trop; mod weights; @@ -22,12 +21,6 @@ mod weights; /// A Python module implemented in Rust for diff-diff acceleration. #[pymodule] fn _rust_backend(m: &Bound<'_, PyModule>) -> PyResult<()> { - // Bootstrap weight generation - m.add_function(wrap_pyfunction!( - bootstrap::generate_bootstrap_weights_batch, - m - )?)?; - // Simplex projection m.add_function(wrap_pyfunction!(weights::project_simplex, m)?)?; diff --git a/tests/test_bootstrap_utils.py b/tests/test_bootstrap_utils.py index d4cb7ae1..a4f7e98c 100644 --- a/tests/test_bootstrap_utils.py +++ b/tests/test_bootstrap_utils.py @@ -8,6 +8,8 @@ from diff_diff.bootstrap_utils import ( compute_effect_bootstrap_stats, compute_effect_bootstrap_stats_batch, + generate_bootstrap_weights_batch, + generate_bootstrap_weights_batch_numpy, stratified_bootstrap_indices, warn_bootstrap_failure_rate, ) @@ -267,3 +269,124 @@ def test_empty_treated_pool(self): assert ctrl.shape == (4, 3) assert trt.shape == (4, 0) assert ctrl.min() >= 0 and ctrl.max() < 3 + + +class TestBootstrapWeightsBatchDistributions: + """Distributional coverage for the numpy multiplier-weight helper. + + Ported from the deleted Rust-backend tests when the Rust + ``generate_bootstrap_weights_batch`` binding was removed (silent-failures + audit follow-up). The numpy helper is now the sole code path, so these + tests cover shape, support, and statistical moments for each of the + Rademacher, Mammen, and Webb multiplier distributions. + """ + + def test_alias_is_same_function(self): + """``generate_bootstrap_weights_batch_numpy`` is a back-compat alias.""" + assert generate_bootstrap_weights_batch_numpy is generate_bootstrap_weights_batch + + def test_invalid_weight_type_raises(self): + rng = np.random.default_rng(0) + with pytest.raises(ValueError, match="weight_type must be"): + generate_bootstrap_weights_batch(5, 3, "unknown", rng) + + def test_rademacher_shape_and_support(self): + rng = np.random.default_rng(42) + weights = generate_bootstrap_weights_batch(100, 50, "rademacher", rng) + assert weights.shape == (100, 50) + assert set(np.unique(weights)) == {-1.0, 1.0} + + def test_rademacher_mean_and_variance(self): + rng = np.random.default_rng(42) + weights = generate_bootstrap_weights_batch(10000, 100, "rademacher", rng) + assert abs(weights.mean()) < 0.02, f"mean={weights.mean()}" + assert abs(weights.var() - 1.0) < 0.02, f"var={weights.var()}" + + def test_mammen_moments(self): + rng = np.random.default_rng(42) + weights = generate_bootstrap_weights_batch(10000, 100, "mammen", rng) + # Mammen: E[w]=0, E[w^2]=1, E[w^3]=1 + assert abs(weights.mean()) < 0.02, f"E[w]={weights.mean()}" + assert abs((weights**2).mean() - 1.0) < 0.02, f"E[w^2]={(weights ** 2).mean()}" + assert abs((weights**3).mean() - 1.0) < 0.1, f"E[w^3]={(weights ** 3).mean()}" + # Support is exactly two values + unique = np.unique(weights) + assert len(unique) == 2, f"Mammen has two-point support, got {len(unique)}" + + def test_webb_support_and_moments(self): + rng = np.random.default_rng(42) + weights = generate_bootstrap_weights_batch(10000, 100, "webb", rng) + # Webb: six-point support, E[w]=0, Var[w]=1 + unique = np.unique(weights) + assert len(unique) == 6, f"Webb has six-point support, got {len(unique)}" + assert abs(weights.mean()) < 0.05, f"E[w]={weights.mean()}" + assert abs(weights.var() - 1.0) < 0.05, f"Var[w]={weights.var()}" + + def test_reproducibility_same_seed(self): + rng1 = np.random.default_rng(42) + rng2 = np.random.default_rng(42) + w1 = generate_bootstrap_weights_batch(100, 50, "rademacher", rng1) + w2 = generate_bootstrap_weights_batch(100, 50, "rademacher", rng2) + np.testing.assert_array_equal(w1, w2) + + def test_different_seeds_differ(self): + rng1 = np.random.default_rng(42) + rng2 = np.random.default_rng(43) + w1 = generate_bootstrap_weights_batch(100, 50, "rademacher", rng1) + w2 = generate_bootstrap_weights_batch(100, 50, "rademacher", rng2) + assert not np.array_equal(w1, w2) + + def test_value_pin_default_rng_42_rademacher(self): + """Byte-level pin of ``default_rng(42)`` Rademacher output. + + Catches silent numpy-RNG stream drift across numpy versions. Mirror + of the ``TestStratifiedBootstrapIndices::test_value_pin_default_rng_42`` + pattern. + """ + rng = np.random.default_rng(42) + w = generate_bootstrap_weights_batch(5, 3, "rademacher", rng) + expected = np.array( + [ + [-1.0, 1.0, 1.0], + [-1.0, -1.0, 1.0], + [-1.0, 1.0, -1.0], + [-1.0, 1.0, 1.0], + [1.0, 1.0, 1.0], + ] + ) + np.testing.assert_array_equal(w, expected) + + def test_value_pin_default_rng_42_mammen(self): + """Byte-level pin of ``default_rng(42)`` Mammen output.""" + rng = np.random.default_rng(42) + w = generate_bootstrap_weights_batch(5, 3, "mammen", rng) + sqrt5 = np.sqrt(5.0) + neg = -(sqrt5 - 1.0) / 2.0 + pos = (sqrt5 + 1.0) / 2.0 + expected = np.array( + [ + [pos, neg, pos], + [neg, neg, pos], + [pos, pos, neg], + [neg, neg, pos], + [neg, pos, neg], + ] + ) + np.testing.assert_allclose(w, expected, atol=1e-14, rtol=1e-14) + + def test_value_pin_default_rng_42_webb(self): + """Byte-level pin of ``default_rng(42)`` Webb output.""" + rng = np.random.default_rng(42) + w = generate_bootstrap_weights_batch(5, 3, "webb", rng) + v3 = np.sqrt(3.0 / 2.0) + v1 = np.sqrt(1.0 / 2.0) + expected = np.array( + [ + [-v3, 1.0, v1], + [-v1, -v1, v3], + [-v3, 1.0, -1.0], + [-v3, v1, v3], + [1.0, 1.0, 1.0], + ] + ) + np.testing.assert_allclose(w, expected, atol=1e-14, rtol=1e-14) diff --git a/tests/test_rust_backend.py b/tests/test_rust_backend.py index 48d1f0e5..75271de7 100644 --- a/tests/test_rust_backend.py +++ b/tests/test_rust_backend.py @@ -43,62 +43,35 @@ def test_rust_backend_info(self): # Bootstrap Weight Tests # ========================================================================= - def test_bootstrap_weights_shape(self): - """Test bootstrap weights have correct shape.""" - from diff_diff._rust_backend import generate_bootstrap_weights_batch - - n_bootstrap, n_units = 100, 50 - weights = generate_bootstrap_weights_batch(n_bootstrap, n_units, "rademacher", 42) - assert weights.shape == (n_bootstrap, n_units) - - def test_rademacher_weights_values(self): - """Test Rademacher weights are +-1.""" - from diff_diff._rust_backend import generate_bootstrap_weights_batch - - weights = generate_bootstrap_weights_batch(100, 50, "rademacher", 42) - unique_vals = np.unique(weights) - assert len(unique_vals) == 2 - assert set(unique_vals) == {-1.0, 1.0} - - def test_rademacher_weights_mean_zero(self): - """Test Rademacher weights have approximately zero mean.""" - from diff_diff._rust_backend import generate_bootstrap_weights_batch - - weights = generate_bootstrap_weights_batch(10000, 1, "rademacher", 42) - mean = weights.mean() - assert abs(mean) < 0.05, f"Rademacher mean should be ~0, got {mean}" - - def test_mammen_weights_mean_zero(self): - """Test Mammen weights have approximately zero mean.""" - from diff_diff._rust_backend import generate_bootstrap_weights_batch - - weights = generate_bootstrap_weights_batch(10000, 1, "mammen", 42) - mean = weights.mean() - assert abs(mean) < 0.05, f"Mammen mean should be ~0, got {mean}" - - def test_webb_weights_mean_zero(self): - """Test Webb weights have approximately zero mean.""" - from diff_diff._rust_backend import generate_bootstrap_weights_batch - - weights = generate_bootstrap_weights_batch(10000, 1, "webb", 42) - mean = weights.mean() - assert abs(mean) < 0.1, f"Webb mean should be ~0, got {mean}" - - def test_bootstrap_reproducibility(self): - """Test bootstrap weights are reproducible with same seed.""" - from diff_diff._rust_backend import generate_bootstrap_weights_batch - - weights1 = generate_bootstrap_weights_batch(100, 50, "rademacher", 42) - weights2 = generate_bootstrap_weights_batch(100, 50, "rademacher", 42) - np.testing.assert_array_equal(weights1, weights2) + # The direct-Rust multiplier-bootstrap weight tests were removed when the + # Rust `generate_bootstrap_weights_batch` PyO3 binding was deleted as part + # of the silent-failures audit follow-up. Python's numpy implementation is + # now the canonical (and only) path, so `seed` is backend-invariant across + # all six downstream multiplier-bootstrap estimators. Distributional + # coverage lives in + # `tests/test_bootstrap_utils.py::TestBootstrapWeightsBatchDistributions`. + + def test_rust_bootstrap_weights_batch_is_removed(self): + """Regression guard against accidental re-export of the deleted + `generate_bootstrap_weights_batch` PyO3 binding. The Rust RNG path + diverged from Python's numpy PCG64 under fixed seeds, producing + different multiplier-bootstrap SE / CI / p-values depending on + whether the Rust backend was compiled in. Removing the binding and + routing everything through the numpy helper closes that silent + failure. If this test fails, someone reintroduced the binding — + audit the reason before adding it back.""" + import diff_diff._rust_backend as rb - def test_bootstrap_different_seeds(self): - """Test different seeds produce different weights.""" - from diff_diff._rust_backend import generate_bootstrap_weights_batch + with pytest.raises(ImportError): + from diff_diff._rust_backend import ( # noqa: F401 + generate_bootstrap_weights_batch, + ) - weights1 = generate_bootstrap_weights_batch(100, 50, "rademacher", 42) - weights2 = generate_bootstrap_weights_batch(100, 50, "rademacher", 43) - assert not np.array_equal(weights1, weights2) + assert not hasattr(rb, "generate_bootstrap_weights_batch"), ( + "generate_bootstrap_weights_batch was removed from the Rust " + "backend to restore cross-backend RNG parity for multiplier " + "bootstrap; its presence here indicates accidental re-export." + ) # ========================================================================= # Synthetic Weight Tests @@ -664,55 +637,10 @@ def test_robust_vcov_clustered_match(self): # Bootstrap Weights Equivalence (Statistical Properties) # ========================================================================= - def test_bootstrap_weights_rademacher_properties(self): - """Test Rust Rademacher weights have correct statistical properties.""" - from diff_diff._rust_backend import generate_bootstrap_weights_batch as rust_fn - - # Generate large sample for statistical tests - n_bootstrap, n_units = 10000, 100 - weights = rust_fn(n_bootstrap, n_units, "rademacher", 42) - - # Rademacher: values are +-1, mean ~0, variance ~1 - unique_vals = np.unique(weights) - assert set(unique_vals) == {-1.0, 1.0}, "Rademacher weights should be +-1" - - mean = weights.mean() - assert abs(mean) < 0.02, f"Rademacher mean should be ~0, got {mean}" - - var = weights.var() - assert abs(var - 1.0) < 0.02, f"Rademacher variance should be ~1, got {var}" - - def test_bootstrap_weights_mammen_properties(self): - """Test Rust Mammen weights have correct statistical properties.""" - from diff_diff._rust_backend import generate_bootstrap_weights_batch as rust_fn - - n_bootstrap, n_units = 10000, 100 - weights = rust_fn(n_bootstrap, n_units, "mammen", 42) - - # Mammen: E[w] = 0, E[w^2] = 1, E[w^3] = 1 - mean = weights.mean() - assert abs(mean) < 0.02, f"Mammen mean should be ~0, got {mean}" - - second_moment = (weights ** 2).mean() - assert abs(second_moment - 1.0) < 0.02, f"Mammen E[w^2] should be ~1, got {second_moment}" - - third_moment = (weights ** 3).mean() - assert abs(third_moment - 1.0) < 0.1, f"Mammen E[w^3] should be ~1, got {third_moment}" - - def test_bootstrap_weights_webb_properties(self): - """Test Rust Webb weights have correct statistical properties.""" - from diff_diff._rust_backend import generate_bootstrap_weights_batch as rust_fn - - n_bootstrap, n_units = 10000, 100 - weights = rust_fn(n_bootstrap, n_units, "webb", 42) - - # Webb: 6-point distribution with E[w] = 0 - mean = weights.mean() - assert abs(mean) < 0.1, f"Webb mean should be ~0, got {mean}" - - # Should have 6 unique values - unique_vals = np.unique(weights.flatten()) - assert len(unique_vals) == 6, f"Webb should have 6 unique values, got {len(unique_vals)}" + # Moved to + # `tests/test_bootstrap_utils.py::TestBootstrapWeightsBatchDistributions` + # after the Rust `generate_bootstrap_weights_batch` binding was removed + # and the Python numpy helper became the canonical path. # ========================================================================= # Synthetic Weights Equivalence From d58fa6497558d2ea3540e21d233bdcaee4e68f3c Mon Sep 17 00:00:00 2001 From: igerber Date: Fri, 24 Apr 2026 16:41:17 -0400 Subject: [PATCH 2/3] Address PR #362 CI review P3: docstring precision on rng contract The helper accepts any ``np.random.Generator``, not specifically PCG64; library callers happen to seed via ``np.random.default_rng``. Clarifies that backend invariance is with respect to the supplied generator state, not a specific engine. Co-Authored-By: Claude Opus 4.7 (1M context) --- diff_diff/bootstrap_utils.py | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/diff_diff/bootstrap_utils.py b/diff_diff/bootstrap_utils.py index 8023e35b..4ffa589e 100644 --- a/diff_diff/bootstrap_utils.py +++ b/diff_diff/bootstrap_utils.py @@ -166,10 +166,11 @@ def generate_bootstrap_weights_batch( """ Generate all bootstrap weights at once (vectorized). - Runs under the numpy PCG64 RNG exclusively, so the output is a - deterministic function of ``rng`` state regardless of whether the - Rust backend is available. A single ``seed`` therefore produces - identical multiplier-bootstrap weights in both installs. + Output is a deterministic function of the supplied ``rng`` state, + so a single ``seed`` produces identical multiplier-bootstrap weights + regardless of whether the Rust backend is compiled in. (Library + call sites seed via ``np.random.default_rng`` — PCG64 — but any + ``np.random.Generator`` is accepted.) Parameters ---------- From 9af7199787dcd2ba578bbf75015dd992e46b52ef Mon Sep 17 00:00:00 2001 From: igerber Date: Fri, 24 Apr 2026 17:40:28 -0400 Subject: [PATCH 3/3] Fix CI: collapse dCDH bootstrap-baseline to single backend-invariant value MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The PR's RNG unification is working as intended — Rust now produces the same bootstrap SE as pure-Python under the same seed — but the pinned `TestBootstrapCellPeriod::test_bootstrap_se_matches_pre_pr4_baseline` guard held two separate per-backend baselines from the pre-RNG-unified era. The Rust backend now produces exactly the value that was formerly pinned as the Python-only baseline (0.3030802540369796), so the split is obsolete. Collapse the two constants into one, drop the HAS_RUST_BACKEND branch + os env-var check, and document the convergence in the comment. The regression guard's semantic is unchanged: under PSU=group the dispatcher still routes through the legacy group-level bootstrap path, and the assertion is still ULP-precision bit-identity against the captured baseline. Co-Authored-By: Claude Opus 4.7 (1M context) --- tests/test_survey_dcdh.py | 47 ++++++++++++++------------------------- 1 file changed, 17 insertions(+), 30 deletions(-) diff --git a/tests/test_survey_dcdh.py b/tests/test_survey_dcdh.py index 69b5a400..11769b7c 100644 --- a/tests/test_survey_dcdh.py +++ b/tests/test_survey_dcdh.py @@ -1,6 +1,5 @@ """Survey support tests for ChaisemartinDHaultfoeuille (dCDH).""" -import os from typing import Optional import numpy as np @@ -1815,17 +1814,16 @@ class TestBootstrapCellPeriod: """ # Captured on pre-PR-4 code (origin/main at SHA ac181b7f — PR #329 - # merge) via a scratch fit on the fixture below. Pinned here as - # the bit-identity regression guard for the dispatcher's - # PSU-within-group-constant legacy-path routing. If this test - # drifts, the dispatcher is no longer reproducing pre-PR-4 - # behavior under PSU=group and the legacy fast path has - # regressed. Values differ between Rust and pure-Python backends - # because `_generate_bootstrap_weights_batch` and `solve_ols` - # take different numeric paths; bit-identity still holds within - # each backend. - _BASELINE_OVERALL_SE_RUST = 0.30560839419979546 - _BASELINE_OVERALL_SE_PYTHON = 0.3030802540369796 + # merge) via a scratch fit on the fixture below, on the pure-Python + # backend. Pinned here as the bit-identity regression guard for the + # dispatcher's PSU-within-group-constant legacy-path routing: if + # this test drifts, the dispatcher is no longer reproducing pre-PR-4 + # behavior under PSU=group and the legacy fast path has regressed. + # Both backends now converge on this value — PR #362 unified the + # multiplier-weight RNG on the numpy PCG64 path (previously Rust + # seeded Xoshiro per-row, producing a different Rust baseline of + # 0.30560839419979546 under the same fixture). + _BASELINE_OVERALL_SE = 0.3030802540369796 @staticmethod def _make_baseline_fixture() -> pd.DataFrame: @@ -1861,21 +1859,11 @@ def _make_baseline_fixture() -> pd.DataFrame: def test_bootstrap_se_matches_pre_pr4_baseline(self): """Bit-identity regression guard: under PSU=group the dispatcher routes through the legacy group-level bootstrap - path, so the overall bootstrap SE must match pre-PR-4 code - to ULP precision. The baseline values were captured on - `origin/main` at `ac181b7f` (the PR #329 merge) under each - backend independently. + path, so the overall bootstrap SE must match the pre-PR-4 + pure-Python baseline to ULP precision under either backend + (PR #362 unified the multiplier-weight RNG on numpy PCG64, so + Rust and Python now produce the same value). """ - from diff_diff._backend import HAS_RUST_BACKEND - pure_python = ( - os.environ.get("DIFF_DIFF_BACKEND", "auto").lower() == "python" - or not HAS_RUST_BACKEND - ) - expected = ( - self._BASELINE_OVERALL_SE_PYTHON - if pure_python - else self._BASELINE_OVERALL_SE_RUST - ) df_ = self._make_baseline_fixture() sd = SurveyDesign(weights="pw", psu="group") res = ChaisemartinDHaultfoeuille(n_bootstrap=500, seed=42).fit( @@ -1885,12 +1873,11 @@ def test_bootstrap_se_matches_pre_pr4_baseline(self): ) assert res.bootstrap_results is not None observed_se = float(res.bootstrap_results.overall_se) - backend_label = "pure-python" if pure_python else "rust" assert observed_se == pytest.approx( - expected, rel=0.0, abs=1e-15, + self._BASELINE_OVERALL_SE, rel=0.0, abs=1e-15, ), ( - f"Bootstrap SE drifted from pre-PR-4 baseline " - f"(backend={backend_label}). expected={expected!r}, " + f"Bootstrap SE drifted from pre-PR-4 baseline. " + f"expected={self._BASELINE_OVERALL_SE!r}, " f"observed={observed_se!r}. The dispatcher's " f"PSU-within-group-constant routing is no longer " f"bit-identical to the legacy group-level bootstrap."