Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1. Removed the `if self._precomputed is not None:` branch that silently substituted `self._precomputed["Y"]` / `["D"]` / `["time_dist_matrix"]` (original-panel cache populated during main fit) for the function-argument `Y, D`. Under bootstrap, `_fit_with_fixed_lambda` computes fresh `Y, D` from the resampled `boot_data` and passes them in; the helper was discarding those and recomputing unit distances from the original panel, so Python's local bootstrap resampled units but reused stale unit-distance weights. Rust's bootstrap was already correct (always consumed `y_boot, d_boot`).
2. Removed the `valid_control_at_t = D[t, :] == 0` target-period donor gate that zeroed `ω_j` for any unit `j` treated at the target period (other than the target unit itself). Per REGISTRY Eq. 2/3 and Rust's `compute_weight_matrix`, `ω_j = exp(-λ_unit × dist(j, i))` for all `j ≠ i`; treated-cell exclusion happens via the `(1 − W_{js})` factor applied inside `_estimate_model`. Same-cohort donors now contribute via their pre-treatment rows. Empirically the main-fit ATT is unchanged on tested fixtures because same-cohort pre-treatment observations are exactly absorbed by their own unit fixed effect `alpha_j` without propagating into `mu`, `beta`, or other units' parameters — so this change is structural alignment rather than a numerical shift in output. Users on same-cohort panels with very few controls may still see tiny differences in edge cases; the new `test_local_method_same_cohort_donor_parity` regression guards the aligned behavior.
Together with the normalization fix above, TROP local-method backend parity on the main-fit ATT is regime-dependent: `atol=rtol=1e-14` for `lambda_nn=inf` (no nuclear-norm regularization, uniform weight scaling leaves the WLS argmin invariant) and `atol=1e-10` for finite `lambda_nn` (FISTA inner loop + BLAS reduction ordering introduce sub-1e-10 roundoff across Rust `faer` vs numpy paths). Bootstrap SE parity is asserted at `atol=1e-5` to accommodate ~1e-7 roundoff between Rust's `estimate_model` matrix factorization and numpy's `lstsq` that accumulates across per-replicate fits; sub-1e-14 bootstrap parity is tracked as a follow-up in `TODO.md` under "unify Rust local-method solver path". Closes silent-failures audit finding #23 (local-method half; the RNG half closed in PR #354 and the grid-search half in PR #348).
- **Multiplier-bootstrap weight generation (`generate_bootstrap_weights_batch`) no longer diverges across backends under a fixed `seed`.** Previously the Rust helper in `rust/src/bootstrap.rs` seeded `Xoshiro256PlusPlus::seed_from_u64(seed + i)` per row while the Python fallback consumed `numpy.random.default_rng` (PCG64), so the same `seed` produced different Rademacher/Mammen/Webb multiplier weights depending on whether the Rust backend was compiled in — and therefore different bootstrap SE / CI / p-values in the six downstream estimators routed through `diff_diff.bootstrap_utils.generate_bootstrap_weights_batch`: `CallawaySantAnna` (non-survey multiplier), `ContinuousDiD`, `ImputationDiD`, `TwoStageDiD`, `ChaisemartinDHaultfoeuille` (multi-horizon PSU + group-level + PSU-broadcast multiplier bootstrap), and `EfficientDiD` (cluster + unit multiplier bootstrap). Fixed by making the Python numpy path canonical and removing the Rust batch-weight helper entirely (including the `rand` and `rand_xoshiro` Cargo dependencies and the `rust/src/bootstrap.rs` module) — the Rust function only generated weights, so routing weights through Python-side `default_rng` left nothing for Rust to do. Sampling distributions (Rademacher, Mammen, Webb) and numerical identities are unchanged. **User-visible effect**: users who were previously running with the Rust backend compiled in will see different-but-equally-valid multiplier-bootstrap SE / CI / p-values under the same `seed` after this change (PCG64 replaces Xoshiro); pure-Python-backend users are unaffected. Bootstrap runtime may regress single-digit percent on very wide bootstraps (`n_units > 10000`) where Rust's rayon parallelism was previously providing a modest weight-generation speedup; sub-second at typical sizes (`n_bootstrap=999, n_units ≤ 1000`). Regression guard `tests/test_rust_backend.py::TestRustBackend::test_rust_bootstrap_weights_batch_is_removed` asserts the binding is not reintroduced; distributional coverage moves to `tests/test_bootstrap_utils.py::TestBootstrapWeightsBatchDistributions` (11 tests including seed-pinned byte-level baselines for each of the three distributions). Closes the silent-failures audit TODO row for the Rust multiplier-weight RNG.

### Changed
- **`did_had_pretest_workflow(aggregate="event_study")` verdict no longer emits the "paper step 2 deferred to Phase 3 follow-up" caveat** — the joint pre-trends Stute test closes that gap. The two-period `aggregate="overall"` path retains the existing caveat since the joint variant does not apply to single-pre-period panels. Downstream code that greps verdict strings for the Phase 3 caveat will see it suppressed on the event-study path.
Expand Down
1 change: 0 additions & 1 deletion TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,6 @@ Deferred items from PR reviews that were not addressed before merge.
| Regenerate `benchmarks/data/clubsandwich_cr2_golden.json` from R (`Rscript benchmarks/R/generate_clubsandwich_golden.R`). Current JSON has `source: python_self_reference` as a stability anchor until an authoritative R run. | `benchmarks/R/generate_clubsandwich_golden.R` | Phase 1a | Medium |
| `honest_did.py:1907` `np.linalg.solve(A_sys, b_sys) / except LinAlgError: continue` is a silent basis-rejection in the vertex-enumeration loop that is algorithmically intentional (try the next basis). Consider surfacing a count of rejected bases as a diagnostic when ARP enumeration exhausts, so users see when the vertex search was heavily constrained. Not a silent failure in the sense of the Phase 2 audit (the algorithm is supposed to skip), but the diagnostic would help debug borderline cases. | `honest_did.py` | #334 | Low |
| Unify Rust local-method `estimate_model` solver path to `solve_wls_svd` (the same SVD helper used by the global-method since PR #348) for sub-1e-14 bootstrap SE parity. Current local-method bootstrap parity test (`tests/test_rust_backend.py::TestTROPRustEdgeCaseParity::test_bootstrap_seed_reproducibility_local`) passes at `atol=1e-5` — the residual ~1e-7 gap is roundoff between Rust's `estimate_model` matrix factorization and numpy's `lstsq`, which accumulates differently across per-replicate bootstrap fits. Main-fit ATT parity is regime-dependent (`atol=1e-14` for `lambda_nn=inf`, `atol=1e-10` for finite `lambda_nn` — see `test_local_method_main_fit_parity`); the bootstrap gap is a same-solver-path roundoff concern and not a user-visible correctness bug. | `rust/src/trop.rs::estimate_model`, `rust/src/linalg.rs::solve_wls_svd` | follow-up | Low |
| Rust multiplier-bootstrap weight RNG (`generate_bootstrap_weights_batch` in `rust/src/bootstrap.rs:9-10, 57-75`) uses `Xoshiro256PlusPlus::seed_from_u64(seed + i)` per row for Rademacher/Mammen/Webb generation. If any Python caller (SDID / efficient-DiD multiplier bootstrap) has a numpy-canonical equivalent, the two backends likely diverge under the same seed. Audit Python callers (`diff_diff/sdid.py`, `diff_diff/efficient_did_bootstrap.py`, `diff_diff/bootstrap_utils.py::generate_bootstrap_weights_batch_numpy`) for parity-test gaps. Same fix shape as TROP RNG parity (PR #354): pre-generate weights in Python via numpy and pass them to Rust through PyO3. | `rust/src/bootstrap.rs`, `diff_diff/bootstrap_utils.py` | follow-up | Medium |
| `bias_corrected_local_linear`: extend golden parity to `kernel="triangular"` and `kernel="uniform"` (currently epa-only; all three kernels share `kernel_W` and the `lprobust` math, so parity is expected but not separately asserted). | `benchmarks/R/generate_nprobust_lprobust_golden.R`, `tests/test_bias_corrected_lprobust.py` | Phase 1c | Low |
| `bias_corrected_local_linear`: expose `vce in {"hc0", "hc1", "hc2", "hc3"}` on the public wrapper once R parity goldens exist (currently raises `NotImplementedError`). The port-level `lprobust` and `lprobust_res` already support all four; expanding the public surface requires a golden generator for each hc mode and a decision on hc2/hc3 q-fit leverage (R reuses p-fit `hii` for q-fit residuals; whether to match that or stage-match deserves a derivation before the wrapper advertises CCT-2014 conformance). | `diff_diff/local_linear.py::bias_corrected_local_linear`, `benchmarks/R/generate_nprobust_lprobust_golden.R`, `tests/test_bias_corrected_lprobust.py` | Phase 1c | Medium |
| `bias_corrected_local_linear`: support `weights=` once survey-design adaptation lands. nprobust's `lprobust` has no weight argument so there is no parity anchor; derivation needed. | `diff_diff/local_linear.py`, `diff_diff/_nprobust_port.py::lprobust` | Phase 1c | Medium |
Expand Down
1 change: 0 additions & 1 deletion diff_diff/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@
# Import backend detection from dedicated module (avoids circular imports)
from diff_diff._backend import (
HAS_RUST_BACKEND,
_rust_bootstrap_weights,
_rust_compute_robust_vcov,
_rust_project_simplex,
_rust_solve_ols,
Expand Down
4 changes: 0 additions & 4 deletions diff_diff/_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@
# Try to import Rust backend for accelerated operations
try:
from diff_diff._rust_backend import (
generate_bootstrap_weights_batch as _rust_bootstrap_weights,
project_simplex as _rust_project_simplex,
solve_ols as _rust_solve_ols,
compute_robust_vcov as _rust_compute_robust_vcov,
Expand All @@ -44,7 +43,6 @@
_rust_available = True
except ImportError:
_rust_available = False
_rust_bootstrap_weights = None
_rust_project_simplex = None
_rust_solve_ols = None
_rust_compute_robust_vcov = None
Expand All @@ -69,7 +67,6 @@
if _backend_env == "python":
# Force pure Python mode - disable Rust even if available
HAS_RUST_BACKEND = False
_rust_bootstrap_weights = None
_rust_project_simplex = None
_rust_solve_ols = None
_rust_compute_robust_vcov = None
Expand Down Expand Up @@ -120,7 +117,6 @@ def rust_backend_info():
__all__ = [
"HAS_RUST_BACKEND",
"rust_backend_info",
"_rust_bootstrap_weights",
"_rust_project_simplex",
"_rust_solve_ols",
"_rust_compute_robust_vcov",
Expand Down
44 changes: 10 additions & 34 deletions diff_diff/bootstrap_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,6 @@

import numpy as np

from diff_diff._backend import HAS_RUST_BACKEND, _rust_bootstrap_weights

__all__ = [
"generate_bootstrap_weights",
"generate_bootstrap_weights_batch",
Expand Down Expand Up @@ -168,38 +166,11 @@ def generate_bootstrap_weights_batch(
"""
Generate all bootstrap weights at once (vectorized).

Uses Rust backend if available for parallel generation.

Parameters
----------
n_bootstrap : int
Number of bootstrap iterations.
n_units : int
Number of units (clusters) to generate weights for.
weight_type : str
Type of weights: "rademacher", "mammen", or "webb".
rng : np.random.Generator
Random number generator.

Returns
-------
np.ndarray
Array of bootstrap weights with shape (n_bootstrap, n_units).
"""
if HAS_RUST_BACKEND and _rust_bootstrap_weights is not None:
seed = rng.integers(0, 2**63 - 1)
return _rust_bootstrap_weights(n_bootstrap, n_units, weight_type, seed)
return generate_bootstrap_weights_batch_numpy(n_bootstrap, n_units, weight_type, rng)


def generate_bootstrap_weights_batch_numpy(
n_bootstrap: int,
n_units: int,
weight_type: str,
rng: np.random.Generator,
) -> np.ndarray:
"""
NumPy fallback implementation of :func:`generate_bootstrap_weights_batch`.
Output is a deterministic function of the supplied ``rng`` state,
so a single ``seed`` produces identical multiplier-bootstrap weights
regardless of whether the Rust backend is compiled in. (Library
call sites seed via ``np.random.default_rng`` — PCG64 — but any
``np.random.Generator`` is accepted.)

Parameters
----------
Expand Down Expand Up @@ -243,6 +214,11 @@ def generate_bootstrap_weights_batch_numpy(
)


# Alias kept for backward compatibility with callers that imported the
# explicit numpy variant; both names resolve to the same implementation.
generate_bootstrap_weights_batch_numpy = generate_bootstrap_weights_batch


def compute_percentile_ci(
boot_dist: np.ndarray,
alpha: float,
Expand Down
2 changes: 0 additions & 2 deletions rust/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,6 @@ openblas = ["ndarray/blas"]
pyo3 = "0.28"
numpy = "0.28"
ndarray = { version = "0.17", features = ["rayon"] }
rand = "0.8"
rand_xoshiro = "0.6"
rayon = "1.8"

# Pure Rust linear algebra for SVD/matrix inversion (no external deps).
Expand Down
Loading
Loading