Unify Rust TROP inner solver to SVD (close finding #23 grid-search divergence)#348
Merged
Unify Rust TROP inner solver to SVD (close finding #23 grid-search divergence)#348
Conversation
…vergence Closes the grid-search half of silent-failures finding #23 (TODO row 87). The `xfail(strict=True)` regression `test_grid_search_rank_deficient_Y` baselined a ~6% ATT divergence between Rust and Python on two near-parallel control units. Root cause: Rust's `solve_joint_no_lowrank` used iterative block coordinate descent (50 iter, tol=1e-8) while Python used SVD-based minimum-norm least squares. On rank-deficient Y the two solvers converge to different stationary points of the same objective. Python is canonical (SVD / minimum-norm least squares per Golub & Van Loan). Rust's iterative solver was a speed optimization, not a methodology choice. Port the Rust inner TWFE step to SVD-based WLS that mirrors Python's `np.linalg.lstsq(rcond=None)` step-for-step, with numpy-compatible `rcond = eps * max(n, k)`. Changes - rust/src/linalg.rs: promote ndarray_to_faer to pub(crate) so trop.rs can reuse it. - rust/src/trop.rs: new module-private solve_wls_svd helper — thin-SVD + rcond truncation, matches numpy's minimum-norm semantics. Rewrite solve_joint_no_lowrank body to flatten y/weights row-major, build the [intercept | unit_dummies[1..] | time_dummies[1..]] design matrix, apply sqrt-weights, and solve via solve_wls_svd. Function signature unchanged — all 4 call sites (LOOCV, FISTA TWFE step x2, bootstrap) benefit transitively. - tests/test_rust_backend.py: remove @pytest.mark.xfail from test_grid_search_rank_deficient_Y; the gap is closed. Bootstrap-seed test retains its xfail (row 87 RNG mismatch, out of scope). - docs/methodology/REGISTRY.md: update the TROP Global Estimation bullet at the existing `np.linalg.lstsq` line to note Rust and Python now both use SVD-based minimum-norm WLS with numpy-compatible rcond. - TODO.md: delete row 87 (grid-search divergence entry). Verification - maturin develop --release --features accelerate: clean build, no warnings. - pytest tests/test_rust_backend.py::TestTROPRustEdgeCaseParity: grid-search test now passes; bootstrap-seed test correctly xfails. - pytest tests/test_rust_backend.py -k TROP -m '': 23 passed, 1 xfailed, no regressions. - pytest tests/test_trop.py: 83 passed, 37 deselected (slow). - TestTROPGlobalRustVsNumpy (incl. lambda_nn=0 low-rank FISTA path): 8 passed — FISTA TWFE step unchanged in behavior on well-conditioned data. - grep for other 'for _ in 0..50' coordinate-descent patterns in rust/src/*.rs: none found. Non-goals - No changes to row 87 (bootstrap RNG mismatch — Rust rand crate vs numpy default_rng ~28% SE gap on seed=42). Separate PR. - No changes to linalg.rs::solve_ols (rcond=1e-7 is load-bearing for MultiPeriodDiD / DiD / TWFE). - No public API changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…t docstring After this PR deletes the old row 87 from TODO.md, row 87 now points to a different item. Replace the row-number breadcrumb with "Silent-failures audit Finding #23 (grid-search half)" which is stable across future TODO.md reshuffles. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Owner
Author
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
igerber
added a commit
that referenced
this pull request
Apr 24, 2026
Rust and Python TROP backends produced different bootstrap standard errors for the same `seed` value. On a tiny correlated panel under `seed=42` the gap was ~28% of SE: Rust seeded `rand_xoshiro:: Xoshiro256PlusPlus` per replicate while Python's fallback consumed `numpy.random.default_rng` (PCG64), so identical seeds mapped to different bytestreams. Canonicalize on numpy. New `stratified_bootstrap_indices` helper in `diff_diff/bootstrap_utils.py` pre-generates per-replicate (control, treated) positional index arrays from a numpy `Generator` and hands them to both backends through the PyO3 surface — both Rust bootstrap functions (`bootstrap_trop_variance_global`, `bootstrap_trop_variance`) now accept `control_indices` and `treated_indices` as `i64` arrays in place of `seed: u64`. Parallelism is preserved. Sampling law (stratified: controls then treated, with replacement) is unchanged. Global-method SE is now backend-invariant under the same seed to machine precision: the prior `xfail(strict=True)` in `test_bootstrap_seed_reproducibility` is flipped to a passing `assert_allclose(atol=rtol=1e-14)` and parametrized over `[0, 42, 12345]`. A companion `test_bootstrap_seed_reproducibility_local` is added for the local-method bootstrap. It is currently `xfail(strict=True)` because aligning the RNG exposed two separate local-method backend divergences beyond this PR's scope: Rust's `compute_weight_matrix` normalizes time and unit weights to sum to 1, while Python's `_compute_observation_weights` does not; and the Python fallback's `_compute_observation_weights(_precomputed branch)` reads the original-panel cached `Y`/`D` instead of the bootstrap-sample arguments. Both are tracked as follow-up rows in `TODO.md` with file:line pointers and will land in a separate methodology PR. Closes the bootstrap half of silent-failures audit finding #23 (the grid-search half closed in PR #348). Reference: Athey, Imbens, Qu & Viviano (2025), "Triply Robust Panel Estimators", Algorithm 3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
Apr 24, 2026
…+ Python cache-fallthrough Closes the local-method half of silent-failures audit finding #23 (RNG half closed in PR #354; grid-search half in PR #348). Two methodology fixes, both isolated to the local-method path — global is unaffected. 1. Rust weight-matrix normalization removed ------------------------------------------ `rust/src/trop.rs::compute_weight_matrix` no longer divides `time_weights` and `unit_weights` by their respective sums before the outer product. The paper's Equation 2/3 (Athey, Imbens, Qu, Viviano 2025) and REGISTRY.md Requirements checklist (`[x] Unit weights: exp(-λ_unit × distance) (unnormalized, matching Eq. 2)`) both specify raw-exponential weights; Python's `_compute_observation_weights` was already REGISTRY-compliant. Rust's normalization inflated the effective nuclear-norm penalty relative to the data-fit term, changing the regularization trade-off. User-visible effect: Rust local-method ATT values may shift for fits with `lambda_nn < infinity`. For `lambda_nn = infinity` (factor model disabled) outputs are unchanged — uniform weight scaling leaves the minimum-norm WLS argmin invariant. Rust LOOCV-selected lambdas may also shift on that boundary; both backends now converge on the same selection. Affects both local-method Rust call sites (LOOCV at trop.rs:459, bootstrap at trop.rs:1096). 2. Python `_compute_observation_weights` cache-fallthrough removed --------------------------------------------------------------- Removed the `if self._precomputed is not None:` branch that silently substituted `self._precomputed["Y"]` / `["D"]` / `["time_dist_matrix"]` (original-panel cache populated during main fit) for the function-argument `Y, D`. Under bootstrap, `_fit_with_fixed_lambda` computes fresh `Y, D` from the resampled `boot_data` and passes them in; the helper was discarding those and recomputing unit distances from the original panel, so Python's local bootstrap resampled units but reused stale unit-distance weights. Rust's bootstrap was already correct (always consumed `y_boot, d_boot`). Test changes ------------ - `tests/test_rust_backend.py::TestTROPRustEdgeCaseParity:: test_bootstrap_seed_reproducibility_local`: flipped from `xfail(strict=True)` to passing `assert_allclose` at `atol=1e-5` across seeds `[0, 42, 12345]`. Residual ~1e-7 gap is Rust `estimate_model` vs numpy `lstsq` roundoff that accumulates differently across per-replicate bootstrap fits; follow-up TODO row tracks unifying Rust to the `solve_wls_svd` path (same SVD helper the global-method uses since PR #348) for sub-1e-14 parity. - New `test_local_method_main_fit_parity`: parametrized over `(lambda_nn=inf, atol=1e-14)` and `(lambda_nn=0.1, atol=1e-10)`; asserts `atol=1e-14` bit-identity for the main-fit ATT at `lambda_nn=inf` (the regression guard for the normalization fix) and `atol=1e-10` for the finite-`lambda_nn` FISTA path. Verification ------------ Targeted regression sweep — all green: - 9 `TestTROPRustEdgeCaseParity` tests (grid-search + global bootstrap × 3 seeds + local bootstrap × 3 seeds + local main-fit × 2 regimes) - Full `test_rust_backend.py` suite: 92 passed - Full `test_trop.py` suite under Rust backend: 120 passed Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
Apr 24, 2026
P3 — the PR #354 [Unreleased] Fixed entry (line 21) said local-method bit-identity SE remained blocked by the Rust-normalization and Python cache-fallthrough divergences and was "tracked as a follow-up in TODO.md." With the two TROP-local Fixed entries that this PR adds (lines 22-27) closing exactly those divergences, the PR #354 tail sentence is now internally inconsistent with the surrounding entries. Rewritten to say the RNG half of finding #23 is closed here (bootstrap contract), grid-search half was closed in PR #348, and the local- method methodology half is closed by the two Fixed entries that follow in the same release. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
rust/src/trop.rs::solve_joint_no_lowrank) from iterative block coordinate descent to SVD-based minimum-norm weighted least squares, mirroring Python'snp.linalg.lstsq(rcond=None)step-for-step with numpy-compatiblercond = eps * max(n, k).test_grid_search_rank_deficient_Yis no longer xfailed.randvs numpydefault_rng) is a separate root cause — tracked inTODO.mdrow 87 for a future PR.Methodology references (required if estimator / math changes)
REGISTRY.mdTROP Global Estimation bullet updated in place (near line 2061) to spell out that both backends now use SVD-based minimum-norm WLS with numpy-compatible rcond.Validation
tests/test_rust_backend.py::TestTROPRustEdgeCaseParity::test_grid_search_rank_deficient_Yxfail decorator removed; test now asserts Rust/Python parity atatol=1e-6on rank-deficient Y.maturin develop --release --features accelerate: clean build, no warnings.pytest tests/test_rust_backend.py::TestTROPRustEdgeCaseParity: 1 passed (grid-search), 1 xfailed (bootstrap-seed, out of scope).pytest tests/test_rust_backend.py -k TROP -m '': 23 passed, 1 xfailed, no regressions.pytest tests/test_trop.py: 83 passed, 37 deselected (slow-marked).TestTROPGlobalRustVsNumpy(includeslambda_nn=0low-rank FISTA path): 8 passed. FISTA TWFE step behavior preserved on well-conditioned data.for _ in 0..50inrust/src/*.rs: no other iterative coordinate-descent patterns that would need similar treatment.Security / privacy
Diff: +148 / −96 across
rust/src/trop.rs,rust/src/linalg.rs,tests/test_rust_backend.py,TODO.md,docs/methodology/REGISTRY.md.