diff --git a/TODO.md b/TODO.md index 5893cffa..94612417 100644 --- a/TODO.md +++ b/TODO.md @@ -83,7 +83,6 @@ Deferred items from PR reviews that were not addressed before merge. | Weighted CR2 Bell-McCaffrey cluster-robust (`vcov_type="hc2_bm"` + `cluster_ids` + `weights`) currently raises `NotImplementedError`. Weighted hat matrix and residual rebalancing need threading per clubSandwich WLS handling. | `linalg.py::_compute_cr2_bm` | Phase 1a | Medium | | Regenerate `benchmarks/data/clubsandwich_cr2_golden.json` from R (`Rscript benchmarks/R/generate_clubsandwich_golden.R`). Current JSON has `source: python_self_reference` as a stability anchor until an authoritative R run. | `benchmarks/R/generate_clubsandwich_golden.R` | Phase 1a | Medium | | `honest_did.py:1907` `np.linalg.solve(A_sys, b_sys) / except LinAlgError: continue` is a silent basis-rejection in the vertex-enumeration loop that is algorithmically intentional (try the next basis). Consider surfacing a count of rejected bases as a diagnostic when ARP enumeration exhausts, so users see when the vertex search was heavily constrained. Not a silent failure in the sense of the Phase 2 audit (the algorithm is supposed to skip), but the diagnostic would help debug borderline cases. | `honest_did.py` | #334 | Low | -| TROP Rust vs Python grid-search divergence on rank-deficient Y: on two near-parallel control units, LOOCV grid-search ATT diverges ~6% between Rust (`trop_global.py:688`) and Python fallback (`trop_global.py:753`). Either grid-winner ties are broken differently or the per-λ solver reaches different stationary points under rank deficiency. Audit finding #23 flagged this surface. `@pytest.mark.xfail(strict=True)` in `tests/test_rust_backend.py::TestTROPRustEdgeCaseParity::test_grid_search_rank_deficient_Y` baselines the gap. | `trop_global.py`, `rust/` | follow-up | Medium | | TROP Rust vs Python bootstrap SE divergence under fixed seed: `seed=42` on a tiny panel produces ~28% bootstrap-SE gap. Root cause: Rust bootstrap uses its own RNG (`rand` crate) while Python uses `numpy.random.default_rng`; same seed value maps to different bytestreams across backends. Audit axis-H (RNG/seed) adjacent. `@pytest.mark.xfail(strict=True)` in `tests/test_rust_backend.py::TestTROPRustEdgeCaseParity::test_bootstrap_seed_reproducibility` baselines the gap. Unifying RNG (threading a numpy-generated seed-sequence into Rust, or porting Python to ChaCha) would close it. | `trop_global.py`, `rust/` | follow-up | Medium | | `bias_corrected_local_linear`: extend golden parity to `kernel="triangular"` and `kernel="uniform"` (currently epa-only; all three kernels share `kernel_W` and the `lprobust` math, so parity is expected but not separately asserted). | `benchmarks/R/generate_nprobust_lprobust_golden.R`, `tests/test_bias_corrected_lprobust.py` | Phase 1c | Low | | `bias_corrected_local_linear`: expose `vce in {"hc0", "hc1", "hc2", "hc3"}` on the public wrapper once R parity goldens exist (currently raises `NotImplementedError`). The port-level `lprobust` and `lprobust_res` already support all four; expanding the public surface requires a golden generator for each hc mode and a decision on hc2/hc3 q-fit leverage (R reuses p-fit `hii` for q-fit residuals; whether to match that or stage-match deserves a derivation before the wrapper advertises CCT-2014 conformance). | `diff_diff/local_linear.py::bias_corrected_local_linear`, `benchmarks/R/generate_nprobust_lprobust_golden.R`, `tests/test_bias_corrected_lprobust.py` | Phase 1c | Medium | diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index dcc1e209..4db62fb4 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -2059,6 +2059,10 @@ Treatment effects are **heterogeneous** per-observation values. ATT is their mea 1. **Without low-rank (λ_nn = ∞)**: Standard weighted least squares - Build design matrix with unit/time dummies (no treatment indicator) - Solve via np.linalg.lstsq for (μ, α, β) using (1-W)-masked weights + - Both the Python fallback and the Rust acceleration path use SVD-based + minimum-norm least squares with numpy-compatible rcond = eps × max(n, k), + so they return the canonical minimum-norm solution on rank-deficient Y + (e.g., two near-parallel control units) 2. **With low-rank (finite λ_nn)**: Alternating minimization - Alternate between: diff --git a/rust/src/linalg.rs b/rust/src/linalg.rs index 8d4c5854..002b2b2f 100644 --- a/rust/src/linalg.rs +++ b/rust/src/linalg.rs @@ -277,7 +277,7 @@ fn compute_robust_vcov_internal( } /// Convert ndarray Array2 to faer Mat -fn ndarray_to_faer(arr: &Array2) -> faer::Mat { +pub(crate) fn ndarray_to_faer(arr: &Array2) -> faer::Mat { let nrows = arr.nrows(); let ncols = arr.ncols(); faer::Mat::from_fn(nrows, ncols, |i, j| arr[[i, j]]) diff --git a/rust/src/trop.rs b/rust/src/trop.rs index 4257d4a8..b1f52e5c 100644 --- a/rust/src/trop.rs +++ b/rust/src/trop.rs @@ -14,6 +14,8 @@ use numpy::{PyArray1, PyArray2, PyReadonlyArray1, PyReadonlyArray2, ToPyArray}; use pyo3::prelude::*; use rayon::prelude::*; +use crate::linalg::ndarray_to_faer; + /// Minimum chunk size for parallel distance computation. /// Reduces scheduling overhead for small matrices. const MIN_CHUNK_SIZE: usize = 16; @@ -1233,111 +1235,164 @@ fn solve_joint_no_lowrank( y: &ArrayView2, delta: &ArrayView2, ) -> Option<(f64, Array1, Array1)> { + // SVD-based minimum-norm weighted least-squares fit — mirrors Python's + // `_solve_global_no_lowrank` at `diff_diff/trop_global.py:340-412` + // step-for-step so Rust and Python paths produce the same canonical + // solution on rank-deficient Y (silent-failures finding #23). + // + // Model: Y_it = μ + α_i + β_t + ε_it, with α_0 = β_0 = 0 for + // identification. Weights: δ_it. Flatten row-major with + // idx = t * n_units + i (matches Python's Y.flatten() C-order). let n_periods = y.nrows(); let n_units = y.ncols(); + let n_obs = n_periods * n_units; + let n_params = 1 + (n_units - 1) + (n_periods - 1); - // We solve using normal equations with the design matrix structure - // Rather than build full X matrix, use block structure for efficiency - // - // The model: Y_it = μ + α_i + β_t + ε_it - // With identification: α_0 = β_0 = 0 - - // Compute weighted sums needed for normal equations + // Flatten y + weights with NaN masking — matches trop_global.py:354-360. + let mut y_flat = Array1::::zeros(n_obs); + let mut w_flat = Array1::::zeros(n_obs); let mut sum_w = 0.0; - let mut sum_wy = 0.0; - - // Per-unit and per-period weighted sums - let mut sum_w_by_unit = Array1::::zeros(n_units); - let mut sum_wy_by_unit = Array1::::zeros(n_units); - let mut sum_w_by_period = Array1::::zeros(n_periods); - let mut sum_wy_by_period = Array1::::zeros(n_periods); - for t in 0..n_periods { for i in 0..n_units { - // NaN outcomes get zero weight (not imputed to 0.0 with active weight) - let w = if y[[t, i]].is_finite() { delta[[t, i]] } else { 0.0 }; - let y_ti = if y[[t, i]].is_finite() { y[[t, i]] } else { 0.0 }; - + let idx = t * n_units + i; + let y_ti = y[[t, i]]; + let w_ti = delta[[t, i]]; + let valid = y_ti.is_finite() && w_ti.is_finite(); + let w = if valid { w_ti.max(0.0) } else { 0.0 }; + y_flat[idx] = if valid { y_ti } else { 0.0 }; + w_flat[idx] = w; sum_w += w; - sum_wy += w * y_ti; - - sum_w_by_unit[i] += w; - sum_wy_by_unit[i] += w * y_ti; - sum_w_by_period[t] += w; - sum_wy_by_period[t] += w * y_ti; } } + // All-zero weights short-circuit — matches trop_global.py:366. if sum_w < 1e-10 { return None; } - // Use iterative approach: alternate between (alpha, beta) and mu - // until convergence (simpler than full normal equations) - let mut mu = sum_wy / sum_w; + // Build design matrix X = [intercept | unit_dummies[1..] | time_dummies[1..]] + // — matches trop_global.py:374-385. Explicit nested loops so the + // index correspondence with Python is unambiguous. + let mut x = Array2::::zeros((n_obs, n_params)); + for t in 0..n_periods { + for i in 0..n_units { + let idx = t * n_units + i; + x[[idx, 0]] = 1.0; // intercept + if i >= 1 { + x[[idx, i]] = 1.0; // unit dummy (unit 0 dropped) + } + if t >= 1 { + x[[idx, (n_units - 1) + t]] = 1.0; // time dummy (period 0 dropped) + } + } + } + + // Apply sqrt-weights: X_w = X * sqrt(w)[:, None], y_w = y * sqrt(w). + // Matches trop_global.py:387-389. + let sqrt_w: Array1 = w_flat.mapv(|w| w.sqrt()); + for r in 0..n_obs { + let s = sqrt_w[r]; + for c in 0..n_params { + x[[r, c]] *= s; + } + y_flat[r] *= s; + } + + // Solve via SVD with numpy-compatible rcond truncation. + let coeffs = solve_wls_svd(&x.view(), &y_flat.view())?; + + // Unpack: mu = coeffs[0], alpha[1..] = coeffs[1..n_units], + // beta[1..] = coeffs[n_units..]. Matches trop_global.py:406-410. + let mu = coeffs[0]; let mut alpha = Array1::::zeros(n_units); + for i in 1..n_units { + alpha[i] = coeffs[i]; + } let mut beta = Array1::::zeros(n_periods); + for t in 1..n_periods { + beta[t] = coeffs[(n_units - 1) + t]; + } - for _ in 0..50 { - let mu_old = mu; - let alpha_old = alpha.clone(); - let beta_old = beta.clone(); + Some((mu, alpha, beta)) +} - // Update alpha (fixing beta, mu) - for i in 1..n_units { // α_0 = 0 for identification - if sum_w_by_unit[i] > 1e-10 { - let mut num = 0.0; - for t in 0..n_periods { - // NaN outcomes get zero weight - let w = if y[[t, i]].is_finite() { delta[[t, i]] } else { 0.0 }; - let y_ti = if y[[t, i]].is_finite() { y[[t, i]] } else { 0.0 }; - num += w * (y_ti - mu - beta[t]); - } - alpha[i] = num / sum_w_by_unit[i]; - } - } +/// Minimum-norm least-squares solution via faer thin SVD with rcond truncation. +/// +/// Mirrors `np.linalg.lstsq(A, b, rcond=None)` from numpy: singular values +/// below `rcond * max(S)` with `rcond = eps * max(n_rows, n_cols)` are +/// treated as zero. On rank-deficient A this returns the unique +/// minimum-norm least-squares solution. +/// +/// This helper intentionally does NOT reuse `rust/src/linalg.rs::solve_ols` +/// because `solve_ols` hard-codes `rcond = 1e-7` (R's `lm()` default) which +/// would truncate singular values that numpy's default keeps. TROP's +/// canonical Python path is numpy-compatible; Rust must match. +/// +/// Returns `None` only when the SVD itself fails (rare on finite inputs); +/// the caller (LOOCV / FISTA / bootstrap) interprets `None` as an +/// unsuccessful fit. +fn solve_wls_svd(a: &ArrayView2, b: &ArrayView1) -> Option> { + let n_rows = a.nrows(); + let n_cols = a.ncols(); + let a_owned = a.to_owned(); + let b_owned = b.to_owned(); - // Update beta (fixing alpha, mu) - for t in 1..n_periods { // β_0 = 0 for identification - if sum_w_by_period[t] > 1e-10 { - let mut num = 0.0; - for i in 0..n_units { - // NaN outcomes get zero weight - let w = if y[[t, i]].is_finite() { delta[[t, i]] } else { 0.0 }; - let y_ti = if y[[t, i]].is_finite() { y[[t, i]] } else { 0.0 }; - num += w * (y_ti - mu - alpha[i]); - } - beta[t] = num / sum_w_by_period[t]; - } + // Convert ndarray -> faer for SVD. + let a_faer = ndarray_to_faer(&a_owned); + + let svd = a_faer.thin_svd().ok()?; + + let u_faer = svd.U(); + let s_diag = svd.S(); + let s_col = s_diag.column_vector(); + let v_faer = svd.V(); + + // Extract U (n_rows x min(n,k)) back to ndarray. + let u_rows = u_faer.nrows(); + let u_cols = u_faer.ncols(); + let mut u = Array2::::zeros((u_rows, u_cols)); + for i in 0..u_rows { + for j in 0..u_cols { + u[[i, j]] = u_faer[(i, j)]; } + } - // Update mu (fixing alpha, beta) - let mut num_mu = 0.0; - for t in 0..n_periods { - for i in 0..n_units { - // NaN outcomes get zero weight - let w = if y[[t, i]].is_finite() { delta[[t, i]] } else { 0.0 }; - let y_ti = if y[[t, i]].is_finite() { y[[t, i]] } else { 0.0 }; - num_mu += w * (y_ti - alpha[i] - beta[t]); - } + // Extract singular values. + let s_len = s_col.nrows(); + let mut s = Array1::::zeros(s_len); + for i in 0..s_len { + s[i] = s_col[i]; + } + + // Extract V (k x min(n,k)) back to ndarray. faer's V is not V^T. + let v_rows = v_faer.nrows(); + let v_cols = v_faer.ncols(); + let mut v = Array2::::zeros((v_rows, v_cols)); + for i in 0..v_rows { + for j in 0..v_cols { + v[[i, j]] = v_faer[(i, j)]; } - mu = num_mu / sum_w; - - // Check convergence across ALL parameters (not just mu) - let mu_diff = (mu - mu_old).abs(); - let alpha_diff = alpha.iter().zip(alpha_old.iter()) - .map(|(a, b)| (a - b).abs()) - .fold(0.0_f64, f64::max); - let beta_diff = beta.iter().zip(beta_old.iter()) - .map(|(a, b)| (a - b).abs()) - .fold(0.0_f64, f64::max); - let max_diff = mu_diff.max(alpha_diff).max(beta_diff); - if max_diff < 1e-8 { - break; + } + + // numpy rcond = eps * max(n_rows, n_cols); truncate s[i] <= rcond * max(s). + let rcond = f64::EPSILON * (n_rows.max(n_cols) as f64); + let s_max = s.iter().cloned().fold(0.0_f64, f64::max); + let threshold = s_max * rcond; + + // Compute β = V * S^{-1}_truncated * U^T * y. + let uty = u.t().dot(&b_owned); // (min(n,k),) + let mut s_inv_uty = Array1::::zeros(s_len); + for i in 0..s_len { + if s[i] > threshold { + s_inv_uty[i] = uty[i] / s[i]; } + // else: leave 0 — this is the pseudo-inverse / minimum-norm step + // that also covers Python's `except LinAlgError: pinv(...)` fallback + // tier, since faer thin_svd is numerically robust on finite inputs. } + let coeffs = v.dot(&s_inv_uty); - Some((mu, alpha, beta)) + Some(coeffs) } /// Solve global TWFE + low-rank via alternating minimization (no tau). diff --git a/tests/test_rust_backend.py b/tests/test_rust_backend.py index 360c152d..228565d6 100644 --- a/tests/test_rust_backend.py +++ b/tests/test_rust_backend.py @@ -2204,23 +2204,18 @@ def _make_correlated_panel(n_units=6, n_periods=5, n_treated=2): data.append({"unit": i, "time": t, "outcome": y, "treated": treated}) return pd.DataFrame(data) - @pytest.mark.xfail( - strict=True, - reason="TROP Rust grid-search at trop_global.py:688 and Python fallback " - "at trop_global.py:753 diverge on rank-deficient Y: empirical ATT " - "gap ~6% on two near-parallel control units. Either (a) grid-winner " - "ties break differently between backends, or (b) the per-λ solver " - "itself reaches different stationary points under rank deficiency. " - "Finding #23 flagged this exact surface as the Phase-2 gap. Rust " - "vs Python unification is a P1 follow-up (TODO.md). This xfail " - "baselines the divergence so we notice when/if the backends align.", - ) def test_grid_search_rank_deficient_Y(self): """Grid-search ATT parity on rank-deficient Y. - Known to fail: two near-parallel control units produce ~6% ATT - divergence between Rust and Python LOOCV grid-search. See xfail - reason for follow-up plan. + Silent-failures audit Finding #23 (grid-search half) regression + guard. Previously a ~6% ATT divergence on two near-parallel + control units because the Rust inner solver used iterative block + coordinate descent while the Python fallback used SVD-based + minimum-norm least squares. Fixed by porting the Rust inner + solver to an SVD-based WLS path (numpy-compatible + rcond = eps*max(n,k)) that mirrors Python's + `np.linalg.lstsq(rcond=None)` step-for-step. This test asserts + the backends now agree at atol=1e-6 on rank-deficient Y. """ import sys from unittest.mock import patch