Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,6 @@ Deferred items from PR reviews that were not addressed before merge.
| Weighted CR2 Bell-McCaffrey cluster-robust (`vcov_type="hc2_bm"` + `cluster_ids` + `weights`) currently raises `NotImplementedError`. Weighted hat matrix and residual rebalancing need threading per clubSandwich WLS handling. | `linalg.py::_compute_cr2_bm` | Phase 1a | Medium |
| Regenerate `benchmarks/data/clubsandwich_cr2_golden.json` from R (`Rscript benchmarks/R/generate_clubsandwich_golden.R`). Current JSON has `source: python_self_reference` as a stability anchor until an authoritative R run. | `benchmarks/R/generate_clubsandwich_golden.R` | Phase 1a | Medium |
| `honest_did.py:1907` `np.linalg.solve(A_sys, b_sys) / except LinAlgError: continue` is a silent basis-rejection in the vertex-enumeration loop that is algorithmically intentional (try the next basis). Consider surfacing a count of rejected bases as a diagnostic when ARP enumeration exhausts, so users see when the vertex search was heavily constrained. Not a silent failure in the sense of the Phase 2 audit (the algorithm is supposed to skip), but the diagnostic would help debug borderline cases. | `honest_did.py` | #334 | Low |
| TROP Rust vs Python grid-search divergence on rank-deficient Y: on two near-parallel control units, LOOCV grid-search ATT diverges ~6% between Rust (`trop_global.py:688`) and Python fallback (`trop_global.py:753`). Either grid-winner ties are broken differently or the per-λ solver reaches different stationary points under rank deficiency. Audit finding #23 flagged this surface. `@pytest.mark.xfail(strict=True)` in `tests/test_rust_backend.py::TestTROPRustEdgeCaseParity::test_grid_search_rank_deficient_Y` baselines the gap. | `trop_global.py`, `rust/` | follow-up | Medium |
| TROP Rust vs Python bootstrap SE divergence under fixed seed: `seed=42` on a tiny panel produces ~28% bootstrap-SE gap. Root cause: Rust bootstrap uses its own RNG (`rand` crate) while Python uses `numpy.random.default_rng`; same seed value maps to different bytestreams across backends. Audit axis-H (RNG/seed) adjacent. `@pytest.mark.xfail(strict=True)` in `tests/test_rust_backend.py::TestTROPRustEdgeCaseParity::test_bootstrap_seed_reproducibility` baselines the gap. Unifying RNG (threading a numpy-generated seed-sequence into Rust, or porting Python to ChaCha) would close it. | `trop_global.py`, `rust/` | follow-up | Medium |
| `bias_corrected_local_linear`: extend golden parity to `kernel="triangular"` and `kernel="uniform"` (currently epa-only; all three kernels share `kernel_W` and the `lprobust` math, so parity is expected but not separately asserted). | `benchmarks/R/generate_nprobust_lprobust_golden.R`, `tests/test_bias_corrected_lprobust.py` | Phase 1c | Low |
| `bias_corrected_local_linear`: expose `vce in {"hc0", "hc1", "hc2", "hc3"}` on the public wrapper once R parity goldens exist (currently raises `NotImplementedError`). The port-level `lprobust` and `lprobust_res` already support all four; expanding the public surface requires a golden generator for each hc mode and a decision on hc2/hc3 q-fit leverage (R reuses p-fit `hii` for q-fit residuals; whether to match that or stage-match deserves a derivation before the wrapper advertises CCT-2014 conformance). | `diff_diff/local_linear.py::bias_corrected_local_linear`, `benchmarks/R/generate_nprobust_lprobust_golden.R`, `tests/test_bias_corrected_lprobust.py` | Phase 1c | Medium |
Expand Down
4 changes: 4 additions & 0 deletions docs/methodology/REGISTRY.md
Original file line number Diff line number Diff line change
Expand Up @@ -2059,6 +2059,10 @@ Treatment effects are **heterogeneous** per-observation values. ATT is their mea
1. **Without low-rank (λ_nn = ∞)**: Standard weighted least squares
- Build design matrix with unit/time dummies (no treatment indicator)
- Solve via np.linalg.lstsq for (μ, α, β) using (1-W)-masked weights
- Both the Python fallback and the Rust acceleration path use SVD-based
minimum-norm least squares with numpy-compatible rcond = eps × max(n, k),
so they return the canonical minimum-norm solution on rank-deficient Y
(e.g., two near-parallel control units)

2. **With low-rank (finite λ_nn)**: Alternating minimization
- Alternate between:
Expand Down
2 changes: 1 addition & 1 deletion rust/src/linalg.rs
Original file line number Diff line number Diff line change
Expand Up @@ -277,7 +277,7 @@ fn compute_robust_vcov_internal(
}

/// Convert ndarray Array2 to faer Mat
fn ndarray_to_faer(arr: &Array2<f64>) -> faer::Mat<f64> {
pub(crate) fn ndarray_to_faer(arr: &Array2<f64>) -> faer::Mat<f64> {
let nrows = arr.nrows();
let ncols = arr.ncols();
faer::Mat::from_fn(nrows, ncols, |i, j| arr[[i, j]])
Expand Down
215 changes: 135 additions & 80 deletions rust/src/trop.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ use numpy::{PyArray1, PyArray2, PyReadonlyArray1, PyReadonlyArray2, ToPyArray};
use pyo3::prelude::*;
use rayon::prelude::*;

use crate::linalg::ndarray_to_faer;

/// Minimum chunk size for parallel distance computation.
/// Reduces scheduling overhead for small matrices.
const MIN_CHUNK_SIZE: usize = 16;
Expand Down Expand Up @@ -1233,111 +1235,164 @@ fn solve_joint_no_lowrank(
y: &ArrayView2<f64>,
delta: &ArrayView2<f64>,
) -> Option<(f64, Array1<f64>, Array1<f64>)> {
// SVD-based minimum-norm weighted least-squares fit — mirrors Python's
// `_solve_global_no_lowrank` at `diff_diff/trop_global.py:340-412`
// step-for-step so Rust and Python paths produce the same canonical
// solution on rank-deficient Y (silent-failures finding #23).
//
// Model: Y_it = μ + α_i + β_t + ε_it, with α_0 = β_0 = 0 for
// identification. Weights: δ_it. Flatten row-major with
// idx = t * n_units + i (matches Python's Y.flatten() C-order).
let n_periods = y.nrows();
let n_units = y.ncols();
let n_obs = n_periods * n_units;
let n_params = 1 + (n_units - 1) + (n_periods - 1);

// We solve using normal equations with the design matrix structure
// Rather than build full X matrix, use block structure for efficiency
//
// The model: Y_it = μ + α_i + β_t + ε_it
// With identification: α_0 = β_0 = 0

// Compute weighted sums needed for normal equations
// Flatten y + weights with NaN masking — matches trop_global.py:354-360.
let mut y_flat = Array1::<f64>::zeros(n_obs);
let mut w_flat = Array1::<f64>::zeros(n_obs);
let mut sum_w = 0.0;
let mut sum_wy = 0.0;

// Per-unit and per-period weighted sums
let mut sum_w_by_unit = Array1::<f64>::zeros(n_units);
let mut sum_wy_by_unit = Array1::<f64>::zeros(n_units);
let mut sum_w_by_period = Array1::<f64>::zeros(n_periods);
let mut sum_wy_by_period = Array1::<f64>::zeros(n_periods);

for t in 0..n_periods {
for i in 0..n_units {
// NaN outcomes get zero weight (not imputed to 0.0 with active weight)
let w = if y[[t, i]].is_finite() { delta[[t, i]] } else { 0.0 };
let y_ti = if y[[t, i]].is_finite() { y[[t, i]] } else { 0.0 };

let idx = t * n_units + i;
let y_ti = y[[t, i]];
let w_ti = delta[[t, i]];
let valid = y_ti.is_finite() && w_ti.is_finite();
let w = if valid { w_ti.max(0.0) } else { 0.0 };
y_flat[idx] = if valid { y_ti } else { 0.0 };
w_flat[idx] = w;
sum_w += w;
sum_wy += w * y_ti;

sum_w_by_unit[i] += w;
sum_wy_by_unit[i] += w * y_ti;
sum_w_by_period[t] += w;
sum_wy_by_period[t] += w * y_ti;
}
}

// All-zero weights short-circuit — matches trop_global.py:366.
if sum_w < 1e-10 {
return None;
}

// Use iterative approach: alternate between (alpha, beta) and mu
// until convergence (simpler than full normal equations)
let mut mu = sum_wy / sum_w;
// Build design matrix X = [intercept | unit_dummies[1..] | time_dummies[1..]]
// — matches trop_global.py:374-385. Explicit nested loops so the
// index correspondence with Python is unambiguous.
let mut x = Array2::<f64>::zeros((n_obs, n_params));
for t in 0..n_periods {
for i in 0..n_units {
let idx = t * n_units + i;
x[[idx, 0]] = 1.0; // intercept
if i >= 1 {
x[[idx, i]] = 1.0; // unit dummy (unit 0 dropped)
}
if t >= 1 {
x[[idx, (n_units - 1) + t]] = 1.0; // time dummy (period 0 dropped)
}
}
}

// Apply sqrt-weights: X_w = X * sqrt(w)[:, None], y_w = y * sqrt(w).
// Matches trop_global.py:387-389.
let sqrt_w: Array1<f64> = w_flat.mapv(|w| w.sqrt());
for r in 0..n_obs {
let s = sqrt_w[r];
for c in 0..n_params {
x[[r, c]] *= s;
}
y_flat[r] *= s;
}

// Solve via SVD with numpy-compatible rcond truncation.
let coeffs = solve_wls_svd(&x.view(), &y_flat.view())?;

// Unpack: mu = coeffs[0], alpha[1..] = coeffs[1..n_units],
// beta[1..] = coeffs[n_units..]. Matches trop_global.py:406-410.
let mu = coeffs[0];
let mut alpha = Array1::<f64>::zeros(n_units);
for i in 1..n_units {
alpha[i] = coeffs[i];
}
let mut beta = Array1::<f64>::zeros(n_periods);
for t in 1..n_periods {
beta[t] = coeffs[(n_units - 1) + t];
}

for _ in 0..50 {
let mu_old = mu;
let alpha_old = alpha.clone();
let beta_old = beta.clone();
Some((mu, alpha, beta))
}

// Update alpha (fixing beta, mu)
for i in 1..n_units { // α_0 = 0 for identification
if sum_w_by_unit[i] > 1e-10 {
let mut num = 0.0;
for t in 0..n_periods {
// NaN outcomes get zero weight
let w = if y[[t, i]].is_finite() { delta[[t, i]] } else { 0.0 };
let y_ti = if y[[t, i]].is_finite() { y[[t, i]] } else { 0.0 };
num += w * (y_ti - mu - beta[t]);
}
alpha[i] = num / sum_w_by_unit[i];
}
}
/// Minimum-norm least-squares solution via faer thin SVD with rcond truncation.
///
/// Mirrors `np.linalg.lstsq(A, b, rcond=None)` from numpy: singular values
/// below `rcond * max(S)` with `rcond = eps * max(n_rows, n_cols)` are
/// treated as zero. On rank-deficient A this returns the unique
/// minimum-norm least-squares solution.
///
/// This helper intentionally does NOT reuse `rust/src/linalg.rs::solve_ols`
/// because `solve_ols` hard-codes `rcond = 1e-7` (R's `lm()` default) which
/// would truncate singular values that numpy's default keeps. TROP's
/// canonical Python path is numpy-compatible; Rust must match.
///
/// Returns `None` only when the SVD itself fails (rare on finite inputs);
/// the caller (LOOCV / FISTA / bootstrap) interprets `None` as an
/// unsuccessful fit.
fn solve_wls_svd(a: &ArrayView2<f64>, b: &ArrayView1<f64>) -> Option<Array1<f64>> {
let n_rows = a.nrows();
let n_cols = a.ncols();
let a_owned = a.to_owned();
let b_owned = b.to_owned();

// Update beta (fixing alpha, mu)
for t in 1..n_periods { // β_0 = 0 for identification
if sum_w_by_period[t] > 1e-10 {
let mut num = 0.0;
for i in 0..n_units {
// NaN outcomes get zero weight
let w = if y[[t, i]].is_finite() { delta[[t, i]] } else { 0.0 };
let y_ti = if y[[t, i]].is_finite() { y[[t, i]] } else { 0.0 };
num += w * (y_ti - mu - alpha[i]);
}
beta[t] = num / sum_w_by_period[t];
}
// Convert ndarray -> faer for SVD.
let a_faer = ndarray_to_faer(&a_owned);

let svd = a_faer.thin_svd().ok()?;

let u_faer = svd.U();
let s_diag = svd.S();
let s_col = s_diag.column_vector();
let v_faer = svd.V();

// Extract U (n_rows x min(n,k)) back to ndarray.
let u_rows = u_faer.nrows();
let u_cols = u_faer.ncols();
let mut u = Array2::<f64>::zeros((u_rows, u_cols));
for i in 0..u_rows {
for j in 0..u_cols {
u[[i, j]] = u_faer[(i, j)];
}
}

// Update mu (fixing alpha, beta)
let mut num_mu = 0.0;
for t in 0..n_periods {
for i in 0..n_units {
// NaN outcomes get zero weight
let w = if y[[t, i]].is_finite() { delta[[t, i]] } else { 0.0 };
let y_ti = if y[[t, i]].is_finite() { y[[t, i]] } else { 0.0 };
num_mu += w * (y_ti - alpha[i] - beta[t]);
}
// Extract singular values.
let s_len = s_col.nrows();
let mut s = Array1::<f64>::zeros(s_len);
for i in 0..s_len {
s[i] = s_col[i];
}

// Extract V (k x min(n,k)) back to ndarray. faer's V is not V^T.
let v_rows = v_faer.nrows();
let v_cols = v_faer.ncols();
let mut v = Array2::<f64>::zeros((v_rows, v_cols));
for i in 0..v_rows {
for j in 0..v_cols {
v[[i, j]] = v_faer[(i, j)];
}
mu = num_mu / sum_w;

// Check convergence across ALL parameters (not just mu)
let mu_diff = (mu - mu_old).abs();
let alpha_diff = alpha.iter().zip(alpha_old.iter())
.map(|(a, b)| (a - b).abs())
.fold(0.0_f64, f64::max);
let beta_diff = beta.iter().zip(beta_old.iter())
.map(|(a, b)| (a - b).abs())
.fold(0.0_f64, f64::max);
let max_diff = mu_diff.max(alpha_diff).max(beta_diff);
if max_diff < 1e-8 {
break;
}

// numpy rcond = eps * max(n_rows, n_cols); truncate s[i] <= rcond * max(s).
let rcond = f64::EPSILON * (n_rows.max(n_cols) as f64);
let s_max = s.iter().cloned().fold(0.0_f64, f64::max);
let threshold = s_max * rcond;

// Compute β = V * S^{-1}_truncated * U^T * y.
let uty = u.t().dot(&b_owned); // (min(n,k),)
let mut s_inv_uty = Array1::<f64>::zeros(s_len);
for i in 0..s_len {
if s[i] > threshold {
s_inv_uty[i] = uty[i] / s[i];
}
// else: leave 0 — this is the pseudo-inverse / minimum-norm step
// that also covers Python's `except LinAlgError: pinv(...)` fallback
// tier, since faer thin_svd is numerically robust on finite inputs.
}
let coeffs = v.dot(&s_inv_uty);

Some((mu, alpha, beta))
Some(coeffs)
}

/// Solve global TWFE + low-rank via alternating minimization (no tau).
Expand Down
23 changes: 9 additions & 14 deletions tests/test_rust_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -2204,23 +2204,18 @@ def _make_correlated_panel(n_units=6, n_periods=5, n_treated=2):
data.append({"unit": i, "time": t, "outcome": y, "treated": treated})
return pd.DataFrame(data)

@pytest.mark.xfail(
strict=True,
reason="TROP Rust grid-search at trop_global.py:688 and Python fallback "
"at trop_global.py:753 diverge on rank-deficient Y: empirical ATT "
"gap ~6% on two near-parallel control units. Either (a) grid-winner "
"ties break differently between backends, or (b) the per-λ solver "
"itself reaches different stationary points under rank deficiency. "
"Finding #23 flagged this exact surface as the Phase-2 gap. Rust "
"vs Python unification is a P1 follow-up (TODO.md). This xfail "
"baselines the divergence so we notice when/if the backends align.",
)
def test_grid_search_rank_deficient_Y(self):
"""Grid-search ATT parity on rank-deficient Y.

Known to fail: two near-parallel control units produce ~6% ATT
divergence between Rust and Python LOOCV grid-search. See xfail
reason for follow-up plan.
Silent-failures audit Finding #23 (grid-search half) regression
guard. Previously a ~6% ATT divergence on two near-parallel
control units because the Rust inner solver used iterative block
coordinate descent while the Python fallback used SVD-based
minimum-norm least squares. Fixed by porting the Rust inner
solver to an SVD-based WLS path (numpy-compatible
rcond = eps*max(n,k)) that mirrors Python's
`np.linalg.lstsq(rcond=None)` step-for-step. This test asserts
the backends now agree at atol=1e-6 on rank-deficient Y.
"""
import sys
from unittest.mock import patch
Expand Down
Loading