Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,6 @@ Deferred items from PR reviews that were not addressed before merge.
| Weighted CR2 Bell-McCaffrey cluster-robust (`vcov_type="hc2_bm"` + `cluster_ids` + `weights`) currently raises `NotImplementedError`. Weighted hat matrix and residual rebalancing need threading per clubSandwich WLS handling. | `linalg.py::_compute_cr2_bm` | Phase 1a | Medium |
| Regenerate `benchmarks/data/clubsandwich_cr2_golden.json` from R (`Rscript benchmarks/R/generate_clubsandwich_golden.R`). Current JSON has `source: python_self_reference` as a stability anchor until an authoritative R run. | `benchmarks/R/generate_clubsandwich_golden.R` | Phase 1a | Medium |
| `honest_did.py:1907` `np.linalg.solve(A_sys, b_sys) / except LinAlgError: continue` is a silent basis-rejection in the vertex-enumeration loop that is algorithmically intentional (try the next basis). Consider surfacing a count of rejected bases as a diagnostic when ARP enumeration exhausts, so users see when the vertex search was heavily constrained. Not a silent failure in the sense of the Phase 2 audit (the algorithm is supposed to skip), but the diagnostic would help debug borderline cases. | `honest_did.py` | #334 | Low |
| Rust `compute_synthetic_weights` + `compute_synthetic_weights_internal` (now dead code) can be removed from `rust/src/weights.rs:43-117` in a future Rust-cleanup PR. Python-side wrapper was deleted (post-audit cleanup for finding #22) and its sole caller now inlines Frank-Wolfe via `_sc_weight_fw`. The Rust symbol remains callable via `from diff_diff._rust_backend import compute_synthetic_weights` but no Python code calls it. Removal requires `maturin develop` rebuild. No functional impact of leaving it. | `rust/src/weights.rs` | follow-up | Low |
| TROP Rust vs Python grid-search divergence on rank-deficient Y: on two near-parallel control units, LOOCV grid-search ATT diverges ~6% between Rust (`trop_global.py:688`) and Python fallback (`trop_global.py:753`). Either grid-winner ties are broken differently or the per-λ solver reaches different stationary points under rank deficiency. Audit finding #23 flagged this surface. `@pytest.mark.xfail(strict=True)` in `tests/test_rust_backend.py::TestTROPRustEdgeCaseParity::test_grid_search_rank_deficient_Y` baselines the gap. | `trop_global.py`, `rust/` | follow-up | Medium |
| TROP Rust vs Python bootstrap SE divergence under fixed seed: `seed=42` on a tiny panel produces ~28% bootstrap-SE gap. Root cause: Rust bootstrap uses its own RNG (`rand` crate) while Python uses `numpy.random.default_rng`; same seed value maps to different bytestreams across backends. Audit axis-H (RNG/seed) adjacent. `@pytest.mark.xfail(strict=True)` in `tests/test_rust_backend.py::TestTROPRustEdgeCaseParity::test_bootstrap_seed_reproducibility` baselines the gap. Unifying RNG (threading a numpy-generated seed-sequence into Rust, or porting Python to ChaCha) would close it. | `trop_global.py`, `rust/` | follow-up | Medium |
| `bias_corrected_local_linear`: extend golden parity to `kernel="triangular"` and `kernel="uniform"` (currently epa-only; all three kernels share `kernel_W` and the `lprobust` math, so parity is expected but not separately asserted). | `benchmarks/R/generate_nprobust_lprobust_golden.R`, `tests/test_bias_corrected_lprobust.py` | Phase 1c | Low |
Expand Down
3 changes: 1 addition & 2 deletions rust/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,7 @@ fn _rust_backend(m: &Bound<'_, PyModule>) -> PyResult<()> {
m
)?)?;

// Synthetic control weights (legacy projected gradient descent)
m.add_function(wrap_pyfunction!(weights::compute_synthetic_weights, m)?)?;
// Simplex projection
m.add_function(wrap_pyfunction!(weights::project_simplex, m)?)?;

// SDID weights (Frank-Wolfe matching R's synthdid)
Expand Down
123 changes: 0 additions & 123 deletions rust/src/weights.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
//! Synthetic control weight computation.
//!
//! This module provides optimized implementations of:
//! - Legacy synthetic control weight optimization (projected gradient descent)
//! - Frank-Wolfe synthetic control weights (matching R's synthdid)
//! - Simplex projection
//! - SDID unit and time weight computation
Expand All @@ -11,111 +10,6 @@ use ndarray::linalg::general_mat_vec_mul;
use numpy::{PyArray1, PyReadonlyArray1, PyReadonlyArray2, ToPyArray};
use pyo3::prelude::*;

/// Maximum number of optimization iterations.
const MAX_ITER: usize = 1000;

/// Default convergence tolerance (matches Python's _OPTIMIZATION_TOL).
const DEFAULT_TOL: f64 = 1e-8;

/// Default step size for gradient descent.
const DEFAULT_STEP_SIZE: f64 = 0.1;

// =========================================================================
// Legacy synthetic control weights (projected gradient descent)
// =========================================================================

/// Compute synthetic control weights via projected gradient descent.
///
/// Solves: min_w ||Y_treated - Y_control @ w||² + lambda * ||w||²
/// subject to: w >= 0, sum(w) = 1
///
/// # Arguments
/// * `y_control` - Control unit outcomes matrix (n_pre, n_control)
/// * `y_treated` - Treated unit outcomes (n_pre,)
/// * `lambda_reg` - L2 regularization parameter
/// * `max_iter` - Maximum number of iterations (default: 1000)
/// * `tol` - Convergence tolerance (default: 1e-6)
///
/// # Returns
/// Optimal weights (n_control,) that sum to 1
#[pyfunction]
#[pyo3(signature = (y_control, y_treated, lambda_reg=0.0, max_iter=None, tol=None))]
pub fn compute_synthetic_weights<'py>(
py: Python<'py>,
y_control: PyReadonlyArray2<'py, f64>,
y_treated: PyReadonlyArray1<'py, f64>,
lambda_reg: f64,
max_iter: Option<usize>,
tol: Option<f64>,
) -> PyResult<Bound<'py, PyArray1<f64>>> {
let y_control_arr = y_control.as_array();
let y_treated_arr = y_treated.as_array();

let weights =
compute_synthetic_weights_internal(&y_control_arr, &y_treated_arr, lambda_reg, max_iter, tol)?;

Ok(weights.to_pyarray(py))
}

/// Internal implementation of synthetic weight computation.
fn compute_synthetic_weights_internal(
y_control: &ArrayView2<f64>,
y_treated: &ArrayView1<f64>,
lambda_reg: f64,
max_iter: Option<usize>,
tol: Option<f64>,
) -> PyResult<Array1<f64>> {
let n_control = y_control.ncols();
let max_iter = max_iter.unwrap_or(MAX_ITER);
let tol = tol.unwrap_or(DEFAULT_TOL);

// Precompute Hessian: H = Y_control' @ Y_control + lambda * I
let h = {
let ytc = y_control.t().dot(y_control);
let mut h = ytc;
// Add regularization to diagonal
for i in 0..n_control {
h[[i, i]] += lambda_reg;
}
h
};

// Precompute linear term: f = Y_control' @ Y_treated
let f = y_control.t().dot(y_treated);

// Initialize with uniform weights
let mut weights = Array1::from_elem(n_control, 1.0 / n_control as f64);

// Projected gradient descent
let step_size = DEFAULT_STEP_SIZE;
let mut prev_weights = weights.clone();

for _ in 0..max_iter {
// Gradient: grad = H @ weights - f
let grad = h.dot(&weights) - &f;

// Gradient step
weights = &weights - step_size * &grad;

// Project onto simplex
weights = project_simplex_internal(&weights.view());

// Check convergence
let diff: f64 = weights
.iter()
.zip(prev_weights.iter())
.map(|(a, b)| (a - b).powi(2))
.sum();
if diff.sqrt() < tol {
break;
}

prev_weights.assign(&weights);
}

Ok(weights)
}

// =========================================================================
// Simplex projection
// =========================================================================
Expand Down Expand Up @@ -806,23 +700,6 @@ mod tests {
assert!(result.iter().all(|&x| x >= -1e-10));
}

#[test]
fn test_compute_weights_sum_to_one() {
let y_control = array![[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]];
let y_treated = array![2.0, 5.0, 8.0];

let weights =
compute_synthetic_weights_internal(&y_control.view(), &y_treated.view(), 0.0, None, None)
.unwrap();

let sum: f64 = weights.sum();
assert!((sum - 1.0).abs() < 1e-6, "Weights should sum to 1, got {}", sum);
assert!(
weights.iter().all(|&w| w >= -1e-10),
"Weights should be non-negative"
);
}

#[test]
fn test_sum_normalize() {
let v = array![2.0, 3.0, 5.0];
Expand Down
22 changes: 20 additions & 2 deletions tests/test_rust_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,8 +106,26 @@ def test_bootstrap_different_seeds(self):

# Tests for `compute_synthetic_weights` direct Rust binding removed in
# the silent-failures audit post-cleanup (finding #22). The helper was
# deleted from the Python layer; remaining Rust symbol is dead code
# (tracked as a low-priority Rust cleanup TODO).
# deleted from the Python layer and the Rust symbol was subsequently
# removed from `rust/src/weights.rs` + unregistered in `rust/src/lib.rs`.

def test_compute_synthetic_weights_is_removed(self):
"""Regression guard against accidental re-export of the deleted
`compute_synthetic_weights` PyO3 binding (silent-failures finding
#22). If this test fails, someone reintroduced the binding — audit
the reason before adding it back."""
import diff_diff._rust_backend as rb

with pytest.raises(ImportError):
from diff_diff._rust_backend import ( # noqa: F401
compute_synthetic_weights,
)

assert not hasattr(rb, "compute_synthetic_weights"), (
"compute_synthetic_weights was removed from the Rust backend "
"in the post-audit cleanup for finding #22; its presence here "
"indicates accidental re-export."
)

# =========================================================================
# Simplex Projection Tests
Expand Down
Loading