igerber · igerber · Apr 20, 2026 · Apr 19, 2026 · Apr 19, 2026 · Apr 19, 2026
diff --git a/TODO.md b/TODO.md
@@ -86,6 +86,11 @@ Deferred items from PR reviews that were not addressed before merge.
 | `compute_synthetic_weights` backend algorithm mismatch: Rust path uses Frank-Wolfe (`_rust_synthetic_weights` in `utils.py:1184`); Python fallback uses projected gradient descent (`_compute_synthetic_weights_numpy` in `utils.py:1228`). Both solve the same constrained QP but converge to different simplex vertices on near-degenerate / extreme-scale inputs (e.g. `Y~1e9`, or near-singular `Y'Y`). Unified backend (one algorithm) would close the parity gap surfaced by audit finding #22. Two `@pytest.mark.xfail(strict=True)` tests in `tests/test_rust_backend.py::TestSyntheticWeightsBackendParity` baseline the divergence so we notice when/if the algorithms align. | `utils.py`, `rust/` | follow-up | Medium |
 | TROP Rust vs Python grid-search divergence on rank-deficient Y: on two near-parallel control units, LOOCV grid-search ATT diverges ~6% between Rust (`trop_global.py:688`) and Python fallback (`trop_global.py:753`). Either grid-winner ties are broken differently or the per-λ solver reaches different stationary points under rank deficiency. Audit finding #23 flagged this surface. `@pytest.mark.xfail(strict=True)` in `tests/test_rust_backend.py::TestTROPRustEdgeCaseParity::test_grid_search_rank_deficient_Y` baselines the gap. | `trop_global.py`, `rust/` | follow-up | Medium |
 | TROP Rust vs Python bootstrap SE divergence under fixed seed: `seed=42` on a tiny panel produces ~28% bootstrap-SE gap. Root cause: Rust bootstrap uses its own RNG (`rand` crate) while Python uses `numpy.random.default_rng`; same seed value maps to different bytestreams across backends. Audit axis-H (RNG/seed) adjacent. `@pytest.mark.xfail(strict=True)` in `tests/test_rust_backend.py::TestTROPRustEdgeCaseParity::test_bootstrap_seed_reproducibility` baselines the gap. Unifying RNG (threading a numpy-generated seed-sequence into Rust, or porting Python to ChaCha) would close it. | `trop_global.py`, `rust/` | follow-up | Medium |
+| `bias_corrected_local_linear`: extend golden parity to `kernel="triangular"` and `kernel="uniform"` (currently epa-only; all three kernels share `kernel_W` and the `lprobust` math, so parity is expected but not separately asserted). | `benchmarks/R/generate_nprobust_lprobust_golden.R`, `tests/test_bias_corrected_lprobust.py` | Phase 1c | Low |
+| `bias_corrected_local_linear`: expose `vce in {"hc0", "hc1", "hc2", "hc3"}` on the public wrapper once R parity goldens exist (currently raises `NotImplementedError`). The port-level `lprobust` and `lprobust_res` already support all four; expanding the public surface requires a golden generator for each hc mode and a decision on hc2/hc3 q-fit leverage (R reuses p-fit `hii` for q-fit residuals; whether to match that or stage-match deserves a derivation before the wrapper advertises CCT-2014 conformance). | `diff_diff/local_linear.py::bias_corrected_local_linear`, `benchmarks/R/generate_nprobust_lprobust_golden.R`, `tests/test_bias_corrected_lprobust.py` | Phase 1c | Medium |
+| `bias_corrected_local_linear`: support `weights=` once survey-design adaptation lands. nprobust's `lprobust` has no weight argument so there is no parity anchor; derivation needed. | `diff_diff/local_linear.py`, `diff_diff/_nprobust_port.py::lprobust` | Phase 1c | Medium |
+| `bias_corrected_local_linear`: support multi-eval grid (`neval > 1`) with cross-covariance (`covgrid=TRUE` branch of `lprobust.R:253-378`). Not needed for HAD but useful for multi-dose diagnostics. | `diff_diff/_nprobust_port.py::lprobust` | Phase 1c | Low |
+| Clustered-DGP parity: Phase 1c's DGP 4 uses manual `h=b=0.3` to sidestep an nprobust-internal singleton-cluster bug in `lpbwselect.mse.dpi`'s pilot fits. Once nprobust ships a fix (or we derive one independently), add a clustered-auto-bandwidth parity test. | `benchmarks/R/generate_nprobust_lprobust_golden.R` | Phase 1c | Low |
 
 #### Performance
 

diff --git a/benchmarks/R/generate_nprobust_lprobust_golden.R b/benchmarks/R/generate_nprobust_lprobust_golden.R
@@ -0,0 +1,166 @@
+# Generate nprobust lprobust golden values for the Phase 1c parity suite.
+#
+# This script calls nprobust::lprobust() at a single eval point with
+# bwselect="mse-dpi" on five deterministic DGPs and records:
+#   - tau.cl, tau.bc (point estimates)
+#   - se.cl, se.rb   (standard errors)
+#   - h, b           (bandwidths chosen by the mse-dpi selector)
+#   - N              (observations in the selected kernel window)
+#   - z = qnorm(1 - alpha/2)
+#   - ci_low, ci_high = tau.bc +/- z * se.rb
+#
+# DGPs 1-3 reuse the same seed + shape as benchmarks/R/generate_nprobust_golden.R
+# so the selected (h, b) are identical; Phase 1c parity is therefore isolated
+# to the point-estimate + variance computation. DGP 4 adds cluster IDs for
+# cluster-robust SE parity; DGP 5 shifts the support to test a non-zero
+# boundary (Design 1 continuous-near-d_lower).
+#
+# Usage:
+#   Rscript benchmarks/R/generate_nprobust_lprobust_golden.R
+#
+# Requirements:
+#   nprobust (CRAN), jsonlite
+#
+# Output:
+#   benchmarks/data/nprobust_lprobust_golden.json
+#
+# Phase 1c of the HeterogeneousAdoptionDiD implementation (de Chaisemartin,
+# Ciccia, D'Haultfoeuille & Knau 2026, arXiv:2405.04465v6). Python tests at
+# tests/test_bias_corrected_lprobust.py and tests/test_nprobust_port.py load
+# this JSON and check agreement to tiered tolerances (1e-14 on tau_cl/se_cl,
+# 1e-12 on tau_bc/se_rb, 1e-13 on CI bounds; see Phase 1c plan).
+
+library(nprobust)
+library(jsonlite)
+
+stopifnot(packageVersion("nprobust") == "0.5.0")
+
+extract_lprobust_single_eval <- function(d, y, eval_point = 0.0,
+                                          kernel = "epa", vce = "nn",
+                                          cluster = NULL, alpha = 0.05,
+                                          h = NULL, b = NULL) {
+  # If h (and optionally b) are passed, bypass the mse-dpi selector and
+  # call lprobust() with those bandwidths directly. This is used for
+  # clustered DGPs where nprobust's internal lpbwselect.mse.dpi hits a
+  # singleton-cluster shape bug in lprobust.vce during the order-q+1/q+2
+  # pilot fits. For unclustered DGPs, h=NULL triggers bwselect="mse-dpi".
+  if (is.null(h)) {
+    fit <- lprobust(y = y, x = d, eval = eval_point,
+                    p = 1L, deriv = 0L, kernel = kernel,
+                    bwselect = "mse-dpi", vce = vce, cluster = cluster,
+                    bwcheck = 21L, bwregul = 1, nnmatch = 3L)
+  } else {
+    # When b is unspecified, nprobust defaults to b = h / rho with rho=1.
+    fit <- lprobust(y = y, x = d, eval = eval_point,
+                    p = 1L, deriv = 0L, kernel = kernel,
+                    h = h, b = if (is.null(b)) h else b,
+                    vce = vce, cluster = cluster,
+                    bwcheck = 21L, nnmatch = 3L)
+  }
+  est <- fit$Estimate[1, ]
+  z   <- qnorm(1 - alpha / 2)
+  ci_low  <- as.numeric(est["tau.bc"] - z * est["se.rb"])
+  ci_high <- as.numeric(est["tau.bc"] + z * est["se.rb"])
+
+  list(
+    eval_point = as.numeric(eval_point),
+    h          = as.numeric(est["h"]),
+    b          = as.numeric(est["b"]),
+    n_used     = as.integer(est["N"]),
+    tau_cl     = as.numeric(est["tau.us"]),
+    tau_bc     = as.numeric(est["tau.bc"]),
+    se_cl      = as.numeric(est["se.us"]),
+    se_rb      = as.numeric(est["se.rb"]),
+    ci_low     = ci_low,
+    ci_high    = ci_high,
+    alpha      = as.numeric(alpha),
+    z          = as.numeric(z)
+  )
+}
+
+set.seed(20260419)
+
+# DGP 1: d ~ Uniform(0, 1), y = d + d^2 + N(0, 0.5)
+G <- 2000L
+d1 <- runif(G, 0, 1)
+y1 <- d1 + d1^2 + rnorm(G, 0, 0.5)
+
+# DGP 2: d ~ Beta(2, 2), y = d + d^2 + N(0, 0.5)   (f(0) vanishes at boundary)
+d2 <- rbeta(G, 2, 2)
+y2 <- d2 + d2^2 + rnorm(G, 0, 0.5)
+
+# DGP 3: Half-normal d, y = 0.5 * d^2 + N(0, 1)
+d3 <- abs(rnorm(G, 0, 1))
+y3 <- 0.5 * d3^2 + rnorm(G, 0, 1)
+
+# DGP 4: Uniform(0, 1) with 50 clusters of 40 obs (cluster-robust SE parity).
+# Fewer, larger clusters avoid an nprobust-internal singleton-cluster shape
+# bug in lprobust.vce that fires if a kernel window retains only one obs per
+# cluster. 50 clusters x 40 obs => the mse-dpi pilot windows near the
+# boundary keep enough obs per cluster to stay well-conditioned.
+set.seed(20260420)
+G4 <- 2000L
+d4 <- runif(G4, 0, 1)
+cluster4 <- rep(1:50, each = 40)[1:G4]
+# Introduce within-cluster correlation in y via a cluster effect.
+cluster_effect <- rnorm(50, 0, 0.3)[cluster4]
+y4 <- d4 + d4^2 + cluster_effect + rnorm(G4, 0, 0.3)
+
+# DGP 5: Uniform(0.2, 1.0) — Design 1 continuous-near-d_lower at
+# boundary = d.min() > 0. Different seed to avoid aliasing DGP 1.
+set.seed(20260421)
+G5 <- 2000L
+d5 <- runif(G5, 0.2, 1.0)
+y5 <- (d5 - 0.2) + (d5 - 0.2)^2 + rnorm(G5, 0, 0.5)
+eval5 <- min(d5)  # Design 1 continuous: evaluate at the realized minimum.
+
+golden <- list(
+  metadata = list(
+    nprobust_version = as.character(packageVersion("nprobust")),
+    nprobust_sha = "36e4e532d2f7d23d4dc6e162575cca79e0927cda",
+    seeds = list(dgp1 = 20260419L, dgp2 = 20260419L, dgp3 = 20260419L,
+                 dgp4 = 20260420L, dgp5 = 20260421L),
+    generator = "benchmarks/R/generate_nprobust_lprobust_golden.R",
+    algorithm = paste("nprobust::lprobust(..., bwselect='mse-dpi') at a single",
+                       "eval point, p=1, deriv=0, kernel='epa', vce='nn'",
+                       "unless noted. The Python wrapper computes its own",
+                       "z_{1-alpha/2} via scipy.stats.norm.ppf inside",
+                       "safe_inference(); R's z is exported here for audit",
+                       "so a reviewer can verify the two critical values",
+                       "agree to machine precision.")
+  ),
+  dgp1 = c(list(n = G, d = d1, y = y1, kernel = "epa", vce = "nn",
+                description = "Uniform(0,1), polynomial m(d) = d + d^2"),
+           extract_lprobust_single_eval(d1, y1, kernel = "epa", vce = "nn")),
+  dgp2 = c(list(n = G, d = d2, y = y2, kernel = "epa", vce = "nn",
+                description = "Beta(2,2) - boundary density vanishes at 0"),
+           extract_lprobust_single_eval(d2, y2, kernel = "epa", vce = "nn")),
+  dgp3 = c(list(n = G, d = d3, y = y3, kernel = "epa", vce = "nn",
+                description = "Half-normal d, quadratic m(d) with unit noise"),
+           extract_lprobust_single_eval(d3, y3, kernel = "epa", vce = "nn")),
+  dgp4 = c(list(n = G4, d = d4, y = y4, cluster = cluster4,
+                kernel = "epa", vce = "nn",
+                h_manual = 0.3, b_manual = 0.3,
+                description = paste("Clustered (50 groups of 40) Uniform(0,1);",
+                                     "manual h=b=0.3 to sidestep nprobust's",
+                                     "singleton-cluster bug in the mse-dpi",
+                                     "pilot fits.")),
+           extract_lprobust_single_eval(d4, y4, kernel = "epa", vce = "nn",
+                                         cluster = cluster4,
+                                         h = 0.3, b = 0.3)),
+  dgp5 = c(list(n = G5, d = d5, y = y5, eval_point_override = eval5,
+                kernel = "epa", vce = "nn",
+                description = "Uniform(0.2, 1.0), Design 1 boundary = d.min()"),
+           extract_lprobust_single_eval(d5, y5, eval_point = eval5,
+                                         kernel = "epa", vce = "nn"))
+)
+
+out_path <- "benchmarks/data/nprobust_lprobust_golden.json"
+dir.create("benchmarks/data", recursive = TRUE, showWarnings = FALSE)
+write_json(golden, out_path, auto_unbox = TRUE, pretty = TRUE, digits = 14)
+cat("Golden values written to", out_path, "\n")
+for (name in c("dgp1", "dgp2", "dgp3", "dgp4", "dgp5")) {
+  cat(sprintf("%s: tau.bc = %.6f, se.rb = %.6f, h = %.6f, b = %.6f\n",
+              name, golden[[name]]$tau_bc, golden[[name]]$se_rb,
+              golden[[name]]$h, golden[[name]]$b))
+}
diff --git a/benchmarks/data/nprobust_lprobust_golden.json b/benchmarks/data/nprobust_lprobust_golden.json
diff --git a/diff_diff/__init__.py b/diff_diff/__init__.py
@@ -46,7 +46,9 @@
 from diff_diff.local_linear import (
     KERNELS,
     BandwidthResult,
+    BiasCorrectedFit,
     LocalLinearFit,
+    bias_corrected_local_linear,
     epanechnikov_kernel,
     kernel_moments,
     local_linear_fit,
@@ -427,6 +429,9 @@
     # MSE-optimal bandwidth selector (Phase 1b for HeterogeneousAdoptionDiD)
     "BandwidthResult",
     "mse_optimal_bandwidth",
+    # Bias-corrected local-linear (Phase 1c for HeterogeneousAdoptionDiD)
+    "BiasCorrectedFit",
+    "bias_corrected_local_linear",
     # Datasets
     "load_card_krueger",
     "load_castle_doctrine",