Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]

### Added
- **`HeterogeneousAdoptionDiD.fit(survey=..., weights=...)` on continuous-dose paths (Phase 4.5 survey support).** The `continuous_at_zero` (paper Design 1') and `continuous_near_d_lower` (Design 1 continuous-near-d̲) designs accept survey weights through two interchangeable kwargs: `weights=<array>` (pweight shortcut, weighted-robust SE from the CCT-2014 lprobust port) and `survey=SurveyDesign(weights, strata, psu, fpc)` (design-based inference via Binder-TSL variance using the existing `compute_survey_if_variance` helper at `diff_diff/survey.py:1802`). Point estimates match across both entry paths; SE diverges by design (pweight-only vs PSU-aggregated). `HeterogeneousAdoptionDiDResults.survey_metadata` is a repo-standard `SurveyMetadata` dataclass (weight_type / effective_n / design_effect / sum_weights / weight_range / n_strata / n_psu / df_survey); HAD-specific extras (`variance_formula` label, `effective_dose_mean`) are separate top-level result fields. `to_dict()` surfaces the full `SurveyMetadata` object plus `variance_formula` + `effective_dose_mean`; `summary()` renders `variance_formula`, `effective_n`, `effective_dose_mean`, and (when the survey= path is used) `df_survey`; `__repr__` surfaces `variance_formula` + `effective_dose_mean` when present. The HAD `mass_point` design and `aggregate="event_study"` path raise `NotImplementedError` under survey/weights (deferred to Phase 4.5 B: weighted 2SLS + event-study survey composition); the HAD pretests stay unweighted in this release (Phase 4.5 C). Parity ceiling acknowledged — no public weighted-CCF bias-corrected local-linear reference exists in any language; methodology confidence comes from (1) uniform-weights bit-parity at `atol=1e-14` on the full lprobust output struct, (2) cross-language weighted-OLS parity (manual R reference) at `atol=1e-12`, and (3) Monte Carlo oracle consistency on known-τ DGPs. `_nprobust_port.lprobust` gains `weights=` and `return_influence=` (used internally by the Binder-TSL path); `bias_corrected_local_linear` removes the Phase 1c `NotImplementedError` on `weights=` and forwards. Auto-bandwidth selection remains unweighted in this release — pass `h`/`b` explicitly for weight-aware bandwidths. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Weighted extension (Phase 4.5 survey support)".
- **`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test` + `StuteJointResult`** (HeterogeneousAdoptionDiD Phase 3 follow-up). Joint Cramér-von Mises pretests across K horizons with shared-η Mammen wild bootstrap (preserves vector-valued empirical-process unit-level dependence per Delgado-Manteiga 2001 / Hlávka-Hušková 2020). The core `stute_joint_pretest` is residuals-in; two thin data-in wrappers construct per-horizon residuals for the two nulls the paper spells out: mean-independence (step 2 pre-trends, `OLS(Y_t − Y_base ~ 1)` per pre-period) and linearity (step 3 joint, `OLS(Y_t − Y_base ~ 1 + D)` per post-period). Sum-of-CvMs aggregation (`S_joint = Σ_k S_k`); per-horizon scale-invariant exact-linear short-circuit. Closes the paper Section 4.2 step-2 gap that Phase 3 `did_had_pretest_workflow` previously flagged with an "Assumption 7 pre-trends test NOT run" caveat. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Joint Stute tests" for algorithm, invariants, and scope exclusion of Eq 18 linear-trend detrending (deferred to Phase 4 Pierce-Schott replication).
- **`did_had_pretest_workflow(aggregate="event_study")`**: multi-period dispatch on balanced ≥3-period panels. Runs QUG at `F` + joint pre-trends Stute across earlier pre-periods + joint homogeneity-linearity Stute across post-periods. Step 2 closure requires ≥2 pre-periods; with only a single pre-period (the base `F-1`) `pretrends_joint=None` and the verdict flags the skip. Reuses the Phase 2b event-study panel validator (last-cohort auto-filter under staggered timing with `UserWarning`; `ValueError` when `first_treat_col=None` and the panel is staggered). The data-in wrappers `joint_pretrends_test` and `joint_homogeneity_test` also route through that same validator internally, so direct wrapper calls inherit the last-cohort filter and constant-post-dose invariant. `HADPretestReport` extended with `pretrends_joint`, `homogeneity_joint`, and `aggregate` fields; serialization methods (`summary`, `to_dict`, `to_dataframe`, `__repr__`) preserve the Phase 3 output bit-exactly on `aggregate="overall"` — no `aggregate` key, no header row, no schema drift — and only surface the new fields on `aggregate="event_study"`.
- **`target_parameter` block in BR/DR schemas (experimental; schema version bumped to 2.0)** — `BUSINESS_REPORT_SCHEMA_VERSION` and `DIAGNOSTIC_REPORT_SCHEMA_VERSION` bumped from `"1.0"` to `"2.0"` because the new `"no_scalar_by_design"` value on the `headline.status` / `headline_metric.status` enum (dCDH `trends_linear=True, L_max>=2` configuration) is a breaking change per the REPORTING.md stability policy. BusinessReport and DiagnosticReport now emit a top-level `target_parameter` block naming what the headline scalar actually represents for each of the 16 result classes. Closes BR/DR foundation gap #6 (target-parameter clarity). Fields: `name`, `definition`, `aggregation` (machine-readable dispatch tag), `headline_attribute` (raw result attribute), `reference` (citation pointer). BR's summary emits the short `name` right after the headline; DR's overall-interpretation paragraph does the same; both full reports carry a "## Target Parameter" section with the full definition. Per-estimator dispatch is sourced from REGISTRY.md and lives in the new `diff_diff/_reporting_helpers.py::describe_target_parameter`. A few branches read fit-time config (`EfficientDiDResults.pt_assumption`, `StackedDiDResults.clean_control`, `ChaisemartinDHaultfoeuilleResults.L_max` / `covariate_residuals` / `linear_trends_effects`); others emit a fixed tag (the fit-time `aggregate` kwarg on CS / Imputation / TwoStage / Wooldridge does not change the `overall_att` scalar — disambiguating horizon / group tables is tracked under gap #9). See `docs/methodology/REPORTING.md` "Target parameter" section.
Expand Down
7 changes: 5 additions & 2 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,8 +92,11 @@ Deferred items from PR reviews that were not addressed before merge.
| Clustered-DGP parity: Phase 1c's DGP 4 uses manual `h=b=0.3` to sidestep an nprobust-internal singleton-cluster bug in `lpbwselect.mse.dpi`'s pilot fits. Once nprobust ships a fix (or we derive one independently), add a clustered-auto-bandwidth parity test. | `benchmarks/R/generate_nprobust_lprobust_golden.R` | Phase 1c | Low |
| `HeterogeneousAdoptionDiD` joint cross-horizon covariance on event study: per-horizon SEs use INDEPENDENT sandwiches in Phase 2b (paper-faithful pointwise CIs per Pierce-Schott Figure 2). A follow-up could derive an IF-based stacking of per-horizon scores for joint cross-horizon inference (needed for joint hypothesis tests across event-time horizons). Block-bootstrap is a reasonable alternative. | `diff_diff/had.py::_fit_event_study` | Phase 2b | Low |
| `HeterogeneousAdoptionDiD` event-study staggered-timing beyond last cohort: Phase 2b auto-filters staggered panels to the last cohort per paper Appendix B.2. Earlier-cohort treatment effects are not identified by HAD; redirecting to `ChaisemartinDHaultfoeuille` / `did_multiplegt_dyn` is the paper's prescription. A full staggered HAD would require a different identification path (out of paper scope). | `diff_diff/had.py::_validate_had_panel_event_study` | Phase 2b | Low |
| `HeterogeneousAdoptionDiD`: survey-design integration (`survey=SurveyDesign(...)`). Currently raises `NotImplementedError`. Requires Taylor-linearization of the β-scale rescaling and replicate-weight-compatible 2SLS variance on the mass-point path. | `diff_diff/had.py` | Phase 2a | Medium |
| `HeterogeneousAdoptionDiD`: `weights=` support. Deferred jointly with survey integration. nprobust's `lprobust` has no weight argument so the nonparametric continuous path needs a derivation; the 2SLS mass-point path needs weighted-sandwich parity. | `diff_diff/had.py` | Phase 2a | Medium |
| `HeterogeneousAdoptionDiD` Phase 4.5 B: `survey=` / `weights=` on `design="mass_point"` (weighted 2SLS + weighted-sandwich variance; the Wooldridge 2010 Ch. 12 weighted-IV sandwich has a Stata `ivregress ... [pweight=...]` + R `AER::ivreg(weights=...)` parity anchor). Also ships `aggregate="event_study"` + survey/weights via per-horizon IPW + shared PSU multiplier bootstrap across horizons. This PR (Phase 4.5 A) raises `NotImplementedError` on both paths. | `diff_diff/had.py::_fit_mass_point_2sls`, `diff_diff/had.py::_fit_event_study` | Phase 4.5 B | Medium |
| `HeterogeneousAdoptionDiD` Phase 4.5 C0: QUG-under-survey decision gate. `qug_test` uses a ratio of extreme order statistics `D_{(1)} / (D_{(2)} - D_{(1)})` — extreme-value theory under inverse-probability weighting is a research area, not a standard toolkit. Lit-review Guillou-Hall (2001), Chen-Chen (2004); likely outcome is `NotImplementedError` on `qug_test(..., weights=...)` with a clear pointer to the Stute/Yatchew/joint pretests as the survey-supported alternatives. | `diff_diff/had_pretests.py::qug_test` | Phase 4.5 C0 | Low |
| `HeterogeneousAdoptionDiD` Phase 4.5 C: pretests under survey (`stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`, `did_had_pretest_workflow`). Rao-Wu rescaled bootstrap for the Stute-family (weighted η generation + PSU clustering in the bootstrap draw); weighted OLS residuals + weighted variance estimator for Yatchew. | `diff_diff/had_pretests.py` | Phase 4.5 C | Medium |
| `HeterogeneousAdoptionDiD` Phase 4.5: weight-aware auto-bandwidth MSE-DPI selector. Phase 4.5 A ships weighted `lprobust` with an unweighted DPI selector; users who want a weight-aware bandwidth must pass `h`/`b` explicitly. Extending `lpbwselect_mse_dpi` to propagate weights through density, second-derivative, and variance stages is ~300 LoC of methodology and was out of scope. | `diff_diff/_nprobust_port.py::lpbwselect_mse_dpi` | Phase 4.5 | Low |
| `HeterogeneousAdoptionDiD` Phase 4.5 C: replicate-weight SurveyDesigns (BRR / Fay / JK1 / JKn / SDR) on the continuous-dose paths. Phase 4.5 A raises `NotImplementedError` on replicate designs in `_aggregate_unit_resolved_survey`. Rao-Wu-style replicate bootstrap for HAD paths requires deriving the per-replicate weight-ratio rescaling for the local-linear intercept IF. | `diff_diff/had.py::_aggregate_unit_resolved_survey` | Phase 4.5 C | Low |
| `HeterogeneousAdoptionDiD` mass-point: `vcov_type in {"hc2", "hc2_bm"}` raises `NotImplementedError` pending a 2SLS-specific leverage derivation. The OLS leverage `x_i' (X'X)^{-1} x_i` is wrong for 2SLS; the correct finite-sample correction uses `x_i' (Z'X)^{-1} (...) (X'Z)^{-1} x_i`. Needs derivation plus an R / Stata (`ivreg2 small robust`) parity anchor. | `diff_diff/had.py::_fit_mass_point_2sls` | Phase 2a | Medium |
| `HeterogeneousAdoptionDiD` continuous paths: thread `cluster=` through `bias_corrected_local_linear` (Phase 1c's wrapper already supports cluster; Phase 2a ignores it with a `UserWarning` on the continuous path to keep scope tight). | `diff_diff/had.py`, `diff_diff/local_linear.py` | Phase 2a | Low |
| `HeterogeneousAdoptionDiD` Eq 18 linear-trend detrending (Pierce-Schott style): the joint-Stute infrastructure shipped in the Phase 3 follow-up supports pre-trends (mean-indep) and post-homogeneity (linearity) nulls. The Pierce-Schott application (paper Section 5.2) uses a LINEAR-TREND detrending of pre-period outcomes before the joint CvM — `Y_{g,t} - Y_{g,t_anchor} - (t - t_anchor)*(Y_{g,t_anchor} - Y_{g,t_anchor-1})` — reaching p=0.51 on US-China tariff data. Extends `joint_pretrends_test` with a detrending mode or a separate Eq 18-specific helper. Deferred to Phase 4 replication harness (where the published p=0.51 serves as the parity anchor). | `diff_diff/had_pretests.py::joint_pretrends_test` | Phase 4 | Medium |
Expand Down
154 changes: 154 additions & 0 deletions benchmarks/R/generate_np_npreg_weighted_golden.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# Generate cross-language weighted-OLS parity fixture for HAD Phase 4.5.
#
# Purpose: no public weighted-CCF reference exists for bias-corrected
# local-linear (nprobust::lprobust has no weight argument; np::npreg uses
# its own internal local-linear algorithm that does not reduce to a
# straightforward weighted OLS at the intercept). To validate the weighted
# kernel-composition machinery in diff_diff._nprobust_port cross-language,
# we record the intercept from an R implementation of the SAME formula
# (weighted-OLS with one-sided Epanechnikov kernel) that the Python port
# implements. Bit-parity at atol=1e-14 locks in numerical consistency
# across BLAS reductions.
#
# This is NOT third-party validation of the weighted-CCF methodology. It is
# a regression lock against R↔Python drift on the weighted-OLS formula
# itself. Methodology confidence under informative weights comes from:
# 1. Analytic derivation in docs/methodology/REGISTRY.md
# 2. Uniform-weights bit-parity: weights=np.ones ≡ unweighted at 1e-14
# 3. Monte Carlo oracle consistency (tests/test_had_mc.py)
#
# Usage:
# Rscript benchmarks/R/generate_np_npreg_weighted_golden.R
#
# Output:
# benchmarks/data/np_npreg_weighted_golden.json
#
# Phase 4.5 of HeterogeneousAdoptionDiD (de Chaisemartin et al. 2026).
# Python test loader: tests/test_np_npreg_weighted_parity.py.

library(jsonlite)

# -------------------------------------------------------------------------
# Weighted local-linear at a boundary: manual weighted OLS with Epa kernel.
# Matches diff_diff/local_linear.py::local_linear_fit exactly.
# -------------------------------------------------------------------------

weighted_local_linear <- function(d, y, weights, eval_point = 0.0, h = 0.3) {
# One-sided epanechnikov on [0, 1]: k(u) = (3/4)(1 - u^2), zero elsewhere.
u <- (d - eval_point) / h
kw <- ifelse(u >= 0 & u <= 1, 0.75 * (1 - u^2), 0)
# Combined weights: user weights * kernel weights.
combined <- kw * weights
# Active window (non-zero combined weight).
active <- combined > 0
if (sum(active) < 2) {
stop("Active window has fewer than 2 observations.")
}
# Weighted OLS of y ~ 1 + (d - eval_point), intercept is mu_hat at
# eval_point.
fit <- lm(y[active] ~ I(d[active] - eval_point), weights = combined[active])
mu_hat <- as.numeric(coef(fit)[1])
slope_hat <- as.numeric(coef(fit)[2])
list(
mu_hat = mu_hat,
slope = slope_hat,
n_active = as.integer(sum(active)),
h = h,
eval_point = eval_point
)
}

# -------------------------------------------------------------------------
# DGPs: deterministic seeds for reproducibility.
# -------------------------------------------------------------------------

set.seed(20260424)

dgp1 <- local({
G <- 500
d <- runif(G, 0, 1)
y <- 2 * d + 0.3 * d^2 + rnorm(G, sd = 0.25)
w <- rep(1.0, G)
list(d = d, y = y, w = w, eval_point = 0.0, h = 0.30,
description = "Uniform weights, G=500, boundary=0")
})

dgp2 <- local({
G <- 400
d <- runif(G, 0, 1)
y <- 2 * d + 0.3 * d^2 + rnorm(G, sd = 0.25)
w <- exp(-d * 2.0)
list(d = d, y = y, w = w, eval_point = 0.0, h = 0.25,
description = "Informative weights (exp decay from boundary), G=400")
})

dgp3 <- local({
G <- 200
d <- runif(G, 0, 1)
y <- 3 * d - d^2 + 0.5 * d^3 + rnorm(G, sd = 0.30)
w <- pmax(0.1, runif(G, 0.5, 1.5))
list(d = d, y = y, w = w, eval_point = 0.0, h = 0.35,
description = "Small G=200, nonlinear m(d), bounded heterogeneous weights")
})

dgp4 <- local({
G <- 400
d_lower <- 0.1
d <- runif(G, d_lower, 1)
y <- 2 * (d - d_lower) + 0.3 * (d - d_lower)^2 + rnorm(G, sd = 0.25)
w <- rep(1.0, G)
list(d = d - d_lower, y = y, w = w, eval_point = 0.0, h = 0.30,
description = "G=400, d_lower=0.1 shifted boundary=0 (Design 1 near-d_lower)",
d_lower = d_lower)
})

# -------------------------------------------------------------------------
# Run each DGP through the weighted local-linear reference.
# -------------------------------------------------------------------------

run_one <- function(name, dgp) {
cat(sprintf("Running %s: %s\n", name, dgp$description))
res <- weighted_local_linear(
d = dgp$d, y = dgp$y, weights = dgp$w,
eval_point = dgp$eval_point, h = dgp$h
)
list(
description = dgp$description,
n = length(dgp$d),
d = as.numeric(dgp$d),
y = as.numeric(dgp$y),
weights = as.numeric(dgp$w),
eval_point = as.numeric(res$eval_point),
h = as.numeric(res$h),
kernel = "epanechnikov",
n_active = res$n_active,
mu_hat = as.numeric(res$mu_hat),
slope = as.numeric(res$slope)
)
}

out <- list(
metadata = list(
r_version = paste(R.Version()$major, R.Version()$minor, sep = "."),
seed = 20260424L,
generator = "generate_np_npreg_weighted_golden.R",
algorithm = "manual weighted OLS with one-sided Epanechnikov kernel",
purpose = "HAD Phase 4.5 cross-language weighted-LL parity",
note = paste(
"Regression lock on the weighted kernel + weighted OLS formula",
"implemented in diff_diff.local_linear.local_linear_fit. Not a",
"third-party validation of weighted-CCF methodology; see REGISTRY",
"'Weighted extension (Phase 4.5)' for the parity-gap acknowledgement."
)
),
dgp1 = run_one("dgp1", dgp1),
dgp2 = run_one("dgp2", dgp2),
dgp3 = run_one("dgp3", dgp3),
dgp4 = run_one("dgp4", dgp4)
)

out_path <- "benchmarks/data/np_npreg_weighted_golden.json"
dir.create(dirname(out_path), recursive = TRUE, showWarnings = FALSE)

write_json(out, out_path, auto_unbox = TRUE, pretty = TRUE, digits = 14)
cat(sprintf("Wrote %s\n", out_path))
Loading
Loading