diff --git a/CHANGELOG.md b/CHANGELOG.md index def97c4f..e3c1a0c1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Changed +- Refresh `ROADMAP.md` to drop top-level phase numbering and reflect shipped state through v3.1.1. Absorbs dCDH into the Current State estimator list; adds Recently Shipped summary; reorganizes open work as Shipping Next / Under Consideration / AI-Agent Track / Long-term. Updates `docs/business-strategy.md`, `docs/survey-roadmap.md`, `docs/practitioner_decision_tree.rst`, `docs/choosing_estimator.rst`, `docs/api/chaisemartin_dhaultfoeuille.rst`, `README.md`, and `diff_diff/guides/llms-full.txt` to remove stale phase-deferral language now that the deferred items have shipped. + ## [3.1.1] - 2026-04-16 ### Added diff --git a/README.md b/README.md index 0095bf0e..f6b32a66 100644 --- a/README.md +++ b/README.md @@ -1113,7 +1113,7 @@ results = stacked_did( ### Efficient DiD (Chen, Sant'Anna & Xie 2025) -Efficient DiD achieves the semiparametric efficiency bound for ATT estimation in staggered adoption designs. It optimally weights across all valid comparison groups and baselines via the inverse covariance matrix Omega*, producing tighter confidence intervals than standard estimators like Callaway-Sant'Anna when the stronger PT-All assumption holds. +Efficient DiD achieves the semiparametric efficiency bound for ATT estimation in staggered adoption designs along the **no-covariate path**, producing tighter confidence intervals than standard estimators when the stronger PT-All assumption holds. It optimally weights across all valid comparison groups and baselines via the inverse covariance matrix Omega*. A doubly-robust covariate path is also available: it is consistent if either the outcome regression or the sieve propensity ratio is correctly specified, but the linear OLS outcome regression does not generically attain the efficiency bound unless the conditional mean is linear in the covariates. ```python from diff_diff import EfficientDiD, generate_staggered_data @@ -1148,8 +1148,13 @@ EfficientDiD( ) ``` -> **Note:** Phase 1 supports the no-covariates path only. Use CallawaySantAnna with -> `estimation_method='dr'` if you need covariate adjustment. +> **Note:** EfficientDiD supports covariate adjustment via a doubly-robust path +> (sieve-based propensity score ratios and a linear OLS outcome regression). +> The DR property gives consistency if either the OR or the PS is correctly +> specified, but the OLS working model for the outcome regression does not +> generically attain the semiparametric efficiency bound. The unqualified +> efficiency-bound claim applies to the no-covariate path only. See the +> `covariates` parameter on `fit()` and `docs/methodology/REGISTRY.md`. **When to use Efficient DiD vs Callaway-Sant'Anna:** @@ -1157,15 +1162,15 @@ EfficientDiD( |--------|--------------|-------------------| | Approach | Optimal EIF-based weighting | Separate 2x2 DiD aggregation | | PT assumption | PT-All (stronger) or PT-Post | Conditional PT | -| Efficiency | Achieves semiparametric bound | Not efficient | -| Covariates | Not yet (Phase 2) | Supported (OR, IPW, DR) | +| Efficiency | Achieves semiparametric bound on the no-covariate path; DR covariate path is consistent but does not generically attain the bound under a linear OLS outcome regression | Not efficient | +| Covariates | Supported (doubly robust, sieve-based PS + linear OLS OR) | Supported (OR, IPW, DR) | | When to choose | Maximum efficiency, PT-All credible | Covariates needed, weaker PT | ### de Chaisemartin-D'Haultfœuille (dCDH) for Reversible Treatments `ChaisemartinDHaultfoeuille` (alias `DCDH`) is the only library estimator that handles **non-absorbing (reversible) treatments** — treatment can switch on AND off over time. This is the natural fit for marketing campaigns, seasonal promotions, on/off policy cycles. -Ships `DID_M` (= `DID_1` at horizon `l = 1`) plus the full multi-horizon event study `DID_l` for `l = 1..L_max` via the `L_max` parameter. Phase 3 will add covariate adjustment. +Ships `DID_M` (= `DID_1` at horizon `l = 1`), the full multi-horizon event study `DID_l` for `l = 1..L_max` via the `L_max` parameter, residualization-style covariate adjustment (`controls`), group-specific linear trends (`trends_linear`), state-set-specific trends (`trends_nonparam`), heterogeneity testing, non-binary treatment, HonestDiD sensitivity integration on placebos, and survey support via Taylor-series linearization. ```python from diff_diff import ChaisemartinDHaultfoeuille @@ -1221,7 +1226,7 @@ ChaisemartinDHaultfoeuille( | `n_groups_dropped_crossers`, `n_groups_dropped_singleton_baseline` | Filter counts (multi-switch groups dropped before estimation; singleton-baseline groups excluded from variance) | | `n_groups_dropped_never_switching` | Backwards-compatibility metadata. Never-switching groups participate in the variance via stable-control roles; this field is no longer a filter count. | -**Multi-horizon event study** (Phase 2 - pass `L_max` to `fit()`): +**Multi-horizon event study** (pass `L_max` to `fit()`): ```python results = est.fit(data, outcome="outcome", group="group", @@ -1260,13 +1265,13 @@ print(f"Fraction of negative weights: {diagnostic.fraction_negative:.3f}") print(f"sigma_fe (sign-flipping threshold): {diagnostic.sigma_fe:.3f}") ``` -> **Note:** Placebo SE is `NaN` for both the single-lag `DID_M^pl` and the dynamic placebos `DID^{pl}_l`. The point estimates are meaningful for visual pre-trends inspection; formal placebo inference (influence-function derivation) is deferred to a follow-up. See `REGISTRY.md` for the full contract. +> **Note:** Placebo SE is `NaN` for the single-period `DID_M^pl` (`L_max=None`) because the per-period aggregation path has no influence-function derivation; the point estimate is meaningful for visual pre-trends inspection. Multi-horizon dynamic placebos `DID^{pl}_l` (`L_max >= 1`) have valid analytical SE via the same cohort-recentered plug-in variance as the positive horizons, with bootstrap SE available when `n_bootstrap > 0`. See `docs/methodology/REGISTRY.md` for the full contract. > **Note:** By default (`drop_larger_lower=True`), the estimator drops groups whose treatment switches more than once before estimation. This matches R `DIDmultiplegtDYN`'s default and is required for the analytical variance formula to be consistent with the point estimate. Each drop emits an explicit warning. -> **Note:** Phase 1 requires panels with a **balanced baseline** (every group observed at the first global period) and **no interior period gaps**. Late-entry groups (missing the baseline) raise `ValueError`; interior-gap groups are dropped with a warning; terminally-missing groups (early exit / right-censoring) are retained and contribute from their observed periods only. This is a documented deviation from R `DIDmultiplegtDYN`, which supports unbalanced panels — see [`docs/methodology/REGISTRY.md`](docs/methodology/REGISTRY.md) for the rationale, the defensive guards that make terminal missingness safe, and workarounds for unbalanced inputs. +> **Note:** The estimator requires panels with a **balanced baseline** (every group observed at the first global period) and **no interior period gaps**. Late-entry groups (missing the baseline) raise `ValueError`; interior-gap groups are dropped with a warning; terminally-missing groups (early exit / right-censoring) are retained and contribute from their observed periods only. This is a documented deviation from R `DIDmultiplegtDYN`, which supports unbalanced panels - see [`docs/methodology/REGISTRY.md`](docs/methodology/REGISTRY.md) for the rationale, the defensive guards that make terminal missingness safe, and workarounds for unbalanced inputs. -> **Note:** Survey design (`survey_design`), covariate adjustment (`controls`), group-specific linear trends (`trends_linear`), and HonestDiD integration (`honest_did`) are not yet supported. They raise `NotImplementedError` with phase pointers - see [`ROADMAP.md`](ROADMAP.md) for the Phase 3 rollout. +> **Note:** Survey design is supported via Taylor-series linearization on `pweight` with strata / PSU / FPC. Replicate-weight variance and PSU-level bootstrap for dCDH are a planned extension. The `aggregate` parameter still raises `NotImplementedError`. ### Triple Difference (DDD) diff --git a/ROADMAP.md b/ROADMAP.md index 814c730f..c7abe633 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -1,285 +1,215 @@ # diff-diff Roadmap -This document outlines the feature roadmap for diff-diff, prioritized by practitioner value and academic credibility. +This document outlines the feature roadmap for diff-diff, organized as current state, queued work, candidates under consideration, and longer-term directions. For past changes and release history, see [CHANGELOG.md](CHANGELOG.md). --- -## Current Status (v3.0) +## Current State -diff-diff is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` + `synthdid` ecosystem for core DiD analysis, plus **unique survey support** that no R or Python package matches. +diff-diff is a production Python library for difference-in-differences causal inference with sklearn-like estimators and statsmodels-style output. It has feature parity with the standard R DiD ecosystem for core analysis, plus survey-design support that is not currently available in any other Python or R package. ### Estimators - **Core**: Basic DiD, TWFE, MultiPeriod event study - **Heterogeneity-robust**: Callaway-Sant'Anna (2021), Sun-Abraham (2021), Borusyak-Jaravel-Spiess Imputation (2024), Two-Stage DiD (Gardner 2022), Stacked DiD (Wing et al. 2024) - **Specialized**: Synthetic DiD (Arkhangelsky et al. 2021), Triple Difference, Staggered Triple Difference (Ortiz-Villavicencio & Sant'Anna 2025), Continuous DiD (Callaway, Goodman-Bacon & Sant'Anna 2024), TROP -- **Efficient**: EfficientDiD (Chen, Sant'Anna & Xie 2025) — semiparametrically efficient with doubly robust covariates -- **Nonlinear**: WooldridgeDiD / ETWFE (Wooldridge 2023, 2025) — saturated OLS (direct cohort x time coefficients), logit, and Poisson QMLE (ASF-based ATT with delta-method SEs) +- **Efficient**: EfficientDiD (Chen, Sant'Anna & Xie 2025) - attains the semiparametric efficiency bound on the no-covariate path; offers an optional doubly-robust covariate path (sieve-based propensity ratios plus linear OLS outcome regression) that is DR-consistent but does not generically attain the bound +- **Nonlinear**: WooldridgeDiD / ETWFE (Wooldridge 2023, 2025) - saturated OLS, logit, Poisson QMLE with ASF-based ATT +- **Reversible treatment**: ChaisemartinDHaultfoeuille (de Chaisemartin & D'Haultfœuille AER 2020 + NBER WP 29873) - the only estimator in the library for non-absorbing (on/off) treatments, with full dynamic event study, covariates, group-specific trends, non-binary treatment, HonestDiD integration, and survey support -### Inference & Diagnostics +### Inference and diagnostics -- Robust SEs, cluster SEs, wild bootstrap, multiplier bootstrap, placebo-based variance -- Parallel trends tests, placebo tests, Goodman-Bacon decomposition +- Robust SEs, cluster SEs, wild bootstrap, multiplier bootstrap, placebo-based variance, jackknife (SyntheticDiD) +- Parallel trends tests, placebo tests, Goodman-Bacon decomposition, TWFE decomposition diagnostic - Honest DiD sensitivity analysis (Rambachan & Roth 2023), pre-trends power analysis (Roth 2022) -- Power analysis and simulation-based MDE tools +- Power analysis, MDE, and sample-size tools (analytical + simulation), including survey-aware variants - EPV diagnostics for propensity score estimation -### Survey Support +### Survey support -`SurveyDesign` with strata, PSU, FPC, weight types (pweight/fweight/aweight), lonely PSU handling. All 16 estimators accept `survey_design` (15 inference-level + BaconDecomposition diagnostic); design-based variance estimation varies by estimator: +`SurveyDesign` with strata, PSU, FPC, weight types (pweight/fweight/aweight), lonely PSU handling. All 16 estimators accept `survey_design` (15 inference-level + BaconDecomposition diagnostic): - **TSL variance** (Taylor Series Linearization) with strata + PSU + FPC -- **Replicate weights**: BRR, Fay's BRR, JK1, JKn, SDR — 12 of 16 estimators (not SyntheticDiD, TROP, BaconDecomposition, WooldridgeDiD) -- **Survey-aware bootstrap**: multiplier at PSU (IF-based) and Rao-Wu rescaled (resampling-based) -- **DEFF diagnostics**, **subpopulation analysis**, **weight trimming**, **CV on estimates** +- **Replicate weights**: BRR, Fay's BRR, JK1, JKn, SDR - 12 of 16 estimators +- **Survey-aware bootstrap**: PSU-level multiplier (IF-based) and Rao-Wu rescaled (resampling-based) +- **Diagnostics**: per-coefficient DEFF, subpopulation analysis, weight trimming, CV on estimates - **Repeated cross-sections**: `CallawaySantAnna(panel=False)` for BRFSS, ACS, CPS -- **R cross-validation**: 15 tests against R's `survey` package using NHANES, RECS, and API datasets +- **Microdata-to-panel bridge**: `aggregate_survey()` helper with design-based precision +- **Survey-aware power analysis** via `SurveyPowerConfig` +- **R cross-validation**: tests against R's `survey` package using NHANES, RECS, and API datasets -See [Survey Design Support](docs/choosing_estimator.rst#survey-design-support) for the full compatibility matrix, and [survey-roadmap.md](docs/survey-roadmap.md) for implementation details. +See [Survey Design Support](docs/choosing_estimator.rst#survey-design-support) for the compatibility matrix and [survey-roadmap.md](docs/survey-roadmap.md) for the technical reference. ### Infrastructure - Optional Rust backend for accelerated computation -- Label-gated CI (tests run only when `ready-for-ci` label is added) +- Label-gated CI (tests run only when `ready-for-ci` label is added); standalone CI Gate workflow - Documentation dependency map (`docs/doc-deps.yaml`) with `/docs-impact` skill - AI practitioner guardrails based on Baker et al. (2025) 8-step workflow +- Runtime-accessible LLM guides via `get_llm_guide(...)`, bundled in the wheel +- JOSS paper materials (`paper.md`, `paper.bib`) --- -## Survey Academic Credibility (Phase 10) +## Recently Shipped -Phase 10 established the theoretical and empirical foundation for survey support -credibility. See [survey-roadmap.md](docs/survey-roadmap.md) for detailed specs. +Major landings since the prior roadmap revision. See [CHANGELOG.md](CHANGELOG.md) for the full history. -| Item | Priority | Status | -|------|----------|--------| -| **10a.** Theory document (`survey-theory.md`) | HIGH | ✅ Shipped (v2.9.1) | -| **10b.** Research-grade survey DGP (enhance `generate_survey_did_data`) | HIGH | ✅ Shipped (v2.9.1) | -| **10c.** Expand R validation (ImputationDiD, StackedDiD, SunAbraham, TripleDifference) | HIGH | ✅ Shipped (v2.9.1) | -| **10d.** Tutorial: flat-weight vs design-based comparison | HIGH | ✅ Shipped (v2.9.1) | -| **10e.** Position paper / arXiv preprint | MEDIUM | Not started — depends on 10b | -| **10f.** WooldridgeDiD survey support (OLS + logit + Poisson) | MEDIUM | ✅ Shipped (v2.9.0) | -| **10g.** Practitioner guidance: when does survey design matter? | LOW | Subsumed by B1d | +- **ChaisemartinDHaultfoeuille (dCDH)** - full feature set: `DID_M` contemporaneous-switch, multi-horizon `DID_l` event study, analytical SE, multiplier bootstrap, TWFE decomposition diagnostic, dynamic placebos, normalized estimator, cost-benefit aggregate, sup-t bands, covariate adjustment (`DID^X`), group-specific linear trends (`DID^{fd}`), state-set-specific trends, heterogeneity testing, non-binary treatment, HonestDiD integration, and survey support (TSL + pweight). +- **SyntheticDiD jackknife variance** (`variance_method='jackknife'`) with survey-weighted jackknife. +- **SyntheticDiD validation diagnostics**. +- **Survey support completion** - all 16 estimators accept `survey_design`; `aggregate_survey()` microdata-to-panel bridge with `second_stage_weights` parameter; `conditional_pt` DGP parameter for conditional-PT scenarios. +- **Survey-aware power analysis** via `SurveyPowerConfig`. +- **Practitioner onboarding** - Brand Awareness Survey DiD tutorial, Geo-Experiment Analysis tutorial, practitioner decision tree, practitioner getting-started guide, README "For Data Scientists" section. +- **Survey academic grounding** - `survey-theory.md` methodology document, research-grade survey DGP, expanded R-validation across additional estimators, flat-weight-vs-design-based comparison tutorial. +- **WooldridgeDiD / ETWFE estimator** (Wooldridge 2023, 2025). +- **Staggered Triple Difference** (Ortiz-Villavicencio & Sant'Anna 2025). +- **LLM guide bundling** - `get_llm_guide()` exposes `llms.txt`, `llms-full.txt`, and `llms-practitioner.txt` at runtime. +- **JOSS paper materials** and CONTRIBUTORS.md. +- **Python 3.14 support**; standalone CI Gate workflow. --- -## Data Science Practitioners (Phases B1–B4) +## Shipping Next -Parallel track targeting data science practitioners — marketing, product, operations — who need DiD for real-world problems but are underserved by the current academic framing. See [business-strategy.md](docs/business-strategy.md) for competitive analysis, personas, and full rationale. +Queued work, ordered by expected leverage. Each item is its own PR. Ordering is priority-sequenced, not time-committed. -### Phase B1: Foundation (Docs & Positioning) +### Practitioner-ready output -*Goal: Make diff-diff discoverable and approachable for data science practitioners. Zero code changes.* +- **`BusinessReport` class.** Plain-English summaries of any estimator's results with markdown export. Optional rich formatting via a `[reporting]` extra; core remains numpy/pandas/scipy only. Turns raw coefficients into stakeholder-ready artifacts. +- **`DiagnosticReport` with context-aware `practitioner_next_steps()`.** Unified diagnostic runner that bundles parallel-trends, placebo, HonestDiD, Bacon decomposition, DEFF, EPV, and power diagnostics into one plain-English report. `practitioner_next_steps()` substitutes actual column names from fitted results instead of generic placeholders. -| Item | Priority | Status | -|------|----------|--------| -| **B1a.** Brand Awareness Survey DiD tutorial — lead use case showcasing unique survey support | HIGH | Done (Tutorial 17) | -| **B1b.** README "For Data Scientists" section alongside "For Academics" and "For AI Agents" | HIGH | Done | -| **B1c.** Practitioner decision tree — "which method should I use?" framed for business contexts | HIGH | Done | -| **B1d.** "Getting Started" guide for practitioners with business ↔ academic terminology bridge | MEDIUM | Done | +### Practitioner tutorials -### Phase B2: Practitioner Content +- **dCDH comprehensive tutorial.** One notebook covering reversible treatment, dynamic event study, covariates, trends, HonestDiD on placebos, and survey. Favara-Imbs (2015) banking-deregulation replication as the headline application. +- **BRFSS repeated-cross-section tutorial.** State-policy DiD replication using `CallawaySantAnna(panel=False)` with design-based SEs and HonestDiD sensitivity. Targets the highest-demand survey-DiD audience segment. +- **Marketing Campaign Lift tutorial** (CallawaySantAnna, staggered geo rollout). +- **Pricing / Promotion Impact tutorial** (ContinuousDiD dose-response). -*Goal: End-to-end tutorials for each persona. Ship incrementally, each as its own PR.* +### Survey breadth and validation -| Item | Priority | Status | -|------|----------|--------| -| **B2a.** Marketing Campaign Lift tutorial (CallawaySantAnna, staggered geo rollout) | HIGH | Not started | -| **B2b.** Geo-Experiment tutorial (SyntheticDiD) | HIGH | Done (Tutorial 18) | -| **B2c.** diff-diff vs GeoLift vs CausalImpact comparison page | MEDIUM | Not started | -| **B2d.** Product Launch Regional Rollout tutorial (staggered estimators) | MEDIUM | Not started | -| **B2e.** Pricing/Promotion Impact tutorial (ContinuousDiD, dose-response) | MEDIUM | Not started | -| **B2f.** Loyalty Program Evaluation tutorial (TripleDifference) | LOW | Not started | - -### Phase B3: Convenience Layer - -*Goal: Reduce time-to-insight and enable stakeholder communication. Core stays numpy/pandas/scipy only.* - -| Item | Priority | Status | -|------|----------|--------| -| **B3a.** `BusinessReport` class — plain-English summaries, markdown export; rich export via optional `[reporting]` extra | HIGH | Not started | -| **B3b.** `DiagnosticReport` — unified diagnostic runner with plain-English interpretation. Includes making `practitioner_next_steps()` context-aware (substitute actual column names from fitted results into code snippets instead of generic placeholders). | HIGH | Not started | -| **B3c.** Practitioner data generator wrappers (thin wrappers around existing generators with business-friendly names) | MEDIUM | Not started | -| **B3d.** `aggregate_survey()` helper (microdata-to-panel bridge for BRFSS/ACS/CPS) | MEDIUM | Shipped (v3.0.1) | - -### Phase B4: Platform (Longer-term) - -*Goal: Integrate into data science practitioner workflows.* - -| Item | Priority | Status | -|------|----------|--------| -| **B4a.** Integration guides (Databricks, Jupyter dashboards, survey platforms) | MEDIUM | Not started | -| **B4b.** Export templates (PowerPoint via optional extra, Confluence/Notion markdown, HTML widget) | MEDIUM | Not started | -| **B4c.** AI agent integration — position B3a/B3b as tools for AI agents assisting practitioners | LOW | Not started | +- **Two-phase sampling + multi-stage cluster R-validation tests.** Extend existing survey cross-validation to NHANES two-phase design and MICS/DHS/NCVS multi-stage cluster. Closes a practitioner-design gap and firms up the design-based variance claim. --- -## de Chaisemartin-D'Haultfœuille (dCDH) Estimator +## Under Consideration -The dCDH estimator is the only modern DiD estimator in the library that handles **non-absorbing (reversible) treatments**. All other staggered estimators (CallawaySantAnna, SunAbraham, ImputationDiD, TwoStageDiD, EfficientDiD, WooldridgeDiD) assume treatment is an absorbing state — once treated, always treated. dCDH is the natural fit for marketing campaigns, seasonal promotions, policy on/off cycles, and any setting where treatment turns on and off over time. +Research-informed candidates. Each has a rationale, a tractability note, and a commit criterion. Papers are academic references, so citation is fine. -**Implementation strategy.** A single `ChaisemartinDHaultfoeuille` (alias `DCDH`) class evolves across phases via additional `fit()` parameters and additional fields on the results object. Not an estimator family — features land as enhancements to the single class, matching the library's pattern for `CallawaySantAnna`, `ImputationDiD`, `EfficientDiD`, etc. +### Methodology extensions -**Methodology source of truth:** [docs/methodology/REGISTRY.md `## ChaisemartinDHaultfoeuille`](docs/methodology/REGISTRY.md) — assumption checks, estimator equations, edge cases, and all documented deviations from the R `DIDmultiplegtDYN` reference implementation. Consult REGISTRY.md before any methodology change. +- **DiD with no untreated group** (de Chaisemartin, Ciccia, D'Haultfœuille & Knau, arXiv:2405.04465, 2024, plus continuous-treatment-with-no-stayers companion, AEA P&P 2024). New estimator for designs where treatment is universal with heterogeneous dose (the inverse of the few-treated-many-donors case). Uses quasi-untreated units as controls. No existing diff-diff estimator handles this. Tractability: medium; closed-form identification. **Commit when**: methodology plan drafted and validated against the paper's Pierce (2016) solar-panel replication. +- **Nonparametric / flexible outcome regression for `EfficientDiD` DR covariate path** (Chen, Sant'Anna & Xie, arXiv:2506.17729, 2025, Section 4). The shipped staggered `EfficientDiD` uses a linear OLS outcome regression in its doubly-robust covariate path; that preserves DR consistency but does not generically attain the semiparametric efficiency bound unless the conditional mean is linear in the covariates. Replacing the OLS outcome regression with sieve / kernel / ML nuisance estimation (as the paper's Section 4 allows) would close the efficiency gap on the covariate path. Tractability: medium; the hook points are in `diff_diff/efficient_did_covariates.py`. **Commit when**: a paper-review synthesis is written, with an implementation plan for the nonparametric OR that preserves the existing DR consistency guarantees and survey-weighted variance surface. +- **Distributional DiD for staggered timing** (Ciaccio, arXiv:2408.01208, 2024). New estimator extending Callaway-Li QTT to staggered adoption. `CallawaySantAnna` currently gives mean ATT only; this unlocks quantile effects. Tractability: medium. **Commit when**: a health-econ or public-health user reports need for quantile effects in a repeated-cross-section design. +- **Local Projections DiD** (Dube, Girardi, Jordà & Taylor, JAE 2025). New estimator with flexible impulse-response and robustness to dynamic misspecification; natural for anticipation-prone settings. Tractability: well-scoped. **Commit when**: a methodology review confirms the dynamic variant's variance derivation fits our SE helpers. +- **Few-treated-units inference option** (Alvarez, Ferman & Wüthrich, arXiv:2504.19841, 2025). `inference=` option covering t(G-1) corrections, randomization inference, and Ferman-Pinto-style permutation tests. Current SE paths assume large-G asymptotics. Tractability: medium. **Commit when**: a user reports sparse-treatment pain. +- **Riesz-representation sensitivity** (Bach et al., arXiv:2510.09064, 2025). Confounder-based sensitivity bound complementing HonestDiD's trend-based bound. Tractability: medium. **Commit when**: HonestDiD users ask for confounder bounds. +- **Compositional-change inference** (Sant'Anna & Xu, arXiv:2304.13925 v3, 2025). Corrects inference for rolling-panel repeated-cross-section designs (ACS, CPS) where sample composition changes across periods. Tractability: medium. **Commit when**: BRFSS tutorial or an applied user surfaces the issue. +- **Triple-difference identification-with-covariates audit** (Ortiz-Villavicencio & Sant'Anna, arXiv:2505.09942, 2025). The paper shows common DDD implementations are invalid under covariate-conditional identification. Audit existing `TripleDifference` / `StaggeredTripleDifference` against the paper. Tractability: small. **Promote to Shipping Next** if the audit finds a real issue. -**Primary papers** (consulted by the implementer; not committed in-repo as they are upstream sources): -- de Chaisemartin, C. & D'Haultfœuille, X. (2020). Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects. *American Economic Review*, 110(9), 2964-2996. — `DID_M` contemporaneous-switch estimator, TWFE decomposition diagnostics. -- de Chaisemartin, C. & D'Haultfœuille, X. (2022, revised 2024). Difference-in-Differences Estimators of Intertemporal Treatment Effects. NBER Working Paper 29873. — Full dynamic event study `DID_l`, cohort-recentered analytical variance (Web Appendix Section 3.7.3), residualization-style covariates `DID^X`, group-specific linear trends `DID^{fd}`. +### Post-estimation and export capabilities -The dynamic companion paper subsumes the AER 2020 paper: `DID_1 = DID_M`. The single class implements the dynamic estimator's machinery (`DID_{g,l}` building block, cohort-recentered analytical variance from Web Appendix Section 3.7.3 of the dynamic paper) at horizon `l = 1` for Phase 1, with later phases looping over multiple horizons and adding covariate / extension support. +Framed as what diff-diff offers, not which external tool plugs in: -### Phase 1: Foundation (contemporaneous switch / DID_M) +- **Standard post-estimation interface.** Expose `.predict()` and `.vcov()` in shapes that common post-estimation slope / contrast / hypothesis-test interfaces consume. Tractability: small Protocol addition plus compatibility shim. **Commit when**: a concrete contract with one of the existing results objects is defined. +- **Publication-table export.** `result.to_table()` producing publication-quality HTML / PNG / LaTeX tables via an optional extra. Tractability: low. **Commit when**: `BusinessReport` ships so the formatter can piggyback on its summary pipeline. +- **Survey design object interop.** `SurveyDesign.from_design_object(...)` / `.to_design_object(...)` for accepting and emitting standard Python survey-design objects. Tractability: depends on upstream API stability. **Commit when**: a stable public design surface exists upstream. +- **Pluggable regression engine for TWFE / event-study paths.** Opt-in `engine=` parameter allowing alternative backends. Tractability: contained change plus coefficient-parity CI. **Commit when**: profiling shows material wins on real practitioner panels. -*Goal: Ship a `ChaisemartinDHaultfoeuille` estimator class for the contemporaneous-switch case (`l = 1`), which is `DID_M` of the AER 2020 paper. Forward-compatible API: parameters and result fields for Phase 2/3 are reserved from day one and raise `NotImplementedError` with phase pointers until they're implemented.* +### Parked (explicit non-goals) -| Item | Priority | Status | -|------|----------|--------| -| **1a.** `ChaisemartinDHaultfoeuille` class with `fit()` returning per-group `DID_{g,1}` and aggregate `DID_1` / `DID_M` | HIGH | Shipped | -| **1b.** Joiners-only (`DID_+`) and leavers-only (`DID_-`) views on the results object | HIGH | Shipped | -| **1c.** Single-lag placebo `DID_M^pl` (AER 2020 placebo specification = `DID^{pl}_1` of dynamic paper) | HIGH | Shipped (point estimate; analytical SE deferred to Phase 2) | -| **1d.** Analytical SE via cohort-recentered plug-in formula (Web Appendix Section 3.7.3 of dynamic paper, applied at `l = 1`) | HIGH | Shipped | -| **1e.** Multiplier bootstrap clustered at the group level (library extension; matches CS / ImputationDiD / TwoStageDiD convention) | HIGH | Shipped | -| **1f.** TWFE decomposition diagnostic: per-`(g, t)` weights, fraction negative, `sigma_fe` (Theorem 1 of AER 2020 + `twowayfeweights` parity) | MEDIUM | Shipped | -| **1g.** Parity tests vs R `DIDmultiplegtDYN` at `l = 1` | HIGH | Shipped | -| **1h.** REGISTRY.md entry, doc-deps.yaml mapping, README.md section, RST docs, CHANGELOG.md entry | HIGH | Shipped | -| **1i.** Survey compatibility matrix in `docs/choosing_estimator.rst`: explicitly document **NO survey support** for dCDH (separate effort after all phases ship) | HIGH | Shipped | +- New estimators beyond the list above without a user-driven demand signal. +- Calibration / raking / post-stratification as first-party features (remain upstream; document the handoff). +- Product Launch Regional Rollout and Loyalty Program tutorials (defer until a practitioner request). +- Methodology-vs-alternative comparison pages (replaced by BusinessReport and the tutorials that showcase diff-diff's output directly). -### Phase 2: Dynamic event study (multiple horizons) - -*Goal: Add multi-horizon event study to the same class via the `L_max` parameter. Loops the Phase 1 machinery over horizons `l = 1, ..., L`. No API breakage from Phase 1. No new tutorial - the comprehensive tutorial waits for Phase 3.* - -| Item | Priority | Status | -|------|----------|--------| -| **2a.** Multi-horizon `DID_l` via per-group `DID_{g,l}` building block, with `L_max` parameter | HIGH | Shipped | -| **2b.** Multi-horizon analytical SE (cohort-recentered plug-in per horizon) | HIGH | Shipped | -| **2c.** Dynamic placebos `DID^{pl}_l` for pre-trends testing (Web Appendix Section 1.1 of dynamic paper) | HIGH | Shipped (point estimates; SE deferred) | -| **2d.** Normalized estimator `DID^n_l` (Section 3.2 of dynamic paper) | MEDIUM | Shipped | -| **2e.** Cost-benefit aggregate `delta` (Section 3.3 of dynamic paper, Lemma 4) | MEDIUM | Shipped | -| **2f.** Simultaneous (sup-t) confidence bands for event study plots | MEDIUM | Shipped | -| **2g.** `plot_event_study()` integration; `< 50%`-of-switchers warning for far horizons | MEDIUM | Shipped | -| **2h.** Parity tests vs `did_multiplegt_dyn` for multi-horizon designs | HIGH | Shipped (point estimates; SE/placebo parity deferred) | - -### Phase 3: Covariates, extensions, and tutorial - -*Goal: Add residualization-style covariate adjustment, group-specific linear trends, non-binary treatment support, HonestDiD integration, and a single comprehensive tutorial covering all three phases. This is the phase where dCDH ships as a complete public feature.* +--- -| Item | Priority | Status | -|------|----------|--------| -| **3a.** Residualization-style covariate adjustment `DID^X` (Web Appendix Section 1.2 of dynamic paper). **Note:** NOT doubly-robust, NOT IPW, NOT Callaway-Sant'Anna-style. | HIGH | Shipped (PR B) | -| **3b.** Group-specific linear trends `DID^{fd}` (Web Appendix Section 1.3, Lemma 6) — second-difference estimator with cumulation for level effects | MEDIUM | Shipped (PR B) | -| **3c.** State-set-specific trends (`trends_nonparam` option, Web Appendix Section 1.4) | MEDIUM | Shipped (PR B) | -| **3d.** Heterogeneity testing `beta^{het}_l` (Web Appendix Section 1.5) | LOW | Shipped (PR B) | -| **3e.** Design-2 switch-in / switch-out separation (Web Appendix Section 1.6) | LOW | Shipped (PR B; convenience wrapper) | -| **3f.** Non-binary treatment support (the formula already handles it; this row is documentation + tests) | MEDIUM | Shipped (PR #300; also ships placebo SE, L_max=1 per-group path, parity SE assertions) | -| **3g.** HonestDiD (Rambachan-Roth) integration on `DID^{pl}_l` placebos | MEDIUM | Shipped (PR C) | -| **3h.** **Single comprehensive tutorial notebook** covering all three phases — Favara-Imbs (2015) banking deregulation replication as the headline application, with comparison plots vs LP / TWFE | HIGH | Not started | -| **3i.** Parity tests vs `did_multiplegt_dyn` for covariate and extension specifications | HIGH | Shipped (PR B; controls, trends_lin, combined) | +## AI-Agent Track -### Out of scope for the dCDH single-class evolution +Long-running program, framed as "building toward" rather than with discrete ship dates. -These are referenced by the dCDH papers but live in *separate* efforts or *separate* companion papers we don't yet have: +**Vision.** A practitioner hands an AI agent a business scenario. The agent, with diff-diff as its toolkit, interprets the scenario, selects the correct estimator and identification strategy, executes the analysis with correct diagnostics and sensitivity, and returns a business-ready report. Practitioners never see raw coefficients unless they want to. -- **Survey design integration** — shipped. Supports pweight with strata/PSU/FPC via Taylor Series Linearization. Replicate weights and PSU-level bootstrap deferred. -- **Fuzzy DiD** (within-cell-varying treatment, Web Appendix Section 1.7 of dynamic paper) → de Chaisemartin & D'Haultfœuille (2018), separate paper not yet reviewed -- **Principled anticipation handling and trimming rules** (footnote 14 of dynamic paper) → de Chaisemartin (2021), separate paper not yet reviewed -- **2SLS DiD** (referenced in AER appendix Section 3.4) → separate paper +**Building blocks already in place.** -These remain in **Future Estimators** below if/when we choose to extend. +- Baker et al. (2025) 8-step workflow enforcement in `diff_diff/practitioner.py`. +- `practitioner_next_steps()` context-aware guidance. +- Runtime LLM guides via `get_llm_guide(...)` (`llms.txt`, `llms-full.txt`, `llms-practitioner.txt`), bundled in the wheel. +- Silent-operation warnings so agents and humans see the same signals at the same time. -### Architectural notes (for plan and PR reviewers) +**Next blocks toward the vision.** -- **Single `ChaisemartinDHaultfoeuille` class** (alias `DCDH`). Not a family. New features land as `fit()` parameters or fields on the results dataclass. No `DCDHDynamic`, `DCDHCovariate`, etc. Matches the library's idiomatic pattern: `CallawaySantAnna`, `ImputationDiD`, and `EfficientDiD` are all single classes that evolved across many phases. -- **Forward-compatible API from Phase 1.** `fit(aggregate=None, controls=None, trends_linear=None, L_max=None, ...)` accepts the Phase 2/3 parameters from day one and raised `NotImplementedError` with a clear pointer to the relevant phase until they were implemented. As of the dCDH work, Phase 2, Phase 3, and `survey_design` are all live; only `aggregate` remains gated with `NotImplementedError`. No signature changes between phases. -- **Conservative CI** under Assumption 8 (independent groups), exact only under iid sampling. Documented in REGISTRY.md as a `**Note:**` deviation from "default nominal coverage." Theorem 1 of the dynamic paper. -- **Cohort recentering for variance is essential.** Cohorts are defined by the triple `(D_{g,1}, F_g, S_g)`. The plug-in variance subtracts cohort-conditional means, **NOT a single grand mean**. Test fixtures must catch this — a wrong implementation silently produces a smaller, incorrect variance. -- **No Rust acceleration is planned for any phase.** The estimator's hot path is groupby + BLAS-accelerated matrix-vector products, where NumPy already operates near-optimally. If profiling on large panels (`G > 100K`) reveals a bottleneck post-ship, the existing `_rust_bootstrap_weights` helper can be reused for the bootstrap loop without writing new Rust code. -- **Survey design integration shipped.** Supports pweight with strata/PSU/FPC via TSL. Replicate weights and PSU-level bootstrap deferred to a follow-up. +- **BusinessReport / DiagnosticReport** (in Shipping Next) - the output form the vision assumes. +- **Context-aware `practitioner_next_steps()`** that substitutes actual column names - turns guidance into executable recommendations. +- **AI-legible diagnostic surfaces** - once BusinessReport ships, a structured JSON counterpart that agents can parse without screen-scraping human text. +- **Scenario-to-estimator selection guidance** - agent-facing extension of `docs/practitioner_decision_tree.rst` that returns a specific estimator choice plus rationale for a given scenario description. +- **End-to-end scenario walkthrough templates** - reusable orchestration recipes an agent can adapt from data ingest through business-ready output. --- -## Future Estimators +## Long-term Research Directions -### Local Projections DiD - -Implements local projections for dynamic treatment effects. Doesn't require specifying full dynamic structure. - -- Flexible impulse response estimation -- Robust to misspecification of dynamics -- Natural handling of anticipation effects - -**Reference**: Dube, Girardi, Jorda, and Taylor (2023). +Frontier methods that may graduate to Under Consideration given time and research signals. ### Causal Duration Analysis with DiD -Extends DiD to duration/survival outcomes where standard methods fail (hazard rates, time-to-event). - -- Duration analogue of parallel trends on hazard rates -- Avoids distributional assumptions and hazard function specification - -**Reference**: [Deaner & Ku (2025)](https://www.aeaweb.org/conference/2025/program/paper/k77Kh8iS). *AEA Conference Paper*. +Extends DiD to duration / survival outcomes where standard methods fail (hazard rates, time-to-event). Duration analogue of parallel trends; avoids distributional and hazard-function assumptions. ---- - -## Long-Term Research Directions - -Frontier methods requiring more research investment. +**Reference**: Deaner & Ku (2025), *AEA Conference Paper*. ### DiD with Interference / Spillovers -Standard DiD assumes SUTVA; spatial/network spillovers violate this. Two-stage imputation approach estimates treatment AND spillover effects under staggered timing. +Standard DiD assumes SUTVA; spatial and network spillovers violate this. Two-stage imputation approaches estimate treatment and spillover effects jointly under staggered timing. -**Reference**: [Butts (2024)](https://arxiv.org/abs/2105.03737). *Working Paper*. +**Reference**: Butts (2024), working paper. -### Quantile/Distributional DiD +### Quantile / Distributional DiD -Recover the full counterfactual distribution and quantile treatment effects (QTT), not just mean ATT. +Recover the full counterfactual distribution and quantile treatment effects (QTT), not just mean ATT. Changes-in-Changes (CiC) identification strategy. -- Changes-in-Changes (CiC) identification strategy -- QTT(tau) at user-specified quantiles -- Full counterfactual distribution function - -**Reference**: [Athey & Imbens (2006)](https://onlinelibrary.wiley.com/doi/10.1111/j.1468-0262.2006.00668.x). *Econometrica*. +**Reference**: Athey & Imbens (2006), *Econometrica*. (Ciaccio 2024 extension listed under Under Consideration.) ### CATT Meta-Learner for Heterogeneous Effects -ML-powered conditional ATT — discover who benefits most from treatment using doubly robust meta-learner. +ML-powered conditional ATT, using a doubly robust meta-learner to discover which units benefit most from treatment. -**Reference**: [Lan, Chang, Dillon & Syrgkanis (2025)](https://arxiv.org/abs/2502.04699). *Working Paper*. +**Reference**: Lan, Chang, Dillon & Syrgkanis (2025), working paper. ### Causal Forests for DiD -Machine learning methods for discovering heterogeneous treatment effects in DiD settings. +Machine-learning methods for discovering heterogeneous treatment effects in DiD settings. Recent applied-econometrics work (Gavrilova et al. 2025, *Journal of Applied Econometrics*) demonstrates the approach on panel data. -**References**: -- [Kattenberg, Scheer & Thiel (2023)](https://ideas.repec.org/p/cpb/discus/452.html). *CPB Discussion Paper*. -- Athey & Wager (2019). *Annals of Statistics*. +**References**: Athey & Wager (2019), *Annals of Statistics*; Kattenberg, Scheer & Thiel (2023), *CPB Discussion Paper*. ### Matrix Completion Methods -Unified framework encompassing synthetic control and regression approaches. +Unified framework encompassing synthetic control and regression approaches via low-rank matrix recovery. -**Reference**: [Athey et al. (2021)](https://arxiv.org/abs/1710.10251). *Journal of the American Statistical Association*. +**Reference**: Athey et al. (2021), *Journal of the American Statistical Association*. -### Double/Debiased ML for DiD +### Double / Debiased ML for DiD -For high-dimensional settings with many potential confounders. +Machine learning nuisance estimation in high-dimensional DiD settings. -**Reference**: Chernozhukov et al. (2018). *The Econometrics Journal*. +**Reference**: Chernozhukov et al. (2018), *The Econometrics Journal*. ### Alternative Inference Methods -- **Randomization inference**: Exact p-values for small samples -- **Bayesian DiD**: Priors on parallel trends violations -- **Conformal inference**: Prediction intervals with finite-sample guarantees +- **Randomization inference**: exact p-values for small samples. +- **Bayesian DiD**: priors on parallel-trends violations. +- **Conformal inference**: prediction intervals with finite-sample guarantees. --- ## Contributing -Interested in contributing? The Phase 10 items and future estimators are good candidates. See the [GitHub repository](https://github.com/igerber/diff-diff) for open issues. +Interested in contributing? Under Consideration items with clear commit criteria are good candidates. See the [GitHub repository](https://github.com/igerber/diff-diff) for open issues. + +Key references: -Key references for implementation: - [Roth et al. (2023)](https://www.sciencedirect.com/science/article/abs/pii/S0304407623001318). "What's Trending in Difference-in-Differences?" *Journal of Econometrics*. - [Baker et al. (2025)](https://arxiv.org/pdf/2503.13323). "Difference-in-Differences Designs: A Practitioner's Guide." +- [Abadie, Angrist, Frandsen & Pischke (2025)](https://www.nber.org/papers/w34550). "Harvesting Differences-in-Differences and Event-Study Evidence." NBER WP 34550. diff --git a/TODO.md b/TODO.md index 436f7118..f5917ebc 100644 --- a/TODO.md +++ b/TODO.md @@ -16,8 +16,8 @@ Current limitations that may affect users: | `predict()` raises NotImplementedError | `estimators.py:567-588` | Low | Rarely needed | For survey-specific limitations (NotImplementedError paths), see the -[consolidated deferred list](docs/survey-roadmap.md#deferred-work-consolidated) -in survey-roadmap.md. +[Current Limitations](docs/survey-roadmap.md#current-limitations) section +of survey-roadmap.md. ## Code Quality diff --git a/diff_diff/chaisemartin_dhaultfoeuille.py b/diff_diff/chaisemartin_dhaultfoeuille.py index 9e8b9818..07f6c6f3 100644 --- a/diff_diff/chaisemartin_dhaultfoeuille.py +++ b/diff_diff/chaisemartin_dhaultfoeuille.py @@ -291,38 +291,49 @@ def _validate_and_aggregate_to_cells( class ChaisemartinDHaultfoeuille(ChaisemartinDHaultfoeuilleBootstrapMixin): """ - de Chaisemartin-D'Haultfoeuille (dCDH) estimator — Phase 1. + de Chaisemartin-D'Haultfoeuille (dCDH) estimator. - Computes the contemporaneous-switch DiD ``DID_M`` from the AER 2020 - paper, equivalently ``DID_1`` (horizon ``l = 1``) of the dynamic - companion paper (NBER WP 29873). The estimator is the only modern - DiD in the library that handles **reversible (non-absorbing) - treatments** — treatment may switch on AND off over time. + The only modern DiD estimator in the library that handles **reversible + (non-absorbing) treatments** - treatment may switch on AND off over + time. Computes the contemporaneous-switch DiD ``DID_M`` from the + AER 2020 paper (equivalently ``DID_1`` at horizon ``l = 1`` of the + dynamic companion paper, NBER WP 29873) plus the full multi-horizon + event study ``DID_l`` for ``l = 1..L_max`` via the ``L_max`` parameter + on :meth:`fit`. - Phase 1 deliverables: + Supported: - - The headline ``DID_M`` point estimate + - Headline ``DID_M`` plus multi-horizon ``DID_l`` event study - Joiners-only ``DID_+`` and leavers-only ``DID_-`` decompositions - - The single-lag placebo ``DID_M^pl`` (computed automatically by - default; gate via ``placebo=False``) - - Analytical SE via the cohort-recentered plug-in formula from - Web Appendix Section 3.7.3 of the dynamic paper - - Optional multiplier bootstrap clustered at the group level - - Optional TWFE decomposition diagnostic from Theorem 1 of AER 2020 - (per-cell weights, fraction negative, ``sigma_fe``) + - Single-lag placebo ``DID_M^pl`` and dynamic placebos ``DID^{pl}_l`` + (computed automatically by default; gate via ``placebo=False``) + - Analytical SE via the cohort-recentered plug-in formula from Web + Appendix Section 3.7.3; multiplier bootstrap clustered at the group + level via ``n_bootstrap`` + - Normalized estimator ``DID^n_l``, cost-benefit aggregate ``delta``, + and sup-t simultaneous confidence bands + - Residualization-style covariate adjustment (``DID^X``) via + ``controls=``, group-specific linear trends (``DID^{fd}``) via + ``trends_linear=True``, state-set-specific trends via + ``trends_nonparam=``, heterogeneity testing, non-binary treatment, + HonestDiD sensitivity integration on placebos via ``honest_did=True`` + - Survey support via ``survey_design=`` (pweight + strata/PSU/FPC with + Taylor-series linearization) + - TWFE decomposition diagnostic from Theorem 1 of AER 2020 + + Only ``aggregate`` on :meth:`fit` still raises ``NotImplementedError``. Parameters ---------- alpha : float, default=0.05 Significance level for confidence intervals. cluster : str, optional, default=None - **Phase 1 contract:** ``cluster`` must be ``None`` (the default). - dCDH always clusters at the group level via the cohort-recentered - influence-function plug-in (analytical SEs) and the multiplier - bootstrap (also grouped at the ``group`` column). Passing any - non-``None`` value raises ``NotImplementedError`` with a Phase 1 - pointer. Custom clustering at a coarser or finer level than the - group is reserved for a future phase. See REGISTRY.md + Must be ``None`` (the default). dCDH always clusters at the group + level via the cohort-recentered influence-function plug-in + (analytical SEs) and the multiplier bootstrap (also grouped at the + ``group`` column). Passing any non-``None`` value raises + ``NotImplementedError``. Custom clustering at a coarser or finer + level than the group is a planned extension. See REGISTRY.md ``ChaisemartinDHaultfoeuille`` section for the full contract. n_bootstrap : int, default=0 Number of multiplier-bootstrap iterations. ``0`` (default) uses diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt index 7e6ef60b..95734a57 100644 --- a/diff_diff/guides/llms-full.txt +++ b/diff_diff/guides/llms-full.txt @@ -235,7 +235,7 @@ de Chaisemartin & D'Haultfœuille (2020/2022) estimator for **non-absorbing (rev ```python ChaisemartinDHaultfoeuille( alpha: float = 0.05, - cluster: str | None = None, # Phase 1: must be None; non-None raises NotImplementedError + cluster: str | None = None, # Must be None; non-None raises NotImplementedError n_bootstrap: int = 0, # 0 = analytical SE only bootstrap_weights: str = "rademacher", # "rademacher", "mammen", or "webb" seed: int | None = None, @@ -256,21 +256,21 @@ est.fit( outcome: str, group: str, # Group identifier time: str, - treatment: str, # Per-observation binary treatment - # ---- Phase 2: multi-horizon ---- + treatment: str, # Per-observation binary treatment or non-binary intensity + # ---- multi-horizon ---- L_max: int | None = None, # Max horizon; None = l=1 only - # ---- forward-compat (Phase 3) ---- - aggregate: str | None = None, # Phase 3: reserved - controls: list[str] | None = None, # Phase 3: DID^X covariates - trends_linear: bool | None = None, # Phase 3: DID^{fd} - trends_nonparam: Any | None = None, # Phase 3: DID^s - honest_did: bool = False, # Phase 3: HonestDiD integration + # ---- covariates and extensions ---- + aggregate: str | None = None, # Reserved; raises NotImplementedError + controls: list[str] | None = None, # DID^X residualization-style covariates + trends_linear: bool | None = None, # DID^{fd} group-specific linear trends + trends_nonparam: Any | None = None, # DID^s state-set-specific trends + honest_did: bool = False, # HonestDiD sensitivity on placebos # ---- survey support ---- - survey_design: SurveyDesign | None = None, # pweight + strata/PSU/FPC (TSL) + survey_design: SurveyDesign | None = None, # pweight + strata/PSU/FPC (TSL) ) -> ChaisemartinDHaultfoeuilleResults ``` -`L_max` controls multi-horizon computation. Phase 3 parameters (`controls`, `trends_linear`, `trends_nonparam`, `honest_did`, `heterogeneity`, `design2`) and `survey_design` are implemented; only `aggregate` remains gated with `NotImplementedError`. +`L_max` controls multi-horizon computation. `controls`, `trends_linear`, `trends_nonparam`, `honest_did`, `heterogeneity`, `design2`, and `survey_design` are all supported; only `aggregate` still raises `NotImplementedError`. **Usage:** @@ -295,7 +295,7 @@ print(f"DID_+ (joiners): {results.joiners_att:.3f}") print(f"DID_- (leavers): {results.leavers_att:.3f}") print(f"Placebo (DID^pl): {results.placebo_effect:.3f}") -# Multi-horizon event study (Phase 2) +# Multi-horizon event study results = est.fit(data, outcome="outcome", group="group", time="period", treatment="treatment", L_max=3) for h in sorted(results.event_study_effects): @@ -320,7 +320,7 @@ print(f"sigma_fe (sign-flipping threshold): {diagnostic.sigma_fe:.3f}") **Notes:** - Validated against R `DIDmultiplegtDYN` v2.3.3 at horizon `l = 1` via `tests/test_chaisemartin_dhaultfoeuille_parity.py` -- Phase 1 placebo SE is intentionally `NaN` with a warning. The dynamic companion paper Section 3.7.3 derives the cohort-recentered analytical variance for `DID_l` only — not for the placebo `DID_M^pl`. Phase 2 will add multiplier-bootstrap support for the placebo. Until then, the placebo point estimate is meaningful but its inference fields stay NaN-consistent **even when `n_bootstrap > 0`** (bootstrap currently covers `DID_M`, `DID_+`, and `DID_-` only) +- Placebo SE contract: single-period placebo `DID_M^pl` (`L_max=None`) has `NaN` SE because the per-period aggregation path has no influence-function derivation; inference fields stay NaN-consistent **even when `n_bootstrap > 0`** for the single-period path (bootstrap covers `DID_M`, `DID_+`, and `DID_-` only). Multi-horizon dynamic placebos `DID^{pl}_l` (`L_max >= 1`) have valid analytical SE via the placebo influence function (same cohort-recentered structure as positive horizons, applied to backward outcome differences), with bootstrap SE override when `n_bootstrap > 0`. This is a library extension beyond the dynamic companion paper, which states Theorem 1 variance for `DID_l` only. - The analytical CI is conservative under Assumption 8 (independent groups) of the dynamic companion paper, exact only under iid sampling - Survey design supported: pweight with strata/PSU/FPC via Taylor Series Linearization. Replicate weights and PSU-level bootstrap deferred @@ -637,13 +637,13 @@ plot_event_study(results) ### EfficientDiD -Efficient DiD estimator (Chen, Sant'Anna & Xie 2025). Achieves the semiparametric efficiency bound for ATT(g,t). Phase 1: no-covariates path only. +Efficient DiD estimator (Chen, Sant'Anna & Xie 2025). Achieves the semiparametric efficiency bound for ATT(g,t) on the **no-covariate path**. Also supports an optional doubly-robust covariate path (sieve-based propensity score ratios + linear OLS outcome regression): the DR property gives consistency if either the OR or the PS is correctly specified, but the linear OLS outcome regression does not generically attain the efficiency bound unless the conditional mean is linear in the covariates. Pass column names to the `covariates` parameter on `fit()`. ```python EfficientDiD( pt_assumption: str = "all", # "all" (overidentified) or "post" (just-identified) alpha: float = 0.05, - cluster: str | None = None, # Not yet implemented + cluster: str | None = None, # Column name for cluster-robust SEs (Liang-Zeger on EIF values); cluster-level multiplier bootstrap when n_bootstrap > 0 n_bootstrap: int = 0, # Multiplier bootstrap iterations bootstrap_weights: str = "rademacher", # "rademacher", "mammen", or "webb" seed: int | None = None, @@ -662,7 +662,7 @@ edid.fit( unit: str, time: str, first_treat: str, - covariates: list[str] = None, # Not yet implemented (Phase 2) + covariates: list[str] = None, # Time-invariant unit-level covariates; uses doubly-robust sieve path when non-None aggregate: str = None, # None, "simple", "event_study", "group", or "all" balance_e: int = None, ) -> EfficientDiDResults diff --git a/docs/api/chaisemartin_dhaultfoeuille.rst b/docs/api/chaisemartin_dhaultfoeuille.rst index 7206db0c..1312c2b9 100644 --- a/docs/api/chaisemartin_dhaultfoeuille.rst +++ b/docs/api/chaisemartin_dhaultfoeuille.rst @@ -6,12 +6,16 @@ The only modern staggered DiD estimator in diff-diff that handles off over time. This module implements the methodology from de Chaisemartin & D'Haultfœuille -(2020/2022). Phase 1 ships the contemporaneous-switch estimator ``DID_M`` -(= ``DID_1`` at horizon ``l = 1``). Phase 2 adds the full multi-horizon -event study ``DID_l`` for ``l = 1..L_max`` via the ``L_max`` parameter, -plus normalized estimator ``DID^n_l``, cost-benefit aggregate ``delta``, -dynamic placebos ``DID^{pl}_l``, and sup-t simultaneous confidence bands. -Phase 3 will add covariate adjustment. +(2020/2022). The estimator ships the contemporaneous-switch path ``DID_M`` +(= ``DID_1`` at horizon ``l = 1``); the full multi-horizon event study +``DID_l`` for ``l = 1..L_max`` via the ``L_max`` parameter, with normalized +estimator ``DID^n_l``, cost-benefit aggregate ``delta``, dynamic placebos +``DID^{pl}_l``, and sup-t simultaneous confidence bands; residualization-style +covariate adjustment (``controls``); group-specific linear trends +(``trends_linear``); state-set-specific trends (``trends_nonparam``); +heterogeneity testing; non-binary treatment; HonestDiD sensitivity +integration on placebos; and survey support via Taylor-series linearization +(pweight + strata/PSU/FPC). The estimator: @@ -58,7 +62,7 @@ All other staggered estimators in diff-diff (:class:`~diff_diff.CallawaySantAnna once treated, stays treated. ``ChaisemartinDHaultfoeuille`` is the only library option for non-absorbing treatments. -**Phase 1 panel requirements (deviation from R DIDmultiplegtDYN):** +**Panel requirements (deviation from R DIDmultiplegtDYN):** - Every group must have an observation at the **first global period** (the panel's earliest time value). Groups missing this baseline raise @@ -67,15 +71,15 @@ library option for non-absorbing treatments. their first and last observed period) are dropped with a ``UserWarning``. - **Terminal missingness** (groups observed at the baseline but missing - one or more later periods — early exit / right-censoring) is supported. + one or more later periods - early exit / right-censoring) is supported. The group contributes from its observed periods only, masked out of the missing transitions by the per-period ``present`` guard in the variance computation. -- This is a Phase 1 limitation relative to R ``DIDmultiplegtDYN``, which - supports unbalanced panels with documented missing-treatment-before- - first-switch handling. **Workaround:** pre-process your panel to - back-fill the baseline (or drop late-entry groups before fitting), or - use R until a future phase lifts the restriction. See the +- This is a documented deviation from R ``DIDmultiplegtDYN``, which + supports unbalanced panels with missing-treatment-before-first-switch + handling. **Workaround:** pre-process your panel to back-fill the + baseline (or drop late-entry groups before fitting), or use R until + this restriction is lifted. See the ``Note (deviation from R DIDmultiplegtDYN)`` block in ``docs/methodology/REGISTRY.md`` for the rationale and the exact defensive guards that make terminal missingness safe. diff --git a/docs/api/efficient_did.rst b/docs/api/efficient_did.rst index 0216b08d..bba00a97 100644 --- a/docs/api/efficient_did.rst +++ b/docs/api/efficient_did.rst @@ -6,7 +6,7 @@ from Chen, Sant'Anna & Xie (2025). This module implements the efficiency-bound-attaining estimator that: -1. **Achieves the semiparametric efficiency bound** for ATT(g,t) estimation +1. **Achieves the semiparametric efficiency bound** for ATT(g,t) estimation on the no-covariate path 2. **Optimally weights** across comparison groups and baselines via the inverse covariance matrix Ω* 3. **Supports two PT assumptions**: PT-All (overidentified, tighter SEs) and @@ -16,16 +16,28 @@ This module implements the efficiency-bound-attaining estimator that: .. note:: - Phase 1 supports the **no-covariates** path only. The with-covariates - path (Phase 2) will be added in a future version. + EfficientDiD supports a doubly-robust covariate path: sieve-based + propensity score ratios combined with a linear OLS outcome regression. + The DR property ensures consistency if either the OR or the PS ratio is + correctly specified, but the linear OLS working model for the outcome + regression does not generically attain the semiparametric efficiency + bound unless the conditional mean is linear in the covariates. The + unqualified efficiency-bound claim applies to the no-covariate path + only. Pass column names to the ``covariates`` parameter on ``fit()``. + See ``docs/methodology/REGISTRY.md`` for the full contract. **When to use EfficientDiD:** -- Staggered adoption design where you want **maximum efficiency** +- Staggered adoption design where you want **maximum efficiency** on the no-covariate path - You believe parallel trends holds across all pre-treatment periods (PT-All) - You want tighter confidence intervals than Callaway-Sant'Anna - You need a formal efficiency benchmark for comparing estimators +For covariate-adjusted designs, the doubly-robust path is consistent under +either outcome-regression or propensity-ratio correctness but does not +generically attain the efficiency bound under the shipped linear OLS +outcome regression. + **Reference:** Chen, X., Sant'Anna, P. H. C., & Xie, H. (2025). Efficient Difference-in-Differences and Event Study Estimators. @@ -133,11 +145,11 @@ Comparison with Other Staggered Estimators - Conditional PT - Strict exogeneity * - Efficiency - - Achieves semiparametric bound + - Achieves semiparametric bound on the no-covariate path; DR covariate path is consistent but does not generically attain the bound under a linear OLS outcome regression - Not efficient - Efficient under homogeneity * - Covariates - - Not yet (Phase 2) + - Supported (doubly robust, sieve-based PS + linear OLS OR) - Supported (OR, IPW, DR) - Supported * - Bootstrap diff --git a/docs/business-strategy.md b/docs/business-strategy.md index 92a5d787..603e58da 100644 --- a/docs/business-strategy.md +++ b/docs/business-strategy.md @@ -323,64 +323,75 @@ Not the academic flowchart -- a business decision tree. --- -## 7. Interaction with Existing Roadmap +## 7. Interaction with ROADMAP.md -The project has an existing ROADMAP.md covering Phase 10 (survey academic credibility), future estimators, and research directions. This strategy supplements rather than replaces it: +`ROADMAP.md` has been refreshed to organize work as Current State / Recently Shipped / Shipping Next / Under Consideration / AI-Agent Track / Long-term. This strategy document is an internal planning artifact that sits alongside it: -**Directly subsumed items:** -- **10g. "Practitioner guidance: when does survey design matter?"** -- this becomes part of the business tutorials and Getting Started guide. No longer a standalone item. -- **`aggregate_survey()` helper** -- shipped in v3.0.1. The microdata-to-panel workflow helper is in place for Persona A (survey data from BRFSS/ACS -> geographic panel). Practitioner-facing tutorials should reference it. +**Subsumed into current capability:** +- **Survey guidance for practitioners** -- now covered by the practitioner getting-started guide, practitioner decision tree, and the Brand Awareness Survey DiD tutorial (all shipped). +- **`aggregate_survey()` helper** -- shipped; microdata-to-panel workflow is in place for Persona A (BRFSS / ACS -> geographic panel). Practitioner tutorials reference it. +- **de Chaisemartin-D'Haultfoeuille (reversible treatments)** -- shipped end-to-end. Marketing interventions that switch on and off (seasonal campaigns, promotions) are now supported. -**Reprioritized by business use cases:** -- **de Chaisemartin-D'Haultfouille (reversible treatments)** -- marketing interventions frequently switch on/off (seasonal campaigns, promotions). This estimator becomes higher priority for business DS than for academics. Should move up in the roadmap. -- **10e. Position paper / arXiv preprint** -- still valuable for academic credibility but not on the critical path for business DS adoption. +**In Shipping Next on ROADMAP.md:** +- Practitioner-ready output (`BusinessReport`, `DiagnosticReport` with context-aware `practitioner_next_steps()`). +- Practitioner tutorials: dCDH comprehensive walkthrough, BRFSS state-policy, Marketing Campaign Lift, Pricing / Promotion Impact. +- Survey breadth and validation: two-phase sampling + multi-stage cluster R-cross-validation. -**Unchanged:** -- Future estimators (Local Projections DiD, Causal Duration, etc.) and long-term research directions remain academic-oriented and unaffected by this strategy. +**Deferred per ROADMAP.md:** +- Product Launch Regional Rollout and Loyalty Program tutorials (deferred until practitioner demand). +- Methodology-vs-alternative comparison pages (superseded by `BusinessReport` + tutorials that demonstrate capability directly). --- -## 8. Prioritized Roadmap +## 8. Current State and Sequenced Work -### Phase 1: Foundation -*Goal: Make diff-diff discoverable and approachable for business DS* +This section mirrors the `ROADMAP.md` structure (Current State / Shipping Next / Under Consideration / AI-Agent Track) but adds internal context useful for strategic planning. -1. Business "Getting Started" guide (1a) -2. Terminology bridge as supplement within business docs, not standalone (1b) -3. README "For Data Scientists" section (1c) -4. Business decision tree -- "which method should I use?" (4b) -5. Brand Awareness Survey DiD tutorial -- the lead use case (2b) +### Shipped (foundation complete) -**Why start here**: Zero code changes. Maximum positioning impact. The survey tutorial showcases our unique capability (survey design support) in the context that matters most to the user. +Positioning and practitioner-foundation items that landed in the wave through v3.1.1: -**Validation gate before Phase 2**: After Phase 1 ships, look for adoption signals -- tutorial page views, GitHub issues from business users, PyPI download trajectory. These signals determine how aggressively to invest in Phases 2-3. +- Business "Getting Started" guide, terminology bridge, practitioner decision tree, and README "For Data Scientists" section. +- Brand Awareness Survey DiD tutorial -- the lead survey-design use case. +- Geo-Experiment tutorial (SyntheticDiD walkthrough for few-market marketing analytics). +- `aggregate_survey()` helper (microdata-to-panel bridge). +- Survey-aware power analysis (`SurveyPowerConfig`). +- Full dCDH estimator (reversible treatment) with survey support. +- Runtime LLM guides (`get_llm_guide(...)`) bundled in the wheel. -### Phase 2: Business Content -*Goal: Provide end-to-end examples for each major persona* +### Shipping Next -Tutorials in priority order (ship incrementally, not all at once): +The queued work, ordered by expected leverage: -6. Marketing Campaign Lift tutorial (2a) -- **highest priority after survey** -7. Geo-Experiment tutorial (2f) -- captures GeoLift/CausalImpact search traffic -8. Comparison page: diff-diff vs GeoLift vs CausalImpact (1d) -9. Product Launch Rollout tutorial (2c) -10. Pricing/Promotion Impact tutorial (2d) -11. Loyalty Program tutorial using DDD (2e) +1. `BusinessReport` class -- plain-English summaries from any results object. Core uses only numpy / pandas / scipy; rich export (PowerPoint, HTML) via an optional `[reporting]` extra. +2. `DiagnosticReport` with context-aware `practitioner_next_steps()` -- unified diagnostic runner that substitutes actual column names from fitted results. +3. dCDH comprehensive tutorial (reversible treatment walkthrough; Favara-Imbs headline replication). +4. BRFSS repeated-cross-section tutorial -- state-policy DiD with design-based SEs and HonestDiD sensitivity. +5. Marketing Campaign Lift tutorial (CallawaySantAnna, staggered geo rollout). +6. Pricing / Promotion Impact tutorial (ContinuousDiD dose-response). +7. Two-phase sampling + multi-stage cluster R-cross-validation. -### Phase 3: Convenience Layer -*Goal: Reduce time-to-insight and enable stakeholder communication* +**Why this order**: practitioner-ready output (items 1-2) unblocks every subsequent tutorial -- each tutorial can close with a stakeholder-ready report. Items 3-6 validate the new reporting surface across the personas identified in §3. -12. `BusinessReport` class (3a) -- core uses only numpy/pandas/scipy; rich export via optional `[reporting]` extra -13. `DiagnosticReport` descriptive assessment (3b) -14. Business data generator wrappers (3c) -15. ~~`survey_aggregate()` helper from existing roadmap~~ -- shipped in v3.0.1 as `aggregate_survey()`; directly enables the survey tutorial workflow +### Under Consideration -### Phase 4: Platform (Longer-term) -*Goal: Integrate into business DS workflows* +Candidates surfaced by 2025-26 methodology research and practitioner-ecosystem scanning. Each has a commit criterion in `ROADMAP.md`: -16. Integration guides (4a) -17. Export templates (4c) -18. AI agent integration -- position DiagnosticReport and BusinessReport as tools AI agents can invoke on behalf of business DS (leveraging existing `practitioner_next_steps()` infrastructure) +- **New estimators** that cover use cases existing diff-diff estimators do not: DiD with no untreated group (inverse of SDiD's few-treated setup), distributional DiD for staggered timing, Local Projections DiD. +- **Inference options** for sparse-treatment designs (few treated units). +- **Sensitivity extensions** complementing HonestDiD (confounder-based bounds). +- **Identification audits** on triple-difference estimators. +- **Post-estimation and export capabilities** (standard post-estimation interface, publication-table export, survey-design object interop, pluggable regression engine). + +### AI-Agent Track + +Long-running program: position `DiagnosticReport` and `BusinessReport` as the output surface an AI agent produces on behalf of a practitioner, so the agent (using `practitioner_next_steps()`, Baker et al. guardrails, and the runtime LLM guides) can deliver a business-ready report without the practitioner seeing raw coefficients. See `ROADMAP.md` AI-Agent Track for the full vision and named milestones. + +### Deferred + +- Product Launch Regional Rollout and Loyalty Program tutorials -- defer until practitioner demand signals warrant them. +- Business data generator wrappers (`generate_campaign_data`, etc.) -- rethink after `BusinessReport` ships; the wrappers may be moot if the reporting surface makes generic DGP naming sufficient. +- Integration guides (Databricks, Jupyter dashboards) and explicit export templates (PowerPoint, Confluence) -- defer until `BusinessReport` defines the common output shape. --- @@ -390,24 +401,24 @@ Tutorials in priority order (ship incrementally, not all at once): |---|---| | Oversimplifying may undermine credibility with academic users | Keep business layer additive -- don't change existing academic interface. Business tools translate, not replace. | | Business tutorials may encourage methodologically unsound analysis | Embed guardrails: DiagnosticReport flags issues, tutorials emphasize assumption checking in business language | -| Scope creep | Phase 1 is documentation-only. Validate adoption signals before investing in code (Phase 3+). | +| Scope creep | Practitioner-foundation work is documentation-only; validate adoption signals before investing in code (BusinessReport / DiagnosticReport and beyond). | | Maintaining two audiences | Shared codebase, separate entry points. Like scikit-learn serving both ML engineers and researchers. | --- ## 10. Success Metrics -**Leading indicators (measurable after Phase 1):** +**Leading indicators (measurable now that practitioner foundation is shipped):** - Tutorial notebook page views / nbviewer hits for business tutorials - GitHub issues or discussions mentioning business use cases (campaigns, surveys, geo-experiments) - Search console impressions for business-oriented queries ("python campaign lift", "python geo experiment", "python survey did") -**Lagging indicators (Phases 2-3):** +**Lagging indicators (measurable after BusinessReport / DiagnosticReport ship):** - PyPI download trajectory (month-over-month growth rate, not absolute) - GitHub stars from non-academic profiles - External blog posts or talks using diff-diff for business analysis -**Phase 1 -> Phase 2 gate**: At least one of: (a) 3+ GitHub issues from business users, (b) measurable search impression growth for business queries, (c) qualitative signal that the business framing is resonating (social media, conference mentions). If none after 8 weeks, revisit the strategy before investing in code changes. +**Adoption gate before investing further in code**: At least one of: (a) 3+ GitHub issues from business users, (b) measurable search impression growth for business queries, (c) qualitative signal that the business framing is resonating (social media, conference mentions). If none after 8 weeks of the practitioner-foundation content being live, revisit the strategy before investing further in code changes beyond `BusinessReport` / `DiagnosticReport`. --- @@ -415,6 +426,6 @@ Tutorials in priority order (ship incrementally, not all at once): We have the best DiD engine in Python. What we don't have is the business packaging. The methodology is sound, the survey support is unique, the diagnostic suite is unmatched. But a marketing data scientist looking at our docs sees academic econometrics, not their problem. -The fix is mostly about **framing, examples, and a thin convenience layer** -- not rebuilding the core. Phase 1 requires zero code changes. Phases 2-3 add content and lightweight APIs. The competitive window is open because no one else is targeting this intersection: comprehensive DiD + business data science + Python. +The fix is mostly about **framing, examples, and a thin convenience layer** -- not rebuilding the core. The practitioner-foundation content (docs, positioning, lead tutorials) is now shipped. The convenience layer (`BusinessReport`, `DiagnosticReport`) is the next code-level step; platform integrations follow. The competitive window is open because no one else is targeting this intersection: comprehensive DiD + business data science + Python. The survey use case is the sharpest wedge. No other tool in any language combines complex survey design with modern heterogeneity-robust DiD estimators. Lead with that, then broaden. diff --git a/docs/choosing_estimator.rst b/docs/choosing_estimator.rst index 4c70b785..a79e74ee 100644 --- a/docs/choosing_estimator.rst +++ b/docs/choosing_estimator.rst @@ -250,9 +250,12 @@ All other staggered estimators treatment is absorbing - once treated, stays treated. Ships ``DID_M`` (= ``DID_1``) from de Chaisemartin & D'Haultfœuille -(2020) plus the full multi-horizon event study ``DID_l`` for -``l = 1..L_max`` from the dynamic companion paper (NBER WP 29873). -Phase 3 will add covariate adjustment. +(2020), the full multi-horizon event study ``DID_l`` for ``l = 1..L_max`` +from the dynamic companion paper (NBER WP 29873), residualization-style +covariate adjustment (``controls``), group-specific linear trends +(``trends_linear``), state-set-specific trends (``trends_nonparam``), +heterogeneity testing, non-binary treatment, HonestDiD sensitivity +integration on placebos, and survey support via Taylor-series linearization. .. code-block:: python @@ -286,10 +289,13 @@ Phase 3 will add covariate adjustment. .. note:: - Placebo SE (both single-lag ``DID_M^pl`` and dynamic ``DID^{pl}_l``) - is intentionally ``NaN``. Placebo point estimates are meaningful for - visual pre-trends inspection; formal placebo inference is deferred. - See ``REGISTRY.md`` for the full contract. + Single-period placebo ``DID_M^pl`` (``L_max=None``) has ``NaN`` SE - + the per-period aggregation path has no influence-function derivation, + so inference fields stay ``NaN`` even when ``n_bootstrap > 0``. The + point estimate is meaningful for visual pre-trends inspection. + Multi-horizon dynamic placebos ``DID^{pl}_l`` (``L_max >= 1``) have + valid analytical SE and bootstrap SE via the placebo IF. See + ``docs/methodology/REGISTRY.md`` for the full contract. .. note:: @@ -356,16 +362,22 @@ Efficient DiD Use :class:`~diff_diff.EfficientDiD` when: -- You have staggered adoption and want **maximum statistical efficiency** +- You have staggered adoption and want **maximum statistical efficiency** on the no-covariate path - You believe parallel trends holds across all pre-treatment periods (PT-All) - You want tighter confidence intervals than Callaway-Sant'Anna - You need a formal efficiency benchmark for comparing estimators .. note:: - Phase 1 supports the **no-covariates** path only. If you need covariate - adjustment, use :class:`~diff_diff.CallawaySantAnna` with ``estimation_method='dr'`` - or :class:`~diff_diff.ImputationDiD`. + EfficientDiD supports covariate adjustment via a doubly-robust path: + sieve-based propensity score ratios combined with a linear OLS outcome + regression. The DR property gives consistency if either the OR or the + PS is correctly specified, but the linear OLS outcome regression does + not generically attain the semiparametric efficiency bound unless the + conditional mean is linear in the covariates. The unqualified efficiency + claim applies to the no-covariate path only. Pass column names to the + ``covariates`` parameter on ``fit()``. See + ``docs/methodology/REGISTRY.md`` for the full contract. .. code-block:: python diff --git a/docs/practitioner_decision_tree.rst b/docs/practitioner_decision_tree.rst index 1f77f6ea..8ca06d15 100644 --- a/docs/practitioner_decision_tree.rst +++ b/docs/practitioner_decision_tree.rst @@ -192,18 +192,18 @@ a joiners-only view `DID_+`, and a leavers-only view `DID_-`. By default, the estimator drops markets whose treatment switches more than once before estimation (``drop_larger_lower=True``, matching the R reference). Each drop emits a warning. If your design has many multi-switch markets and - you need them all, raise this with the diff-diff maintainers — Phase 2 of the - estimator will add explicit multi-switch handling via the dynamic event-study - path. + you need them all, raise this with the diff-diff maintainers - explicit + multi-switch handling is a planned extension. .. note:: - Single-lag placebo (`DID_M^pl`) is computed automatically and exposed via + Single-lag placebo (``DID_M^pl``) is computed automatically and exposed via ``results.placebo_effect``. The placebo inference fields (SE, p-value, CI) - are intentionally ``NaN`` in Phase 1 — and stay ``NaN`` even when - ``n_bootstrap > 0``. The dynamic companion paper Section 3.7.3 derives - the cohort-recentered analytical variance for ``DID_l`` only; - placebo-bootstrap support is deferred to Phase 2. + are intentionally ``NaN`` for the single-lag path and stay ``NaN`` even + when ``n_bootstrap > 0``. The dynamic companion paper Section 3.7.3 derives + the cohort-recentered analytical variance for ``DID_l`` only; an + influence-function derivation for the single-lag placebo is a planned + extension. Dynamic placebos (``L_max >= 1``) do have valid analytical SE. .. _section-dose: diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md index 993b60c3..49b3ab05 100644 --- a/docs/survey-roadmap.md +++ b/docs/survey-roadmap.md @@ -1,8 +1,10 @@ -# Survey Data Support Roadmap +# Survey Data Support: History and Current State -This document captures the survey data support roadmap for diff-diff. -Phases 1-9 are complete. Phase 10 covers the credibility and announcement -readiness work still ahead. +This document is the technical reference for survey-design support in +diff-diff. It records the build history (Phases 1-10) as shipped and +documents current limitations. Forward-looking roadmap items live in +[ROADMAP.md](../ROADMAP.md); this file is the historical and technical +companion. --- @@ -129,12 +131,14 @@ for the methodology entry. --- -## Phase 10: Remaining Items +## Phase 10: Academic Grounding (History) -The items below establish further credibility with -practitioners and methodologists. +The Phase 10 items established the theoretical and empirical foundation +for survey-design variance estimation on modern DiD influence functions. +All items below are shipped; this section documents what was done and +why. -### 10a. Theory Document (HIGH priority) ✅ +### 10a. Theory Document ✅ `docs/methodology/survey-theory.md` lays out the formal argument for design-based variance estimation with modern DiD influence functions: @@ -159,7 +163,7 @@ immediately. Survey Data." *JASA* 83(401). - Shao, J. (1996). "Resampling Methods in Sample Surveys." *Statistics* 27. -### 10b. Survey Simulation DGP (HIGH priority) ✅ +### 10b. Survey Simulation DGP ✅ Enhanced `generate_survey_did_data()` with 8 research-grade parameters: `icc`, `weight_cv`, `informative_sampling`, `heterogeneous_te_by_strata`, @@ -172,45 +176,19 @@ units' x1 mean by +1 SD and adds `conditional_pt * x1_i * (t/T)` to the outcome, creating X-dependent time trends. Unconditional PT fails; conditional PT holds after covariate adjustment. DR/IPW estimators recover truth. -### 10c. Expand R Validation Coverage (HIGH priority) ✅ +### 10c. Expand R Validation Coverage ✅ 8 of 16 estimators now cross-validated against R's `survey::svyglm()`: DifferenceInDifferences, TWFE, CallawaySantAnna, SyntheticDiD, ImputationDiD, StackedDiD, SunAbraham, TripleDifference. -### 10d. Tutorial: Show the Pain (HIGH priority) ✅ +### 10d. Tutorial: Show the Pain ✅ Survey tutorial rewritten with side-by-side flat-weight vs design-based comparison using the research-grade DGP from 10b, showing known ground truth, coverage simulation, and false pre-trend detection rates. -### 10e. Position Paper / arXiv Preprint (MEDIUM priority, long-term) - -A 15-25 page methodology note targeting JSSAM, simultaneously posted to -arXiv. Theory (~5pp), simulation study using DGP from 10b (~8pp), -empirical illustration with NHANES ACA data (~3pp), software section -(~2pp). - -**Simulation study scenarios** (minimum): -1. Unconditional PT with complex survey — coverage of TSL vs flat-weight SEs -2. Informative sampling + heterogeneous TE — weighted ATT bias correction -3. Panel vs repeated cross-section — both design types -4. **Conditional PT** — unconditional PT fails (differential pre-trends - correlated with X), conditional PT holds after covariate adjustment. - DR/IPW with covariates recovers truth; no-covariate estimator is biased. - This is the most novel claim — survey-weighted nuisance estimation - (propensity scores, outcome regression) produces valid IFs under complex - sampling. **Resolved:** `conditional_pt` parameter added to - `generate_survey_did_data()` with X-dependent time trends - (`y += conditional_pt * x1_i * (t/T)`) and treated x1 mean shift. - -**Co-authorship:** A co-author from the DiD methodology community would -strengthen credibility — someone who can vouch that the IFs are valid -under survey weighting. The survey statistics side (Binder 1983, Rao & -Wu 1988) is established and doesn't need a survey methodologist to -co-sign. - -### 10f. WooldridgeDiD Survey Support — SHIPPED +### 10f. WooldridgeDiD Survey Support ✅ WooldridgeDiD (ETWFE) now supports `survey_design` for all three methods (OLS, logit, Poisson) with `pweight` only (`fweight`/`aweight` rejected). @@ -219,12 +197,15 @@ Logit/Poisson use survey-weighted IRLS + X_tilde linearization for TSL vcov. Replicate-weight designs raise `NotImplementedError`; bootstrap + survey is rejected. -### 10g. Practitioner Guidance (LOW priority) +### 10g. Practitioner Guidance ✅ -A decision flowchart helping practitioners decide whether they need full -survey design or whether flat weights suffice. Key factors: ICC, number -of PSUs, stratification gain, DEFF magnitude. DEFF diagnostics provide -the empirical answer, but practitioners need guidance on interpretation. +Subsumed by the practitioner decision tree +(`docs/practitioner_decision_tree.rst`) and the practitioner +getting-started guide (`docs/practitioner_getting_started.rst`). +The Brand Awareness Survey DiD tutorial +(`docs/tutorials/17_brand_awareness_survey.ipynb`) demonstrates the +full workflow end-to-end; DEFF diagnostics provide the empirical signal +for whether survey design matters on a given dataset. --- diff --git a/tests/test_guides.py b/tests/test_guides.py index e6c321ce..bc0abe83 100644 --- a/tests/test_guides.py +++ b/tests/test_guides.py @@ -43,9 +43,10 @@ def test_wheel_content_matches_package_resource(): def test_utf8_encoding_preserved(): - # llms-full.txt contains the em-dash '\u2014'; verify it roundtrips. + # llms-full.txt contains the non-ASCII ligature '\u0153' (oe, from + # "D'Haultfoeuille"); verify UTF-8 roundtrips through the packaged guide. text = get_llm_guide("full") - assert "\u2014" in text + assert "\u0153" in text @pytest.mark.parametrize("bad", ["bogus", "", "CONCISE", None, 0, True, ["x"]])