docs: HAD ecosystem completion (RTD audit Batch A)#389
Conversation
Closes the gaps left after PR #372 added HeterogeneousAdoptionDiD to the canonical surfaces. The narrative pages did not yet mention HAD, and the 12-symbol HAD pretest suite shipped in `had_pretests.py` was absent from the API page. Also refreshes the inference-contract block to use the `survey_design=` canonical kwarg consolidated in PR #376. - `docs/api/had.rst`: new HAD Pretests section covering all 12 public symbols (4 single-period tests + 4 result classes + 3 joint tests + 1 joint result), split into `aggregate="overall"` and `aggregate="event_study"` subsections matching the workflow's dispatch. Refreshes the existing inference-contract block to reference `survey_design=make_pweight_design(weights)` (pweight shortcut) and `survey_design=SurveyDesign(...)` (full TSL); notes `survey=` / `weights=` are deprecated aliases. - `docs/choosing_estimator.rst`: HAD entries in all 3 tables (Quick Reference, Standard Error Methods, Survey Design Support) plus a new "Universal Rollout / No Untreated Control" subsection in Detailed Guidance. SE Methods row uses `survey_design=` canonical naming. - `docs/r_comparison.rst`: HAD row in Feature Comparison Table, new "No-Untreated Designs (no R parallel)" subsection, Migration Tips bullet. - `docs/troubleshooting.rst`: new HAD Issues section with 4 subsections (estimand resolution / mass-point fallback / classical SE under survey_design / panel-only event-study). - `docs/practitioner_decision_tree.rst`: Start Here option 7, At a Glance row, new "Universal Rollout" section with `_section-no-untreated` anchor. - `docs/doc-deps.yaml`: extend had_pretests.py entry with llms.txt user-guide dep; add new top-level local_linear.py entry. Verification: all 12 HAD pretest symbols importable; `make_pweight_design` + `SurveyDesign` importable; sphinx build succeeds with 0 new warnings (71 pre-existing unaffected); HTML render contains expected HAD content (276 hits in had.html, 4-8 in narrative pages); 0 em dashes; `_section-no-untreated` anchor resolves. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Overall Assessment Executive Summary
Methodology
Code Quality No findings in the changed files. Performance No findings in the changed files. Maintainability No findings in the changed files. Tech Debt No findings. I also did not see any changed-file issue that is already mitigated by Security No findings. Documentation/Tests
Not run here: Sphinx build/tests, due the read-only sandbox. Path to Approval
|
P1 (methodology contract): the inference-contract block, choosing-estimator SE Methods row, and troubleshooting "classical SE under survey" subsection incorrectly described `survey_design=make_pweight_design(weights)` as the estimator-side pweight shortcut. Per `survey.py:681-697` and `had.py:2853-2891` that helper is reserved for ARRAY-IN HAD pretest helpers (`stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `qug_test`); on the data-in `HeterogeneousAdoptionDiD.fit` surface the deprecated `weights=np.ndarray` shortcut is the actual pweight route, and it currently yields a different SE family than `survey_design=SurveyDesign(...)`: `weights=` -> `variance_formula="pweight"` / `"pweight_2sls"` (CCT-2014 / 2SLS pweight-sandwich); `survey_design=SurveyDesign(...)` -> `"survey_binder_tsl"` / `"survey_binder_tsl_2sls"` (Binder-TSL). The unification onto a single SE contract is queued for the next minor release. - `docs/api/had.rst` inference-contract block: restore `weights=` shortcut (deprecated) and `survey_design=SurveyDesign(weights="col", ...)` as the two distinct weighted regimes; spell out the SE-family difference and the next-minor unification; add a separate paragraph that documents `make_pweight_design()` correctly as the pweight-only convenience for the array-in pretest helpers. - `docs/api/had.rst` mass-point classical deviation: cband+event_study `NotImplementedError` fires on the deprecated `weights=` shortcut, not on `survey_design=make_pweight_design(...)`. - `docs/choosing_estimator.rst` SE Methods row: same restoration; spells out the variance_formula values and notes the next-minor unification. - `docs/troubleshooting.rst` "classical SE under survey": subsection `NotImplementedError` description corrected to `survey_design=SurveyDesign(...)` + deprecated `weights=` shortcut. P1 (snippet correctness): the new HAD code snippets used `outcome=`, `unit=`, `time=`, `dose=`, `results.coef`, `results.fit_path` - the actual HAD signature uses `outcome_col`, `unit_col`, `time_col`, `dose_col` (`had.py:2786-2804`, `had_pretests.py:3741+`), and the result objects expose `att` and `design` (`had.py:316, 324, 669, 679`). Fixed across: - `docs/practitioner_decision_tree.rst` Universal Rollout snippet (workflow + fit + result-attr). - `docs/choosing_estimator.rst` Universal Rollout snippet (workflow + fit + result-attr). - `docs/r_comparison.rst` Heterogeneous Adoption snippet (fit kwargs). - `docs/troubleshooting.rst` HAD Issues snippets (fit kwargs + `results.fit_path` -> `results.design`). P2 (R-comparison overstatement): the dedicated R package `DIDHAD` (de Chaisemartin et al., August 2025) covers the QUG case (Design 1') of the same arXiv paper, so claiming "no R parallel" was too strong. Reframed to acknowledge `DIDHAD` and position diff-diff's broader surface (Design 1 no-QUG, multi-period event-study extension Appendix B.2, survey-design integration via Binder TSL): - `docs/r_comparison.rst`: section heading "No-Untreated Designs (no R parallel)" -> "Heterogeneous Adoption (HAD)"; body acknowledges `DIDHAD` and frames diff-diff's broader surface; Feature Comparison Table row label simplified; note about R equivalents updated; Migration Tips bullet narrowed and cross-reference target updated. Sphinx build clean (0 warnings in edited files); also fixes a title-underline-too-short warning on the "Resolved estimand..." subsection that was below threshold by one character. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment The prior P1s from the last review look resolved, but two new unmitigated P1 methodology/doc mismatches remain on the changed HAD public surface. Executive Summary
Methodology
Code Quality No findings in the changed files. Performance No findings in the changed files. Maintainability No findings in the changed files. Tech Debt
Security No findings. Documentation/Tests
Not run here: local import/Sphinx smoke checks. This sandbox is missing Path to Approval
|
…le, DIDHAD claim
P1 (qug_test in array-in pretest helper list): `docs/api/had.rst:67-72`
listed `qug_test` alongside `stute_test` / `yatchew_hr_test` /
`stute_joint_pretest` as accepting `survey_design=make_pweight_design(weights)`.
Per `had_pretests.py:1236-1255` and the methodology REGISTRY (Phase 4.5 C0
decision gate), `qug_test` permanently raises `NotImplementedError` on any
of `survey_design=` / `survey=` / `weights=` - there is no migration target
for survey-aware QUG, and `make_pweight_design()` is explicitly NOT a valid
QUG migration target. The composite workflow `did_had_pretest_workflow`
handles weighted dispatch by skipping QUG with a `UserWarning`. Removed
`qug_test` from the array-in helper list and added an explicit
permanent-rejection note pointing to the workflow's skip behavior.
P1 (estimand-resolution rule misstatement): `docs/troubleshooting.rst`
"Resolved estimand" subsection said "no exact `dose == 0` => Design 1".
Per `had.py:1932-1987` `_detect_design()` resolves to Design 1' when EITHER
`d.min() == 0` OR `d.min() < 0.01 * median(|d|)` (small-share-of-treated
escape clause). Rewrote the cause to spell out both sub-cases and clarify
that Design 1 only fires when `d.min()` is meaningfully positive relative
to the dose scale. Updated the inspection snippet to compute and print the
`0.01 * median(|d|)` threshold instead of just counting `dose == 0` rows.
P2 (DIDHAD event-study overstatement): `docs/r_comparison.rst` Heterogeneous
Adoption section, R-equivalents note, and Migration Tips bullet claimed
diff-diff additionally covers "the multi-period event-study extension
(paper Appendix B.2)" beyond `DIDHAD`. The `DIDHAD` package already
exposes dynamic effects / placebo / event-study output in the QUG case, so
this overstates the gap. Narrowed all three locations to the documented
differences: Design 1 (no QUG, `WAS_{d_lower}`) and survey-design
integration via Binder TSL.
Sphinx build clean (0 warnings in edited files; the unrelated
`tutorials/18_geo_experiments.ipynb:61` "File not found:
practitioner_decision_tree.html#few-test-markets" warning is pre-existing
on origin/main and not introduced here).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
The HAD "Panel-only event-study restriction" subsection in `docs/troubleshooting.rst` overstated when staggered multi-cohort event-study inputs raise. Per `had.py:1230-1366` and `had.py:1470-1499` (also documented in `docs/methodology/REGISTRY.md:2408, 2533`): - Common-adoption panel (single first-treat period): `first_treat_col` optional; the period is auto-inferred from the dose invariant. - Staggered panel WITH `first_treat_col`: estimator auto-filters to the last-treatment cohort + never-treated and emits a UserWarning. - Staggered panel WITHOUT `first_treat_col`: estimator raises (the only actual failure mode for this restriction). Rewrote the cause to spell out the dispatch and made `first_treat_col` the primary remedy; kept manual cohort subsetting as an equivalent that skips the UserWarning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall AssessmentExecutive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…lusion P1 (default robust value): the troubleshooting "classical SE under survey" solution and the api/had.rst inference-contract block both said the default `robust=True` maps to `vcov_type='hc1'`. Per `had.py:2643-2649` the constructor default is `robust=False`, which the survey + mass-point guard at `had.py:3418-3447` treats as classical and raises - so the suggested workaround `HeterogeneousAdoptionDiD()` reproduces the same NotImplementedError. Replaced both with explicit working overrides (`HeterogeneousAdoptionDiD(vcov_type='hc1')` or `HeterogeneousAdoptionDiD(robust=True)`) and called out the wrong-default trap explicitly. P3 (CR1 omitted from weighted mass-point inference summary): both `docs/api/had.rst:L49-L54` and `docs/choosing_estimator.rst:L665-L667` described the deprecated `weights=` shortcut mass-point path as "`classical` / `hc1` only". Per `had.py:2276-2284, 2433-2448` and `docs/methodology/REGISTRY.md:2356-2358, 2527`, CR1 (Liang-Zeger) is also supported when `cluster=` is supplied on the weighted mass-point path. Added "CR1 when `cluster=` is supplied" to both summaries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…y, not adjudicator)
The new HAD prose overstated `did_had_pretest_workflow` as adjudicating the
HAD design path or auto-dispatching by panel structure. Per
`had_pretests.py:4033-4045` the workflow has two explicit modes via the
`aggregate=` kwarg ("overall" vs "event_study") that the caller picks; the
HAD design (`continuous_at_zero` / `continuous_near_d_lower` / `mass_point`)
is resolved separately inside `HeterogeneousAdoptionDiD.fit` by
`_detect_design()` from the dose support (`had.py:1932-1987`).
- `docs/api/had.rst` HAD Pretests intro: rephrased as a diagnostic battery;
spells out the two `aggregate=` modes selected by the caller and notes
the design path is auto-detected inside the estimator.
- `docs/practitioner_decision_tree.rst` Universal Rollout snippet: comment
no longer claims the workflow "adjudicates which design path"; clarifies
that the estimator picks the design from the dose support.
- `docs/r_comparison.rst` Heterogeneous Adoption section: dropped
"adjudicates the design path" claim; describes the workflow as a
diagnostic battery and points the design-path resolution at the estimator.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…eption The R4 CR1 addition wrote "CR1 when `cluster=` is supplied" without noting the documented mass-point + `aggregate="event_study"` + `cband=True` carve-out: that sub-path rejects an effective `classical` vcov (per `had.py:4147-4181` and REGISTRY:2380-2382), so plain `cluster=` with the default `robust=False` hits the classical-default trap. Both `docs/api/had.rst` and `docs/choosing_estimator.rst` now spell out the carve-out and point to the existing classical-default deviation note for the working override (`vcov_type="hc1"` or `robust=True`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…ected outright The R6 wording said `cluster=` + `aggregate="event_study"` + `cband=True` on mass-point could be made to work via `vcov_type="hc1"` / `robust=True`. Per `had.py:4059-4095` and `had.py:3399-3447` (also documented in REGISTRY:2380-2382) that path is rejected outright regardless of `vcov_type`, and `survey_design=` + `cluster=` on weighted mass-point is similarly rejected. The error is about variance-family mixing in the sup-t bootstrap / Binder-TSL composition, not about the classical-default trap. - `docs/api/had.rst` inline weights= shortcut summary: narrowed the CR1 qualifier - "rejected outright regardless of `vcov_type`" + cross-link to a new sibling deviation note. - `docs/api/had.rst` new "Mass-point cluster-combination deviation" note beside the existing classical-default note: enumerates the two rejected combinations (survey_design= + cluster= static and event-study; weights= + cluster= + cband=True event-study) with the implementation's own workaround language (cband=False / drop cluster= / cluster= alone / weights= + cluster=). - `docs/choosing_estimator.rst` SE Methods row: dropped the misleading "requires explicit hc1/robust=True" implication; says rejected outright for both the weighted shortcut + cluster= + event_study cband and the survey_design= + cluster= mass-point combination. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment✅ Looks good Executive Summary
MethodologyAffected methods reviewed:
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
… fallback
The "Mass-point fit fallback" troubleshooting subsection framed
`mass_point` as a fallback that "just changes the SE regime." Per
`had.py:33-39, 2260-2272` and REGISTRY:2400-2402, 2523-2527, mass-point is
a distinct Design 1 estimator path from dCDH 2026 paper Section 3.2.4
(not the Appendix): both the point estimate (Wald-IV sample-average ratio
with binary instrument `Z_g = 1{D_{g,2} > d_lower}`) AND the SE
(structural-residual 2SLS sandwich) differ from the continuous local-linear
/ CCT-2014 paths.
Renamed the subsection to "Mass-point design selected", rewrote the cause
to spell out the 2%-modal-fraction trigger, the Wald-IV ratio formula, and
the structural-residual sandwich; corrected the paper-section attribution
(Section 3.2.4, not Appendix); refined the solution snippet to point at
re-binning to drop modal fraction below 2% if the user prefers
continuous_near_d_lower.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology No findings. The changed HAD docs align with the registry and implementation, and the only prior review issue was informational and is now resolved ( Code Quality No findings. The new API documentation covers the 12 exported HAD pretest symbols without inventing non-public surfaces ( Performance No findings. This is a docs-only diff and it does not change runtime code paths. Maintainability No findings. The new Tech Debt No findings. The limitations referenced by these docs remain tracked in Security No findings. Documentation/Tests No blocking documentation issues found in the diff. Static review only: Sphinx is not installed in this environment, so I could not rerun the docs build. |
…Intensity Practitioner walkthrough for HeterogeneousAdoptionDiD on the no-untreated-controls case: every market got the campaign at varying intensity and there is no clean comparison group. Fills the structural gap T14 (ContinuousDiD) cannot address. Notebook scope (23 cells, 13 markdown / 10 code, mirrors T19's structure): - Sections 1-3: framing the no-untreated-controls measurement problem, setup imports, synthetic 60-DMA / 8-week panel with Uniform[$5K, $50K] regional add-on spend (every DMA participates, no DMA at $0). DGP is internally consistent: outcomes are generated from the dose values HAD then sees, no post-hoc relabeling. - Section 4: overall WAS_d_lower fit on a 2-period (pre/post mean) collapse - HAD's overall mode requires exactly 2 periods (had.py:952-959). Locked headline: per-$1K marginal effect of 100 weekly visits per DMA above the boundary spend (95% CI [98.6, 101.4]) with design auto-detection landing on `continuous_near_d_lower` (Design 1) and target `WAS_d_lower`. Surfaces the Assumption 5/6 advisory the library fires for Design 1 and explains why it holds in this DGP (linear by construction). - Section 5: multi-week event-study fit on the 8-week panel, per-week WAS_d_lower for e=0..3 (~100 each, CIs cover truth) and pre-launch placebos at e=-2..-4 sitting on zero. - Section 6: stakeholder communication template (T18/T19 markdown blockquote pattern), per-DMA dollar-lift interpretation `(actual_dose - d_lower) * WAS_d_lower`, Assumption 6 caveat. - Section 7: extensions (population-weighted/survey path, composite pretest workflow described accurately as QUG support-infimum test + linearity tests, mass-point design path), related-tutorials cross-links (T01, T02, T14, T17, T18, T19), summary checklist. Drift detection: companion tests/test_t20_had_brand_campaign_drift.py (13 tests, 0.06s, mirrors T19's test-file-only pattern - T19's notebook itself has zero in-notebook asserts). Pins panel composition including sample median, design auto-detection / target / d_lower, overall WAS_d_lower / SE / CI endpoints to one-decimal display, dose mean, n_units, full event-study horizon presence (e=-4..-2, 0..3), per-week post-launch coverage of TRUE_SLOPE=100 and zero coverage at every placebo horizon (|placebo_att| < 0.1). Tight `round(_, 1) == X.X` pins throughout - HAD's analytical SE path is bit-identical regardless of backend env (no Rust kernel involved). Locked DGP seed: MAIN_SEED=87. Documentation integration: - docs/tutorials/README.md: new T20 entry following T18/T19's 5-bullet pattern. - docs/doc-deps.yaml: T20 added to the existing diff_diff/had.py entry; cross-link to docs/practitioner_decision_tree.rst added. - docs/practitioner_decision_tree.rst: `.. tip::` block at the end of `section-no-untreated` (Universal Rollout - landed on main via PR #389) cross-links to T20 for the full walkthrough. - CHANGELOG.md: new ### Added bullet under [Unreleased]. Out of scope (queued in project_had_followups.md memory): - _handle_had in practitioner.py:_HANDLERS map. - HAD entries in llms-full.txt / choosing_estimator.rst. - Pretest workflow tutorial, weighted/survey HAD tutorial, mass-point design demo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The HeterogeneousAdoptionDiD example snippet in choosing_estimator.rst failed the Pure Python Fallback CI job (test_doc_snippets.py ::test_doc_snippet[choosing_estimator:block7]) due to three latent drift bugs from PR #389 (docs-rtd-audit): 1. Missing aggregate='event_study' on both did_had_pretest_workflow and HAD.fit calls — default aggregate='overall' requires exactly 2 periods, but the doc-snippet test framework's namespace `data` (built via generate_staggered_data) has 10 periods. 2. Used the namespace's generic `data` variable, which has nonzero dose in every period (rng.choice from {0.0, 0.5, 1.0, 2.0}). HAD requires D=0 for all units in at least one pre-period. 3. `print(f"Estimate: {results.att:.3f}")` formatted att as a scalar, but under aggregate='event_study' results.att is a numpy array. Fix: rewrite the snippet to construct its own HAD-shape panel inline (mirrors how block6 handles ContinuousDiD with its own data generator); thread aggregate='event_study' through both calls; iterate the per-horizon att array for output. Pre-existing on origin/main; surfaced on this PR's CI re-run after the rebase. Other failing snippets (troubleshooting:block18, :block20, r_comparison:block6, :block7) are also pre-existing on main but are out of scope for this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix latent doc-snippet bugs from PR #389 (HAD ecosystem)
…Intensity Practitioner walkthrough for HeterogeneousAdoptionDiD on the no-untreated-controls case: every market got the campaign at varying intensity and there is no clean comparison group. Fills the structural gap T14 (ContinuousDiD) cannot address. Notebook scope (23 cells, 13 markdown / 10 code, mirrors T19's structure): - Sections 1-3: framing the no-untreated-controls measurement problem, setup imports, synthetic 60-DMA / 8-week panel with Uniform[$5K, $50K] regional add-on spend (every DMA participates, no DMA at $0). DGP is internally consistent: outcomes are generated from the dose values HAD then sees, no post-hoc relabeling. - Section 4: overall WAS_d_lower fit on a 2-period (pre/post mean) collapse - HAD's overall mode requires exactly 2 periods (had.py:952-959). Locked headline: per-$1K marginal effect of 100 weekly visits per DMA above the boundary spend (95% CI [98.6, 101.4]) with design auto-detection landing on `continuous_near_d_lower` (Design 1) and target `WAS_d_lower`. Surfaces the Assumption 5/6 advisory the library fires for Design 1 and explains why it holds in this DGP (linear by construction). - Section 5: multi-week event-study fit on the 8-week panel, per-week WAS_d_lower for e=0..3 (~100 each, CIs cover truth) and pre-launch placebos at e=-2..-4 sitting on zero. - Section 6: stakeholder communication template (T18/T19 markdown blockquote pattern), per-DMA dollar-lift interpretation `(actual_dose - d_lower) * WAS_d_lower`, Assumption 6 caveat. - Section 7: extensions (population-weighted/survey path, composite pretest workflow described accurately as QUG support-infimum test + linearity tests, mass-point design path), related-tutorials cross-links (T01, T02, T14, T17, T18, T19), summary checklist. Drift detection: companion tests/test_t20_had_brand_campaign_drift.py (13 tests, 0.06s, mirrors T19's test-file-only pattern - T19's notebook itself has zero in-notebook asserts). Pins panel composition including sample median, design auto-detection / target / d_lower, overall WAS_d_lower / SE / CI endpoints to one-decimal display, dose mean, n_units, full event-study horizon presence (e=-4..-2, 0..3), per-week post-launch coverage of TRUE_SLOPE=100 and zero coverage at every placebo horizon (|placebo_att| < 0.1). Tight `round(_, 1) == X.X` pins throughout - HAD's analytical SE path is bit-identical regardless of backend env (no Rust kernel involved). Locked DGP seed: MAIN_SEED=87. Documentation integration: - docs/tutorials/README.md: new T20 entry following T18/T19's 5-bullet pattern. - docs/doc-deps.yaml: T20 added to the existing diff_diff/had.py entry; cross-link to docs/practitioner_decision_tree.rst added. - docs/practitioner_decision_tree.rst: `.. tip::` block at the end of `section-no-untreated` (Universal Rollout - landed on main via PR #389) cross-links to T20 for the full walkthrough. - CHANGELOG.md: new ### Added bullet under [Unreleased]. Out of scope (queued in project_had_followups.md memory): - _handle_had in practitioner.py:_HANDLERS map. - HAD entries in llms-full.txt / choosing_estimator.rst. - Pretest workflow tutorial, weighted/survey HAD tutorial, mass-point design demo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR igerber#389 added HAD code snippets to choosing_estimator.rst, troubleshooting.rst, and r_comparison.rst. None of those edits triggered rust-test.yml (which only runs on rust/, diff_diff/, tests/, pyproject.toml, and the workflow file), so tests/test_doc_snippets.py never executed and the snippets shipped with five latent bugs that now surface on every code PR via the Pure Python Fallback job. Bugs addressed: - r_comparison:block6 — bare HAD.fit(data, ...) with the generate_staggered_data fixture failed because the default aggregate='overall' requires exactly 2 periods and the namespace data has 10. Replaced with an inline HAD-shape panel construction (mirrors the upstream choosing_estimator:block7 fix in 55d7a27) plus aggregate='event_study'. - troubleshooting:block20 — the snippet demonstrates first_treat_col= auto-filtering on a staggered panel. The fixture's first_treat values disagree with the dose path (random per-row dose on never-treated units), tripping HAD's first_treat / dose-path consistency validator. Inlined a 120-unit / 10-period staggered HAD-shape panel (30 never + 30 cohort 5 + 60 cohort 8) so the validator passes and the boundary local-linear estimator has enough distinct dose values to fit. - troubleshooting:block17 / block18 / r_comparison:block7 — these are legitimately context-dependent snippets that reference est / results from prior text-flow context (inspection / output-format examples). Added them to _CONTEXT_DEPENDENT_SNIPPETS so the expected NameError is suppressed, matching the pattern already used for block8, the api_bacon blocks, and the existing r_comparison context-dependent set. choosing_estimator:block7 was the sixth failing snippet but was already fixed upstream in 55d7a27 with the inline-construction pattern; this branch rebases onto that. Verification: PYTHONPATH=. DIFF_DIFF_BACKEND=python pytest tests/test_doc_snippets.py reports 111 passed, 4 skipped, 0 failed on this branch (was 6 failed on origin/main before 55d7a27 and 5 failed after). Follow-up (separate PR queued): carve test_doc_snippets.py out into a dedicated docs-tests.yml workflow triggered on docs/** + diff_diff/** + the test file itself, and exclude it from rust-test.yml's pytest invocations so doc bugs are caught on doc PRs (not silently inherited by code PRs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…est.yml PR igerber#389 (Batch A) shipped six broken HAD doc snippets in 2026-04 that no CI run caught because rust-test.yml only triggers on rust/, diff_diff/, tests/, pyproject.toml, and the workflow file — none of which include docs/. PR igerber#396 patched the snippets but did not address the structural gap. This PR addresses it. Two changes: 1. New .github/workflows/docs-tests.yml — separate workflow that runs `pytest tests/test_doc_snippets.py -v` on a single ubuntu-latest / py3.14 / pure-Python runner. Triggers on docs/, diff_diff/, tests/test_doc_snippets.py, pyproject.toml, and the workflow file itself; same ready-for-ci label gate as rust-test.yml / notebooks.yml. Mirrors notebooks.yml's shape (the existing precedent for `pytest`-validated docs assets) so the two doc-validation workflows look like siblings. 2. .github/workflows/rust-test.yml: add --ignore=tests/test_doc_snippets.py to all three pytest invocations so doc snippets stop riding the code workflow. The Pure Python Fallback edit (line 193) is the only one that changes CI signal: that job runs from the repo root and was the ONLY place where test_doc_snippets.py actually executed. The two Rust-matrix edits (lines 158, 165) are defensive consistency: the matrix copies tests/ to /tmp/tests (rust-test.yml:138, 142) without docs/, so DOCS_DIR resolves to /tmp/docs/ which doesn't exist; the test collector silently skips every RST file via the guard at tests/test_doc_snippets.py:129. Adding --ignore there prevents the no-op from becoming a real run if anyone later adds `cp -r docs ...` to the copy steps. Each invocation now carries an in-YAML comment documenting which case it's the defensive vs behavior-changing edit. Verification: - python -c "import yaml; yaml.safe_load(open('.github/workflows/ docs-tests.yml')); yaml.safe_load(open('.github/workflows/ rust-test.yml'))" — both files well-formed. - pytest tests/ --ignore=tests/test_doc_snippets.py --ignore=tests/test_rust_backend.py --collect-only — 0 occurrences of test_doc_snippets in the collected set (was 115 cases collected when not ignored), confirming pytest accepts repeated --ignore flags as the existing line-193 pattern with --ignore=tests/ test_rust_backend.py already showed. After this PR opens, the workflow file itself triggers docs-tests.yml on its own change, providing the first end-to-end CI validation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Closes the gaps left after PR #372 added HeterogeneousAdoptionDiD to the canonical surfaces. The narrative pages did not yet mention HAD, and the 12-symbol HAD pretest suite shipped in
had_pretests.pywas absent from the API page. Also refreshes the inference-contract block to use thesurvey_design=canonical kwarg consolidated in PR #376.docs/api/had.rst— new HAD Pretests section covering all 12 public symbols (4 single-period tests + 4 result classes + 3 joint tests + 1 joint result), split intoaggregate="overall"andaggregate="event_study"subsections matching the workflow's dispatch. Refreshes the inference-contract block to referencesurvey_design=make_pweight_design(weights)(pweight shortcut) andsurvey_design=SurveyDesign(...)(full TSL); notessurvey=/weights=are deprecated aliases.docs/choosing_estimator.rst— HAD entries in all 3 tables (Quick Reference, Standard Error Methods, Survey Design Support) plus a new "Universal Rollout / No Untreated Control" subsection in Detailed Guidance. SE Methods row usessurvey_design=canonical naming.docs/r_comparison.rst— HAD row in Feature Comparison Table, new "No-Untreated Designs (no R parallel)" subsection, Migration Tips bullet.docs/troubleshooting.rst— new HAD Issues section with 4 subsections (estimand resolution / mass-point fallback / classical SE undersurvey_design=/ panel-only event-study).docs/practitioner_decision_tree.rst— Start Here option 7, At a Glance row, new "Universal Rollout" section with_section-no-untreatedanchor.docs/references.rst— Binder citation references the canonicalsurvey_design=paths.docs/doc-deps.yaml— extendshad_pretests.pyentry withllms.txtuser-guide dep; adds new top-levellocal_linear.pyentry.Methodology references (required if estimator / math changes)
Validation
python -m sphinx -b html -q docs docs/_build/html); 0 new warnings from this PR (71 pre-existing inbacon.py/staggered_bootstrap.pyautosummary unaffected).HADPretestReport,qug_test,stute_test,yatchew_hr_test,did_had_pretest_workflow,QUGTestResults,StuteTestResults,YatchewTestResults,stute_joint_pretest,joint_pretrends_test,joint_homogeneity_test,StuteJointResult);make_pweight_design+SurveyDesignimportable.api/had.html276 hits, narrative pages 4-8 hits each._section-no-untreatedanchor (definition + cross-ref).Security / privacy