Refocus Tutorial 19 on dCDH alone (drop TWFE comparison) by igerber · Pull Request #375 · igerber/diff-diff

igerber · 2026-04-25T18:41:05Z

Summary

Drops the TWFE-vs-dCDH comparison section from Tutorial 19. The proximity comparison was misleading: across 200 seeds of the original DGP, TWFE was on average closer to truth than dCDH (mean |TWFE - 12| = 0.30 vs |dCDH - 12| = 0.52). The dCDH paper's TWFE-bias warnings only bite when effect heterogeneity correlates with cohort or timing - not on the random per-cell heterogeneity our synthetic generator produces. The tutorial now frames dCDH as "the right pick when treatment turns on AND off" without a head-to-head accuracy claim against TWFE.
Tightens DGP to effect_sd=1.5 for cleaner narrative numbers (dCDH lands at 12.05 vs truth 12.0; locked seed=46 from a 100-seed search).
Splits the fit in two: Phase 1 with placebo=False (gets joiners/leavers without firing the documented "single-period placebo SE is NaN" UserWarning), then event study with L_max=2 + n_bootstrap=199 (multi-horizon placebos with valid SE).
Surfaces and explains the Assumption 7 UserWarning that fires on every reversible panel (cost-benefit delta uses A7; headline ATT and event study don't) - instead of silencing it.
Wraps the bootstrap fit in np.errstate(divide="ignore", over="ignore", invalid="ignore") to silence Apple Accelerate's spurious matmul FP-error RuntimeWarnings (numpy issue #26669, fires only on macOS Accelerate-linked NumPy builds), with a code comment naming the attribution.
Drops the drift-guards section (maintenance helper, not pedagogy).
Net: 23 cells, down from 35. pytest --nbmake passes in ~2.4s.

Methodology references (required if estimator / math changes)

Method: de Chaisemartin & D'Haultfoeuille DID_M (AER 2020) and DID_l (NBER WP 29873 dynamic companion). No estimator code changed.
Paper / source link(s): AER 2020 - https://www.aeaweb.org/articles?id=10.1257/aer.20181169 ; dynamic companion - https://www.nber.org/papers/w29873
Any intentional deviations: None - documentation only. No source files in diff_diff/, rust/src/, or docs/methodology/ modified.

Validation

Tests added/updated: None (no source changes).
Notebook validation: pytest --nbmake docs/tutorials/19_dcdh_marketing_pulse.ipynb passes in ~2.4s. The notebook ships with cleared outputs; nbmake re-executes end-to-end.
Locked numbers (seed=46, effect_sd=1.5):
- Headline overall_att = 12.054 (CI [11.33, 12.78], covers truth 12.0)
- joiners_att = 12.124, leavers_att = 11.933 (both close to truth, both within sampling uncertainty of each other)
- Event study: l=1 → 12.39 (CI [11.39, 13.27]), l=2 → 12.61 (CI [11.54, 13.64])
- Placebos: l=-2 → 0.33, l=-1 → 0.36 (both small; CIs cover 0)
Warning policy: the Assumption 7 UserWarning is the only warning that fires on the visible cells (intentional, explained in markdown). Bootstrap RuntimeWarnings from Apple Accelerate are silenced via narrow np.errstate with attribution comment - those fire only on macOS Accelerate-linked NumPy builds and don't affect the result.

Security / privacy

Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

The original Tutorial 19 framed dCDH as a fix for TWFE bias on reversible-treatment panels, demonstrating with a TWFE-vs-dCDH proximity comparison on a synthetic panel. Two problems with that framing surfaced on review: 1. The proximity comparison was misleading. Across 200 seeds of the original DGP (effect_sd=4.0), TWFE was on average closer to truth than dCDH (mean |TWFE - 12| = 0.30 vs |dCDH - 12| = 0.52). The dCDH paper's TWFE-bias warnings only bite when effect heterogeneity correlates with cohort or timing - not on the random per-cell heterogeneity our synthetic generator produces. Without that systematic structure, the diagnostic was warning about a problem that didn't materialize on the demo panel, and the head-to-head numbers undercut dCDH's credibility (TWFE 11.5 vs dCDH 11.2 vs truth 12.0). 2. Warning suppression had crept into the notebook in two places. The user feedback policy is: prefer DGP/fit conditions that fire no warnings; when warnings DO fire, surface and explain rather than silence. This rewrite restructures around "treatment turns on AND off, dCDH is the right pick for that case" with no TWFE comparison: - Drops Section 4 (TWFE diagnostic + bar chart + transition prose) - Drops the `twowayfeweights` import - Tightens DGP to effect_sd=1.5 for cleaner numbers (dCDH lands at 12.05 vs truth 12.0; locked seed=46 from a 100-seed search) - Splits the dCDH fit in two: (a) Phase 1 with placebo=False to get joiners/leavers without firing the documented "single-period placebo SE is NaN" UserWarning, then (b) event study with L_max=2 + n_bootstrap=199 for multi-horizon placebos with valid SE - Surfaces the Assumption 7 UserWarning that fires on every reversible panel and adds a markdown cell explaining why it fires (cost-benefit delta uses A7; headline ATT and event study don't) and why we accept it on a reversible design - instead of silencing it - Wraps the bootstrap fit in `np.errstate(divide="ignore", over="ignore", invalid="ignore")` to silence Apple Accelerate's spurious matmul FP-error RuntimeWarnings (numpy issue #26669, fires only on macOS Accelerate-linked NumPy builds), with a code comment naming the attribution honestly - Drops the drift-guards section (maintenance helper, not pedagogy; prose drift on a tutorial isn't a critical CI failure) - Strips "Phase 1" jargon from headings and abstract Net: 23 cells, down from 35. nbmake passes in ~2.4s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-25T18:46:12Z

Overall Assessment

⚠️ Needs changes

Executive Summary

P1 The tutorial now says dCDH is built for within-market on -> off -> on paths, but the current estimator contract drops multi-switch groups by default and this notebook itself uses an A5-safe single-switch DGP.
P1 The A11 explanation now says mixed-direction single-switch panels “naturally satisfy” the stable-controls assumption. That is not true; A11 remains a per-period empirical condition, and the test suite includes a single-switch counterexample.
P2 The new notebook-level np.errstate(...) suppresses all NumPy divide/overflow/invalid warnings for the whole bootstrap fit, not just the specific Accelerate false positives described in the comment.
P3 The PR removes the drift-guard assertions while keeping several seed-specific numeric claims in markdown, so future drift can stale the tutorial text without failing nbmake.
Execution note: I could not re-run the notebook here because this review environment does not have numpy; the findings below are from static inspection against the checked-in registry, docstrings, and tests.

Methodology

P1 docs/tutorials/19_dcdh_marketing_pulse.ipynb:L20-L23 says the promo can be “on ... then off, then back on” and that dCDH is built “for exactly this case.” That conflicts with the documented estimator contract, which drops groups with more than one treatment-change period by default (docs/methodology/REGISTRY.md:L483-L486, docs/methodology/REGISTRY.md:L614-L614; diff_diff/chaisemartin_dhaultfoeuille.py:L389-L397), and with the generator/tutorial’s own A5-safe single-switch setup (diff_diff/prep_dgp.py:L1832-L1836, diff_diff/prep_dgp.py:L1926-L1937; docs/tutorials/19_dcdh_marketing_pulse.ipynb:L52-L55). Impact: readers are told the default supported path handles multi-switch panels when it actually filters them out or requires drop_larger_lower=False with an explicitly inconsistent estimator/SE pairing. Concrete fix: rewrite the intro to distinguish the general reversible-treatment motivation from the supported tutorial scope: single-switch joiner/leaver panels under A5.
P1 docs/tutorials/19_dcdh_marketing_pulse.ipynb:L157-L159 says single-switch panels with both joiners and leavers “naturally satisfy A11.” The registry says A11 is checked period-by-period (docs/methodology/REGISTRY.md:L489-L491, docs/methodology/REGISTRY.md:L616-L616), and the test suite has a single-switch panel that violates A11 (tests/test_chaisemartin_dhaultfoeuille.py:L877-L908). Impact: users may infer stable-control availability is automatic and underweight the warning that guards identification on reversible panels. Concrete fix: replace that sentence with a dataset-specific claim (“this seed does not trigger the A11 warning”) or add an explicit per-period A11 check/output.

Code Quality

P2 docs/tutorials/19_dcdh_marketing_pulse.ipynb:L241-L261 wraps the full L_max=2 fit in np.errstate(divide="ignore", over="ignore", invalid="ignore"). The project’s own tracking and code scope similar suppressions to specific matmul sites (TODO.md:L198-L204; diff_diff/chaisemartin_dhaultfoeuille.py:L4517-L4523; diff_diff/_nprobust_port.py:L554-L559). Impact: this notebook now suppresses any new floating-point warnings from the entire fit, not just the known Accelerate false positives, which weakens it as a regression detector. Concrete fix: localize the suppression to the specific operation/platform or keep warning suppression inside library code rather than around the whole fit.

Performance

No findings in the changed diff.

Maintainability

No findings beyond the scoped-warning issue above.

Tech Debt

No new TODO-tracked debt added; no separate finding.

Security

No findings.

Documentation/Tests

P3 The notebook still hard-codes seed-specific results in prose/template form (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L184-L206, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L312-L338), but the prior assertion-based drift guard has been removed and there is no replacement validation in the notebook. Impact: future estimator/DGP drift can leave the narrative stale while pytest --nbmake still passes. Concrete fix: restore a hidden assert cell or add a dedicated notebook regression test that checks the rounded quoted values and truth-coverage claims for the locked seed.

Path to Approval

Rewrite the intro so it is A5-accurate: dCDH is the library’s reversible-treatment estimator, but this tutorial and the default analytical SE path cover the single-switch joiner/leaver case, not arbitrary on -> off -> on paths.
Rewrite the A11 paragraph so it does not claim mixed-direction single-switch panels imply A11. Make it explicit that A11 is still checked period-by-period and that this chosen DGP/seed happens not to trigger the warning.

…drift test Four fixes from CI review: 1. **P1 - A5 scope in intro.** Section 1 previously said "the promo can be on for a market, then off, then back on" - misleading because the default `drop_larger_lower=True` filter drops multi-switch groups and our DGP uses `pattern="single_switch"` (A5-safe). Rewrite the intro to be A5-accurate: across-market reversibility (joiners + leavers in the same panel), not within-market on-off-on cycling. Add an explicit "Scope of this tutorial" paragraph that names A5 and points at `by_path` for the multi-switch extension. 2. **P1 - A11 doesn't "naturally" hold.** The "Where do the controls come from?" paragraph previously said single-switch panels with both joiner and leaver cohorts "naturally satisfy A11" - false. The test suite has a single-switch panel where A11 fails (`tests/ test_chaisemartin_dhaultfoeuille.py::TestA11Handling::test_a11_ violation_zero_in_numerator_retain_in_denominator`). Replace with a seed-specific claim ("this seed and DGP happen not to trigger an A11 warning") and a pointer at the test as a counterexample. 3. **P2 - Narrow the matmul filter.** Replace `np.errstate(divide="ignore", over="ignore", invalid="ignore")` - which suppresses ALL FP error categories from the entire fit - with `warnings.filterwarnings(category=RuntimeWarning, message=r".*encountered in matmul")` - which only catches the Accelerate matmul pattern. Unrelated future RuntimeWarnings now surface. 4. **P3 - Restore drift detection via sibling test file.** Add `tests/test_t19_marketing_pulse_drift.py` with 8 tests that re-derive the narrative numbers (overall_att, joiners, leavers, event-study horizons, placebos, panel composition, A7 warning fires, A11 warning does not fire) at the locked seed and check them against the tolerance bands quoted in the markdown. If a future library change moves any number outside its band, the test fails and a maintainer is forced to update the prose. Keeps the notebook clean of maintenance scaffolding while addressing the stale-prose risk. Test runs in <100ms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T18:59:47Z

/ai-review

github-actions · 2026-04-25T19:04:36Z

🔁 AI review rerun (requested by @igerber)

Head SHA: ba225330cb693a76c0c13240bb07c3ea67a417cc

Overall Assessment

✅ Looks good

No unmitigated P0/P1 issues in the changed diff. The prior methodology blockers from the earlier AI review appear resolved. Review is static only: I could not execute tests here because this environment lacks both pytest and numpy.

Executive Summary

Tutorial 19 now explicitly scopes itself to the A5-safe single-switch case, which matches the default dCDH analytical-SE contract in docs/tutorials/19_dcdh_marketing_pulse.ipynb:L24-L24, docs/methodology/REGISTRY.md:L486-L491, and diff_diff/prep_dgp.py:L1926-L1933.
The A11 explanation is now correct: it is described as a per-period stable-controls condition and explicitly not automatic on single-switch panels, consistent with docs/tutorials/19_dcdh_marketing_pulse.ipynb:L160-L160, docs/methodology/REGISTRY.md:L491-L491, docs/methodology/REGISTRY.md:L616-L616, and tests/test_chaisemartin_dhaultfoeuille.py:L877-L908.
The notebook no longer blankets the bootstrap fit with np.errstate(...); it now suppresses only the known RuntimeWarning: ... encountered in matmul, while leaving the methodology-relevant A7 warning visible in the tutorial flow (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L228-L256).
The new drift test is a real improvement: it pins the locked DGP, headline ATT bands, joiner/leaver split, placebo behavior, and expected A7/A11 warning regime (tests/test_t19_marketing_pulse_drift.py:L36-L225).
P3 The drift test still does not assert some explicit prose claims in the notebook summary, so a few narrative lines can still drift silently.

Methodology

No findings. The changed notebook text now matches the documented single-switch/A5 scope and the registry’s per-period A11 contract.

Code Quality

No findings. Warning handling in the notebook is materially tighter and aligned with the stated intent.

Performance

No findings in the changed diff.

Maintainability

No findings.

Tech Debt

No findings. No new TODO-tracked debt is needed for approval.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the new regression test protects approximate effect sizes and truth-coverage, but it does not pin the notebook’s explicit CI/significance claims. The prose says 95% CI: 11.3 to 12.8, l=1 CI [11.4, 13.3], l=2 CI [11.5, 13.6], and bootstrap p < 0.01 at both post-treatment horizons, but tests/test_t19_marketing_pulse_drift.py:L114-L153 only checks broad effect bands and coverage, not those quoted endpoints or p-values. Those lines in docs/tutorials/19_dcdh_marketing_pulse.ipynb:L186-L188 and docs/tutorials/19_dcdh_marketing_pulse.ipynb:L323-L346 can still go stale while the test passes. Concrete fix: extend the drift test to assert tight bands on the quoted CI endpoints and to check event_study_effects[1]["p_value"] < 0.01 and event_study_effects[2]["p_value"] < 0.01.

CI review surfaced that the drift test pinned broad effect bands and truth-coverage but not the specific CI endpoints quoted in the prose ('95% CI: 11.3 to 12.8', l=1 [11.4, 13.3], l=2 [11.5, 13.6]) or the 'bootstrap p < 0.01' significance claim in the stakeholder template. Those narrative lines could drift silently while the existing tests passed. Add three new tests: - test_overall_ci_endpoints_match_quoted: locks the Phase 1 ATT CI endpoints to bands around the quoted 11.3 / 12.8 (tolerance ~0.3) - test_event_study_ci_endpoints_match_quoted: locks the L_max=2 event study CI endpoints at l=1 and l=2 to bands around the quoted [11.4, 13.3] and [11.5, 13.6] (tolerance ~0.3) - test_event_study_significance: asserts both post-treatment horizons have bootstrap p_value < 0.01 11 tests pass in ~0.07s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T19:06:23Z

/ai-review

github-actions · 2026-04-25T19:11:21Z

🔁 AI review rerun (requested by @igerber)

Head SHA: a7f2b0e06e60a12267a6c0e25b85a9ac54b0a76e

Overall Assessment

✅ Looks good

Static review only: this environment lacks both pytest and numpy, so I could not execute the notebook or the new drift test locally.

Executive Summary

No estimator or registry code changed, and the rewritten tutorial now matches the documented dCDH contract for A5 single-switch scope, A11 per-period stable-control checks, A7 cost-benefit-delta warnings, and the Phase 1 placebo-NaN-SE limitation.
The prior re-review P3 on missing significance coverage is partly addressed: the new drift test now asserts p_value < 0.01 at both post-treatment horizons (tests/test_t19_marketing_pulse_drift.py:L167-L172).
I found no unmitigated P0/P1 issues in the changed diff.
Two minor P3 items remain: the new CI “lock” bands are too loose to actually pin the quoted one-decimal prose, and the warning tests do not fully assert the notebook’s stated warning surface.

Methodology
No findings. The new notebook text is consistent with the DCDH methodology contract on A5-safe single_switch data, A11 warning behavior, A7 cost-benefit-delta warnings, and the documented L_max=None placebo inference limitation (diff_diff/prep_dgp.py:L1919-L1937, docs/methodology/REGISTRY.md:L486-L558, diff_diff/chaisemartin_dhaultfoeuille.py:L2140-L2308, tests/test_chaisemartin_dhaultfoeuille.py:L875-L969, tests/test_chaisemartin_dhaultfoeuille.py:L1109-L1151).

Code Quality

Severity P3. Impact: event_study_results says it mirrors the notebook’s warning handling with “A7 visible”, but it actually suppresses the A7 warning via warnings.filterwarnings(...), so the shared fixture’s docstring is false and easy to misread during future maintenance (tests/test_t19_marketing_pulse_drift.py:L79-L91). Concrete fix: either remove the A7 suppression from that fixture or change the docstring to say A7 is intentionally muted there and validated separately by test_assumption7_warning_fires_as_expected.

Performance
No findings in the changed diff.

Maintainability
No findings beyond the Code Quality note above.

Tech Debt
No findings. The tutorial’s placebo=False Phase 1 path is aligned with the already-tracked DID_M^pl NaN-SE limitation in TODO.md:L59, so this is properly documented deferred work rather than a blocker.

Security
No findings.

Documentation/Tests

Severity P3. Impact: the new endpoint assertions still do not actually “lock the rounded endpoints” they claim to lock. Bands like 11.0 <= ci_low <= 11.6 and 13.0 <= ci_hi <= 13.6 would still pass values that round to several different tenths while the notebook prose stays at 11.3, 12.8, 11.4, 13.3, etc., so stale narrative numbers can still slip through (tests/test_t19_marketing_pulse_drift.py:L125-L132, tests/test_t19_marketing_pulse_drift.py:L155-L164 versus docs/tutorials/19_dcdh_marketing_pulse.ipynb:L186-L188, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L323-L346). Concrete fix: assert the displayed rounding directly, e.g. round(ci_low, 1) == 11.3, or use tight absolute tolerances such as abs(ci_low - 11.3) < 0.05.
Severity P3. Impact: the warning tests do not fully pin the notebook’s stated warning policy. The notebook says the visible event-study fit should show the A7 UserWarning while narrowly silencing only Accelerate matmul RuntimeWarnings (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L228-L269), but the tests only assert that an A7 warning exists somewhere and suppress broad floating-point warnings with np.errstate, so extra unexpected warnings could appear without failing the suite (tests/test_t19_marketing_pulse_drift.py:L186-L255). Concrete fix: mirror the notebook’s message-scoped warning filter in the test and assert the exact warning set for the visible event-study cell.

CI review surfaced two refinements: 1. Endpoint bands like `11.0 <= ci_low <= 11.6` would still pass values rounding to several different one-decimal displays (11.0, 11.1, ..., 11.6) while the notebook prose stays at "11.3", "12.8", "11.4", "13.3", "11.5", "13.6". Replace those with `round(ci_low, 1) == 11.3` etc. - directly pins the displayed rounding so any drift past the tenth fails the test. 2. The warning tests didn't pin the notebook's full warning contract. `event_study_results` suppressed A7 for fixture cleanliness while the docstring claimed "A7 visible". Two changes: - Fix the fixture docstring to acknowledge A7 is muted there for value-checking tests, with the notebook's actual warning-policy contract validated separately - Add `test_event_study_warning_policy_matches_notebook` that mirrors the notebook's exact filter (only matmul-pattern RuntimeWarnings silenced) and asserts the resulting warning set: exactly one UserWarning (A7 leavers-present, the one the markdown explains) and zero RuntimeWarnings. If a future library change emits an unexpected warning on this code path, the test fails. 12 tests pass in ~0.07s (was 11). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T19:13:27Z

/ai-review

github-actions · 2026-04-25T19:19:31Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 6a663289ddec907def490f58bfa82d0357017757

Overall Assessment

✅ Looks good

Static review only: this environment lacks pytest and numpy, so I could not execute tests/test_t19_marketing_pulse_drift.py or the notebook locally.

Executive Summary

No estimator or registry code changed, and the rewritten tutorial remains consistent with the documented dCDH contract for Phase 1 placebo NaN SE, dynamic placebo inference, A7 warning semantics, and A11 handling (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L160-L188, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L228-L269, docs/methodology/REGISTRY.md:L554-L606, diff_diff/chaisemartin_dhaultfoeuille.py:L1628-L1765, diff_diff/chaisemartin_dhaultfoeuille.py:L2231-L2295, diff_diff/chaisemartin_dhaultfoeuille.py:L2719-L3187).
The previous re-review P3s appear addressed: CI endpoints are now pinned by displayed rounding, the warning-policy test now asserts the exact visible event-study warning set, and the A7-muted fixture docstring is now accurate (tests/test_t19_marketing_pulse_drift.py:L77-L83, tests/test_t19_marketing_pulse_drift.py:L128-L166, tests/test_t19_marketing_pulse_drift.py:L216-L255).
I found no unmitigated P0/P1 issues in the changed diff.
One minor P3 remains: provenance comments point to _scratch/dcdh_tutorial helpers that are not present in this checkout (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L78, tests/test_t19_marketing_pulse_drift.py:L15-L16).

Methodology

No findings. The notebook’s statements about placebo=False, multi-horizon placebos at L_max=2, the informational A7 warning, and bootstrap percentile inference match the documented dCDH contract (docs/methodology/REGISTRY.md:L554-L606).

Code Quality

No findings.

Performance

No findings.

Maintainability

Severity P3. Impact: the notebook seed comment and the new drift-test docstring cite _scratch/dcdh_tutorial/40_build_notebook.py and _scratch/dcdh_tutorial/lock_seed.py as the provenance for the locked DGP/seed, but those files are not in the repo, so a maintainer cannot actually follow the breadcrumb from this checkout (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L78, tests/test_t19_marketing_pulse_drift.py:L15-L16). Concrete fix: either commit the referenced helper scripts or rewrite the comments to say the notebook/test themselves are the source of truth for the locked seed and quoted numbers.

Tech Debt

No findings. The tutorial’s Phase 1 placebo=False choice is compatible with the already-tracked DID_M^pl NaN-SE limitation in TODO.md:L59, so this is documented deferred work rather than a blocker.

Security

No findings.

Documentation/Tests

No new findings. The prior CI-rounding and warning-surface gaps look resolved by the direct one-decimal assertions and the exact warning-policy regression test (tests/test_t19_marketing_pulse_drift.py:L128-L166, tests/test_t19_marketing_pulse_drift.py:L216-L255).

Pure Python CI failed at test_event_study_ci_endpoints_match_quoted because the bootstrap RNG path differs between Rust and pure-Python backends (per the bit-identity-baseline-per-backend convention): Rust: es[1] CI low = 11.394 -> rounds to 11.4 (matches prose) Pure Python: es[1] CI low = 11.487 -> rounds to 11.5 (mismatch) The 0.09 backend gap is enough to flip the rounding boundary on the exact-match `round(_, 1) == 11.4` pin I tightened to in the prior round. Loosen the four bootstrap-CI endpoint asserts to a 0.15 absolute tolerance band around the quoted prose values. Tight enough to catch real prose drift (a real shift would move by >>0.15), loose enough to absorb the documented backend variance. Verified on both backends locally: pytest tests/test_t19_marketing_pulse_drift.py -> 12/12 pass DIFF_DIFF_BACKEND=python pytest <same> -> 12/12 pass The analytical-SE endpoint asserts in test_overall_ci_endpoints_match_quoted keep the strict `round(_, 1) ==` pin since they're not bootstrap-driven and are bit-identical across backends. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber added the ready-for-ci Triggers CI test workflows label Apr 25, 2026

igerber merged commit 75eab37 into main Apr 25, 2026
22 checks passed

igerber deleted the refactor/dcdh-tutorial-focus branch April 25, 2026 21:45

igerber mentioned this pull request Apr 26, 2026

Release 3.3.1: HAD survey-design consolidation, dCDH by_path placebos + sup-t bands, Phase 4.5 C #387

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refocus Tutorial 19 on dCDH alone (drop TWFE comparison)#375

Refocus Tutorial 19 on dCDH alone (drop TWFE comparison)#375
igerber merged 5 commits intomainfrom
refactor/dcdh-tutorial-focus

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Apr 25, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant