Add Tutorial 19: dCDH for Marketing Pulse Campaigns by igerber · Pull Request #373 · igerber/diff-diff

igerber · 2026-04-25T14:48:59Z

Summary

Adds docs/tutorials/19_dcdh_marketing_pulse.ipynb (35 cells, 12 code / 23 markdown, ~3s nbmake) — practitioner walkthrough of ChaisemartinDHaultfoeuille (alias DCDH) on a 60-market reversible-treatment panel
Tutorial covers Phase 1 (DID_M, joiners-vs-leavers, single-lag placebo, AER 2020 Theorem 1 TWFE diagnostic via twowayfeweights) and the multi-horizon event study with multiplier bootstrap (L_max, n_bootstrap)
Section 4 includes an explicit "Where do the controls come from?" block clarifying that dCDH's contemporaneously-stable-cell identification means the panel can have zero never-treated and zero always-treated units, distinguishing it from CallawaySantAnna and other staggered-DiD estimators
Stakeholder-communication template with explicit per-bullet source mapping; drift guards lock the headline numbers (overall_att, placebo magnitude, TWFE fraction_negative, event-study l=1 effect)
Tutorial 17 (Brand Awareness Survey) backfilled in docs/tutorials/README.md (was missing)
Wiring: docs/doc-deps.yaml adds the notebook to the chaisemartin_dhaultfoeuille.py dependency list; docs/practitioner_decision_tree.rst adds a .. tip:: cross-link in the Reversible Treatment section
CHANGELOG [Unreleased] entry

Methodology references (required if estimator / math changes)

Method: de Chaisemartin & D'Haultfoeuille DID_M (AER 2020) and DID_l (NBER WP 29873 dynamic companion)
Paper / source link(s): AER 2020 — https://www.aeaweb.org/articles?id=10.1257/aer.20181169 ; dynamic companion — https://www.nber.org/papers/w29873
Any intentional deviations: None — this PR is documentation only. No source files in diff_diff/, rust/src/, or docs/methodology/ modified. The tutorial uses the existing public API (DCDH, twowayfeweights, generate_reversible_did_data) and surfaces the documented Phase 1 + L_max + bootstrap contracts already locked in docs/methodology/REGISTRY.md § ChaisemartinDHaultfoeuille.

Validation

Tests added/updated: None (no source changes).
Notebook validation: pytest --nbmake docs/tutorials/19_dcdh_marketing_pulse.ipynb passes in ~3s. The notebook ships with cleared outputs; nbmake re-executes end-to-end and the final drift-guard cell prints All drift guards passed.
Drift guards (cell t19-cell-031) lock the narrative numbers: overall_att ∈ [10.72, 11.72] and CI covers truth, |placebo_effect| < 1.5, twfe_fraction_negative >= 0.10, event_study_effects[1]["effect"] ∈ [10.24, 12.24]. Tolerances chosen wide enough to absorb numerical noise, tight enough to detect material drift.
Seed-search lineage: parameters locked at seed=53, n_groups=60, n_periods=8, pattern="single_switch", heterogeneous_effects=True, effect_sd=4.0, L_max=2, n_bootstrap=199 (matches ci_params.bootstrap convention). Selection criteria: TWFE fraction_negative >= 0.10 and bias visible, dCDH ATT covers truth, both joiners and leavers contribute, multi-horizon event-study CIs cover truth and placebos cover 0.

Security / privacy

Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

github-actions · 2026-04-25T14:55:23Z

Overall Assessment

✅ Looks good

Executive Summary

This is a documentation-only PR; no estimator, weighting, variance, or default-behavior code changed.
Cross-checking the tutorial against docs/methodology/REGISTRY.md found the core dCDH methodology aligned, including the documented Phase 1 placebo SE=NaN contract and the separate Phase 2 DID_{g,1} path.
P2: the new tutorial misstates the dynamic placebo horizon range once in prose; the implementation exposes placebo lags 1..L_max under negative keys -l, so for L_max=2 the horizons are -2 and -1, not -1..-L+1.
P3: a couple of comparison claims are broader than the project’s own source of truth ("only Python estimator" / other estimators needing a cohort that “survives to the end”); the registry only claims uniqueness “in the library.”
I could not rerun pytest --nbmake in this sandbox because numpy is not installed, so execution validation here was static-only.

Methodology

P2 Impact: the notebook says L_max=L yields placebo horizons l = -1..-L+1, which is inconsistent with both the implementation and the later notebook text that discusses l = -2 and l = -1. This is a methodology/API-surface documentation mismatch for the event-study placebo output. Concrete fix: change that sentence to l = -L..-1 (or “negative horizons -1, -2, ..., -L”). Refs: docs/tutorials/19_dcdh_marketing_pulse.ipynb:L350-L352, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L427-L429, diff_diff/chaisemartin_dhaultfoeuille.py:L2970-L2981, diff_diff/chaisemartin_dhaultfoeuille.py:L5543-L5559, docs/methodology/REGISTRY.md:L546-L548
P3 Impact: the tutorial broadens comparison language beyond the registry by saying other staggered estimators need a cohort that “survives to the end” and that dCDH is the “only Python estimator” for reversible treatment. The registered claim is narrower: uniqueness is only asserted “in the library.” Concrete fix: scope these lines to “among diff-diff’s staggered estimators” and keep the comparison focused on the absorbing-treatment restriction. Refs: docs/tutorials/19_dcdh_marketing_pulse.ipynb:L268-L270, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L451-L451, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L550-L550, docs/methodology/REGISTRY.md:L478-L478

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. Nothing here needs a new TODO.md entry.

Security

No findings; I did not see secrets, credentials, or PII in the changed files.

Documentation/Tests

P3 Impact: the notebook wraps the L_max=2 fit in warnings.simplefilter("ignore", UserWarning), which suppresses not just the expected Assumption 7 cost-benefit warning but any future methodology warnings on that path as well (for example thin-horizon or placebo-control-availability warnings). That weakens the notebook’s value as a drift detector. Concrete fix: narrow the filter to the expected warning text, or explicitly assert the expected warning and let any new UserWarning fail the notebook. Refs: docs/tutorials/19_dcdh_marketing_pulse.ipynb:L362-L374, diff_diff/chaisemartin_dhaultfoeuille.py:L1917-L1933, diff_diff/chaisemartin_dhaultfoeuille.py:L1988-L1995, diff_diff/chaisemartin_dhaultfoeuille.py:L2114-L2120

igerber · 2026-04-25T15:44:17Z

/ai-review

github-actions · 2026-04-25T15:50:56Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 30083faaad96af63bf6d4edb16a140202f5c2dfe

Overall Assessment

⚠️ Needs changes

Executive Summary

Documentation-only PR; no estimator, weighting, variance, or default-behavior code changed. The affected method surface is ChaisemartinDHaultfoeuille tutorial coverage of DID_M, DID_M^pl, DID_l, and DID^{pl}_l.
Severity P1 [Newly identified]: the tutorial claims Assumption 11 is satisfied from whole-panel stable-cell totals, but the method contract is period-specific. That is the wrong identification check for dCDH.
Prior P2 remains unresolved: the notebook still states placebo horizons as -1..-L+1; the implementation/registry expose negative horizons -L..-1, and the notebook later uses -2 and -1.
Prior P3 remains unresolved: comparison language is still broader than the project’s source of truth ("only Python estimator", other estimators needing a cohort that “survives to the end”).
Prior P3 remains unresolved: the L_max=2 fit still suppresses all UserWarnings, weakening the notebook as a drift detector.
Static-only review here; pytest --nbmake could not be rerun because this sandbox is missing numpy.

Methodology

Severity P1 [Newly identified]; Impact: docs/tutorials/19_dcdh_marketing_pulse.ipynb:L261-L270 teaches the wrong way to verify dCDH Assumption 11 by citing aggregate stable-cell totals across the full panel. The contract is period-by-period: when switchers exist in period t, the relevant stable controls must exist in that same period, and violations are tracked/warned per period, not by global counts (docs/methodology/REGISTRY.md:L491-L492, docs/methodology/REGISTRY.md:L616-L617, diff_diff/chaisemartin_dhaultfoeuille.py:L1618-L1626). A panel can have many stable cells overall and still fail Assumption 11 in a switching period. Concrete fix: replace that sentence with a per-period check/table showing stable_0/stable_1 availability at each switching week, or soften it to “the simulated DGP was constructed to satisfy Assumption 11” instead of inferring satisfaction from global totals.
Severity P2 (prior finding unresolved); Impact: the event-study intro still says L_max=L yields placebo horizons l = -1..-L+1, which is inconsistent with both the implementation’s negative-key output and the notebook’s own later discussion of l=-2 and l=-1 (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L350-L352, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L427-L429, diff_diff/chaisemartin_dhaultfoeuille.py:L2970-L2982, diff_diff/chaisemartin_dhaultfoeuille.py:L5543-L5559). Concrete fix: change the prose to l = -L..-1 or “negative horizons -1, -2, ..., -L.”
Severity P3 (prior finding unresolved); Impact: the tutorial still makes ecosystem-wide comparison claims that are broader than the project’s documented source of truth, e.g. “the only Python estimator” and the suggestion that other estimators need a cohort that “survives to the end of the panel” (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L268-L269, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L451-L451, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L550-L550). The registry/docstring only claim uniqueness within this library, and the Callaway-Sant'Anna contract explicitly supports not_yet_treated controls without requiring never-treated units (docs/methodology/REGISTRY.md:L446-L447, docs/methodology/REGISTRY.md:L462-L463, docs/methodology/REGISTRY.md:L478-L479, diff_diff/chaisemartin_dhaultfoeuille.py:L294-L299). Concrete fix: scope these statements to “among diff-diff’s estimators” or cite an external survey if an ecosystem-wide claim is intended.

Code Quality

No findings in the PR-scoped changes.

Performance

No findings in the PR-scoped changes.

Maintainability

No findings in the PR-scoped changes.

Tech Debt

No findings. No relevant TODO.md entry mitigates the issues above.

Security

No findings; no secrets, credentials, or PII were visible in the changed files.

Documentation/Tests

Severity P3 (prior finding unresolved); Impact: the notebook still wraps the L_max=2 fit in warnings.simplefilter("ignore", UserWarning), which will also hide future methodology warnings on that path, including thin-horizon, placebo-control-availability, and Assumption 7 warnings (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L362-L367, diff_diff/chaisemartin_dhaultfoeuille.py:L1917-L1933, diff_diff/chaisemartin_dhaultfoeuille.py:L1988-L1996, diff_diff/chaisemartin_dhaultfoeuille.py:L2114-L2120). Concrete fix: filter only the expected warning text or explicitly assert the expected warning and let any new UserWarning fail notebook CI.
No additional test finding, but execution validation here was static-only because numpy is not installed in this sandbox.

Path to Approval

Fix the Assumption 11 explanation in docs/tutorials/19_dcdh_marketing_pulse.ipynb so it verifies stable controls period-by-period, or explicitly state that the simulated DGP was constructed to satisfy the assumption rather than inferring it from whole-panel counts. After that, the remaining issues are non-blocking P2/P3 documentation cleanup and the assessment would move to ✅.

End-to-end practitioner walkthrough on a 60-market reversible-treatment panel covering: the AER 2020 Theorem 1 TWFE decomposition diagnostic via twowayfeweights, DCDH Phase 1 (DID_M, joiners-vs-leavers, single-lag placebo, TWFE diagnostic block), the L_max multi-horizon event study with multiplier bootstrap, a stakeholder-communication template with explicit per-bullet source mapping, and drift guards. The tutorial leans into dCDH's distinguishing feature - it works on panels with no never-treated and no always-treated units (only switchers), because identification rests on contemporaneously-stable cells rather than a permanent never-treated comparison group. Doc edits beyond the notebook: - README backfills the missing Tutorial 17 (Brand Awareness Survey) entry alongside the new Tutorial 19 entry - docs/doc-deps.yaml wires the notebook into the dCDH dependency list so /docs-impact flags it on future estimator changes - docs/practitioner_decision_tree.rst adds a tip cross-link in the Reversible Treatment section (mirrors the T17/T18 cross-link form) - CHANGELOG [Unreleased] entry under Added Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ilter - Fix placebo horizon range from `l = -1..-L+1` to `l = -L..-1` (matches the implementation: at L_max=2 horizons are l = -2 and l = -1) - Scope "the only Python estimator" claims to "diff-diff's only estimator" in Section 1 abstract and the stakeholder template - REGISTRY.md asserts uniqueness in the library, not across all of Python - Rewrite the Section 4 cross-estimator paragraph to enumerate the actual staggered estimators in diff-diff (CallawaySantAnna, SunAbraham, WooldridgeETWFE, ImputationDiD, TwoStageDiD, EfficientDiD) and frame the comparison around the absorbing-treatment restriction rather than "needs a never-treated cohort that survives to the end of the panel" - Narrow the L_max=2 fit warning filter from `simplefilter("ignore", UserWarning)` to `filterwarnings("ignore", message=r"Assumption 7 .*", category=UserWarning)` so only the expected leavers-present warning is silenced; any new / unexpected UserWarning will surface and keep the notebook usable as a drift detector Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Section 4 paragraph previously inferred Assumption 11 satisfaction from whole-panel stable-cell totals (154 stable_0, 206 stable_1). That is the wrong check - A11 is per-period: at every switching period, at least one stable cell of the relevant type must exist. Rewrite the closing sentences of the "Where do the controls come from?" paragraph to (a) make the per-period nature of the check explicit, (b) reference the library's fit-time A11 warning machinery as the correct verification mechanism, (c) note that our fit ran without any A11 warning, and (d) explain why single-switch panels naturally tend to satisfy A11 (adjacent cohorts function as stable controls for each other). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T15:55:59Z

/ai-review

github-actions · 2026-04-25T15:59:33Z

🔁 AI review rerun (requested by @igerber)

Head SHA: cc904db3f6c5c972bb81cdd01d92687436508468

Overall Assessment

✅ Looks good

Executive Summary

Re-review outcome: the prior P1 on the Assumption 11 explanation is fixed. The tutorial now correctly states that stable-control availability is checked period-by-period and aligns with the implementation’s warning/zero-retention behavior in docs/tutorials/19_dcdh_marketing_pulse.ipynb:L268-L270, docs/methodology/REGISTRY.md:L491-L492, and diff_diff/chaisemartin_dhaultfoeuille.py:L3824-L3856.
The prior P2 on placebo-horizon wording is fixed; the notebook now describes dynamic placebos as l = -L..-1 in docs/tutorials/19_dcdh_marketing_pulse.ipynb:L350-L352.
The prior P3 on blanket warning suppression is fixed; the notebook now suppresses only the expected Assumption 7 warning in docs/tutorials/19_dcdh_marketing_pulse.ipynb:L362-L382, matching the emitted warning text in diff_diff/chaisemartin_dhaultfoeuille.py:L2151-L2156.
One residual P3 remains: the closing summary still makes an ecosystem-wide “modern Python DiD estimator” claim that is broader than the library-scoped contract documented in the registry/docstring.
This PR is documentation-only in scope: notebook/docs wiring changed, but no estimator/math/inference code changed.
Static-only review: I could not rerun pytest --nbmake here because numpy, pandas, pytest, and nbformat are not installed in this sandbox.

Methodology

P3 Impact: docs/tutorials/19_dcdh_marketing_pulse.ipynb:L556-L562 says “No other modern Python DiD estimator handles this case.” The documented source of truth is narrower: dCDH is the only modern staggered estimator in this library that handles reversible treatment, per docs/methodology/REGISTRY.md:L478-L479 and diff_diff/chaisemartin_dhaultfoeuille.py:L294-L299. This is not a methodology defect, but it overstates the supported claim and can go stale independently of the library. Concrete fix: scope the sentence to “No other modern estimator in diff-diff handles this case,” or cite an external ecosystem survey if a Python-wide claim is intended.

Code Quality

No PR-scoped findings.

Performance

No PR-scoped findings.

Maintainability

No PR-scoped findings.

Tech Debt

No PR-scoped findings. No relevant TODO.md entry changes were needed for this PR.

Security

No findings; no secrets, credentials, or PII were visible in the changed files.

Documentation/Tests

No additional PR-scoped findings beyond the P3 wording issue above.
Validation note: review was static-only because the sandbox is missing numpy, pandas, pytest, and nbformat, so notebook execution could not be rerun locally.

Bullet 1 of the closing summary previously asserted "No other modern Python DiD estimator handles this case" - broader than the library's own documented contract (REGISTRY only asserts uniqueness in diff-diff) and a claim that can go stale independently of the library. Drop the comparison sentence entirely. The reader has the within-library positioning from Section 1 and Section 4; the closing checklist doesn't need an ecosystem-wide claim to make its point. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T16:21:30Z

/ai-review

github-actions · 2026-04-25T16:25:09Z

🔁 AI review rerun (requested by @igerber)

Head SHA: bc343146dba8c7c9d4f901ae8f3bf5882d37c65e

Overall Assessment

✅ Looks good

Executive Summary

Re-review outcome: the prior methodology/documentation issues appear resolved. The notebook now explains Assumption 11 as a per-period stable-control condition with the correct zero-retention warning behavior, matching the dCDH registry and implementation contract (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L260-L270, docs/methodology/REGISTRY.md:L478-L491, diff_diff/chaisemartin_dhaultfoeuille.py:L1628-L1634, diff_diff/chaisemartin_dhaultfoeuille.py:L1751-L1757).
The dynamic-placebo wording is corrected to horizons l = -L..-1, consistent with the documented DID^{pl}_l surface (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L350-L352, docs/methodology/REGISTRY.md:L478-L478, docs/methodology/REGISTRY.md:L652-L655).
The notebook now suppresses only the expected Assumption 7 cost-benefit warning instead of blanket warning suppression (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L362-L370, docs/methodology/REGISTRY.md:L554-L555, diff_diff/chaisemartin_dhaultfoeuille.py:L2150-L2156).
The prior overbroad ecosystem claim is gone; reversible-treatment claims are now correctly scoped to diff-diff/library surfaces (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L459-L461, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L556-L561, docs/methodology/REGISTRY.md:L478-L478, diff_diff/chaisemartin_dhaultfoeuille.py:L4-L8).
This remains a documentation-only PR in scope for methodology review: it discusses twowayfeweights, DID_M, DID_M^pl, DID_l, and DID^{pl}_l, but does not modify estimator, variance, inference, identification, or default-behavior code.
Static-only review: I could not rerun pytest --nbmake here because numpy, pandas, pytest, and nbformat are not installed in this sandbox.

Methodology

Severity: None. Impact: No PR-scoped methodology defects found. The tutorial’s descriptions of twowayfeweights (AER 2020 Theorem 1), Phase 1 DID_M / DID_M^pl, and Phase 2 DID_l / DID^{pl}_l align with the Methodology Registry and in-code docstrings, and the previously flagged wording issues are resolved (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L28-L36, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L260-L270, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L350-L370, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L459-L461, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L556-L561, docs/methodology/REGISTRY.md:L478-L491, docs/methodology/REGISTRY.md:L554-L555, docs/methodology/REGISTRY.md:L652-L655, diff_diff/chaisemartin_dhaultfoeuille.py:L4-L8). Concrete fix: None.

Code Quality

Severity: None. Impact: No PR-scoped code-quality issues found. The non-notebook changes are straightforward docs wiring and follow existing patterns for changelog, dependency registration, tutorial indexing, and cross-linking (CHANGELOG.md:L10-L12, docs/doc-deps.yaml:L280-L284, docs/practitioner_decision_tree.rst:L208-L214, docs/tutorials/README.md:L73-L95). Concrete fix: None.

Performance

Severity: None. Impact: Documentation-only PR; no runtime or estimator-path changes. Concrete fix: None.

Maintainability

Severity: None. Impact: No maintainability findings. The notebook includes explicit drift guards on the narrated headline values and keeps warning suppression narrow, which helps catch future documentation drift (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L362-L370, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L493-L510). Concrete fix: None.

Tech Debt

Severity: None. Impact: No new untracked technical debt is introduced by the changed files, and nothing here requires TODO.md tracking to be acceptable. Concrete fix: None.

Security

Severity: None. Impact: No findings; changed files are documentation/notebook only, and no secrets, credentials, or PII were visible. Concrete fix: None.

Documentation/Tests

Severity: None. Impact: No PR-scoped documentation/test defects found. Discoverability is wired through the README, practitioner decision tree, and doc dependency registry, and the notebook contains explicit drift checks for the core narrative numbers (docs/tutorials/README.md:L90-L95, docs/practitioner_decision_tree.rst:L210-L214, docs/doc-deps.yaml:L280-L284, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L493-L510). Concrete fix: None.

igerber and others added 3 commits April 25, 2026 11:53

igerber force-pushed the dcdh-tutorial branch from 30083fa to cc904db Compare April 25, 2026 15:55

igerber added the ready-for-ci Triggers CI test workflows label Apr 25, 2026

igerber merged commit 8596f51 into main Apr 25, 2026
3 of 4 checks passed

igerber deleted the dcdh-tutorial branch April 25, 2026 17:20

igerber mentioned this pull request Apr 26, 2026

Release 3.3.1: HAD survey-design consolidation, dCDH by_path placebos + sup-t bands, Phase 4.5 C #387

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Tutorial 19: dCDH for Marketing Pulse Campaigns#373

Add Tutorial 19: dCDH for Marketing Pulse Campaigns#373
igerber merged 4 commits intomainfrom
dcdh-tutorial

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Apr 25, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant