Skip to content

Add Tutorial 19: dCDH for Marketing Pulse Campaigns#373

Merged
igerber merged 4 commits intomainfrom
dcdh-tutorial
Apr 25, 2026
Merged

Add Tutorial 19: dCDH for Marketing Pulse Campaigns#373
igerber merged 4 commits intomainfrom
dcdh-tutorial

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 25, 2026

Summary

  • Adds docs/tutorials/19_dcdh_marketing_pulse.ipynb (35 cells, 12 code / 23 markdown, ~3s nbmake) — practitioner walkthrough of ChaisemartinDHaultfoeuille (alias DCDH) on a 60-market reversible-treatment panel
  • Tutorial covers Phase 1 (DID_M, joiners-vs-leavers, single-lag placebo, AER 2020 Theorem 1 TWFE diagnostic via twowayfeweights) and the multi-horizon event study with multiplier bootstrap (L_max, n_bootstrap)
  • Section 4 includes an explicit "Where do the controls come from?" block clarifying that dCDH's contemporaneously-stable-cell identification means the panel can have zero never-treated and zero always-treated units, distinguishing it from CallawaySantAnna and other staggered-DiD estimators
  • Stakeholder-communication template with explicit per-bullet source mapping; drift guards lock the headline numbers (overall_att, placebo magnitude, TWFE fraction_negative, event-study l=1 effect)
  • Tutorial 17 (Brand Awareness Survey) backfilled in docs/tutorials/README.md (was missing)
  • Wiring: docs/doc-deps.yaml adds the notebook to the chaisemartin_dhaultfoeuille.py dependency list; docs/practitioner_decision_tree.rst adds a .. tip:: cross-link in the Reversible Treatment section
  • CHANGELOG [Unreleased] entry

Methodology references (required if estimator / math changes)

  • Method: de Chaisemartin & D'Haultfoeuille DID_M (AER 2020) and DID_l (NBER WP 29873 dynamic companion)
  • Paper / source link(s): AER 2020 — https://www.aeaweb.org/articles?id=10.1257/aer.20181169 ; dynamic companion — https://www.nber.org/papers/w29873
  • Any intentional deviations: None — this PR is documentation only. No source files in diff_diff/, rust/src/, or docs/methodology/ modified. The tutorial uses the existing public API (DCDH, twowayfeweights, generate_reversible_did_data) and surfaces the documented Phase 1 + L_max + bootstrap contracts already locked in docs/methodology/REGISTRY.md § ChaisemartinDHaultfoeuille.

Validation

  • Tests added/updated: None (no source changes).
  • Notebook validation: pytest --nbmake docs/tutorials/19_dcdh_marketing_pulse.ipynb passes in ~3s. The notebook ships with cleared outputs; nbmake re-executes end-to-end and the final drift-guard cell prints All drift guards passed.
  • Drift guards (cell t19-cell-031) lock the narrative numbers: overall_att ∈ [10.72, 11.72] and CI covers truth, |placebo_effect| < 1.5, twfe_fraction_negative >= 0.10, event_study_effects[1]["effect"] ∈ [10.24, 12.24]. Tolerances chosen wide enough to absorb numerical noise, tight enough to detect material drift.
  • Seed-search lineage: parameters locked at seed=53, n_groups=60, n_periods=8, pattern="single_switch", heterogeneous_effects=True, effect_sd=4.0, L_max=2, n_bootstrap=199 (matches ci_params.bootstrap convention). Selection criteria: TWFE fraction_negative >= 0.10 and bias visible, dCDH ATT covers truth, both joiners and leavers contribute, multi-horizon event-study CIs cover truth and placebos cover 0.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown

Overall Assessment

✅ Looks good

Executive Summary

  • This is a documentation-only PR; no estimator, weighting, variance, or default-behavior code changed.
  • Cross-checking the tutorial against docs/methodology/REGISTRY.md found the core dCDH methodology aligned, including the documented Phase 1 placebo SE=NaN contract and the separate Phase 2 DID_{g,1} path.
  • P2: the new tutorial misstates the dynamic placebo horizon range once in prose; the implementation exposes placebo lags 1..L_max under negative keys -l, so for L_max=2 the horizons are -2 and -1, not -1..-L+1.
  • P3: a couple of comparison claims are broader than the project’s own source of truth ("only Python estimator" / other estimators needing a cohort that “survives to the end”); the registry only claims uniqueness “in the library.”
  • I could not rerun pytest --nbmake in this sandbox because numpy is not installed, so execution validation here was static-only.

Methodology

  • P2 Impact: the notebook says L_max=L yields placebo horizons l = -1..-L+1, which is inconsistent with both the implementation and the later notebook text that discusses l = -2 and l = -1. This is a methodology/API-surface documentation mismatch for the event-study placebo output. Concrete fix: change that sentence to l = -L..-1 (or “negative horizons -1, -2, ..., -L”). Refs: docs/tutorials/19_dcdh_marketing_pulse.ipynb:L350-L352, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L427-L429, diff_diff/chaisemartin_dhaultfoeuille.py:L2970-L2981, diff_diff/chaisemartin_dhaultfoeuille.py:L5543-L5559, docs/methodology/REGISTRY.md:L546-L548
  • P3 Impact: the tutorial broadens comparison language beyond the registry by saying other staggered estimators need a cohort that “survives to the end” and that dCDH is the “only Python estimator” for reversible treatment. The registered claim is narrower: uniqueness is only asserted “in the library.” Concrete fix: scope these lines to “among diff-diff’s staggered estimators” and keep the comparison focused on the absorbing-treatment restriction. Refs: docs/tutorials/19_dcdh_marketing_pulse.ipynb:L268-L270, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L451-L451, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L550-L550, docs/methodology/REGISTRY.md:L478-L478

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings. Nothing here needs a new TODO.md entry.

Security

  • No findings; I did not see secrets, credentials, or PII in the changed files.

Documentation/Tests

  • P3 Impact: the notebook wraps the L_max=2 fit in warnings.simplefilter("ignore", UserWarning), which suppresses not just the expected Assumption 7 cost-benefit warning but any future methodology warnings on that path as well (for example thin-horizon or placebo-control-availability warnings). That weakens the notebook’s value as a drift detector. Concrete fix: narrow the filter to the expected warning text, or explicitly assert the expected warning and let any new UserWarning fail the notebook. Refs: docs/tutorials/19_dcdh_marketing_pulse.ipynb:L362-L374, diff_diff/chaisemartin_dhaultfoeuille.py:L1917-L1933, diff_diff/chaisemartin_dhaultfoeuille.py:L1988-L1995, diff_diff/chaisemartin_dhaultfoeuille.py:L2114-L2120

@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 25, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 30083faaad96af63bf6d4edb16a140202f5c2dfe


Overall Assessment

⚠️ Needs changes

Executive Summary

  • Documentation-only PR; no estimator, weighting, variance, or default-behavior code changed. The affected method surface is ChaisemartinDHaultfoeuille tutorial coverage of DID_M, DID_M^pl, DID_l, and DID^{pl}_l.
  • Severity P1 [Newly identified]: the tutorial claims Assumption 11 is satisfied from whole-panel stable-cell totals, but the method contract is period-specific. That is the wrong identification check for dCDH.
  • Prior P2 remains unresolved: the notebook still states placebo horizons as -1..-L+1; the implementation/registry expose negative horizons -L..-1, and the notebook later uses -2 and -1.
  • Prior P3 remains unresolved: comparison language is still broader than the project’s source of truth ("only Python estimator", other estimators needing a cohort that “survives to the end”).
  • Prior P3 remains unresolved: the L_max=2 fit still suppresses all UserWarnings, weakening the notebook as a drift detector.
  • Static-only review here; pytest --nbmake could not be rerun because this sandbox is missing numpy.

Methodology

  • Severity P1 [Newly identified]; Impact: docs/tutorials/19_dcdh_marketing_pulse.ipynb:L261-L270 teaches the wrong way to verify dCDH Assumption 11 by citing aggregate stable-cell totals across the full panel. The contract is period-by-period: when switchers exist in period t, the relevant stable controls must exist in that same period, and violations are tracked/warned per period, not by global counts (docs/methodology/REGISTRY.md:L491-L492, docs/methodology/REGISTRY.md:L616-L617, diff_diff/chaisemartin_dhaultfoeuille.py:L1618-L1626). A panel can have many stable cells overall and still fail Assumption 11 in a switching period. Concrete fix: replace that sentence with a per-period check/table showing stable_0/stable_1 availability at each switching week, or soften it to “the simulated DGP was constructed to satisfy Assumption 11” instead of inferring satisfaction from global totals.
  • Severity P2 (prior finding unresolved); Impact: the event-study intro still says L_max=L yields placebo horizons l = -1..-L+1, which is inconsistent with both the implementation’s negative-key output and the notebook’s own later discussion of l=-2 and l=-1 (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L350-L352, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L427-L429, diff_diff/chaisemartin_dhaultfoeuille.py:L2970-L2982, diff_diff/chaisemartin_dhaultfoeuille.py:L5543-L5559). Concrete fix: change the prose to l = -L..-1 or “negative horizons -1, -2, ..., -L.”
  • Severity P3 (prior finding unresolved); Impact: the tutorial still makes ecosystem-wide comparison claims that are broader than the project’s documented source of truth, e.g. “the only Python estimator” and the suggestion that other estimators need a cohort that “survives to the end of the panel” (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L268-L269, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L451-L451, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L550-L550). The registry/docstring only claim uniqueness within this library, and the Callaway-Sant'Anna contract explicitly supports not_yet_treated controls without requiring never-treated units (docs/methodology/REGISTRY.md:L446-L447, docs/methodology/REGISTRY.md:L462-L463, docs/methodology/REGISTRY.md:L478-L479, diff_diff/chaisemartin_dhaultfoeuille.py:L294-L299). Concrete fix: scope these statements to “among diff-diff’s estimators” or cite an external survey if an ecosystem-wide claim is intended.

Code Quality

  • No findings in the PR-scoped changes.

Performance

  • No findings in the PR-scoped changes.

Maintainability

  • No findings in the PR-scoped changes.

Tech Debt

  • No findings. No relevant TODO.md entry mitigates the issues above.

Security

  • No findings; no secrets, credentials, or PII were visible in the changed files.

Documentation/Tests

  • Severity P3 (prior finding unresolved); Impact: the notebook still wraps the L_max=2 fit in warnings.simplefilter("ignore", UserWarning), which will also hide future methodology warnings on that path, including thin-horizon, placebo-control-availability, and Assumption 7 warnings (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L362-L367, diff_diff/chaisemartin_dhaultfoeuille.py:L1917-L1933, diff_diff/chaisemartin_dhaultfoeuille.py:L1988-L1996, diff_diff/chaisemartin_dhaultfoeuille.py:L2114-L2120). Concrete fix: filter only the expected warning text or explicitly assert the expected warning and let any new UserWarning fail notebook CI.
  • No additional test finding, but execution validation here was static-only because numpy is not installed in this sandbox.

Path to Approval

  1. Fix the Assumption 11 explanation in docs/tutorials/19_dcdh_marketing_pulse.ipynb so it verifies stable controls period-by-period, or explicitly state that the simulated DGP was constructed to satisfy the assumption rather than inferring it from whole-panel counts. After that, the remaining issues are non-blocking P2/P3 documentation cleanup and the assessment would move to ✅.

igerber and others added 3 commits April 25, 2026 11:53
End-to-end practitioner walkthrough on a 60-market reversible-treatment
panel covering: the AER 2020 Theorem 1 TWFE decomposition diagnostic via
twowayfeweights, DCDH Phase 1 (DID_M, joiners-vs-leavers, single-lag
placebo, TWFE diagnostic block), the L_max multi-horizon event study with
multiplier bootstrap, a stakeholder-communication template with explicit
per-bullet source mapping, and drift guards.

The tutorial leans into dCDH's distinguishing feature - it works on panels
with no never-treated and no always-treated units (only switchers),
because identification rests on contemporaneously-stable cells rather than
a permanent never-treated comparison group.

Doc edits beyond the notebook:
- README backfills the missing Tutorial 17 (Brand Awareness Survey) entry
  alongside the new Tutorial 19 entry
- docs/doc-deps.yaml wires the notebook into the dCDH dependency list so
  /docs-impact flags it on future estimator changes
- docs/practitioner_decision_tree.rst adds a tip cross-link in the
  Reversible Treatment section (mirrors the T17/T18 cross-link form)
- CHANGELOG [Unreleased] entry under Added

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ilter

- Fix placebo horizon range from `l = -1..-L+1` to `l = -L..-1`
  (matches the implementation: at L_max=2 horizons are l = -2 and l = -1)
- Scope "the only Python estimator" claims to "diff-diff's only estimator"
  in Section 1 abstract and the stakeholder template - REGISTRY.md asserts
  uniqueness in the library, not across all of Python
- Rewrite the Section 4 cross-estimator paragraph to enumerate the actual
  staggered estimators in diff-diff (CallawaySantAnna, SunAbraham,
  WooldridgeETWFE, ImputationDiD, TwoStageDiD, EfficientDiD) and frame
  the comparison around the absorbing-treatment restriction rather than
  "needs a never-treated cohort that survives to the end of the panel"
- Narrow the L_max=2 fit warning filter from `simplefilter("ignore",
  UserWarning)` to `filterwarnings("ignore", message=r"Assumption 7 .*",
  category=UserWarning)` so only the expected leavers-present warning
  is silenced; any new / unexpected UserWarning will surface and keep
  the notebook usable as a drift detector

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Section 4 paragraph previously inferred Assumption 11 satisfaction
from whole-panel stable-cell totals (154 stable_0, 206 stable_1). That
is the wrong check - A11 is per-period: at every switching period, at
least one stable cell of the relevant type must exist.

Rewrite the closing sentences of the "Where do the controls come from?"
paragraph to (a) make the per-period nature of the check explicit,
(b) reference the library's fit-time A11 warning machinery as the
correct verification mechanism, (c) note that our fit ran without any
A11 warning, and (d) explain why single-switch panels naturally tend
to satisfy A11 (adjacent cohorts function as stable controls for each
other).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 25, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: cc904db3f6c5c972bb81cdd01d92687436508468


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review outcome: the prior P1 on the Assumption 11 explanation is fixed. The tutorial now correctly states that stable-control availability is checked period-by-period and aligns with the implementation’s warning/zero-retention behavior in docs/tutorials/19_dcdh_marketing_pulse.ipynb:L268-L270, docs/methodology/REGISTRY.md:L491-L492, and diff_diff/chaisemartin_dhaultfoeuille.py:L3824-L3856.
  • The prior P2 on placebo-horizon wording is fixed; the notebook now describes dynamic placebos as l = -L..-1 in docs/tutorials/19_dcdh_marketing_pulse.ipynb:L350-L352.
  • The prior P3 on blanket warning suppression is fixed; the notebook now suppresses only the expected Assumption 7 warning in docs/tutorials/19_dcdh_marketing_pulse.ipynb:L362-L382, matching the emitted warning text in diff_diff/chaisemartin_dhaultfoeuille.py:L2151-L2156.
  • One residual P3 remains: the closing summary still makes an ecosystem-wide “modern Python DiD estimator” claim that is broader than the library-scoped contract documented in the registry/docstring.
  • This PR is documentation-only in scope: notebook/docs wiring changed, but no estimator/math/inference code changed.
  • Static-only review: I could not rerun pytest --nbmake here because numpy, pandas, pytest, and nbformat are not installed in this sandbox.

Methodology

  • P3 Impact: docs/tutorials/19_dcdh_marketing_pulse.ipynb:L556-L562 says “No other modern Python DiD estimator handles this case.” The documented source of truth is narrower: dCDH is the only modern staggered estimator in this library that handles reversible treatment, per docs/methodology/REGISTRY.md:L478-L479 and diff_diff/chaisemartin_dhaultfoeuille.py:L294-L299. This is not a methodology defect, but it overstates the supported claim and can go stale independently of the library. Concrete fix: scope the sentence to “No other modern estimator in diff-diff handles this case,” or cite an external ecosystem survey if a Python-wide claim is intended.

Code Quality

  • No PR-scoped findings.

Performance

  • No PR-scoped findings.

Maintainability

  • No PR-scoped findings.

Tech Debt

  • No PR-scoped findings. No relevant TODO.md entry changes were needed for this PR.

Security

  • No findings; no secrets, credentials, or PII were visible in the changed files.

Documentation/Tests

  • No additional PR-scoped findings beyond the P3 wording issue above.
  • Validation note: review was static-only because the sandbox is missing numpy, pandas, pytest, and nbformat, so notebook execution could not be rerun locally.

Bullet 1 of the closing summary previously asserted "No other modern
Python DiD estimator handles this case" - broader than the library's
own documented contract (REGISTRY only asserts uniqueness in diff-diff)
and a claim that can go stale independently of the library.

Drop the comparison sentence entirely. The reader has the
within-library positioning from Section 1 and Section 4; the closing
checklist doesn't need an ecosystem-wide claim to make its point.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 25, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: bc343146dba8c7c9d4f901ae8f3bf5882d37c65e


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review outcome: the prior methodology/documentation issues appear resolved. The notebook now explains Assumption 11 as a per-period stable-control condition with the correct zero-retention warning behavior, matching the dCDH registry and implementation contract (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L260-L270, docs/methodology/REGISTRY.md:L478-L491, diff_diff/chaisemartin_dhaultfoeuille.py:L1628-L1634, diff_diff/chaisemartin_dhaultfoeuille.py:L1751-L1757).
  • The dynamic-placebo wording is corrected to horizons l = -L..-1, consistent with the documented DID^{pl}_l surface (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L350-L352, docs/methodology/REGISTRY.md:L478-L478, docs/methodology/REGISTRY.md:L652-L655).
  • The notebook now suppresses only the expected Assumption 7 cost-benefit warning instead of blanket warning suppression (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L362-L370, docs/methodology/REGISTRY.md:L554-L555, diff_diff/chaisemartin_dhaultfoeuille.py:L2150-L2156).
  • The prior overbroad ecosystem claim is gone; reversible-treatment claims are now correctly scoped to diff-diff/library surfaces (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L459-L461, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L556-L561, docs/methodology/REGISTRY.md:L478-L478, diff_diff/chaisemartin_dhaultfoeuille.py:L4-L8).
  • This remains a documentation-only PR in scope for methodology review: it discusses twowayfeweights, DID_M, DID_M^pl, DID_l, and DID^{pl}_l, but does not modify estimator, variance, inference, identification, or default-behavior code.
  • Static-only review: I could not rerun pytest --nbmake here because numpy, pandas, pytest, and nbformat are not installed in this sandbox.

Methodology

  • Severity: None. Impact: No PR-scoped methodology defects found. The tutorial’s descriptions of twowayfeweights (AER 2020 Theorem 1), Phase 1 DID_M / DID_M^pl, and Phase 2 DID_l / DID^{pl}_l align with the Methodology Registry and in-code docstrings, and the previously flagged wording issues are resolved (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L28-L36, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L260-L270, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L350-L370, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L459-L461, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L556-L561, docs/methodology/REGISTRY.md:L478-L491, docs/methodology/REGISTRY.md:L554-L555, docs/methodology/REGISTRY.md:L652-L655, diff_diff/chaisemartin_dhaultfoeuille.py:L4-L8). Concrete fix: None.

Code Quality

  • Severity: None. Impact: No PR-scoped code-quality issues found. The non-notebook changes are straightforward docs wiring and follow existing patterns for changelog, dependency registration, tutorial indexing, and cross-linking (CHANGELOG.md:L10-L12, docs/doc-deps.yaml:L280-L284, docs/practitioner_decision_tree.rst:L208-L214, docs/tutorials/README.md:L73-L95). Concrete fix: None.

Performance

  • Severity: None. Impact: Documentation-only PR; no runtime or estimator-path changes. Concrete fix: None.

Maintainability

  • Severity: None. Impact: No maintainability findings. The notebook includes explicit drift guards on the narrated headline values and keeps warning suppression narrow, which helps catch future documentation drift (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L362-L370, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L493-L510). Concrete fix: None.

Tech Debt

  • Severity: None. Impact: No new untracked technical debt is introduced by the changed files, and nothing here requires TODO.md tracking to be acceptable. Concrete fix: None.

Security

  • Severity: None. Impact: No findings; changed files are documentation/notebook only, and no secrets, credentials, or PII were visible. Concrete fix: None.

Documentation/Tests

  • Severity: None. Impact: No PR-scoped documentation/test defects found. Discoverability is wired through the README, practitioner decision tree, and doc dependency registry, and the notebook contains explicit drift checks for the core narrative numbers (docs/tutorials/README.md:L90-L95, docs/practitioner_decision_tree.rst:L210-L214, docs/doc-deps.yaml:L280-L284, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L493-L510). Concrete fix: None.

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 25, 2026
@igerber igerber merged commit 8596f51 into main Apr 25, 2026
3 of 4 checks passed
@igerber igerber deleted the dcdh-tutorial branch April 25, 2026 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant