From ca11b82283444de5a18c11cfd361928df856a77f Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 25 Apr 2026 10:48:17 -0400 Subject: [PATCH 1/4] Add Tutorial 19: dCDH for Marketing Pulse Campaigns End-to-end practitioner walkthrough on a 60-market reversible-treatment panel covering: the AER 2020 Theorem 1 TWFE decomposition diagnostic via twowayfeweights, DCDH Phase 1 (DID_M, joiners-vs-leavers, single-lag placebo, TWFE diagnostic block), the L_max multi-horizon event study with multiplier bootstrap, a stakeholder-communication template with explicit per-bullet source mapping, and drift guards. The tutorial leans into dCDH's distinguishing feature - it works on panels with no never-treated and no always-treated units (only switchers), because identification rests on contemporaneously-stable cells rather than a permanent never-treated comparison group. Doc edits beyond the notebook: - README backfills the missing Tutorial 17 (Brand Awareness Survey) entry alongside the new Tutorial 19 entry - docs/doc-deps.yaml wires the notebook into the dCDH dependency list so /docs-impact flags it on future estimator changes - docs/practitioner_decision_tree.rst adds a tip cross-link in the Reversible Treatment section (mirrors the T17/T18 cross-link form) - CHANGELOG [Unreleased] entry under Added Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 1 + docs/doc-deps.yaml | 2 + docs/practitioner_decision_tree.rst | 8 + docs/tutorials/19_dcdh_marketing_pulse.ipynb | 574 +++++++++++++++++++ docs/tutorials/README.md | 17 + 5 files changed, 602 insertions(+) create mode 100644 docs/tutorials/19_dcdh_marketing_pulse.ipynb diff --git a/CHANGELOG.md b/CHANGELOG.md index 16a08a2a..e2f1d5e8 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added - **`ChaisemartinDHaultfoeuille.by_path` + `placebo=True`** — per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max`. The same per-path SE convention used for the event-study (joiners/leavers IF precedent: switcher-side contributions zeroed for non-path groups; cohort structure and control pool unchanged; plug-in SE with path-specific divisor `N^{pl}_{l, path}`) is applied to backward horizons via the new `switcher_subset_mask` parameter on `_compute_per_group_if_placebo_horizon`. Surfaced on `results.path_placebo_event_study[path][-l]` (negative-int inner keys mirroring `placebo_event_study`); `summary()` renders the rows alongside per-path event-study horizons; `to_dataframe(level="by_path")` emits negative-horizon rows alongside the existing positive-horizon rows. **Bootstrap** (when `n_bootstrap > 0`) propagates per-`(path, lag)` percentile CI / p-value through the same `_bootstrap_one_target` dispatch as the per-path event-study, with the canonical NaN-on-invalid contract enforced on the new surface (PR #364 library-wide invariant). **SE inherits the cross-path cohort-sharing deviation from R** documented for `path_effects` (full-panel cohort-centered plug-in vs R's per-path re-run): tracks R within tolerance on single-path-cohort panels, diverges materially on cohort-mixed panels — the bootstrap SE is a Monte Carlo analog of the analytical SE and inherits the same deviation. R-parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the new `multi_path_reversible_by_path_placebo` scenario (point estimates exact match; SE within Phase-2 envelope rtol ≤ 5%); positive analytical + bootstrap invariants at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (and the gated `::TestBootstrap` subclass). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path placebos" for the full contract. +- **Tutorial 19: dCDH for Marketing Pulse Campaigns** (`docs/tutorials/19_dcdh_marketing_pulse.ipynb`) — end-to-end practitioner walkthrough on a 60-market reversible-treatment panel covering the TWFE decomposition diagnostic (`twowayfeweights`), `DCDH` Phase 1 (DID_M, joiners-vs-leavers, single-lag placebo), the `L_max` multi-horizon event study with multiplier bootstrap, a stakeholder communication template, and drift guards. README listing for Tutorial 17 (Brand Awareness Survey) backfilled in the same edit. Cross-link from `docs/practitioner_decision_tree.rst` § "Reversible Treatment" added. ## [3.3.0] - 2026-04-25 diff --git a/docs/doc-deps.yaml b/docs/doc-deps.yaml index dfb1181b..5d6894e5 100644 --- a/docs/doc-deps.yaml +++ b/docs/doc-deps.yaml @@ -280,6 +280,8 @@ sources: - path: docs/practitioner_decision_tree.rst section: "Reversible Treatment" type: user_guide + - path: docs/tutorials/19_dcdh_marketing_pulse.ipynb + type: tutorial - path: ROADMAP.md section: "de Chaisemartin-D'Haultfœuille (dCDH) Estimator" type: roadmap diff --git a/docs/practitioner_decision_tree.rst b/docs/practitioner_decision_tree.rst index 8ca06d15..6b0f4bfd 100644 --- a/docs/practitioner_decision_tree.rst +++ b/docs/practitioner_decision_tree.rst @@ -205,6 +205,14 @@ a joiners-only view `DID_+`, and a leavers-only view `DID_-`. influence-function derivation for the single-lag placebo is a planned extension. Dynamic placebos (``L_max >= 1``) do have valid analytical SE. +.. tip:: + + For a full walkthrough on a marketing-pulse panel - including the TWFE + decomposition diagnostic, joiners-vs-leavers reading, multi-horizon event + study with multiplier bootstrap, and a stakeholder communication template, + see `Tutorial 19: dCDH Marketing Pulse Campaigns + `_. + .. _section-dose: diff --git a/docs/tutorials/19_dcdh_marketing_pulse.ipynb b/docs/tutorials/19_dcdh_marketing_pulse.ipynb new file mode 100644 index 00000000..eabe4290 --- /dev/null +++ b/docs/tutorials/19_dcdh_marketing_pulse.ipynb @@ -0,0 +1,574 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "t19-cell-001", + "metadata": {}, + "source": [ + "# Tutorial 19: dCDH for Marketing Pulse Campaigns\n", + "\n", + "A practitioner walkthrough for measuring lift from promotional campaigns that turn on AND off across markets at staggered times. The tutorial uses the `ChaisemartinDHaultfoeuille` estimator (alias `DCDH`), the only modern Python DiD estimator built for reversible (non-absorbing) treatment." + ] + }, + { + "cell_type": "markdown", + "id": "t19-cell-002", + "metadata": {}, + "source": [ + "## 1. The Marketing Pulse Problem" + ] + }, + { + "cell_type": "markdown", + "id": "t19-cell-003", + "metadata": {}, + "source": [ + "Your team runs paid-promo pulses across 60 markets. Some markets ran the promo at the start of the quarter and turned it off as the campaign budget rolled to the next geo; others started untreated and switched the promo on at some point during the quarter. Leadership wants the average lift on weekly checkout sessions while the promo was on.\n", + "\n", + "**Why this is hard.** Three things break standard methods:\n", + "\n", + "1. **Treatment is reversible.** This panel has both joiners (markets that switched the promo on) and leavers (markets that switched it off). The canonical staggered-DiD estimators - Callaway-Sant'Anna, Sun-Abraham, Wooldridge ETWFE, ImputationDiD - all assume *absorbing* treatment: once treated, always treated. They simply don't apply when the promo can come back off.\n", + "\n", + "2. **Two-way fixed-effects regression silently uses negative weights.** When you have switchers in both directions in the same panel, OLS with unit and time fixed effects ends up using some treated cells as *controls* for other treated cells, weighting those cells negatively. Under heterogeneous treatment effects, those negative weights can attenuate or even flip the sign of the regression coefficient ([de Chaisemartin & D'Haultfoeuille 2020](https://www.aeaweb.org/articles?id=10.1257/aer.20181169), Theorem 1).\n", + "\n", + "3. **No diagnostic tells you when to worry.** The standard error from the OLS regression doesn't reveal the weighting problem. You need a separate decomposition to know whether to trust the regression coefficient or reach for an alternative.\n", + "\n", + "**Why diff-diff.** The library implements `ChaisemartinDHaultfoeuille` (`DCDH`) following the AER 2020 paper plus its [dynamic companion](https://www.nber.org/papers/w29873). Phase 1 ships the contemporaneous-switch estimator `DID_M` plus a joiners-vs-leavers decomposition; the multi-horizon event study via `L_max` adds dynamic effects with multiplier-bootstrap inference. Critically, the library also exposes the AER 2020 Theorem 1 TWFE decomposition as a standalone diagnostic - so you can quantify how badly TWFE is contaminated *before* you reach for the fix. Implementation details and any documented deviations from R's `did_multiplegt_dyn` reference live in [`docs/methodology/REGISTRY.md`](../methodology/REGISTRY.html)." + ] + }, + { + "cell_type": "code", + "id": "t19-cell-004", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "import warnings\n", + "\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "import pandas as pd\n", + "\n", + "from diff_diff import (\n", + " DCDH,\n", + " generate_reversible_did_data,\n", + " twowayfeweights,\n", + ")\n", + "\n", + "plt.style.use(\"seaborn-v0_8-whitegrid\")" + ] + }, + { + "cell_type": "markdown", + "id": "t19-cell-005", + "metadata": {}, + "source": [ + "## 2. The Data" + ] + }, + { + "cell_type": "markdown", + "id": "t19-cell-006", + "metadata": {}, + "source": [ + "We'll simulate a panel that mirrors a marketing pulse campaign:\n", + "\n", + "- **60 markets**, each observed for **8 weeks**\n", + "- Some markets started the quarter with the promo on and switched it off (leavers); others started untreated and switched the promo on (joiners). Each market switches exactly once during the panel - the [A5 single-switch contract](../methodology/REGISTRY.html) the analytical SE is derived under.\n", + "- Outcome: weekly checkout sessions per market, baseline ~110\n", + "- True treatment effect: **+12 sessions per market-week** when the promo is on, with cell-level effect heterogeneity (some markets respond more strongly than others).\n", + "\n", + "We use `generate_reversible_did_data` with `pattern=\"single_switch\"` and `heterogeneous_effects=True`. Because the data is synthetic, the true effect is known and we can verify dCDH recovers it." + ] + }, + { + "cell_type": "code", + "id": "t19-cell-007", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "raw = generate_reversible_did_data(\n", + " n_groups=60,\n", + " n_periods=8,\n", + " pattern=\"single_switch\",\n", + " initial_treat_frac=0.4,\n", + " treatment_effect=12.0,\n", + " heterogeneous_effects=True,\n", + " effect_sd=4.0,\n", + " group_fe_sd=8.0,\n", + " time_trend=0.5,\n", + " noise_sd=2.0,\n", + " seed=53, # locked via seed-search; see _scratch/dcdh_tutorial/\n", + ")\n", + "df = raw.rename(\n", + " columns={\n", + " \"group\": \"market_id\",\n", + " \"period\": \"week\",\n", + " \"treatment\": \"promo_on\",\n", + " \"outcome\": \"sessions\",\n", + " }\n", + ")\n", + "df[\"sessions\"] = df[\"sessions\"] + 100.0 # shift to a realistic baseline\n", + "\n", + "print(f\"Panel shape: {df.shape}\")\n", + "print(f\"Markets: {df['market_id'].nunique()}\")\n", + "print(f\"Weeks: {sorted(df['week'].unique())}\")\n", + "print(f\"Sessions range: [{df['sessions'].min():.0f}, {df['sessions'].max():.0f}]\")" + ] + }, + { + "cell_type": "code", + "id": "t19-cell-008", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "# Switcher-type counts. With pattern=\"single_switch\" every group\n", + "# switches exactly once, so we have only joiners (0 \u2192 1) and leavers\n", + "# (1 \u2192 0); no never-treated or always-treated groups by construction.\n", + "df.groupby(\"switcher_type\").size()" + ] + }, + { + "cell_type": "code", + "id": "t19-cell-009", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "# Mean sessions over time, split by which direction the market switched.\n", + "first_treat = df.groupby(\"market_id\")[\"promo_on\"].first()\n", + "category = df[\"market_id\"].map(\n", + " lambda m: \"starts off, switches on\" if first_treat[m] == 0 else \"starts on, switches off\"\n", + ")\n", + "df_plot = df.assign(category=category)\n", + "\n", + "fig, ax = plt.subplots(figsize=(9, 5))\n", + "for label, color in [(\"starts off, switches on\", \"#1f77b4\"), (\"starts on, switches off\", \"#d62728\")]:\n", + " weekly = df_plot[df_plot[\"category\"] == label].groupby(\"week\")[\"sessions\"].mean()\n", + " ax.plot(weekly.index, weekly.values, label=label, color=color, marker=\"o\", linewidth=2)\n", + "ax.set_xlabel(\"Week\")\n", + "ax.set_ylabel(\"Mean weekly sessions\")\n", + "ax.set_title(\"Marketing pulses on/off across markets \u2014 outcomes by switcher type\")\n", + "ax.legend(loc=\"upper left\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "t19-cell-010", + "metadata": {}, + "source": [ + "## 3. Why Standard Regression Misleads Here" + ] + }, + { + "cell_type": "markdown", + "id": "t19-cell-011", + "metadata": {}, + "source": [ + "Before reaching for dCDH, fit standard two-way fixed effects (TWFE) regression on this panel and read out the diagnostic. The dCDH authors derived a closed-form decomposition of the TWFE coefficient (Theorem 1, AER 2020) that tells you *quantitatively* how badly the regression is contaminated, before you have to trust any alternative estimator.\n", + "\n", + "The library exposes this as a standalone function, `twowayfeweights`, that returns three numbers:\n", + "\n", + "- `beta_fe`: the plain TWFE coefficient on the treatment indicator.\n", + "- `fraction_negative`: the share of treated cells that receive a *negative* weight in the TWFE coefficient. Any positive value is a warning sign - it means OLS is using some treated units as controls for other treated units.\n", + "- `sigma_fe`: the smallest cell-level effect-heterogeneity standard deviation that could flip the sign of the TWFE coefficient. Small `sigma_fe` (relative to plausible heterogeneity in your domain) means the regression is fragile." + ] + }, + { + "cell_type": "code", + "id": "t19-cell-012", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "twfe = twowayfeweights(\n", + " df,\n", + " outcome=\"sessions\",\n", + " group=\"market_id\",\n", + " time=\"week\",\n", + " treatment=\"promo_on\",\n", + ")\n", + "\n", + "print(f\"TWFE coefficient (beta_fe): {twfe.beta_fe:.3f}\")\n", + "print(f\"Fraction of negative weights: {twfe.fraction_negative:.3f} ({twfe.fraction_negative*100:.1f}%)\")\n", + "print(f\"Sign-flip threshold (sigma_fe): {twfe.sigma_fe:.3f}\")\n", + "print(f\"True treatment effect (DGP): 12.000\")" + ] + }, + { + "cell_type": "markdown", + "id": "t19-cell-013", + "metadata": {}, + "source": [ + "**Plain-English interpretation.** The TWFE regression estimates the lift at about **11.5 sessions per market-week** - close to the true effect of 12.0 in this synthetic panel. But the diagnostic surfaces two warning signs: **15.4% of treated cells receive negative weight** in that estimate, and the sign-flip threshold sigma_fe is about 12.3. In domains where you might plausibly believe cell-level treatment effects vary by ~12 sessions in standard deviation, the TWFE coefficient is fragile.\n", + "\n", + "On *this* panel the bias happens to be modest because effect heterogeneity is moderate. In production data with stronger heterogeneity the bias would grow significantly. The point of the diagnostic isn't to tell you that TWFE is *catastrophically* wrong today - it's to tell you that TWFE *could* swing on data you haven't seen yet, and to surface the structural problem before you trust the regression coefficient." + ] + }, + { + "cell_type": "code", + "id": "t19-cell-014", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "# Top 15 cells with the most-negative TWFE weights, colored red.\n", + "weights = twfe.weights.sort_values(\"weight\").head(15)\n", + "labels = [\n", + " f\"M{int(r.market_id)}, wk{int(r.week)}\"\n", + " for r in weights.itertuples()\n", + "]\n", + "\n", + "fig, ax = plt.subplots(figsize=(9, 5))\n", + "colors = [\"#d62728\" if w < 0 else \"#1f77b4\" for w in weights[\"weight\"]]\n", + "ax.barh(range(len(weights)), weights[\"weight\"].values, color=colors)\n", + "ax.set_yticks(range(len(weights)))\n", + "ax.set_yticklabels(labels, fontsize=8)\n", + "ax.invert_yaxis()\n", + "ax.axvline(0, color=\"black\", linewidth=0.7)\n", + "ax.set_xlabel(\"TWFE weight on this cell\")\n", + "ax.set_title(\n", + " f\"Top 15 cells with most-negative TWFE weights\\n\"\n", + " f\"({twfe.fraction_negative*100:.1f}% of all {len(twfe.weights)} cells receive negative weight)\"\n", + ")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "t19-cell-015", + "metadata": {}, + "source": [ + "**The transition.** We need an estimator that only compares each switching cell to *contemporaneously stable* control cells - never to other switchers. That's what `DID_M` from the dCDH framework does, by construction." + ] + }, + { + "cell_type": "markdown", + "id": "t19-cell-016", + "metadata": {}, + "source": [ + "## 4. dCDH Phase 1: DID_M, Joiners, Leavers, Placebo" + ] + }, + { + "cell_type": "markdown", + "id": "t19-cell-017", + "metadata": {}, + "source": [ + "`DID_M` is the average across periods of two pieces:\n", + "\n", + "- **DID_+** (joiners): markets switching `0 \u2192 1` between consecutive periods, compared to *contemporaneously untreated* control cells.\n", + "- **DID_-** (leavers): markets switching `1 \u2192 0`, compared to *contemporaneously treated* control cells.\n", + "\n", + "Both pieces use only cells whose treatment status was stable across the two periods being compared - so no treated unit is ever used as a control for another treated unit. The library reports DID_+, DID_-, and their average DID_M separately, so you can see if the two halves agree.\n", + "\n", + "**Where do the controls come from?** dCDH's controls are *contemporaneously stable cells*, not a permanently-untreated comparison group. A market that's untreated at week 3 and week 4 contributes a stable-untreated cell at week 4 - even if that same market eventually turns the promo on at week 5 and keeps it on through week 8. Symmetrically, a market that's been running the promo since week 1 and is still running it at week 4 contributes a stable-treated cell at week 4. This is what lets dCDH work on panels with **no permanent never-treated markets at all** - our panel has zero never-treated and zero always-treated units, only 60 switchers. Callaway-Sant'Anna and other modern staggered-DiD estimators typically need a permanent never-treated cohort (or at minimum a not-yet-treated cohort that survives to the end of the panel); dCDH does not, because its identification is local in time. The technical condition - de Chaisemartin & D'Haultfoeuille's Assumption 11 - is that at every period when a switcher exists, at least one stable cell of the relevant type also exists. With 154 stable-untreated and 206 stable-treated cells across the panel, we're well clear of that condition.\n", + "\n", + "The library also computes a **single-lag placebo** `DID_M^pl`: the same DID_M machinery shifted one period back. Under parallel pre-trends the placebo should be near zero. (Note: Phase 1's single-lag placebo SE is `NaN` by design - the per-period aggregation path doesn't have an analytical influence-function derivation. Magnitude-only interpretation here; full inference comes from the multi-horizon placebos in Section 5 below.)" + ] + }, + { + "cell_type": "code", + "id": "t19-cell-018", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "model = DCDH(twfe_diagnostic=True, placebo=True, seed=42)\n", + "results = model.fit(\n", + " df,\n", + " outcome=\"sessions\",\n", + " group=\"market_id\",\n", + " time=\"week\",\n", + " treatment=\"promo_on\",\n", + ")\n", + "print(results.summary())" + ] + }, + { + "cell_type": "markdown", + "id": "t19-cell-019", + "metadata": {}, + "source": [ + "**Plain-English interpretation.** dCDH estimates the headline lift at **about 11.2 sessions per market-week** (95% CI: ~10.1 to 12.3), covering the true effect of 12.0. The TWFE coefficient was 11.5 - the two estimators happen to land close on this panel because effect heterogeneity is modest, but **dCDH guarantees no negative-weight contamination by construction**, while TWFE only happened to escape it this time.\n", + "\n", + "The TWFE diagnostic block at the bottom of the summary repeats the numbers from Section 3 (15.4% negative weights, sigma_fe \u2248 12.3) as a built-in cross-check - the library computes the diagnostic automatically when `twfe_diagnostic=True` (the default)." + ] + }, + { + "cell_type": "code", + "id": "t19-cell-020", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "# Joiners vs leavers breakdown\n", + "jl = results.to_dataframe(level=\"joiners_leavers\")\n", + "jl.round(3)" + ] + }, + { + "cell_type": "markdown", + "id": "t19-cell-021", + "metadata": {}, + "source": [ + "**Reading joiners vs leavers.** Both halves should produce a positive lift in a healthy marketing-pulse design - turning the promo on increases sessions, and turning it off decreases them. Here DID_+ \u2248 11.0 and DID_- \u2248 11.9: both substantially positive, both within sampling uncertainty of each other and of the true effect of 12. If they had disagreed by sign or by a large margin (say one was 5 and the other was 20), that would be a heterogeneity signal worth investigating before reporting one number to leadership." + ] + }, + { + "cell_type": "code", + "id": "t19-cell-022", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "# Placebo magnitude check (SE is NaN for Phase 1 single-lag)\n", + "print(f\"Placebo effect: {results.placebo_effect:.3f}\")\n", + "print(f\"|Placebo / DID_M|: {abs(results.placebo_effect / results.overall_att):.2%}\")\n", + "print()\n", + "print(\"Placebo magnitude is small (~8% of DID_M), supporting parallel\")\n", + "print(\"pre-trends. Full placebo inference with bootstrap CIs comes from\")\n", + "print(\"the multi-horizon event study below.\")" + ] + }, + { + "cell_type": "markdown", + "id": "t19-cell-023", + "metadata": {}, + "source": [ + "## 5. Multi-Horizon Event Study with Bootstrap" + ] + }, + { + "cell_type": "markdown", + "id": "t19-cell-024", + "metadata": {}, + "source": [ + "DID_M collapses the dynamic effect to one number - the average lift across all switching cells. Setting `L_max=L` instead computes `DID_l` for each horizon `l = 1..L` after each switch, plus `DID^pl_l` placebos at horizons `l = -1..-L+1`. This tells you whether the on-impact lift is sustained or fades, and whether the pre-treatment placebos sit on zero.\n", + "\n", + "With `L_max=2` we get two post-switch horizons and two placebo horizons. The multiplier bootstrap (`n_bootstrap=199`, matching the library's `ci_params.bootstrap` convention) gives valid CIs at every horizon, including the placebo horizons." + ] + }, + { + "cell_type": "code", + "id": "t19-cell-025", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "with warnings.catch_warnings():\n", + " warnings.simplefilter(\"ignore\", UserWarning)\n", + " model_es = DCDH(\n", + " twfe_diagnostic=False, placebo=True, n_bootstrap=199, seed=42\n", + " )\n", + " results_es = model_es.fit(\n", + " df,\n", + " outcome=\"sessions\",\n", + " group=\"market_id\",\n", + " time=\"week\",\n", + " treatment=\"promo_on\",\n", + " L_max=2,\n", + " )\n", + "\n", + "es_df = results_es.to_dataframe(level=\"event_study\")\n", + "es_df.round(3)" + ] + }, + { + "cell_type": "code", + "id": "t19-cell-026", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "# Event-study errorbar plot with bootstrap CIs.\n", + "es_plot = es_df[es_df[\"horizon\"] != 0] # drop reference row\n", + "is_pre = es_plot[\"horizon\"] < 0\n", + "\n", + "fig, ax = plt.subplots(figsize=(9, 5))\n", + "ax.errorbar(\n", + " es_plot.loc[is_pre, \"horizon\"],\n", + " es_plot.loc[is_pre, \"effect\"],\n", + " yerr=[\n", + " es_plot.loc[is_pre, \"effect\"] - es_plot.loc[is_pre, \"conf_int_lower\"],\n", + " es_plot.loc[is_pre, \"conf_int_upper\"] - es_plot.loc[is_pre, \"effect\"],\n", + " ],\n", + " fmt=\"o\", color=\"#888888\", capsize=4, label=\"Pre-treatment placebos\",\n", + ")\n", + "ax.errorbar(\n", + " es_plot.loc[~is_pre, \"horizon\"],\n", + " es_plot.loc[~is_pre, \"effect\"],\n", + " yerr=[\n", + " es_plot.loc[~is_pre, \"effect\"] - es_plot.loc[~is_pre, \"conf_int_lower\"],\n", + " es_plot.loc[~is_pre, \"conf_int_upper\"] - es_plot.loc[~is_pre, \"effect\"],\n", + " ],\n", + " fmt=\"o\", color=\"#1f77b4\", capsize=4, label=\"Post-treatment effects\",\n", + ")\n", + "ax.axhline(0, color=\"black\", linewidth=0.7, linestyle=\"--\")\n", + "ax.axvline(0, color=\"black\", linewidth=0.7, linestyle=\"--\")\n", + "ax.axhline(12.0, color=\"green\", linewidth=0.8, linestyle=\":\", label=\"true effect = 12.0\")\n", + "ax.set_xlabel(\"Weeks since promo switched\")\n", + "ax.set_ylabel(\"Effect on weekly sessions\")\n", + "ax.set_title(\"dCDH event study (L_max=2, multiplier bootstrap)\")\n", + "ax.legend(loc=\"lower right\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "t19-cell-027", + "metadata": {}, + "source": [ + "**Reading the event study.**\n", + "\n", + "- **Both placebo horizons** (l = -2 and l = -1) sit on zero with confidence intervals comfortably covering it. Pre-trends look parallel - we have no evidence that something other than the promo was driving session growth in the cells we're using as controls.\n", + "- **On-impact effect** at l = 1 is about **+11.2 sessions** with a 95% bootstrap CI of roughly [9.7, 12.8], covering the true effect of 12.\n", + "- **Sustained effect** at l = 2 is **+11.3 sessions** with CI [10.0, 12.6]. The lift didn't fade in the second week post-switch.\n", + "\n", + "Bootstrap CIs reflect the cohort-recentered influence-function variance with the finite-sample stability the multiplier bootstrap provides. The fact that both horizons agree closely with each other AND with the headline `DID_M` from Section 4 (the per-period and per-group aggregation paths converge) is a built-in consistency check." + ] + }, + { + "cell_type": "markdown", + "id": "t19-cell-028", + "metadata": {}, + "source": [ + "## 6. Communicating Results to Leadership" + ] + }, + { + "cell_type": "markdown", + "id": "t19-cell-029", + "metadata": {}, + "source": [ + "A stakeholder-ready summary of the synthetic walkthrough above. Each bullet pulls from a specific section of the analysis:\n", + "\n", + "> **Headline.** The pulse campaign lifted weekly checkout sessions by approximately **11.2 sessions per market per week** while the promo was on (95% CI: 10.1 to 12.3). On a baseline of about 110 weekly sessions per market, that's roughly a **10% lift**. *[Source: `results.overall_att` from Section 4.]*\n", + ">\n", + "> **Sample size and design.** 60 markets observed for 8 weeks (480 market-weeks). Of those, 43 markets started untreated and switched the promo on at some point during the quarter (joiners), and 17 markets started with the promo on and switched it off (leavers). Method: dCDH (de Chaisemartin & D'Haultfoeuille 2020), the only Python estimator built for treatment that can switch on AND off in the same panel. *[Source: switcher_type counts and panel shape from Section 2.]*\n", + ">\n", + "> **Validity evidence.** Three checks supported the result. (a) The TWFE diagnostic flagged 15.4% of cells with negative weight in the standard regression, signaling that we needed an alternative - dCDH avoids that contamination by construction. (b) The single-lag placebo from the per-period aggregation was small (~0.9 sessions, ~8% of the headline). (c) The multi-horizon placebos at l = -2 and l = -1 both sat on zero with bootstrap CIs comfortably covering it - parallel pre-trends look credible. *[Sources: TWFE diagnostic from Section 3, single-lag placebo from Section 4, multi-horizon placebos from Section 5.]*\n", + ">\n", + "> **What \"+11.2 sessions per market per week\" means in business terms.** Across 60 markets and the weeks each one had the promo on, that's the per-market-week lift attributable to the campaign. Translate to your own revenue-per-session to compare against campaign spend, then use the per-market lift estimate to project what scaling the promo to additional markets would deliver. *[Source: business framing of the headline.]*\n", + ">\n", + "> **Practical significance caveat.** The 10% lift is statistically significant (bootstrap p < 0.01 at both post-treatment horizons), and the on-impact effect persists at the second horizon - the pulse worked while it was on. Whether 10% justifies the campaign cost is a business judgment, not a statistical one. Note also that joiners (DID_+ \u2248 11.0) and leavers (DID_- \u2248 11.9) gave consistent signals, which reduces the worry that the average is hiding heterogeneity between starting and stopping the promo. *[Sources: dynamic horizons from Section 5, joiners/leavers breakdown from Section 4.]*" + ] + }, + { + "cell_type": "markdown", + "id": "t19-cell-030", + "metadata": {}, + "source": [ + "Adapt this template for your own campaign by swapping in your numbers from `results.summary()`, your own market and switcher counts, your own validity diagnostics, and your own business translation. The pattern - **headline \u2192 sample size and design \u2192 validity evidence \u2192 business interpretation \u2192 practical significance** - is the part to keep." + ] + }, + { + "cell_type": "code", + "id": "t19-cell-031", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "# Drift guards: tolerance-based asserts that lock the numbers quoted in\n", + "# the Section 4 / Section 5 narrative and the Section 6 stakeholder\n", + "# template. nbmake will fail if generate_reversible_did_data() or DCDH\n", + "# output drifts outside these ranges, forcing the markdown to be\n", + "# updated before this notebook can pass CI.\n", + "#\n", + "# Asserts pull from BOTH `results` (Section 4 single-horizon fit) AND\n", + "# `results_es` (Section 5 multi-horizon fit) - both fits are still\n", + "# in scope above this cell.\n", + "\n", + "# Section 4 (L_max=None): per-period DID_M path\n", + "assert 10.72 <= results.overall_att <= 11.72, results.overall_att\n", + "assert results.overall_conf_int[0] <= 12.0 <= results.overall_conf_int[1]\n", + "assert abs(results.placebo_effect) < 1.5, results.placebo_effect\n", + "assert results.twfe_fraction_negative >= 0.10 # documents TWFE bias signal\n", + "\n", + "# Section 5 (L_max=2): per-group DID_g,1 path - DIFFERENT compute path\n", + "# than overall_att, NOT bit-identical. Verified in seed-search to agree\n", + "# on truth-coverage at seed=53.\n", + "_h1 = results_es.event_study_effects[1][\"effect\"]\n", + "assert 10.24 <= _h1 <= 12.24, _h1\n", + "assert (\n", + " results_es.event_study_effects[1][\"conf_int\"][0]\n", + " <= 12.0\n", + " <= results_es.event_study_effects[1][\"conf_int\"][1]\n", + ")\n", + "\n", + "print(\"All drift guards passed.\")" + ] + }, + { + "cell_type": "markdown", + "id": "t19-cell-032", + "metadata": {}, + "source": [ + "## 7. Extensions and Where to Go Next" + ] + }, + { + "cell_type": "markdown", + "id": "t19-cell-033", + "metadata": {}, + "source": [ + "This tutorial covered the dCDH **Phase 1** surface (DID_M, joiners/leavers decomposition, single-lag placebo, TWFE diagnostic) plus the **multi-horizon event study with bootstrap** (`L_max`, `n_bootstrap`). The library also supports several extensions that we did not demonstrate here:\n", + "\n", + "- **Per-trajectory disaggregation** (`by_path=k`): when joiners and leavers each follow a few common treatment paths (e.g., on-off-on vs on-on-off), `by_path=k` reports the event study separately for the top-k most common observed paths. Useful for pulse campaigns where the schedule varies across markets.\n", + "- **Group-specific linear trends** (`trends_linear=True`): allows each market to have its own pre-treatment slope, absorbing differential trends.\n", + "- **State-set-specific trends** (`trends_nonparam=...`): allows non-parametric trends shared within state-set strata.\n", + "- **HonestDiD sensitivity analysis** (`honest_did=True`): Rambachan-Roth (2023) bounds on the post-treatment effects under controlled parallel-trends violations, computed on the placebo event-study surface.\n", + "- **Survey-design support** (`survey_design=...`): Taylor-series linearization with sampling weights, strata, PSU, and FPC.\n", + "\n", + "See [`docs/api/chaisemartin_dhaultfoeuille.rst`](../api/chaisemartin_dhaultfoeuille.html) for the full parameter reference and [`docs/methodology/REGISTRY.md`](../methodology/REGISTRY.html) for the methodology contract on each surface." + ] + }, + { + "cell_type": "markdown", + "id": "t19-cell-034", + "metadata": {}, + "source": [ + "**Related tutorials.**\n", + "\n", + "- [Tutorial 1: Basic DiD](01_basic_did.ipynb) - the 2x2 building block dCDH generalizes.\n", + "- [Tutorial 2: Staggered DiD](02_staggered_did.ipynb) - Goodman-Bacon decomposition is the staggered-adoption analog of the TWFE diagnostic shown here.\n", + "- [Tutorial 5: HonestDiD](05_honest_did.ipynb) - sensitivity to parallel-trends violations on event studies; works on dCDH's placebo surface via `honest_did=True`.\n", + "- [Tutorial 17: Brand Awareness Survey](17_brand_awareness_survey.ipynb) - reach for this if you have survey data with sampling weights / strata / PSU instead of a panel.\n", + "- [Tutorial 18: Geo-Experiment Analysis](18_geo_experiments.ipynb) - reach for this if you have a single-launch pilot in a small number of test markets." + ] + }, + { + "cell_type": "markdown", + "id": "t19-cell-035", + "metadata": {}, + "source": [ + "**Summary: when to reach for dCDH.**\n", + "\n", + "1. Use dCDH when treatment is **reversible** - the panel has switchers in both directions (joiners and leavers) in the same data. No other modern Python DiD estimator handles this case.\n", + "2. Run `twowayfeweights` *before* fitting any estimator on a reversible panel - the diagnostic tells you whether to worry about TWFE contamination, in numbers (`fraction_negative`, `sigma_fe`).\n", + "3. Read joiners (`DID_+`) and leavers (`DID_-`) separately. Disagreement between the two halves is heterogeneity worth investigating before averaging into one number for stakeholders.\n", + "4. Use `L_max` + multiplier bootstrap to expose the dynamic structure of the effect - is the lift on-impact only, sustained, or fading? - and to get valid placebo CIs that the Phase 1 single-lag placebo can't provide.\n", + "5. Defer to follow-up tutorials for `by_path`, `trends_linear`/`trends_nonparam`, HonestDiD on dCDH's placebo surface, and the survey-design integration. Each is a single constructor or `fit()` kwarg away." + ] + } + ], + "metadata": { + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.6" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/tutorials/README.md b/docs/tutorials/README.md index f6afc23a..e801a44c 100644 --- a/docs/tutorials/README.md +++ b/docs/tutorials/README.md @@ -70,6 +70,15 @@ Survey-aware DiD with complex sampling designs (strata, PSU, FPC, weights): - DEFF diagnostics - Repeated cross-sections with survey design +### 17. Brand Awareness Survey (`17_brand_awareness_survey.ipynb`) +Practitioner walkthrough for measuring brand-campaign lift on survey data with complex sampling: +- The brand-tracker problem framed for marketing analytics +- Naive vs survey-aware DiD comparison (overconfidence under naive) +- `SurveyDesign` setup (strata, PSU, FPC, weights) wired into the fit +- Funnel-metric extension across awareness / consideration / purchase intent +- Diagnostics (parallel trends, placebo, automated `practitioner_next_steps()`) +- Stakeholder communication template + ### 18. Geo-Experiment Analysis with SyntheticDiD (`18_geo_experiments.ipynb`) Practitioner walkthrough for marketing analytics teams measuring geo-experiment lift: - The geo-experiment problem framed for marketing analytics @@ -78,6 +87,14 @@ Practitioner walkthrough for marketing analytics teams measuring geo-experiment - Unit weights and time weights interpretation - Stakeholder communication template (Tutorial 17 Section 9 pattern) +### 19. dCDH Marketing Pulse Campaigns (`19_dcdh_marketing_pulse.ipynb`) +Practitioner walkthrough for measuring lift from on/off promotional pulses across markets, where treatment can switch in both directions: +- The marketing-pulse problem framed for reversible (non-absorbing) treatment +- TWFE decomposition diagnostic (`twowayfeweights`) showing why standard regression misleads on reversible panels (de Chaisemartin & D'Haultfoeuille 2020 Theorem 1) +- `DCDH` Phase 1: DID_M, joiners-vs-leavers decomposition, single-lag placebo +- Multi-horizon event study with `L_max` + multiplier bootstrap +- Stakeholder communication template + drift guards + ## Running the Notebooks 1. Install diff-diff with dependencies: From 86ef3ce4c12b9d8cf9c7149799ff1421e8734c90 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 25 Apr 2026 11:44:12 -0400 Subject: [PATCH 2/4] Address review: placebo horizon range, scope claims, narrow warning filter - Fix placebo horizon range from `l = -1..-L+1` to `l = -L..-1` (matches the implementation: at L_max=2 horizons are l = -2 and l = -1) - Scope "the only Python estimator" claims to "diff-diff's only estimator" in Section 1 abstract and the stakeholder template - REGISTRY.md asserts uniqueness in the library, not across all of Python - Rewrite the Section 4 cross-estimator paragraph to enumerate the actual staggered estimators in diff-diff (CallawaySantAnna, SunAbraham, WooldridgeETWFE, ImputationDiD, TwoStageDiD, EfficientDiD) and frame the comparison around the absorbing-treatment restriction rather than "needs a never-treated cohort that survives to the end of the panel" - Narrow the L_max=2 fit warning filter from `simplefilter("ignore", UserWarning)` to `filterwarnings("ignore", message=r"Assumption 7 .*", category=UserWarning)` so only the expected leavers-present warning is silenced; any new / unexpected UserWarning will surface and keep the notebook usable as a drift detector Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/tutorials/19_dcdh_marketing_pulse.ipynb | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/docs/tutorials/19_dcdh_marketing_pulse.ipynb b/docs/tutorials/19_dcdh_marketing_pulse.ipynb index eabe4290..b997318b 100644 --- a/docs/tutorials/19_dcdh_marketing_pulse.ipynb +++ b/docs/tutorials/19_dcdh_marketing_pulse.ipynb @@ -7,7 +7,7 @@ "source": [ "# Tutorial 19: dCDH for Marketing Pulse Campaigns\n", "\n", - "A practitioner walkthrough for measuring lift from promotional campaigns that turn on AND off across markets at staggered times. The tutorial uses the `ChaisemartinDHaultfoeuille` estimator (alias `DCDH`), the only modern Python DiD estimator built for reversible (non-absorbing) treatment." + "A practitioner walkthrough for measuring lift from promotional campaigns that turn on AND off across markets at staggered times. The tutorial uses the `ChaisemartinDHaultfoeuille` estimator (alias `DCDH`) - diff-diff's only estimator built for reversible (non-absorbing) treatment, where every other modern staggered estimator in the library assumes treatment is absorbing." ] }, { @@ -265,7 +265,7 @@ "\n", "Both pieces use only cells whose treatment status was stable across the two periods being compared - so no treated unit is ever used as a control for another treated unit. The library reports DID_+, DID_-, and their average DID_M separately, so you can see if the two halves agree.\n", "\n", - "**Where do the controls come from?** dCDH's controls are *contemporaneously stable cells*, not a permanently-untreated comparison group. A market that's untreated at week 3 and week 4 contributes a stable-untreated cell at week 4 - even if that same market eventually turns the promo on at week 5 and keeps it on through week 8. Symmetrically, a market that's been running the promo since week 1 and is still running it at week 4 contributes a stable-treated cell at week 4. This is what lets dCDH work on panels with **no permanent never-treated markets at all** - our panel has zero never-treated and zero always-treated units, only 60 switchers. Callaway-Sant'Anna and other modern staggered-DiD estimators typically need a permanent never-treated cohort (or at minimum a not-yet-treated cohort that survives to the end of the panel); dCDH does not, because its identification is local in time. The technical condition - de Chaisemartin & D'Haultfoeuille's Assumption 11 - is that at every period when a switcher exists, at least one stable cell of the relevant type also exists. With 154 stable-untreated and 206 stable-treated cells across the panel, we're well clear of that condition.\n", + "**Where do the controls come from?** dCDH's controls are *contemporaneously stable cells*, not a permanently-untreated comparison group. A market that's untreated at week 3 and week 4 contributes a stable-untreated cell at week 4 - even if that same market eventually turns the promo on at week 5 and keeps it on through week 8. Symmetrically, a market that's been running the promo since week 1 and is still running it at week 4 contributes a stable-treated cell at week 4. This is what lets dCDH work on panels with **no permanent never-treated markets at all** - our panel has zero never-treated and zero always-treated units, only 60 switchers. Among diff-diff's modern staggered-DiD estimators - Callaway-Sant'Anna, Sun-Abraham, Wooldridge ETWFE, ImputationDiD, TwoStageDiD, EfficientDiD - all assume absorbing treatment, so the question of which controls they use only arises in panels where treatment never switches off. dCDH applies in the broader reversible-treatment setting and uses contemporaneous stability rather than a permanent never-treated cohort. The technical condition - de Chaisemartin & D'Haultfoeuille's Assumption 11 - is that at every period when a switcher exists, at least one stable cell of the relevant type also exists. With 154 stable-untreated and 206 stable-treated cells across the panel, we're well clear of that condition.\n", "\n", "The library also computes a **single-lag placebo** `DID_M^pl`: the same DID_M machinery shifted one period back. Under parallel pre-trends the placebo should be near zero. (Note: Phase 1's single-lag placebo SE is `NaN` by design - the per-period aggregation path doesn't have an analytical influence-function derivation. Magnitude-only interpretation here; full inference comes from the multi-horizon placebos in Section 5 below.)" ] @@ -347,7 +347,7 @@ "id": "t19-cell-024", "metadata": {}, "source": [ - "DID_M collapses the dynamic effect to one number - the average lift across all switching cells. Setting `L_max=L` instead computes `DID_l` for each horizon `l = 1..L` after each switch, plus `DID^pl_l` placebos at horizons `l = -1..-L+1`. This tells you whether the on-impact lift is sustained or fades, and whether the pre-treatment placebos sit on zero.\n", + "DID_M collapses the dynamic effect to one number - the average lift across all switching cells. Setting `L_max=L` instead computes `DID_l` for each horizon `l = 1..L` after each switch, plus `DID^pl_l` placebos at horizons `l = -L..-1`. This tells you whether the on-impact lift is sustained or fades, and whether the pre-treatment placebos sit on zero.\n", "\n", "With `L_max=2` we get two post-switch horizons and two placebo horizons. The multiplier bootstrap (`n_bootstrap=199`, matching the library's `ci_params.bootstrap` convention) gives valid CIs at every horizon, including the placebo horizons." ] @@ -359,8 +359,16 @@ "execution_count": null, "outputs": [], "source": [ + "# Narrow filter: silence the EXPECTED Assumption 7 warning (cost-benefit\n", + "# delta is computed on the full sample when leavers are present), and\n", + "# let any new / unexpected UserWarning surface so the notebook stays\n", + "# usable as a drift detector.\n", "with warnings.catch_warnings():\n", - " warnings.simplefilter(\"ignore\", UserWarning)\n", + " warnings.filterwarnings(\n", + " \"ignore\",\n", + " message=r\"Assumption 7 .* is violated: leavers present\",\n", + " category=UserWarning,\n", + " )\n", " model_es = DCDH(\n", " twfe_diagnostic=False, placebo=True, n_bootstrap=199, seed=42\n", " )\n", @@ -448,7 +456,7 @@ "\n", "> **Headline.** The pulse campaign lifted weekly checkout sessions by approximately **11.2 sessions per market per week** while the promo was on (95% CI: 10.1 to 12.3). On a baseline of about 110 weekly sessions per market, that's roughly a **10% lift**. *[Source: `results.overall_att` from Section 4.]*\n", ">\n", - "> **Sample size and design.** 60 markets observed for 8 weeks (480 market-weeks). Of those, 43 markets started untreated and switched the promo on at some point during the quarter (joiners), and 17 markets started with the promo on and switched it off (leavers). Method: dCDH (de Chaisemartin & D'Haultfoeuille 2020), the only Python estimator built for treatment that can switch on AND off in the same panel. *[Source: switcher_type counts and panel shape from Section 2.]*\n", + "> **Sample size and design.** 60 markets observed for 8 weeks (480 market-weeks). Of those, 43 markets started untreated and switched the promo on at some point during the quarter (joiners), and 17 markets started with the promo on and switched it off (leavers). Method: dCDH (de Chaisemartin & D'Haultfoeuille 2020) - diff-diff's only estimator built for treatment that can switch on AND off in the same panel. *[Source: switcher_type counts and panel shape from Section 2.]*\n", ">\n", "> **Validity evidence.** Three checks supported the result. (a) The TWFE diagnostic flagged 15.4% of cells with negative weight in the standard regression, signaling that we needed an alternative - dCDH avoids that contamination by construction. (b) The single-lag placebo from the per-period aggregation was small (~0.9 sessions, ~8% of the headline). (c) The multi-horizon placebos at l = -2 and l = -1 both sat on zero with bootstrap CIs comfortably covering it - parallel pre-trends look credible. *[Sources: TWFE diagnostic from Section 3, single-lag placebo from Section 4, multi-horizon placebos from Section 5.]*\n", ">\n", From cc904db3f6c5c972bb81cdd01d92687436508468 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 25 Apr 2026 11:55:54 -0400 Subject: [PATCH 3/4] Address review P1: A11 verification is per-period, not whole-panel The Section 4 paragraph previously inferred Assumption 11 satisfaction from whole-panel stable-cell totals (154 stable_0, 206 stable_1). That is the wrong check - A11 is per-period: at every switching period, at least one stable cell of the relevant type must exist. Rewrite the closing sentences of the "Where do the controls come from?" paragraph to (a) make the per-period nature of the check explicit, (b) reference the library's fit-time A11 warning machinery as the correct verification mechanism, (c) note that our fit ran without any A11 warning, and (d) explain why single-switch panels naturally tend to satisfy A11 (adjacent cohorts function as stable controls for each other). Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/tutorials/19_dcdh_marketing_pulse.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/19_dcdh_marketing_pulse.ipynb b/docs/tutorials/19_dcdh_marketing_pulse.ipynb index b997318b..26e23179 100644 --- a/docs/tutorials/19_dcdh_marketing_pulse.ipynb +++ b/docs/tutorials/19_dcdh_marketing_pulse.ipynb @@ -265,7 +265,7 @@ "\n", "Both pieces use only cells whose treatment status was stable across the two periods being compared - so no treated unit is ever used as a control for another treated unit. The library reports DID_+, DID_-, and their average DID_M separately, so you can see if the two halves agree.\n", "\n", - "**Where do the controls come from?** dCDH's controls are *contemporaneously stable cells*, not a permanently-untreated comparison group. A market that's untreated at week 3 and week 4 contributes a stable-untreated cell at week 4 - even if that same market eventually turns the promo on at week 5 and keeps it on through week 8. Symmetrically, a market that's been running the promo since week 1 and is still running it at week 4 contributes a stable-treated cell at week 4. This is what lets dCDH work on panels with **no permanent never-treated markets at all** - our panel has zero never-treated and zero always-treated units, only 60 switchers. Among diff-diff's modern staggered-DiD estimators - Callaway-Sant'Anna, Sun-Abraham, Wooldridge ETWFE, ImputationDiD, TwoStageDiD, EfficientDiD - all assume absorbing treatment, so the question of which controls they use only arises in panels where treatment never switches off. dCDH applies in the broader reversible-treatment setting and uses contemporaneous stability rather than a permanent never-treated cohort. The technical condition - de Chaisemartin & D'Haultfoeuille's Assumption 11 - is that at every period when a switcher exists, at least one stable cell of the relevant type also exists. With 154 stable-untreated and 206 stable-treated cells across the panel, we're well clear of that condition.\n", + "**Where do the controls come from?** dCDH's controls are *contemporaneously stable cells*, not a permanently-untreated comparison group. A market that's untreated at week 3 and week 4 contributes a stable-untreated cell at week 4 - even if that same market eventually turns the promo on at week 5 and keeps it on through week 8. Symmetrically, a market that's been running the promo since week 1 and is still running it at week 4 contributes a stable-treated cell at week 4. This is what lets dCDH work on panels with **no permanent never-treated markets at all** - our panel has zero never-treated and zero always-treated units, only 60 switchers. Among diff-diff's modern staggered-DiD estimators - Callaway-Sant'Anna, Sun-Abraham, Wooldridge ETWFE, ImputationDiD, TwoStageDiD, EfficientDiD - all assume absorbing treatment, so the question of which controls they use only arises in panels where treatment never switches off. dCDH applies in the broader reversible-treatment setting and uses contemporaneous stability rather than a permanent never-treated cohort. The technical condition - de Chaisemartin & D'Haultfoeuille's Assumption 11 - is that at every period when a switcher exists, at least one stable cell of the relevant type also exists. The check is **per-period**, not on whole-panel totals: 154 stable-untreated cells aggregated across the panel doesn't prove anything if some specific switching week happened to have none. The library checks A11 at fit time period-by-period and emits a `UserWarning` (zeroing the offending period's contribution by paper convention) if any switching period lacks stable controls. Our fit above ran without such a warning, so A11 holds at every switching week in this DGP. Single-switch panels also tend to satisfy A11 by construction because each cohort's pre-switch and post-switch periods naturally function as stable cells for cohorts that switch at adjacent times.\n", "\n", "The library also computes a **single-lag placebo** `DID_M^pl`: the same DID_M machinery shifted one period back. Under parallel pre-trends the placebo should be near zero. (Note: Phase 1's single-lag placebo SE is `NaN` by design - the per-period aggregation path doesn't have an analytical influence-function derivation. Magnitude-only interpretation here; full inference comes from the multi-horizon placebos in Section 5 below.)" ] From bc343146dba8c7c9d4f901ae8f3bf5882d37c65e Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 25 Apr 2026 12:21:29 -0400 Subject: [PATCH 4/4] Address review P3: drop ecosystem-wide uniqueness claim from summary Bullet 1 of the closing summary previously asserted "No other modern Python DiD estimator handles this case" - broader than the library's own documented contract (REGISTRY only asserts uniqueness in diff-diff) and a claim that can go stale independently of the library. Drop the comparison sentence entirely. The reader has the within-library positioning from Section 1 and Section 4; the closing checklist doesn't need an ecosystem-wide claim to make its point. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/tutorials/19_dcdh_marketing_pulse.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/19_dcdh_marketing_pulse.ipynb b/docs/tutorials/19_dcdh_marketing_pulse.ipynb index 26e23179..5a91b4a0 100644 --- a/docs/tutorials/19_dcdh_marketing_pulse.ipynb +++ b/docs/tutorials/19_dcdh_marketing_pulse.ipynb @@ -555,7 +555,7 @@ "source": [ "**Summary: when to reach for dCDH.**\n", "\n", - "1. Use dCDH when treatment is **reversible** - the panel has switchers in both directions (joiners and leavers) in the same data. No other modern Python DiD estimator handles this case.\n", + "1. Use dCDH when treatment is **reversible** - the panel has switchers in both directions (joiners and leavers) in the same data.\n", "2. Run `twowayfeweights` *before* fitting any estimator on a reversible panel - the diagnostic tells you whether to worry about TWFE contamination, in numbers (`fraction_negative`, `sigma_fe`).\n", "3. Read joiners (`DID_+`) and leavers (`DID_-`) separately. Disagreement between the two halves is heterogeneity worth investigating before averaging into one number for stakeholders.\n", "4. Use `L_max` + multiplier bootstrap to expose the dynamic structure of the effect - is the lift on-impact only, sustained, or fading? - and to get valid placebo CIs that the Phase 1 single-lag placebo can't provide.\n",