diff --git a/docs/tutorials/19_dcdh_marketing_pulse.ipynb b/docs/tutorials/19_dcdh_marketing_pulse.ipynb index 5a91b4a0..f6e37fb0 100644 --- a/docs/tutorials/19_dcdh_marketing_pulse.ipynb +++ b/docs/tutorials/19_dcdh_marketing_pulse.ipynb @@ -15,30 +15,18 @@ "id": "t19-cell-002", "metadata": {}, "source": [ - "## 1. The Marketing Pulse Problem" - ] - }, - { - "cell_type": "markdown", - "id": "t19-cell-003", - "metadata": {}, - "source": [ - "Your team runs paid-promo pulses across 60 markets. Some markets ran the promo at the start of the quarter and turned it off as the campaign budget rolled to the next geo; others started untreated and switched the promo on at some point during the quarter. Leadership wants the average lift on weekly checkout sessions while the promo was on.\n", - "\n", - "**Why this is hard.** Three things break standard methods:\n", + "## 1. The Marketing Pulse Problem\n", "\n", - "1. **Treatment is reversible.** This panel has both joiners (markets that switched the promo on) and leavers (markets that switched it off). The canonical staggered-DiD estimators - Callaway-Sant'Anna, Sun-Abraham, Wooldridge ETWFE, ImputationDiD - all assume *absorbing* treatment: once treated, always treated. They simply don't apply when the promo can come back off.\n", + "Your team runs paid-promo pulses across 60 markets. Some markets ran the promo at the start of the quarter and turned it off as the campaign budget rolled to the next geo (leavers); others started untreated and switched the promo on at some point during the quarter (joiners). Leadership wants the average lift on weekly checkout sessions while the promo was on.\n", "\n", - "2. **Two-way fixed-effects regression silently uses negative weights.** When you have switchers in both directions in the same panel, OLS with unit and time fixed effects ends up using some treated cells as *controls* for other treated cells, weighting those cells negatively. Under heterogeneous treatment effects, those negative weights can attenuate or even flip the sign of the regression coefficient ([de Chaisemartin & D'Haultfoeuille 2020](https://www.aeaweb.org/articles?id=10.1257/aer.20181169), Theorem 1).\n", + "**Why dCDH.** This panel has *reversible* (non-absorbing) treatment in the dCDH sense: across the panel, the promo turns on in some markets and off in others - both directions appear in the same dataset. Every other modern staggered-DiD estimator in diff-diff (Callaway-Sant'Anna, Sun-Abraham, Wooldridge ETWFE, ImputationDiD, TwoStageDiD, EfficientDiD) assumes treatment is absorbing: once treated, always treated. They simply don't apply to a panel that contains leavers. dCDH does, following [de Chaisemartin & D'Haultfoeuille (2020)](https://www.aeaweb.org/articles?id=10.1257/aer.20181169) and the [dynamic companion paper](https://www.nber.org/papers/w29873).\n", "\n", - "3. **No diagnostic tells you when to worry.** The standard error from the OLS regression doesn't reveal the weighting problem. You need a separate decomposition to know whether to trust the regression coefficient or reach for an alternative.\n", - "\n", - "**Why diff-diff.** The library implements `ChaisemartinDHaultfoeuille` (`DCDH`) following the AER 2020 paper plus its [dynamic companion](https://www.nber.org/papers/w29873). Phase 1 ships the contemporaneous-switch estimator `DID_M` plus a joiners-vs-leavers decomposition; the multi-horizon event study via `L_max` adds dynamic effects with multiplier-bootstrap inference. Critically, the library also exposes the AER 2020 Theorem 1 TWFE decomposition as a standalone diagnostic - so you can quantify how badly TWFE is contaminated *before* you reach for the fix. Implementation details and any documented deviations from R's `did_multiplegt_dyn` reference live in [`docs/methodology/REGISTRY.md`](../methodology/REGISTRY.html)." + "**Scope of this tutorial.** Each market in our panel switches *at most once* during the quarter (the dCDH paper's Assumption 5, which the default analytical SE path requires). So a market is either a stable-untreated unit, a joiner that turns the promo on exactly once, a leaver that turns it off exactly once, or a stable-treated unit. dCDH does support multi-switch within-market paths (e.g., on-off-on cycles) via `drop_larger_lower=False` plus `by_path=k` for per-path effects, but that's a separate scope - see the extensions section at the end. Implementation details and any documented deviations from R's `did_multiplegt_dyn` reference live in [`docs/methodology/REGISTRY.md`](../methodology/REGISTRY.html)." ] }, { "cell_type": "code", - "id": "t19-cell-004", + "id": "t19-cell-003", "metadata": {}, "execution_count": null, "outputs": [], @@ -49,41 +37,29 @@ "import numpy as np\n", "import pandas as pd\n", "\n", - "from diff_diff import (\n", - " DCDH,\n", - " generate_reversible_did_data,\n", - " twowayfeweights,\n", - ")\n", + "from diff_diff import DCDH, generate_reversible_did_data\n", "\n", "plt.style.use(\"seaborn-v0_8-whitegrid\")" ] }, { "cell_type": "markdown", - "id": "t19-cell-005", - "metadata": {}, - "source": [ - "## 2. The Data" - ] - }, - { - "cell_type": "markdown", - "id": "t19-cell-006", + "id": "t19-cell-004", "metadata": {}, "source": [ + "## 2. The Data\n", + "\n", "We'll simulate a panel that mirrors a marketing pulse campaign:\n", "\n", "- **60 markets**, each observed for **8 weeks**\n", "- Some markets started the quarter with the promo on and switched it off (leavers); others started untreated and switched the promo on (joiners). Each market switches exactly once during the panel - the [A5 single-switch contract](../methodology/REGISTRY.html) the analytical SE is derived under.\n", "- Outcome: weekly checkout sessions per market, baseline ~110\n", - "- True treatment effect: **+12 sessions per market-week** when the promo is on, with cell-level effect heterogeneity (some markets respond more strongly than others).\n", - "\n", - "We use `generate_reversible_did_data` with `pattern=\"single_switch\"` and `heterogeneous_effects=True`. Because the data is synthetic, the true effect is known and we can verify dCDH recovers it." + "- True treatment effect: **+12 sessions per market-week** when the promo is on, with mild cell-level heterogeneity around that average." ] }, { "cell_type": "code", - "id": "t19-cell-007", + "id": "t19-cell-005", "metadata": {}, "execution_count": null, "outputs": [], @@ -95,11 +71,11 @@ " initial_treat_frac=0.4,\n", " treatment_effect=12.0,\n", " heterogeneous_effects=True,\n", - " effect_sd=4.0,\n", + " effect_sd=1.5,\n", " group_fe_sd=8.0,\n", " time_trend=0.5,\n", " noise_sd=2.0,\n", - " seed=53, # locked via seed-search; see _scratch/dcdh_tutorial/\n", + " seed=46, # locked via _scratch/dcdh_tutorial/ seed-search\n", ")\n", "df = raw.rename(\n", " columns={\n", @@ -119,165 +95,79 @@ }, { "cell_type": "code", - "id": "t19-cell-008", + "id": "t19-cell-006", "metadata": {}, "execution_count": null, "outputs": [], "source": [ - "# Switcher-type counts. With pattern=\"single_switch\" every group\n", - "# switches exactly once, so we have only joiners (0 \u2192 1) and leavers\n", - "# (1 \u2192 0); no never-treated or always-treated groups by construction.\n", + "# Switcher-type counts. With pattern=\"single_switch\" every market\n", + "# switches exactly once, so we have only joiners (0 \u2192 1) and\n", + "# leavers (1 \u2192 0); no never-treated or always-treated markets by\n", + "# construction.\n", "df.groupby(\"switcher_type\").size()" ] }, { "cell_type": "code", - "id": "t19-cell-009", + "id": "t19-cell-007", "metadata": {}, "execution_count": null, "outputs": [], "source": [ - "# Mean sessions over time, split by which direction the market switched.\n", + "# Mean sessions over time, split by which direction the market\n", + "# switched. Joiners (blue) ramp up after they turn the promo on;\n", + "# leavers (red) drop off after they turn it off.\n", "first_treat = df.groupby(\"market_id\")[\"promo_on\"].first()\n", "category = df[\"market_id\"].map(\n", - " lambda m: \"starts off, switches on\" if first_treat[m] == 0 else \"starts on, switches off\"\n", + " lambda m: \"starts off, switches on (joiner)\" if first_treat[m] == 0 else \"starts on, switches off (leaver)\"\n", ")\n", "df_plot = df.assign(category=category)\n", "\n", "fig, ax = plt.subplots(figsize=(9, 5))\n", - "for label, color in [(\"starts off, switches on\", \"#1f77b4\"), (\"starts on, switches off\", \"#d62728\")]:\n", + "for label, color in [\n", + " (\"starts off, switches on (joiner)\", \"#1f77b4\"),\n", + " (\"starts on, switches off (leaver)\", \"#d62728\"),\n", + "]:\n", " weekly = df_plot[df_plot[\"category\"] == label].groupby(\"week\")[\"sessions\"].mean()\n", " ax.plot(weekly.index, weekly.values, label=label, color=color, marker=\"o\", linewidth=2)\n", "ax.set_xlabel(\"Week\")\n", "ax.set_ylabel(\"Mean weekly sessions\")\n", - "ax.set_title(\"Marketing pulses on/off across markets \u2014 outcomes by switcher type\")\n", + "ax.set_title(\"Marketing pulses on/off across markets - outcomes by switcher type\")\n", "ax.legend(loc=\"upper left\")\n", "plt.show()" ] }, { "cell_type": "markdown", - "id": "t19-cell-010", - "metadata": {}, - "source": [ - "## 3. Why Standard Regression Misleads Here" - ] - }, - { - "cell_type": "markdown", - "id": "t19-cell-011", - "metadata": {}, - "source": [ - "Before reaching for dCDH, fit standard two-way fixed effects (TWFE) regression on this panel and read out the diagnostic. The dCDH authors derived a closed-form decomposition of the TWFE coefficient (Theorem 1, AER 2020) that tells you *quantitatively* how badly the regression is contaminated, before you have to trust any alternative estimator.\n", - "\n", - "The library exposes this as a standalone function, `twowayfeweights`, that returns three numbers:\n", - "\n", - "- `beta_fe`: the plain TWFE coefficient on the treatment indicator.\n", - "- `fraction_negative`: the share of treated cells that receive a *negative* weight in the TWFE coefficient. Any positive value is a warning sign - it means OLS is using some treated units as controls for other treated units.\n", - "- `sigma_fe`: the smallest cell-level effect-heterogeneity standard deviation that could flip the sign of the TWFE coefficient. Small `sigma_fe` (relative to plausible heterogeneity in your domain) means the regression is fragile." - ] - }, - { - "cell_type": "code", - "id": "t19-cell-012", + "id": "t19-cell-008", "metadata": {}, - "execution_count": null, - "outputs": [], "source": [ - "twfe = twowayfeweights(\n", - " df,\n", - " outcome=\"sessions\",\n", - " group=\"market_id\",\n", - " time=\"week\",\n", - " treatment=\"promo_on\",\n", - ")\n", + "## 3. Fitting dCDH\n", "\n", - "print(f\"TWFE coefficient (beta_fe): {twfe.beta_fe:.3f}\")\n", - "print(f\"Fraction of negative weights: {twfe.fraction_negative:.3f} ({twfe.fraction_negative*100:.1f}%)\")\n", - "print(f\"Sign-flip threshold (sigma_fe): {twfe.sigma_fe:.3f}\")\n", - "print(f\"True treatment effect (DGP): 12.000\")" - ] - }, - { - "cell_type": "markdown", - "id": "t19-cell-013", - "metadata": {}, - "source": [ - "**Plain-English interpretation.** The TWFE regression estimates the lift at about **11.5 sessions per market-week** - close to the true effect of 12.0 in this synthetic panel. But the diagnostic surfaces two warning signs: **15.4% of treated cells receive negative weight** in that estimate, and the sign-flip threshold sigma_fe is about 12.3. In domains where you might plausibly believe cell-level treatment effects vary by ~12 sessions in standard deviation, the TWFE coefficient is fragile.\n", + "`DID_M` is the headline dCDH estimator: the average across periods of two pieces:\n", "\n", - "On *this* panel the bias happens to be modest because effect heterogeneity is moderate. In production data with stronger heterogeneity the bias would grow significantly. The point of the diagnostic isn't to tell you that TWFE is *catastrophically* wrong today - it's to tell you that TWFE *could* swing on data you haven't seen yet, and to surface the structural problem before you trust the regression coefficient." - ] - }, - { - "cell_type": "code", - "id": "t19-cell-014", - "metadata": {}, - "execution_count": null, - "outputs": [], - "source": [ - "# Top 15 cells with the most-negative TWFE weights, colored red.\n", - "weights = twfe.weights.sort_values(\"weight\").head(15)\n", - "labels = [\n", - " f\"M{int(r.market_id)}, wk{int(r.week)}\"\n", - " for r in weights.itertuples()\n", - "]\n", + "- **DID_+** (joiners): markets switching `0 \u2192 1` between consecutive periods, compared to *contemporaneously untreated* control cells.\n", + "- **DID_-** (leavers): markets switching `1 \u2192 0`, compared to *contemporaneously treated* control cells.\n", "\n", - "fig, ax = plt.subplots(figsize=(9, 5))\n", - "colors = [\"#d62728\" if w < 0 else \"#1f77b4\" for w in weights[\"weight\"]]\n", - "ax.barh(range(len(weights)), weights[\"weight\"].values, color=colors)\n", - "ax.set_yticks(range(len(weights)))\n", - "ax.set_yticklabels(labels, fontsize=8)\n", - "ax.invert_yaxis()\n", - "ax.axvline(0, color=\"black\", linewidth=0.7)\n", - "ax.set_xlabel(\"TWFE weight on this cell\")\n", - "ax.set_title(\n", - " f\"Top 15 cells with most-negative TWFE weights\\n\"\n", - " f\"({twfe.fraction_negative*100:.1f}% of all {len(twfe.weights)} cells receive negative weight)\"\n", - ")\n", - "plt.show()" + "Both pieces use only cells whose treatment status was stable across the two periods being compared - so no treated unit is ever used as a control for another treated unit. The library reports DID_+, DID_-, and their average DID_M separately, so you can see if the two halves agree." ] }, { "cell_type": "markdown", - "id": "t19-cell-015", - "metadata": {}, - "source": [ - "**The transition.** We need an estimator that only compares each switching cell to *contemporaneously stable* control cells - never to other switchers. That's what `DID_M` from the dCDH framework does, by construction." - ] - }, - { - "cell_type": "markdown", - "id": "t19-cell-016", - "metadata": {}, - "source": [ - "## 4. dCDH Phase 1: DID_M, Joiners, Leavers, Placebo" - ] - }, - { - "cell_type": "markdown", - "id": "t19-cell-017", + "id": "t19-cell-009", "metadata": {}, "source": [ - "`DID_M` is the average across periods of two pieces:\n", - "\n", - "- **DID_+** (joiners): markets switching `0 \u2192 1` between consecutive periods, compared to *contemporaneously untreated* control cells.\n", - "- **DID_-** (leavers): markets switching `1 \u2192 0`, compared to *contemporaneously treated* control cells.\n", - "\n", - "Both pieces use only cells whose treatment status was stable across the two periods being compared - so no treated unit is ever used as a control for another treated unit. The library reports DID_+, DID_-, and their average DID_M separately, so you can see if the two halves agree.\n", - "\n", - "**Where do the controls come from?** dCDH's controls are *contemporaneously stable cells*, not a permanently-untreated comparison group. A market that's untreated at week 3 and week 4 contributes a stable-untreated cell at week 4 - even if that same market eventually turns the promo on at week 5 and keeps it on through week 8. Symmetrically, a market that's been running the promo since week 1 and is still running it at week 4 contributes a stable-treated cell at week 4. This is what lets dCDH work on panels with **no permanent never-treated markets at all** - our panel has zero never-treated and zero always-treated units, only 60 switchers. Among diff-diff's modern staggered-DiD estimators - Callaway-Sant'Anna, Sun-Abraham, Wooldridge ETWFE, ImputationDiD, TwoStageDiD, EfficientDiD - all assume absorbing treatment, so the question of which controls they use only arises in panels where treatment never switches off. dCDH applies in the broader reversible-treatment setting and uses contemporaneous stability rather than a permanent never-treated cohort. The technical condition - de Chaisemartin & D'Haultfoeuille's Assumption 11 - is that at every period when a switcher exists, at least one stable cell of the relevant type also exists. The check is **per-period**, not on whole-panel totals: 154 stable-untreated cells aggregated across the panel doesn't prove anything if some specific switching week happened to have none. The library checks A11 at fit time period-by-period and emits a `UserWarning` (zeroing the offending period's contribution by paper convention) if any switching period lacks stable controls. Our fit above ran without such a warning, so A11 holds at every switching week in this DGP. Single-switch panels also tend to satisfy A11 by construction because each cohort's pre-switch and post-switch periods naturally function as stable cells for cohorts that switch at adjacent times.\n", - "\n", - "The library also computes a **single-lag placebo** `DID_M^pl`: the same DID_M machinery shifted one period back. Under parallel pre-trends the placebo should be near zero. (Note: Phase 1's single-lag placebo SE is `NaN` by design - the per-period aggregation path doesn't have an analytical influence-function derivation. Magnitude-only interpretation here; full inference comes from the multi-horizon placebos in Section 5 below.)" + "**Where do the controls come from?** dCDH's controls are *contemporaneously stable cells*, not a permanently-untreated comparison group. A market that's untreated at week 3 and week 4 contributes a stable-untreated cell at week 4 - even if that same market eventually turns the promo on at week 5 and keeps it on through week 8. Symmetrically, a market that's been running the promo since week 1 and is still running it at week 4 contributes a stable-treated cell at week 4. This is what lets dCDH work on panels with **no permanent never-treated markets at all** - our panel has zero never-treated and zero always-treated units, only 60 switchers. The technical condition - de Chaisemartin & D'Haultfoeuille's Assumption 11 - is **per-period**: at every period when a switcher exists, at least one stable cell of the relevant type also exists. The library checks A11 at fit time period-by-period and emits a `UserWarning` (zeroing the offending period's contribution by paper convention) if any switching period lacks stable controls. A11 is *not* automatic on single-switch panels - the test suite has a single-switch panel where joiners exist at a period with zero stable-untreated controls (`tests/test_chaisemartin_dhaultfoeuille.py::TestA11Handling::test_a11_violation_zero_in_numerator_retain_in_denominator`). On the seed and DGP we use here, the fit happens not to trigger an A11 warning, so we're in the clean regime. On your own data, check the warning output before trusting the headline." ] }, { "cell_type": "code", - "id": "t19-cell-018", + "id": "t19-cell-010", "metadata": {}, "execution_count": null, "outputs": [], "source": [ - "model = DCDH(twfe_diagnostic=True, placebo=True, seed=42)\n", + "model = DCDH(twfe_diagnostic=False, placebo=False, seed=42)\n", "results = model.fit(\n", " df,\n", " outcome=\"sessions\",\n", @@ -290,17 +180,17 @@ }, { "cell_type": "markdown", - "id": "t19-cell-019", + "id": "t19-cell-011", "metadata": {}, "source": [ - "**Plain-English interpretation.** dCDH estimates the headline lift at **about 11.2 sessions per market-week** (95% CI: ~10.1 to 12.3), covering the true effect of 12.0. The TWFE coefficient was 11.5 - the two estimators happen to land close on this panel because effect heterogeneity is modest, but **dCDH guarantees no negative-weight contamination by construction**, while TWFE only happened to escape it this time.\n", + "**Reading the headline.** dCDH estimates the lift at **about 12.1 sessions per market-week** while the promo was on (95% CI: 11.3 to 12.8), recovering the true effect of 12.0 within sampling uncertainty. The CI half-width is about 0.7 sessions, which translates to a ~6% margin of error around a roughly 11% lift on a baseline of ~110 weekly sessions.\n", "\n", - "The TWFE diagnostic block at the bottom of the summary repeats the numbers from Section 3 (15.4% negative weights, sigma_fe \u2248 12.3) as a built-in cross-check - the library computes the diagnostic automatically when `twfe_diagnostic=True` (the default)." + "(We passed `placebo=False` on this fit because Phase 1's single-lag placebo SE is `NaN` by design - the per-period aggregation path doesn't have an analytical influence-function derivation. We get valid placebo CIs from the multi-horizon path in Section 4 below, which has a proper IF.)" ] }, { "cell_type": "code", - "id": "t19-cell-020", + "id": "t19-cell-012", "metadata": {}, "execution_count": null, "outputs": [], @@ -312,62 +202,59 @@ }, { "cell_type": "markdown", - "id": "t19-cell-021", - "metadata": {}, - "source": [ - "**Reading joiners vs leavers.** Both halves should produce a positive lift in a healthy marketing-pulse design - turning the promo on increases sessions, and turning it off decreases them. Here DID_+ \u2248 11.0 and DID_- \u2248 11.9: both substantially positive, both within sampling uncertainty of each other and of the true effect of 12. If they had disagreed by sign or by a large margin (say one was 5 and the other was 20), that would be a heterogeneity signal worth investigating before reporting one number to leadership." - ] - }, - { - "cell_type": "code", - "id": "t19-cell-022", + "id": "t19-cell-013", "metadata": {}, - "execution_count": null, - "outputs": [], "source": [ - "# Placebo magnitude check (SE is NaN for Phase 1 single-lag)\n", - "print(f\"Placebo effect: {results.placebo_effect:.3f}\")\n", - "print(f\"|Placebo / DID_M|: {abs(results.placebo_effect / results.overall_att):.2%}\")\n", - "print()\n", - "print(\"Placebo magnitude is small (~8% of DID_M), supporting parallel\")\n", - "print(\"pre-trends. Full placebo inference with bootstrap CIs comes from\")\n", - "print(\"the multi-horizon event study below.\")" + "**Reading joiners vs leavers.** Both halves should produce a positive lift in a healthy marketing-pulse design - turning the promo on increases sessions, and turning it off decreases them. Here DID_+ \u2248 12.1 (38 joiner cells) and DID_- \u2248 11.9 (22 leaver cells): both substantially positive, both within sampling uncertainty of each other and of the true effect of 12. If they had disagreed by sign or by a large margin (say one was 5 and the other was 20), that would be a heterogeneity signal worth investigating before reporting one number to leadership." ] }, { "cell_type": "markdown", - "id": "t19-cell-023", + "id": "t19-cell-014", "metadata": {}, "source": [ - "## 5. Multi-Horizon Event Study with Bootstrap" + "## 4. Multi-Horizon Event Study with Bootstrap\n", + "\n", + "DID_M collapses the dynamic effect to one number - the average lift across all switching cells. Setting `L_max=L` instead computes `DID_l` for each horizon `l = 1..L` after each switch, plus `DID^pl_l` placebos at horizons `l = -L..-1`. This tells you whether the on-impact lift is sustained or fades, and whether the pre-treatment placebos sit on zero.\n", + "\n", + "With `L_max=2` we get two post-switch horizons and two placebo horizons. The multiplier bootstrap (`n_bootstrap=199`, matching the library's `ci_params.bootstrap` convention) gives valid CIs at every horizon, including the placebo horizons." ] }, { "cell_type": "markdown", - "id": "t19-cell-024", + "id": "t19-cell-015", "metadata": {}, "source": [ - "DID_M collapses the dynamic effect to one number - the average lift across all switching cells. Setting `L_max=L` instead computes `DID_l` for each horizon `l = 1..L` after each switch, plus `DID^pl_l` placebos at horizons `l = -L..-1`. This tells you whether the on-impact lift is sustained or fades, and whether the pre-treatment placebos sit on zero.\n", + "**About the warning you're about to see.** The fit below will emit a single `UserWarning` saying *Assumption 7 (D_{g,t} >= D_{g,1}) is violated: leavers present*. This is **expected for any reversible panel** and we don't suppress it - it's the library being explicit about a methodology choice on a separate estimand:\n", "\n", - "With `L_max=2` we get two post-switch horizons and two placebo horizons. The multiplier bootstrap (`n_bootstrap=199`, matching the library's `ci_params.bootstrap` convention) gives valid CIs at every horizon, including the placebo horizons." + "- **Assumption 7** is a monotonic-treatment-progression assumption used by the optional **cost-benefit delta** computation (a secondary aggregate the library reports for absorbing-treatment panels). On reversible panels the assumption fails by construction - leavers' treatment goes *down*, not up.\n", + "- The library's response is to compute the cost-benefit delta on the full sample anyway and warn that the interpretation isn't clean. The headline `DID_M`, the joiners/leavers split, and the event-study horizons are **unaffected** by this warning - they use a different aggregation that doesn't rest on A7.\n", + "\n", + "So the warning is informational, points at a result we won't use in this tutorial, and is the price of admission for a reversible design. We surface it; we don't silence it." ] }, { "cell_type": "code", - "id": "t19-cell-025", + "id": "t19-cell-016", "metadata": {}, "execution_count": null, "outputs": [], "source": [ - "# Narrow filter: silence the EXPECTED Assumption 7 warning (cost-benefit\n", - "# delta is computed on the full sample when leavers are present), and\n", - "# let any new / unexpected UserWarning surface so the notebook stays\n", - "# usable as a drift detector.\n", + "# Narrow filter: silence the spurious numpy RuntimeWarnings about\n", + "# \"<...> encountered in matmul\" that fire only on macOS NumPy\n", + "# builds linked against Apple's Accelerate BLAS framework.\n", + "# Accelerate sets FP error flags during matmul on certain shapes/\n", + "# values; the computation is correct (Linux / OpenBLAS users don't\n", + "# see these warnings at all). See numpy issue #26669. The filter\n", + "# is scoped to the matmul message pattern only - any unrelated\n", + "# RuntimeWarning from the fit will still surface, and the\n", + "# Assumption 7 UserWarning below is NOT suppressed (that's the\n", + "# methodology warning we explained above).\n", "with warnings.catch_warnings():\n", " warnings.filterwarnings(\n", " \"ignore\",\n", - " message=r\"Assumption 7 .* is violated: leavers present\",\n", - " category=UserWarning,\n", + " message=r\".*encountered in matmul\",\n", + " category=RuntimeWarning,\n", " )\n", " model_es = DCDH(\n", " twfe_diagnostic=False, placebo=True, n_bootstrap=199, seed=42\n", @@ -387,7 +274,7 @@ }, { "cell_type": "code", - "id": "t19-cell-026", + "id": "t19-cell-017", "metadata": {}, "execution_count": null, "outputs": [], @@ -427,103 +314,54 @@ }, { "cell_type": "markdown", - "id": "t19-cell-027", + "id": "t19-cell-018", "metadata": {}, "source": [ "**Reading the event study.**\n", "\n", "- **Both placebo horizons** (l = -2 and l = -1) sit on zero with confidence intervals comfortably covering it. Pre-trends look parallel - we have no evidence that something other than the promo was driving session growth in the cells we're using as controls.\n", - "- **On-impact effect** at l = 1 is about **+11.2 sessions** with a 95% bootstrap CI of roughly [9.7, 12.8], covering the true effect of 12.\n", - "- **Sustained effect** at l = 2 is **+11.3 sessions** with CI [10.0, 12.6]. The lift didn't fade in the second week post-switch.\n", + "- **On-impact effect** at l = 1 is about **+12.4 sessions** with a 95% bootstrap CI of roughly [11.4, 13.3], covering the true effect of 12.\n", + "- **Sustained effect** at l = 2 is **+12.6 sessions** with CI [11.5, 13.6]. The lift didn't fade in the second week post-switch.\n", "\n", - "Bootstrap CIs reflect the cohort-recentered influence-function variance with the finite-sample stability the multiplier bootstrap provides. The fact that both horizons agree closely with each other AND with the headline `DID_M` from Section 4 (the per-period and per-group aggregation paths converge) is a built-in consistency check." - ] - }, - { - "cell_type": "markdown", - "id": "t19-cell-028", - "metadata": {}, - "source": [ - "## 6. Communicating Results to Leadership" + "Bootstrap CIs reflect the cohort-recentered influence-function variance with the finite-sample stability the multiplier bootstrap provides. Both horizons agree closely with each other AND with the headline `DID_M` from Section 3 - a built-in consistency check across the per-period and per-group aggregation paths." ] }, { "cell_type": "markdown", - "id": "t19-cell-029", + "id": "t19-cell-019", "metadata": {}, "source": [ - "A stakeholder-ready summary of the synthetic walkthrough above. Each bullet pulls from a specific section of the analysis:\n", + "## 5. Communicating Results to Leadership\n", "\n", - "> **Headline.** The pulse campaign lifted weekly checkout sessions by approximately **11.2 sessions per market per week** while the promo was on (95% CI: 10.1 to 12.3). On a baseline of about 110 weekly sessions per market, that's roughly a **10% lift**. *[Source: `results.overall_att` from Section 4.]*\n", + "A stakeholder-ready summary of the analysis above:\n", + "\n", + "> **Headline.** The pulse campaign lifted weekly checkout sessions by approximately **12 sessions per market per week** while the promo was on (95% CI: 11.3 to 12.8). On a baseline of about 110 weekly sessions per market, that's roughly an **11% lift**. *[Source: `results.overall_att` from Section 3.]*\n", ">\n", - "> **Sample size and design.** 60 markets observed for 8 weeks (480 market-weeks). Of those, 43 markets started untreated and switched the promo on at some point during the quarter (joiners), and 17 markets started with the promo on and switched it off (leavers). Method: dCDH (de Chaisemartin & D'Haultfoeuille 2020) - diff-diff's only estimator built for treatment that can switch on AND off in the same panel. *[Source: switcher_type counts and panel shape from Section 2.]*\n", + "> **Sample size and design.** 60 markets observed for 8 weeks (480 market-weeks). Of those, 38 markets started untreated and switched the promo on at some point during the quarter (joiners), and 22 markets started with the promo on and switched it off (leavers). Method: dCDH (de Chaisemartin & D'Haultfoeuille 2020) - diff-diff's only estimator built for treatment that can switch on AND off in the same panel. *[Source: switcher counts and panel shape from Section 2.]*\n", ">\n", - "> **Validity evidence.** Three checks supported the result. (a) The TWFE diagnostic flagged 15.4% of cells with negative weight in the standard regression, signaling that we needed an alternative - dCDH avoids that contamination by construction. (b) The single-lag placebo from the per-period aggregation was small (~0.9 sessions, ~8% of the headline). (c) The multi-horizon placebos at l = -2 and l = -1 both sat on zero with bootstrap CIs comfortably covering it - parallel pre-trends look credible. *[Sources: TWFE diagnostic from Section 3, single-lag placebo from Section 4, multi-horizon placebos from Section 5.]*\n", + "> **Validity evidence.** Two checks supported the result. (a) The joiners-vs-leavers split agreed: joiners produced a +12.1 lift, leavers a +11.9 lift, well within sampling uncertainty of each other and of the headline. (b) The multi-horizon placebos at l = -2 and l = -1 both sat on zero with bootstrap CIs comfortably covering it - parallel pre-trends look credible. *[Sources: joiners/leavers from Section 3, multi-horizon placebos from Section 4.]*\n", ">\n", - "> **What \"+11.2 sessions per market per week\" means in business terms.** Across 60 markets and the weeks each one had the promo on, that's the per-market-week lift attributable to the campaign. Translate to your own revenue-per-session to compare against campaign spend, then use the per-market lift estimate to project what scaling the promo to additional markets would deliver. *[Source: business framing of the headline.]*\n", + "> **What \"+12 sessions per market per week\" means in business terms.** Across 60 markets and the weeks each one had the promo on, that's the per-market-week lift attributable to the campaign. Translate to your own revenue-per-session to compare against campaign spend, then use the per-market lift estimate to project what scaling the promo to additional markets would deliver.\n", ">\n", - "> **Practical significance caveat.** The 10% lift is statistically significant (bootstrap p < 0.01 at both post-treatment horizons), and the on-impact effect persists at the second horizon - the pulse worked while it was on. Whether 10% justifies the campaign cost is a business judgment, not a statistical one. Note also that joiners (DID_+ \u2248 11.0) and leavers (DID_- \u2248 11.9) gave consistent signals, which reduces the worry that the average is hiding heterogeneity between starting and stopping the promo. *[Sources: dynamic horizons from Section 5, joiners/leavers breakdown from Section 4.]*" + "> **Practical significance caveat.** The 11% lift is statistically significant (bootstrap p < 0.01 at both post-treatment horizons), and the on-impact effect persists at the second horizon - the pulse worked while it was on. Whether 11% justifies the campaign cost is a business judgment, not a statistical one. *[Sources: dynamic horizons from Section 4.]*" ] }, { "cell_type": "markdown", - "id": "t19-cell-030", + "id": "t19-cell-020", "metadata": {}, "source": [ "Adapt this template for your own campaign by swapping in your numbers from `results.summary()`, your own market and switcher counts, your own validity diagnostics, and your own business translation. The pattern - **headline \u2192 sample size and design \u2192 validity evidence \u2192 business interpretation \u2192 practical significance** - is the part to keep." ] }, - { - "cell_type": "code", - "id": "t19-cell-031", - "metadata": {}, - "execution_count": null, - "outputs": [], - "source": [ - "# Drift guards: tolerance-based asserts that lock the numbers quoted in\n", - "# the Section 4 / Section 5 narrative and the Section 6 stakeholder\n", - "# template. nbmake will fail if generate_reversible_did_data() or DCDH\n", - "# output drifts outside these ranges, forcing the markdown to be\n", - "# updated before this notebook can pass CI.\n", - "#\n", - "# Asserts pull from BOTH `results` (Section 4 single-horizon fit) AND\n", - "# `results_es` (Section 5 multi-horizon fit) - both fits are still\n", - "# in scope above this cell.\n", - "\n", - "# Section 4 (L_max=None): per-period DID_M path\n", - "assert 10.72 <= results.overall_att <= 11.72, results.overall_att\n", - "assert results.overall_conf_int[0] <= 12.0 <= results.overall_conf_int[1]\n", - "assert abs(results.placebo_effect) < 1.5, results.placebo_effect\n", - "assert results.twfe_fraction_negative >= 0.10 # documents TWFE bias signal\n", - "\n", - "# Section 5 (L_max=2): per-group DID_g,1 path - DIFFERENT compute path\n", - "# than overall_att, NOT bit-identical. Verified in seed-search to agree\n", - "# on truth-coverage at seed=53.\n", - "_h1 = results_es.event_study_effects[1][\"effect\"]\n", - "assert 10.24 <= _h1 <= 12.24, _h1\n", - "assert (\n", - " results_es.event_study_effects[1][\"conf_int\"][0]\n", - " <= 12.0\n", - " <= results_es.event_study_effects[1][\"conf_int\"][1]\n", - ")\n", - "\n", - "print(\"All drift guards passed.\")" - ] - }, { "cell_type": "markdown", - "id": "t19-cell-032", - "metadata": {}, - "source": [ - "## 7. Extensions and Where to Go Next" - ] - }, - { - "cell_type": "markdown", - "id": "t19-cell-033", + "id": "t19-cell-021", "metadata": {}, "source": [ - "This tutorial covered the dCDH **Phase 1** surface (DID_M, joiners/leavers decomposition, single-lag placebo, TWFE diagnostic) plus the **multi-horizon event study with bootstrap** (`L_max`, `n_bootstrap`). The library also supports several extensions that we did not demonstrate here:\n", + "## 6. Extensions and Where to Go Next\n", + "\n", + "This tutorial covered the core dCDH workflow on a reversible panel: `DID_M` with the joiners/leavers split, plus the `L_max` multi-horizon event study with multiplier bootstrap. The library also supports several extensions we did not demonstrate here:\n", "\n", "- **Per-trajectory disaggregation** (`by_path=k`): when joiners and leavers each follow a few common treatment paths (e.g., on-off-on vs on-on-off), `by_path=k` reports the event study separately for the top-k most common observed paths. Useful for pulse campaigns where the schedule varies across markets.\n", "- **Group-specific linear trends** (`trends_linear=True`): allows each market to have its own pre-treatment slope, absorbing differential trends.\n", @@ -536,13 +374,13 @@ }, { "cell_type": "markdown", - "id": "t19-cell-034", + "id": "t19-cell-022", "metadata": {}, "source": [ "**Related tutorials.**\n", "\n", "- [Tutorial 1: Basic DiD](01_basic_did.ipynb) - the 2x2 building block dCDH generalizes.\n", - "- [Tutorial 2: Staggered DiD](02_staggered_did.ipynb) - Goodman-Bacon decomposition is the staggered-adoption analog of the TWFE diagnostic shown here.\n", + "- [Tutorial 2: Staggered DiD](02_staggered_did.ipynb) - Callaway-Sant'Anna for absorbing staggered adoption (when treatment doesn't turn off).\n", "- [Tutorial 5: HonestDiD](05_honest_did.ipynb) - sensitivity to parallel-trends violations on event studies; works on dCDH's placebo surface via `honest_did=True`.\n", "- [Tutorial 17: Brand Awareness Survey](17_brand_awareness_survey.ipynb) - reach for this if you have survey data with sampling weights / strata / PSU instead of a panel.\n", "- [Tutorial 18: Geo-Experiment Analysis](18_geo_experiments.ipynb) - reach for this if you have a single-launch pilot in a small number of test markets." @@ -550,16 +388,15 @@ }, { "cell_type": "markdown", - "id": "t19-cell-035", + "id": "t19-cell-023", "metadata": {}, "source": [ "**Summary: when to reach for dCDH.**\n", "\n", "1. Use dCDH when treatment is **reversible** - the panel has switchers in both directions (joiners and leavers) in the same data.\n", - "2. Run `twowayfeweights` *before* fitting any estimator on a reversible panel - the diagnostic tells you whether to worry about TWFE contamination, in numbers (`fraction_negative`, `sigma_fe`).\n", - "3. Read joiners (`DID_+`) and leavers (`DID_-`) separately. Disagreement between the two halves is heterogeneity worth investigating before averaging into one number for stakeholders.\n", - "4. Use `L_max` + multiplier bootstrap to expose the dynamic structure of the effect - is the lift on-impact only, sustained, or fading? - and to get valid placebo CIs that the Phase 1 single-lag placebo can't provide.\n", - "5. Defer to follow-up tutorials for `by_path`, `trends_linear`/`trends_nonparam`, HonestDiD on dCDH's placebo surface, and the survey-design integration. Each is a single constructor or `fit()` kwarg away." + "2. Read joiners (`DID_+`) and leavers (`DID_-`) separately. Disagreement between the two halves is heterogeneity worth investigating before averaging into one number for stakeholders.\n", + "3. Use `L_max` + multiplier bootstrap to expose the dynamic structure of the effect - is the lift on-impact only, sustained, or fading? - and to get valid placebo CIs that the Phase 1 single-lag placebo can't provide.\n", + "4. Defer to follow-up tutorials for `by_path`, `trends_linear`/`trends_nonparam`, HonestDiD on dCDH's placebo surface, and the survey-design integration. Each is a single constructor or `fit()` kwarg away." ] } ], diff --git a/tests/test_t19_marketing_pulse_drift.py b/tests/test_t19_marketing_pulse_drift.py new file mode 100644 index 00000000..6c31ebd6 --- /dev/null +++ b/tests/test_t19_marketing_pulse_drift.py @@ -0,0 +1,304 @@ +"""Drift detection for Tutorial 19 (`docs/tutorials/19_dcdh_marketing_pulse.ipynb`). + +The tutorial narrative quotes seed-specific numbers (overall_att, joiners, +leavers, event-study horizons, placebos). If library numerics drift +(estimator changes, RNG path changes, BLAS path changes), the prose can +go stale silently while `pytest --nbmake` still passes - it only checks +that the cells execute without error. + +These asserts re-derive the same numbers using the locked DGP and seed +the notebook uses, then check them against the tolerance bands quoted in +the tutorial markdown. If a future change moves any number outside its +band, this test fails and a maintainer is forced to either update the +prose or investigate the methodology shift before merge. + +DGP and seed locked at `_scratch/dcdh_tutorial/40_build_notebook.py`. +Quoted numbers derived from `_scratch/dcdh_tutorial/lock_seed.py`. +""" + +from __future__ import annotations + +import warnings + +import numpy as np +import pytest + +from diff_diff import DCDH, generate_reversible_did_data + +# Locked DGP parameters (must stay in sync with the notebook) +MAIN_SEED = 46 +N_GROUPS = 60 +N_PERIODS = 8 +TREATMENT_EFFECT = 12.0 +EFFECT_SD = 1.5 + + +@pytest.fixture(scope="module") +def panel(): + raw = generate_reversible_did_data( + n_groups=N_GROUPS, + n_periods=N_PERIODS, + pattern="single_switch", + initial_treat_frac=0.4, + treatment_effect=TREATMENT_EFFECT, + heterogeneous_effects=True, + effect_sd=EFFECT_SD, + group_fe_sd=8.0, + time_trend=0.5, + noise_sd=2.0, + seed=MAIN_SEED, + ) + df = raw.rename( + columns={ + "group": "market_id", + "period": "week", + "treatment": "promo_on", + "outcome": "sessions", + } + ) + df["sessions"] = df["sessions"] + 100.0 + return df + + +@pytest.fixture(scope="module") +def phase1_results(panel): + """Phase 1 fit: gets joiners/leavers split. placebo=False to skip the + documented NaN-SE warning on the single-lag placebo path.""" + model = DCDH(twfe_diagnostic=False, placebo=False, seed=42) + return model.fit( + panel, + outcome="sessions", + group="market_id", + time="week", + treatment="promo_on", + ) + + +@pytest.fixture(scope="module") +def event_study_results(panel): + """Event-study fit: L_max=2 + multiplier bootstrap. The A7 + UserWarning is intentionally muted here so the fixture is quiet + for the value-checking tests below; the notebook's actual + warning-policy contract (A7 visible, only matmul filtered) is + validated separately by `test_event_study_warning_policy_matches_notebook`.""" + with warnings.catch_warnings(): + warnings.filterwarnings( + "ignore", + message=r".*encountered in matmul", + category=RuntimeWarning, + ) + warnings.filterwarnings( + "ignore", + message=r"Assumption 7 .* is violated: leavers present", + category=UserWarning, + ) + model = DCDH( + twfe_diagnostic=False, placebo=True, n_bootstrap=199, seed=42 + ) + return model.fit( + panel, + outcome="sessions", + group="market_id", + time="week", + treatment="promo_on", + L_max=2, + ) + + +def test_panel_composition(panel): + """The narrative quotes 38 joiners and 22 leavers in the stakeholder + template. If the DGP drifts, those counts shift and the template + text goes stale.""" + counts = panel.groupby("switcher_type").size().to_dict() + assert counts.get("joiner") == 38, counts + assert counts.get("leaver") == 22, counts + + +def test_overall_att_close_to_truth(phase1_results): + """Section 3 quotes 'about 12 sessions' headline (true effect = 12).""" + assert 11.7 <= phase1_results.overall_att <= 12.4, phase1_results.overall_att + + +def test_overall_ci_covers_truth(phase1_results): + """Section 3 narrative claims the CI covers the true effect of 12.""" + ci_low, ci_high = phase1_results.overall_conf_int + assert ci_low <= TREATMENT_EFFECT <= ci_high, (ci_low, ci_high) + + +def test_overall_ci_endpoints_match_quoted(phase1_results): + """Section 3 narrative quotes '95% CI: 11.3 to 12.8'. Pin the + one-decimal display exactly so any drift past the displayed + rounding fails this test.""" + ci_low, ci_high = phase1_results.overall_conf_int + assert round(ci_low, 1) == 11.3, ci_low + assert round(ci_high, 1) == 12.8, ci_high + + +def test_joiners_leavers_consistent(phase1_results): + """Section 3 narrative quotes joiners ~12.1 and leavers ~11.9, both + positive and within sampling uncertainty of each other.""" + assert 11.5 <= phase1_results.joiners_att <= 12.7, phase1_results.joiners_att + assert 11.4 <= phase1_results.leavers_att <= 12.5, phase1_results.leavers_att + # Both positive and similar in magnitude (no big disagreement) + assert abs(phase1_results.joiners_att - phase1_results.leavers_att) < 1.5 + + +def test_event_study_horizons_cover_truth(event_study_results): + """Section 4 narrative quotes l=1 ~12.4, l=2 ~12.6, both with CIs + covering the true effect of 12.""" + es = event_study_results.event_study_effects + for l in (1, 2): + eff = es[l]["effect"] + ci = es[l]["conf_int"] + assert 11.5 <= eff <= 13.3, (l, eff) + assert ci[0] <= TREATMENT_EFFECT <= ci[1], (l, ci) + + +def test_event_study_ci_endpoints_match_quoted(event_study_results): + """Section 4 narrative quotes l=1 CI [11.4, 13.3] and l=2 CI + [11.5, 13.6]. These are bootstrap-based CIs and the bootstrap RNG + path differs between Rust and pure-Python backends (per the + bit-identity-baseline-per-backend convention), so we use a 0.15 + tolerance band rather than `round(_, 1) ==` exact matching - tight + enough to catch real prose drift, loose enough to absorb the + documented backend variance.""" + es = event_study_results.event_study_effects + # l=1 CI [11.4, 13.3] + assert abs(es[1]["conf_int"][0] - 11.4) < 0.15, es[1]["conf_int"] + assert abs(es[1]["conf_int"][1] - 13.3) < 0.15, es[1]["conf_int"] + # l=2 CI [11.5, 13.6] + assert abs(es[2]["conf_int"][0] - 11.5) < 0.15, es[2]["conf_int"] + assert abs(es[2]["conf_int"][1] - 13.6) < 0.15, es[2]["conf_int"] + + +def test_event_study_significance(event_study_results): + """Section 5 stakeholder template claims 'bootstrap p < 0.01 at both + post-treatment horizons'. Lock that significance threshold.""" + es = event_study_results.event_study_effects + assert es[1]["p_value"] < 0.01, es[1]["p_value"] + assert es[2]["p_value"] < 0.01, es[2]["p_value"] + + +def test_placebo_horizons_cover_zero(event_study_results): + """Section 4 narrative claims pre-treatment placebos sit on zero.""" + pl = event_study_results.placebo_event_study + assert pl is not None + for l in (-1, -2): + eff = pl[l]["effect"] + ci = pl[l]["conf_int"] + assert abs(eff) < 0.7, (l, eff) + assert ci[0] <= 0.0 <= ci[1], (l, ci) + + +def test_assumption7_warning_fires_as_expected(panel): + """The notebook surfaces and explains the A7 warning. If the library + stops firing it, the markdown explanation goes stale and we should + notice.""" + with warnings.catch_warnings(record=True) as ws: + warnings.simplefilter("always") + with np.errstate(divide="ignore", over="ignore", invalid="ignore"): + model = DCDH( + twfe_diagnostic=False, placebo=True, n_bootstrap=49, seed=42 + ) + model.fit( + panel, + outcome="sessions", + group="market_id", + time="week", + treatment="promo_on", + L_max=2, + ) + a7_warnings = [ + w + for w in ws + if w.category is UserWarning + and "Assumption 7" in str(w.message) + and "leavers present" in str(w.message) + ] + assert len(a7_warnings) >= 1, [str(w.message)[:80] for w in ws] + + +def test_event_study_warning_policy_matches_notebook(panel): + """Mirror the notebook's exact warning policy on the visible + event-study fit and assert the resulting warning set matches the + documented contract: exactly one UserWarning (the A7 leavers-present + warning that the notebook's markdown explains), and zero + RuntimeWarnings (matmul-pattern ones filtered; everything else + surfaces). If the library starts emitting an unexpected warning on + this code path, this test fails and the notebook prose may need to + be updated.""" + with warnings.catch_warnings(record=True) as ws: + warnings.simplefilter("always") + # MIRROR the notebook's narrow filter exactly (no np.errstate, no + # blanket A7 suppression). + warnings.filterwarnings( + "ignore", + message=r".*encountered in matmul", + category=RuntimeWarning, + ) + model = DCDH( + twfe_diagnostic=False, placebo=True, n_bootstrap=199, seed=42 + ) + model.fit( + panel, + outcome="sessions", + group="market_id", + time="week", + treatment="promo_on", + L_max=2, + ) + user_warnings = [w for w in ws if w.category is UserWarning] + runtime_warnings = [w for w in ws if w.category is RuntimeWarning] + # Exactly one UserWarning, and it's the documented A7 warning. + assert len(user_warnings) == 1, [str(w.message)[:120] for w in user_warnings] + msg = str(user_warnings[0].message) + assert "Assumption 7" in msg, msg + assert "leavers present" in msg, msg + # All RuntimeWarnings should be the matmul pattern (filtered) - so + # zero remaining. If a new RuntimeWarning fires from somewhere else, + # this fails. + assert len(runtime_warnings) == 0, [str(w.message)[:120] for w in runtime_warnings] + + +def test_a11_warning_does_not_fire(): + """The notebook claims this seed/DGP is in the A11-clean regime + (no warning fires). If a library change starts triggering A11 on + this panel, the prose claim is wrong.""" + with warnings.catch_warnings(record=True) as ws: + warnings.simplefilter("always") + with np.errstate(divide="ignore", over="ignore", invalid="ignore"): + raw = generate_reversible_did_data( + n_groups=N_GROUPS, + n_periods=N_PERIODS, + pattern="single_switch", + initial_treat_frac=0.4, + treatment_effect=TREATMENT_EFFECT, + heterogeneous_effects=True, + effect_sd=EFFECT_SD, + group_fe_sd=8.0, + time_trend=0.5, + noise_sd=2.0, + seed=MAIN_SEED, + ) + df = raw.rename( + columns={ + "group": "market_id", + "period": "week", + "treatment": "promo_on", + "outcome": "sessions", + } + ) + df["sessions"] = df["sessions"] + 100.0 + DCDH(twfe_diagnostic=False, placebo=False, seed=42).fit( + df, + outcome="sessions", + group="market_id", + time="week", + treatment="promo_on", + ) + a11_warnings = [ + w + for w in ws + if w.category is UserWarning and "Assumption 11" in str(w.message) + ] + assert len(a11_warnings) == 0, [str(w.message)[:80] for w in a11_warnings]