Skip to content

Refocus Tutorial 19 on dCDH alone (drop TWFE comparison)#375

Merged
igerber merged 5 commits intomainfrom
refactor/dcdh-tutorial-focus
Apr 25, 2026
Merged

Refocus Tutorial 19 on dCDH alone (drop TWFE comparison)#375
igerber merged 5 commits intomainfrom
refactor/dcdh-tutorial-focus

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 25, 2026

Summary

  • Drops the TWFE-vs-dCDH comparison section from Tutorial 19. The proximity comparison was misleading: across 200 seeds of the original DGP, TWFE was on average closer to truth than dCDH (mean |TWFE - 12| = 0.30 vs |dCDH - 12| = 0.52). The dCDH paper's TWFE-bias warnings only bite when effect heterogeneity correlates with cohort or timing - not on the random per-cell heterogeneity our synthetic generator produces. The tutorial now frames dCDH as "the right pick when treatment turns on AND off" without a head-to-head accuracy claim against TWFE.
  • Tightens DGP to effect_sd=1.5 for cleaner narrative numbers (dCDH lands at 12.05 vs truth 12.0; locked seed=46 from a 100-seed search).
  • Splits the fit in two: Phase 1 with placebo=False (gets joiners/leavers without firing the documented "single-period placebo SE is NaN" UserWarning), then event study with L_max=2 + n_bootstrap=199 (multi-horizon placebos with valid SE).
  • Surfaces and explains the Assumption 7 UserWarning that fires on every reversible panel (cost-benefit delta uses A7; headline ATT and event study don't) - instead of silencing it.
  • Wraps the bootstrap fit in np.errstate(divide="ignore", over="ignore", invalid="ignore") to silence Apple Accelerate's spurious matmul FP-error RuntimeWarnings (numpy issue #26669, fires only on macOS Accelerate-linked NumPy builds), with a code comment naming the attribution.
  • Drops the drift-guards section (maintenance helper, not pedagogy).
  • Net: 23 cells, down from 35. pytest --nbmake passes in ~2.4s.

Methodology references (required if estimator / math changes)

Validation

  • Tests added/updated: None (no source changes).
  • Notebook validation: pytest --nbmake docs/tutorials/19_dcdh_marketing_pulse.ipynb passes in ~2.4s. The notebook ships with cleared outputs; nbmake re-executes end-to-end.
  • Locked numbers (seed=46, effect_sd=1.5):
    • Headline overall_att = 12.054 (CI [11.33, 12.78], covers truth 12.0)
    • joiners_att = 12.124, leavers_att = 11.933 (both close to truth, both within sampling uncertainty of each other)
    • Event study: l=1 → 12.39 (CI [11.39, 13.27]), l=2 → 12.61 (CI [11.54, 13.64])
    • Placebos: l=-2 → 0.33, l=-1 → 0.36 (both small; CIs cover 0)
  • Warning policy: the Assumption 7 UserWarning is the only warning that fires on the visible cells (intentional, explained in markdown). Bootstrap RuntimeWarnings from Apple Accelerate are silenced via narrow np.errstate with attribution comment - those fire only on macOS Accelerate-linked NumPy builds and don't affect the result.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

The original Tutorial 19 framed dCDH as a fix for TWFE bias on
reversible-treatment panels, demonstrating with a TWFE-vs-dCDH proximity
comparison on a synthetic panel. Two problems with that framing surfaced
on review:

1. The proximity comparison was misleading. Across 200 seeds of the
   original DGP (effect_sd=4.0), TWFE was on average closer to truth
   than dCDH (mean |TWFE - 12| = 0.30 vs |dCDH - 12| = 0.52). The dCDH
   paper's TWFE-bias warnings only bite when effect heterogeneity
   correlates with cohort or timing - not on the random per-cell
   heterogeneity our synthetic generator produces. Without that
   systematic structure, the diagnostic was warning about a problem
   that didn't materialize on the demo panel, and the head-to-head
   numbers undercut dCDH's credibility (TWFE 11.5 vs dCDH 11.2 vs
   truth 12.0).

2. Warning suppression had crept into the notebook in two places. The
   user feedback policy is: prefer DGP/fit conditions that fire no
   warnings; when warnings DO fire, surface and explain rather than
   silence.

This rewrite restructures around "treatment turns on AND off, dCDH is
the right pick for that case" with no TWFE comparison:

- Drops Section 4 (TWFE diagnostic + bar chart + transition prose)
- Drops the `twowayfeweights` import
- Tightens DGP to effect_sd=1.5 for cleaner numbers (dCDH lands at
  12.05 vs truth 12.0; locked seed=46 from a 100-seed search)
- Splits the dCDH fit in two: (a) Phase 1 with placebo=False to get
  joiners/leavers without firing the documented "single-period placebo
  SE is NaN" UserWarning, then (b) event study with L_max=2 +
  n_bootstrap=199 for multi-horizon placebos with valid SE
- Surfaces the Assumption 7 UserWarning that fires on every reversible
  panel and adds a markdown cell explaining why it fires (cost-benefit
  delta uses A7; headline ATT and event study don't) and why we accept
  it on a reversible design - instead of silencing it
- Wraps the bootstrap fit in `np.errstate(divide="ignore", over="ignore",
  invalid="ignore")` to silence Apple Accelerate's spurious matmul
  FP-error RuntimeWarnings (numpy issue #26669, fires only on macOS
  Accelerate-linked NumPy builds), with a code comment naming the
  attribution honestly
- Drops the drift-guards section (maintenance helper, not pedagogy;
  prose drift on a tutorial isn't a critical CI failure)
- Strips "Phase 1" jargon from headings and abstract

Net: 23 cells, down from 35. nbmake passes in ~2.4s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall Assessment

⚠️ Needs changes

Executive Summary

  • P1 The tutorial now says dCDH is built for within-market on -> off -> on paths, but the current estimator contract drops multi-switch groups by default and this notebook itself uses an A5-safe single-switch DGP.
  • P1 The A11 explanation now says mixed-direction single-switch panels “naturally satisfy” the stable-controls assumption. That is not true; A11 remains a per-period empirical condition, and the test suite includes a single-switch counterexample.
  • P2 The new notebook-level np.errstate(...) suppresses all NumPy divide/overflow/invalid warnings for the whole bootstrap fit, not just the specific Accelerate false positives described in the comment.
  • P3 The PR removes the drift-guard assertions while keeping several seed-specific numeric claims in markdown, so future drift can stale the tutorial text without failing nbmake.
  • Execution note: I could not re-run the notebook here because this review environment does not have numpy; the findings below are from static inspection against the checked-in registry, docstrings, and tests.

Methodology

  • P1 docs/tutorials/19_dcdh_marketing_pulse.ipynb:L20-L23 says the promo can be “on ... then off, then back on” and that dCDH is built “for exactly this case.” That conflicts with the documented estimator contract, which drops groups with more than one treatment-change period by default (docs/methodology/REGISTRY.md:L483-L486, docs/methodology/REGISTRY.md:L614-L614; diff_diff/chaisemartin_dhaultfoeuille.py:L389-L397), and with the generator/tutorial’s own A5-safe single-switch setup (diff_diff/prep_dgp.py:L1832-L1836, diff_diff/prep_dgp.py:L1926-L1937; docs/tutorials/19_dcdh_marketing_pulse.ipynb:L52-L55). Impact: readers are told the default supported path handles multi-switch panels when it actually filters them out or requires drop_larger_lower=False with an explicitly inconsistent estimator/SE pairing. Concrete fix: rewrite the intro to distinguish the general reversible-treatment motivation from the supported tutorial scope: single-switch joiner/leaver panels under A5.
  • P1 docs/tutorials/19_dcdh_marketing_pulse.ipynb:L157-L159 says single-switch panels with both joiners and leavers “naturally satisfy A11.” The registry says A11 is checked period-by-period (docs/methodology/REGISTRY.md:L489-L491, docs/methodology/REGISTRY.md:L616-L616), and the test suite has a single-switch panel that violates A11 (tests/test_chaisemartin_dhaultfoeuille.py:L877-L908). Impact: users may infer stable-control availability is automatic and underweight the warning that guards identification on reversible panels. Concrete fix: replace that sentence with a dataset-specific claim (“this seed does not trigger the A11 warning”) or add an explicit per-period A11 check/output.

Code Quality

  • P2 docs/tutorials/19_dcdh_marketing_pulse.ipynb:L241-L261 wraps the full L_max=2 fit in np.errstate(divide="ignore", over="ignore", invalid="ignore"). The project’s own tracking and code scope similar suppressions to specific matmul sites (TODO.md:L198-L204; diff_diff/chaisemartin_dhaultfoeuille.py:L4517-L4523; diff_diff/_nprobust_port.py:L554-L559). Impact: this notebook now suppresses any new floating-point warnings from the entire fit, not just the known Accelerate false positives, which weakens it as a regression detector. Concrete fix: localize the suppression to the specific operation/platform or keep warning suppression inside library code rather than around the whole fit.

Performance

  • No findings in the changed diff.

Maintainability

  • No findings beyond the scoped-warning issue above.

Tech Debt

  • No new TODO-tracked debt added; no separate finding.

Security

  • No findings.

Documentation/Tests

  • P3 The notebook still hard-codes seed-specific results in prose/template form (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L184-L206, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L312-L338), but the prior assertion-based drift guard has been removed and there is no replacement validation in the notebook. Impact: future estimator/DGP drift can leave the narrative stale while pytest --nbmake still passes. Concrete fix: restore a hidden assert cell or add a dedicated notebook regression test that checks the rounded quoted values and truth-coverage claims for the locked seed.

Path to Approval

  1. Rewrite the intro so it is A5-accurate: dCDH is the library’s reversible-treatment estimator, but this tutorial and the default analytical SE path cover the single-switch joiner/leaver case, not arbitrary on -> off -> on paths.
  2. Rewrite the A11 paragraph so it does not claim mixed-direction single-switch panels imply A11. Make it explicit that A11 is still checked period-by-period and that this chosen DGP/seed happens not to trigger the warning.

…drift test

Four fixes from CI review:

1. **P1 - A5 scope in intro.** Section 1 previously said "the promo can
   be on for a market, then off, then back on" - misleading because the
   default `drop_larger_lower=True` filter drops multi-switch groups and
   our DGP uses `pattern="single_switch"` (A5-safe). Rewrite the intro
   to be A5-accurate: across-market reversibility (joiners + leavers in
   the same panel), not within-market on-off-on cycling. Add an explicit
   "Scope of this tutorial" paragraph that names A5 and points at
   `by_path` for the multi-switch extension.

2. **P1 - A11 doesn't "naturally" hold.** The "Where do the controls
   come from?" paragraph previously said single-switch panels with both
   joiner and leaver cohorts "naturally satisfy A11" - false. The test
   suite has a single-switch panel where A11 fails (`tests/
   test_chaisemartin_dhaultfoeuille.py::TestA11Handling::test_a11_
   violation_zero_in_numerator_retain_in_denominator`). Replace with a
   seed-specific claim ("this seed and DGP happen not to trigger an
   A11 warning") and a pointer at the test as a counterexample.

3. **P2 - Narrow the matmul filter.** Replace
   `np.errstate(divide="ignore", over="ignore", invalid="ignore")` -
   which suppresses ALL FP error categories from the entire fit -
   with `warnings.filterwarnings(category=RuntimeWarning,
   message=r".*encountered in matmul")` - which only catches the
   Accelerate matmul pattern. Unrelated future RuntimeWarnings now
   surface.

4. **P3 - Restore drift detection via sibling test file.** Add
   `tests/test_t19_marketing_pulse_drift.py` with 8 tests that re-derive
   the narrative numbers (overall_att, joiners, leavers, event-study
   horizons, placebos, panel composition, A7 warning fires, A11
   warning does not fire) at the locked seed and check them against
   the tolerance bands quoted in the markdown. If a future library
   change moves any number outside its band, the test fails and a
   maintainer is forced to update the prose. Keeps the notebook clean
   of maintenance scaffolding while addressing the stale-prose risk.
   Test runs in <100ms.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 25, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: ba225330cb693a76c0c13240bb07c3ea67a417cc


Overall Assessment

Looks good

No unmitigated P0/P1 issues in the changed diff. The prior methodology blockers from the earlier AI review appear resolved. Review is static only: I could not execute tests here because this environment lacks both pytest and numpy.

Executive Summary

  • Tutorial 19 now explicitly scopes itself to the A5-safe single-switch case, which matches the default dCDH analytical-SE contract in docs/tutorials/19_dcdh_marketing_pulse.ipynb:L24-L24, docs/methodology/REGISTRY.md:L486-L491, and diff_diff/prep_dgp.py:L1926-L1933.
  • The A11 explanation is now correct: it is described as a per-period stable-controls condition and explicitly not automatic on single-switch panels, consistent with docs/tutorials/19_dcdh_marketing_pulse.ipynb:L160-L160, docs/methodology/REGISTRY.md:L491-L491, docs/methodology/REGISTRY.md:L616-L616, and tests/test_chaisemartin_dhaultfoeuille.py:L877-L908.
  • The notebook no longer blankets the bootstrap fit with np.errstate(...); it now suppresses only the known RuntimeWarning: ... encountered in matmul, while leaving the methodology-relevant A7 warning visible in the tutorial flow (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L228-L256).
  • The new drift test is a real improvement: it pins the locked DGP, headline ATT bands, joiner/leaver split, placebo behavior, and expected A7/A11 warning regime (tests/test_t19_marketing_pulse_drift.py:L36-L225).
  • P3 The drift test still does not assert some explicit prose claims in the notebook summary, so a few narrative lines can still drift silently.

Methodology

  • No findings. The changed notebook text now matches the documented single-switch/A5 scope and the registry’s per-period A11 contract.

Code Quality

  • No findings. Warning handling in the notebook is materially tighter and aligned with the stated intent.

Performance

  • No findings in the changed diff.

Maintainability

  • No findings.

Tech Debt

  • No findings. No new TODO-tracked debt is needed for approval.

Security

  • No findings.

Documentation/Tests

  • Severity P3. Impact: the new regression test protects approximate effect sizes and truth-coverage, but it does not pin the notebook’s explicit CI/significance claims. The prose says 95% CI: 11.3 to 12.8, l=1 CI [11.4, 13.3], l=2 CI [11.5, 13.6], and bootstrap p < 0.01 at both post-treatment horizons, but tests/test_t19_marketing_pulse_drift.py:L114-L153 only checks broad effect bands and coverage, not those quoted endpoints or p-values. Those lines in docs/tutorials/19_dcdh_marketing_pulse.ipynb:L186-L188 and docs/tutorials/19_dcdh_marketing_pulse.ipynb:L323-L346 can still go stale while the test passes. Concrete fix: extend the drift test to assert tight bands on the quoted CI endpoints and to check event_study_effects[1]["p_value"] < 0.01 and event_study_effects[2]["p_value"] < 0.01.

CI review surfaced that the drift test pinned broad effect bands and
truth-coverage but not the specific CI endpoints quoted in the prose
('95% CI: 11.3 to 12.8', l=1 [11.4, 13.3], l=2 [11.5, 13.6]) or the
'bootstrap p < 0.01' significance claim in the stakeholder template.
Those narrative lines could drift silently while the existing tests
passed.

Add three new tests:

- test_overall_ci_endpoints_match_quoted: locks the Phase 1 ATT CI
  endpoints to bands around the quoted 11.3 / 12.8 (tolerance ~0.3)
- test_event_study_ci_endpoints_match_quoted: locks the L_max=2 event
  study CI endpoints at l=1 and l=2 to bands around the quoted
  [11.4, 13.3] and [11.5, 13.6] (tolerance ~0.3)
- test_event_study_significance: asserts both post-treatment horizons
  have bootstrap p_value < 0.01

11 tests pass in ~0.07s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 25, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: a7f2b0e06e60a12267a6c0e25b85a9ac54b0a76e


Overall Assessment

Looks good

Static review only: this environment lacks both pytest and numpy, so I could not execute the notebook or the new drift test locally.

Executive Summary

  • No estimator or registry code changed, and the rewritten tutorial now matches the documented dCDH contract for A5 single-switch scope, A11 per-period stable-control checks, A7 cost-benefit-delta warnings, and the Phase 1 placebo-NaN-SE limitation.
  • The prior re-review P3 on missing significance coverage is partly addressed: the new drift test now asserts p_value < 0.01 at both post-treatment horizons (tests/test_t19_marketing_pulse_drift.py:L167-L172).
  • I found no unmitigated P0/P1 issues in the changed diff.
  • Two minor P3 items remain: the new CI “lock” bands are too loose to actually pin the quoted one-decimal prose, and the warning tests do not fully assert the notebook’s stated warning surface.

Methodology
No findings. The new notebook text is consistent with the DCDH methodology contract on A5-safe single_switch data, A11 warning behavior, A7 cost-benefit-delta warnings, and the documented L_max=None placebo inference limitation (diff_diff/prep_dgp.py:L1919-L1937, docs/methodology/REGISTRY.md:L486-L558, diff_diff/chaisemartin_dhaultfoeuille.py:L2140-L2308, tests/test_chaisemartin_dhaultfoeuille.py:L875-L969, tests/test_chaisemartin_dhaultfoeuille.py:L1109-L1151).

Code Quality

  • Severity P3. Impact: event_study_results says it mirrors the notebook’s warning handling with “A7 visible”, but it actually suppresses the A7 warning via warnings.filterwarnings(...), so the shared fixture’s docstring is false and easy to misread during future maintenance (tests/test_t19_marketing_pulse_drift.py:L79-L91). Concrete fix: either remove the A7 suppression from that fixture or change the docstring to say A7 is intentionally muted there and validated separately by test_assumption7_warning_fires_as_expected.

Performance
No findings in the changed diff.

Maintainability
No findings beyond the Code Quality note above.

Tech Debt
No findings. The tutorial’s placebo=False Phase 1 path is aligned with the already-tracked DID_M^pl NaN-SE limitation in TODO.md:L59, so this is properly documented deferred work rather than a blocker.

Security
No findings.

Documentation/Tests

  • Severity P3. Impact: the new endpoint assertions still do not actually “lock the rounded endpoints” they claim to lock. Bands like 11.0 <= ci_low <= 11.6 and 13.0 <= ci_hi <= 13.6 would still pass values that round to several different tenths while the notebook prose stays at 11.3, 12.8, 11.4, 13.3, etc., so stale narrative numbers can still slip through (tests/test_t19_marketing_pulse_drift.py:L125-L132, tests/test_t19_marketing_pulse_drift.py:L155-L164 versus docs/tutorials/19_dcdh_marketing_pulse.ipynb:L186-L188, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L323-L346). Concrete fix: assert the displayed rounding directly, e.g. round(ci_low, 1) == 11.3, or use tight absolute tolerances such as abs(ci_low - 11.3) < 0.05.
  • Severity P3. Impact: the warning tests do not fully pin the notebook’s stated warning policy. The notebook says the visible event-study fit should show the A7 UserWarning while narrowly silencing only Accelerate matmul RuntimeWarnings (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L228-L269), but the tests only assert that an A7 warning exists somewhere and suppress broad floating-point warnings with np.errstate, so extra unexpected warnings could appear without failing the suite (tests/test_t19_marketing_pulse_drift.py:L186-L255). Concrete fix: mirror the notebook’s message-scoped warning filter in the test and assert the exact warning set for the visible event-study cell.

CI review surfaced two refinements:

1. Endpoint bands like `11.0 <= ci_low <= 11.6` would still pass values
   rounding to several different one-decimal displays (11.0, 11.1, ...,
   11.6) while the notebook prose stays at "11.3", "12.8", "11.4",
   "13.3", "11.5", "13.6". Replace those with `round(ci_low, 1) == 11.3`
   etc. - directly pins the displayed rounding so any drift past the
   tenth fails the test.

2. The warning tests didn't pin the notebook's full warning contract.
   `event_study_results` suppressed A7 for fixture cleanliness while
   the docstring claimed "A7 visible". Two changes:
   - Fix the fixture docstring to acknowledge A7 is muted there for
     value-checking tests, with the notebook's actual warning-policy
     contract validated separately
   - Add `test_event_study_warning_policy_matches_notebook` that
     mirrors the notebook's exact filter (only matmul-pattern
     RuntimeWarnings silenced) and asserts the resulting warning set:
     exactly one UserWarning (A7 leavers-present, the one the markdown
     explains) and zero RuntimeWarnings. If a future library change
     emits an unexpected warning on this code path, the test fails.

12 tests pass in ~0.07s (was 11).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 25, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 6a663289ddec907def490f58bfa82d0357017757


Overall Assessment

Looks good

Static review only: this environment lacks pytest and numpy, so I could not execute tests/test_t19_marketing_pulse_drift.py or the notebook locally.

Executive Summary

  • No estimator or registry code changed, and the rewritten tutorial remains consistent with the documented dCDH contract for Phase 1 placebo NaN SE, dynamic placebo inference, A7 warning semantics, and A11 handling (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L160-L188, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L228-L269, docs/methodology/REGISTRY.md:L554-L606, diff_diff/chaisemartin_dhaultfoeuille.py:L1628-L1765, diff_diff/chaisemartin_dhaultfoeuille.py:L2231-L2295, diff_diff/chaisemartin_dhaultfoeuille.py:L2719-L3187).
  • The previous re-review P3s appear addressed: CI endpoints are now pinned by displayed rounding, the warning-policy test now asserts the exact visible event-study warning set, and the A7-muted fixture docstring is now accurate (tests/test_t19_marketing_pulse_drift.py:L77-L83, tests/test_t19_marketing_pulse_drift.py:L128-L166, tests/test_t19_marketing_pulse_drift.py:L216-L255).
  • I found no unmitigated P0/P1 issues in the changed diff.
  • One minor P3 remains: provenance comments point to _scratch/dcdh_tutorial helpers that are not present in this checkout (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L78, tests/test_t19_marketing_pulse_drift.py:L15-L16).

Methodology

No findings. The notebook’s statements about placebo=False, multi-horizon placebos at L_max=2, the informational A7 warning, and bootstrap percentile inference match the documented dCDH contract (docs/methodology/REGISTRY.md:L554-L606).

Code Quality

No findings.

Performance

No findings.

Maintainability

  • Severity P3. Impact: the notebook seed comment and the new drift-test docstring cite _scratch/dcdh_tutorial/40_build_notebook.py and _scratch/dcdh_tutorial/lock_seed.py as the provenance for the locked DGP/seed, but those files are not in the repo, so a maintainer cannot actually follow the breadcrumb from this checkout (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L78, tests/test_t19_marketing_pulse_drift.py:L15-L16). Concrete fix: either commit the referenced helper scripts or rewrite the comments to say the notebook/test themselves are the source of truth for the locked seed and quoted numbers.

Tech Debt

No findings. The tutorial’s Phase 1 placebo=False choice is compatible with the already-tracked DID_M^pl NaN-SE limitation in TODO.md:L59, so this is documented deferred work rather than a blocker.

Security

No findings.

Documentation/Tests

No new findings. The prior CI-rounding and warning-surface gaps look resolved by the direct one-decimal assertions and the exact warning-policy regression test (tests/test_t19_marketing_pulse_drift.py:L128-L166, tests/test_t19_marketing_pulse_drift.py:L216-L255).

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 25, 2026
Pure Python CI failed at test_event_study_ci_endpoints_match_quoted
because the bootstrap RNG path differs between Rust and pure-Python
backends (per the bit-identity-baseline-per-backend convention):

  Rust:        es[1] CI low = 11.394 -> rounds to 11.4 (matches prose)
  Pure Python: es[1] CI low = 11.487 -> rounds to 11.5 (mismatch)

The 0.09 backend gap is enough to flip the rounding boundary on the
exact-match `round(_, 1) == 11.4` pin I tightened to in the prior round.

Loosen the four bootstrap-CI endpoint asserts to a 0.15 absolute
tolerance band around the quoted prose values. Tight enough to catch
real prose drift (a real shift would move by >>0.15), loose enough to
absorb the documented backend variance. Verified on both backends
locally:

  pytest tests/test_t19_marketing_pulse_drift.py        -> 12/12 pass
  DIFF_DIFF_BACKEND=python pytest <same>                -> 12/12 pass

The analytical-SE endpoint asserts in test_overall_ci_endpoints_match_quoted
keep the strict `round(_, 1) ==` pin since they're not bootstrap-driven
and are bit-identical across backends.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber igerber merged commit 75eab37 into main Apr 25, 2026
22 checks passed
@igerber igerber deleted the refactor/dcdh-tutorial-focus branch April 25, 2026 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant