Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
af7390f
Add profile_panel() + llms-autonomous.txt agent-facing pair
igerber Apr 24, 2026
0bc776b
Fix profile_panel() binary detection for degenerate panels
igerber Apr 24, 2026
0c7ba05
Address PR #356 CI review (3 P1 code + 2 P1 guide)
igerber Apr 24, 2026
109c83a
Address PR #356 CI review round 2 (2 P1 guide + 1 P1 code)
igerber Apr 24, 2026
5864081
Address PR #356 CI review round 3 (2 P1 guide + 1 P1 + 1 P2 code)
igerber Apr 24, 2026
c05c52f
Address PR #356 CI review round 4 (2 P1 guide + 2 P2 code/docs)
igerber Apr 24, 2026
4506741
Address PR #356 CI review round 5 (2 P2 docs)
igerber Apr 24, 2026
046e35c
Address PR #356 CI review round 6 (3 P2 methodology / docs)
igerber Apr 24, 2026
eea7aa5
Address PR #356 CI review round 7 (1 P1 guide + 1 P2 docs)
igerber Apr 24, 2026
e6c5b57
Address PR #356 CI review round 8 (2 P1 methodology + 1 P2 tests)
igerber Apr 24, 2026
57d42a0
Address PR #356 CI review round 9 (1 P1 + 1 P2 semantic)
igerber Apr 24, 2026
a65b5fa
Address PR #356 CI review round 10 (1 P1 + 1 P2 + 1 P3)
igerber Apr 24, 2026
9a95d2f
Address PR #356 CI review round 11 (1 P1 guide + 1 P2 test)
igerber Apr 24, 2026
610b8aa
Address PR #356 CI review round 12 (1 P2 guide)
igerber Apr 24, 2026
2ba1010
Address PR #356 CI review round 13 (1 P1 guide + code)
igerber Apr 24, 2026
889b24a
Address PR #356 CI review round 14 (1 P1 guide)
igerber Apr 24, 2026
9d5f8f9
Address PR #356 CI review round 15 (1 P1 guide)
igerber Apr 24, 2026
ef6b53d
Address PR #356 CI review round 16 (1 P1 guide)
igerber Apr 24, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- **`did_had_pretest_workflow(aggregate="event_study")`**: multi-period dispatch on balanced ≥3-period panels. Runs QUG at `F` + joint pre-trends Stute across earlier pre-periods + joint homogeneity-linearity Stute across post-periods. Step 2 closure requires ≥2 pre-periods; with only a single pre-period (the base `F-1`) `pretrends_joint=None` and the verdict flags the skip. Reuses the Phase 2b event-study panel validator (last-cohort auto-filter under staggered timing with `UserWarning`; `ValueError` when `first_treat_col=None` and the panel is staggered). The data-in wrappers `joint_pretrends_test` and `joint_homogeneity_test` also route through that same validator internally, so direct wrapper calls inherit the last-cohort filter and constant-post-dose invariant. `HADPretestReport` extended with `pretrends_joint`, `homogeneity_joint`, and `aggregate` fields; serialization methods (`summary`, `to_dict`, `to_dataframe`, `__repr__`) preserve the Phase 3 output bit-exactly on `aggregate="overall"` — no `aggregate` key, no header row, no schema drift — and only surface the new fields on `aggregate="event_study"`.
- **`ChaisemartinDHaultfoeuille.by_path`** — per-path event-study disaggregation, mirroring R `did_multiplegt_dyn(..., by_path=k)`. Passing `by_path=k` (positive int) to the estimator reports separate `DID_{path,l}` + SE + inference for the top-k most common observed treatment paths in the window `[F_g-1, F_g-1+L_max]`, answering the practitioner question "is a single pulse enough, or do you need sustained exposure?" across paths like `(0,1,0,0)` vs `(0,1,1,0)` vs `(0,1,1,1)`. The per-path SE follows the joiners-only / leavers-only IF precedent (switcher-side contribution zeroed for non-path groups; control pool and cohort structure unchanged; plug-in SE with path-specific divisor). Requires `drop_larger_lower=False` (multi-switch groups are the object of interest) and `L_max >= 1`. Binary treatment only in this release; combinations with `controls`, `trends_linear`, `trends_nonparam`, `heterogeneity`, `design2`, `honest_did`, `survey_design`, and `n_bootstrap > 0` raise `NotImplementedError` and are deferred to follow-up PRs. Results expose `results.path_effects: Dict[Tuple[int, ...], Dict[str, Any]]` and `results.to_dataframe(level="by_path")`; the summary grows a "Treatment-Path Disaggregation" block. Ties in path frequency are broken lexicographically on the path tuple for deterministic ranking. Overflow (`by_path > n_observed_paths`) returns all observed paths with a `UserWarning`. See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path per-path event-study disaggregation)` for the full contract.
- **R-parity for `ChaisemartinDHaultfoeuille.by_path`** against `DIDmultiplegtDYN 2.3.3`. Two new scenarios in `benchmarks/data/dcdh_dynr_golden_values.json` generated from `did_multiplegt_dyn(..., by_path=k)`: `mixed_single_switch_by_path` (2 paths, `by_path=2`) and `multi_path_reversible_by_path` (4 observed paths, `by_path=3`, via a new deterministic multi-path DGP pattern in the R generator). Per-path point estimates and per-path switcher counts match R exactly; per-path SE matches within the Phase 2 multi-horizon SE envelope (observed rtol ≤ 10.2% on the 2-path scenario, ≤ 4.2% on the 4-path scenario). Parity tests live at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPath`, matching paths by tuple label via set-equality (robust to R's undocumented frequency-tie tiebreak) and cross-checking per-path switcher counts before SE comparison. **Deviation documented:** cross-path cohort sharing — our full-panel cohort-centered plug-in vs R's per-path re-run diverges materially when a `(D_{g,1}, F_g, S_g)` cohort spans multiple observed paths; the two coincide when every cohort is single-path. The parity scenarios are constructed to keep cohorts single-path (scenario 13 by design, scenario 14 via path-assignment-deterministic-on-F_g). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path...)` for the full write-up.
- **`profile_panel()` utility + `llms-autonomous.txt` reference guide (agent-facing)** — new `diff_diff.profile_panel(df, *, unit, time, treatment, outcome)` returns a frozen `PanelProfile` dataclass of structural facts (panel balance, treatment-type classification — `"binary_absorbing"` / `"binary_non_absorbing"` / `"continuous"` / `"categorical"`, cohort structure, outcome characteristics, and a `tuple[Alert, ...]` of factual observations). `.to_dict()` returns a JSON-serializable view. Paired with a new bundled `"autonomous"` variant on `get_llm_guide()` — `get_llm_guide("autonomous")` returns a reference-shaped guide (distinct from the existing workflow-prose `"practitioner"` variant) with §1 audience disclaimer, §2 `PanelProfile` field reference, §3 embedded 17-estimator × 9-design-feature support matrix, §4 per-design-feature reasoning citing Baker et al. (2025) and Roth / Sant'Anna (2023), §5 post-fit validation index, §6 BR/DR schema reference, §7 citations, §8 intentional omissions. Both pieces are bundled inside the wheel (no GitHub / RTD dependency at runtime); `diff_diff/__init__.py` module docstring leads with an agent-entry block listing `profile_panel`, `get_llm_guide("autonomous")`, `get_llm_guide("practitioner")`, and `BusinessReport` so `help(diff_diff)` surfaces them. Descriptive, not opinionated — `profile_panel` alerts never recommend a specific estimator, and the guide enumerates trade-offs rather than dispatching. Exports: `profile_panel`, `PanelProfile`, `Alert` from top-level `diff_diff`.
- **`target_parameter` block in BR/DR schemas (experimental; schema version bumped to 2.0)** — `BUSINESS_REPORT_SCHEMA_VERSION` and `DIAGNOSTIC_REPORT_SCHEMA_VERSION` bumped from `"1.0"` to `"2.0"` because the new `"no_scalar_by_design"` value on the `headline.status` / `headline_metric.status` enum (dCDH `trends_linear=True, L_max>=2` configuration) is a breaking change per the REPORTING.md stability policy. BusinessReport and DiagnosticReport now emit a top-level `target_parameter` block naming what the headline scalar actually represents for each of the 16 result classes. Closes BR/DR foundation gap #6 (target-parameter clarity). Fields: `name`, `definition`, `aggregation` (machine-readable dispatch tag), `headline_attribute` (raw result attribute), `reference` (citation pointer). BR's summary emits the short `name` right after the headline; DR's overall-interpretation paragraph does the same; both full reports carry a "## Target Parameter" section with the full definition. Per-estimator dispatch is sourced from REGISTRY.md and lives in the new `diff_diff/_reporting_helpers.py::describe_target_parameter`. A few branches read fit-time config (`EfficientDiDResults.pt_assumption`, `StackedDiDResults.clean_control`, `ChaisemartinDHaultfoeuilleResults.L_max` / `covariate_residuals` / `linear_trends_effects`); others emit a fixed tag (the fit-time `aggregate` kwarg on CS / Imputation / TwoStage / Wooldridge does not change the `overall_att` scalar — disambiguating horizon / group tables is tracked under gap #9). See `docs/methodology/REPORTING.md` "Target parameter" section.
- SyntheticDiD coverage Monte Carlo calibration table added to `docs/methodology/REGISTRY.md` §SyntheticDiD — rejection rates at α ∈ {0.01, 0.05, 0.10} across `placebo` / `bootstrap` / `jackknife` on 3 representative DGPs (balanced / exchangeable, unbalanced, and Arkhangelsky et al. (2021) AER §6.3 non-exchangeable). Artifact at `benchmarks/data/sdid_coverage.json` (500 seeds × B=200), regenerable via `benchmarks/python/coverage_sdid.py`.

Expand Down
10 changes: 6 additions & 4 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,15 +137,17 @@ Long-running program, framed as "building toward" rather than with discrete ship

- Baker et al. (2025) 8-step workflow enforcement in `diff_diff/practitioner.py`.
- `practitioner_next_steps()` context-aware guidance.
- Runtime LLM guides via `get_llm_guide(...)` (`llms.txt`, `llms-full.txt`, `llms-practitioner.txt`), bundled in the wheel.
- Runtime LLM guides via `get_llm_guide(...)` (`llms.txt`, `llms-full.txt`, `llms-practitioner.txt`, `llms-autonomous.txt`), bundled in the wheel.
- `profile_panel(df, ...)` returns a `PanelProfile` dataclass of structural facts about the panel - factual, not opinionated. Pairs with the `"autonomous"` guide variant (reference-shaped: estimator-support matrix + per-design-feature reasoning) so agents describe the data then consult a bundled reference rather than calling a deterministic recommender.
- Package docstring leads with an "For AI agents" entry block so `help(diff_diff)` surfaces the agent entry points automatically.
- Silent-operation warnings so agents and humans see the same signals at the same time.

**Next blocks toward the vision.**

- **BusinessReport / DiagnosticReport** (in Shipping Next) - the output form the vision assumes.
- **Post-hoc mismatch detection in BR/DR output** - surfaces structured warnings like "you fit TWFE on staggered data with 37% forbidden-comparison weights" when the profile and the fitted estimator disagree. Safety net, not a pre-emptive rules engine.
- **Structured `sanity_checks` block in BR/DR** - machine-legible pass / warn / fail signals (pretrends, power, forbidden-comparisons, event-study cleanliness, placebo, sensitivity) so agents can dispatch on a stable schema rather than parsing prose.
- **Context-aware `practitioner_next_steps()`** that substitutes actual column names - turns guidance into executable recommendations.
- **AI-legible diagnostic surfaces** - once BusinessReport ships, a structured JSON counterpart that agents can parse without screen-scraping human text.
- **Scenario-to-estimator selection guidance** - agent-facing extension of `docs/practitioner_decision_tree.rst` that returns a specific estimator choice plus rationale for a given scenario description.
- **Unified `assess_*` verb** across estimator native-diagnostic methods for a single discoverable convention.
- **End-to-end scenario walkthrough templates** - reusable orchestration recipes an agent can adapt from data ingest through business-ready output.

---
Expand Down
25 changes: 18 additions & 7 deletions diff_diff/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,20 @@
This library provides sklearn-like estimators for causal inference
using the difference-in-differences methodology.

For rigorous analysis, follow the 8-step practitioner workflow based
on Baker et al. (2025). After estimation, call
``practitioner_next_steps(results)`` for context-aware guidance on
remaining diagnostic steps.
For AI agents:

AI agents: call ``diff_diff.get_llm_guide()`` for a complete API reference.
Use ``get_llm_guide("practitioner")`` for the 8-step workflow or
``get_llm_guide("full")`` for comprehensive documentation.
1. Describe your data: ``diff_diff.profile_panel(df, unit=..., time=...,
treatment=..., outcome=...)``
2. Consult the reference: ``diff_diff.get_llm_guide("autonomous")``
(estimator-support matrix + reasoning)
3. Follow the workflow: ``diff_diff.get_llm_guide("practitioner")``
(Baker et al. (2025) 8-step recipe)
4. Report results: ``diff_diff.BusinessReport(results)``
(structured agent-legible output)

For a comprehensive API reference call ``diff_diff.get_llm_guide("full")``;
``practitioner_next_steps(results)`` returns context-aware guidance after
any estimator's ``fit()``.
"""

# Import backend detection from dedicated module (avoids circular imports)
Expand Down Expand Up @@ -244,6 +250,7 @@
DiagnosticReportResults,
)
from diff_diff._guides_api import get_llm_guide
from diff_diff.profile import Alert, PanelProfile, profile_panel
from diff_diff.datasets import (
clear_cache,
list_datasets,
Expand Down Expand Up @@ -487,6 +494,10 @@
"DiagnosticReport",
"DiagnosticReportResults",
"DIAGNOSTIC_REPORT_SCHEMA_VERSION",
# Panel profiling (agent-facing pre-fit describe utility)
"profile_panel",
"PanelProfile",
"Alert",
# LLM guide accessor
"get_llm_guide",
]
10 changes: 7 additions & 3 deletions diff_diff/_guides_api.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""Runtime accessor for bundled LLM guide files."""

from __future__ import annotations

from importlib.resources import files
Expand All @@ -7,6 +8,7 @@
"concise": "llms.txt",
"full": "llms-full.txt",
"practitioner": "llms-practitioner.txt",
"autonomous": "llms-autonomous.txt",
}


Expand All @@ -21,6 +23,10 @@ def get_llm_guide(variant: str = "concise") -> str:
- ``"concise"`` -- compact API reference (llms.txt)
- ``"full"`` -- complete API documentation (llms-full.txt)
- ``"practitioner"`` -- 8-step practitioner workflow (llms-practitioner.txt)
- ``"autonomous"`` -- reference guide for AI-agent use: estimator-support
matrix, per-design-feature reasoning, post-fit validation index, and
BR/DR schema (llms-autonomous.txt). Pair with
:func:`diff_diff.profile_panel` for pre-fit data description.

Returns
-------
Expand All @@ -42,7 +48,5 @@ def get_llm_guide(variant: str = "concise") -> str:
filename = _VARIANT_TO_FILE[variant]
except (KeyError, TypeError):
valid = ", ".join(repr(k) for k in _VARIANT_TO_FILE)
raise ValueError(
f"Unknown guide variant {variant!r}. Valid options: {valid}."
) from None
raise ValueError(f"Unknown guide variant {variant!r}. Valid options: {valid}.") from None
return files("diff_diff.guides").joinpath(filename).read_text(encoding="utf-8")
Loading
Loading