Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 83 additions & 0 deletions BRIEFING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# SDID Practitioner Validation Tooling - Briefing

## Problem

A data scientist runs `SyntheticDiD`, gets an ATT and a p-value, and then
faces the question: *should I trust this estimate?* The library gives them the
point estimate and inference, but the validation workflow - the steps between
"I got a number" and "I'm confident enough to present this" - is largely
left to the practitioner to assemble from scratch.

The standard validation workflow for synthetic control methods is well
understood in the econometrics literature (Arkhangelsky et al. 2021,
Abadie et al. 2010, Abadie 2021). The pieces include pre-treatment fit
assessment, weight diagnostics, placebo/falsification tests, sensitivity
analysis, and cross-estimator comparison. Our library provides some of the
raw ingredients (pre-treatment RMSE, weight dicts, placebo effects array)
but doesn't connect them into an accessible diagnostic workflow.

The gap is most visible in `practitioner.py`, where `_handle_synthetic`
recommends in-time placebos and leave-one-out analysis but provides only
comment-only pseudo-code. A practitioner following that guidance hits a wall.

## Current state

What we have today:

- `results.pre_treatment_fit` (RMSE) with a warning when it exceeds the
treated pre-period SD
- `results.get_unit_weights_df()` and `results.get_time_weights_df()`
- Three variance methods: placebo (default), bootstrap, and jackknife (just
landed in v3.1.1)
- `results.placebo_effects` - stores per-iteration estimates for all three
variance methods, but for jackknife these are positional LOO estimates
with no unit labels
- `results.summary()` shows top-5 unit weights and count of non-trivial weights
- `practitioner.py` guidance that names the right steps but can't point to
runnable code for most of them

What the practitioner must currently build themselves:

- Mapping jackknife LOO estimates back to unit identities to answer "which
unit, when dropped, changes my estimate the most?"
- In-time placebo tests (re-estimate with a fake treatment date)
- Any weight concentration metric beyond eyeballing the sorted list
- Any sense of whether their RMSE is "bad enough to worry about" beyond
the binary warning
- Regularization sensitivity (does the ATT change if I perturb zeta?)
- Pre-treatment trajectory data for plotting (the Y matrices are internal
to `fit()` and not returned)

## Context from prior discussion

The jackknife work created an interesting opportunity. The delete-one-re-estimate
loop already runs for SE computation. The per-unit ATT estimates are stored in
`results.placebo_effects`. The missing piece is a presentation layer that maps
those estimates to unit identities and surfaces the diagnostic interpretation
(which units are influential, how stable is the estimate to unit composition).

More broadly, the validation gaps fall into two categories:

1. **Low-marginal-cost additions** - things where the computation already
exists and we just need to expose or label it (LOO diagnostic from
jackknife, weight concentration metrics, trajectory data extraction)

2. **New functionality** - things that require new estimation loops or
helpers (in-time placebo, regularization sensitivity sweep)

The practitioner guidance in `practitioner.py` should evolve alongside any
new tooling so that the recommended steps point to real, runnable code paths.

## What "done" looks like

A practitioner using SyntheticDiD should be able to follow a credible
validation workflow using library-provided tools and guidance, without
needing to reverse-engineer internals or write substantial boilerplate.
The validation steps recognized in the literature should either be directly
supported or have clear, concrete guidance for how to perform them with
the library's API.

This is not about adding visualization or plotting (that's a separate
concern). It's about making the computational and diagnostic building
blocks accessible and well-documented through the results API and
practitioner guidance.
63 changes: 51 additions & 12 deletions diff_diff/practitioner.py
Original file line number Diff line number Diff line change
Expand Up @@ -505,35 +505,74 @@ def _handle_synthetic(results: Any):
steps = [
_step(
baker_step=6,
label="Check pre-treatment fit quality",
label="Check pre-treatment fit and weight concentration",
why=(
"Synthetic DiD relies on pre-treatment fit to construct "
"weights. Poor fit suggests the synthetic control may not "
"approximate the counterfactual well."
"weights. Poor fit or highly concentrated unit weights "
"suggest the synthetic control may not approximate the "
"counterfactual well."
),
code=(
"# Check pre-treatment fit and unit weight concentration:\n"
"print(f'Pre-treatment fit (RMSE): {results.pre_treatment_fit:.4f}')\n"
"# Highly concentrated weights suggest fragile estimates"
"concentration = results.get_weight_concentration()\n"
"print(f\"Effective N: {concentration['effective_n']:.1f}\")\n"
"print(f\"Top-5 weight share: {concentration['top_k_share']:.2%}\")"
),
step_name="sensitivity",
),
_step(
baker_step=6,
label="In-time or in-space placebo",
label="In-time placebo",
why=(
"Test robustness by re-estimating on a placebo treatment "
"period (in-time) or excluding treated units one at a time "
"(leave-one-out). These are the natural falsification "
"checks for synthetic control methods."
"Re-estimate on shifted fake treatment dates in the "
"pre-period. A credible design yields near-zero placebo "
"ATTs — departures signal that something is being picked "
"up pre-treatment, weakening the causal interpretation."
),
code=(
"# In-time placebo: re-estimate with a fake treatment date\n"
"# Leave-one-out: drop each treated unit and re-estimate"
"placebo_df = results.in_time_placebo()\n"
"print(placebo_df)"
),
priority="medium",
step_name="sensitivity",
),
_step(
baker_step=6,
label="Leave-one-out influence (jackknife)",
why=(
"If the estimate is driven by a single unit, robustness "
"is weak. Fit with variance_method='jackknife' and inspect "
"which units move the ATT the most."
),
code=(
"# Requires variance_method='jackknife' AND enough support for LOO\n"
"# (n_treated >= 2 and >= 2 effective-weight controls).\n"
"if getattr(results, '_loo_unit_ids', None) is not None:\n"
" loo_df = results.get_loo_effects_df()\n"
" print(loo_df.head(10))\n"
"else:\n"
" print('LOO not available - re-fit with '\n"
" 'variance_method=\"jackknife\" and ensure >=2 treated units '\n"
" 'with positive effective support.')"
),
priority="medium",
step_name="sensitivity",
),
_step(
baker_step=6,
label="Regularization sensitivity (zeta_omega)",
why=(
"The unit-weight regularization is auto-selected from "
"data. Show whether the ATT moves materially across a "
"grid of values to gauge robustness to this choice."
),
code=(
"sens_df = results.sensitivity_to_zeta_omega()\n"
"print(sens_df)"
),
priority="low",
step_name="sensitivity",
),
_step(
baker_step=8,
label="Compare with staggered estimators (CS, SA)",
Expand Down
Loading
Loading