Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ Deferred items from PR reviews that were not addressed before merge.
| Sphinx autodoc fails to import 3 result members: `DiDResults.ci`, `MultiPeriodDiDResults.att`, `CallawaySantAnnaResults.aggregate` — investigate whether these are renamed/removed or just unresolvable from autosummary template | `docs/api/results.rst`, `docs/api/staggered.rst` | — | Medium |
| `EDiDBootstrapResults` cross-reference is ambiguous — class is exported from both `diff_diff` and `diff_diff.efficient_did_bootstrap`, producing 3 "more than one target found" warnings. Add `:noindex:` to one source or use full-path refs | `diff_diff/efficient_did_results.py`, `docs/api/efficient_did.rst` | — | Low |
| Tracked Sphinx autosummary stubs in `docs/api/_autosummary/*.rst` are stale — every sphinx build regenerates them with new attributes (e.g., `coef_var`, `survey_metadata`) that have been added to result classes. Either commit a refresh or move the directory to `.gitignore` and treat as build output. Also 6 untracked stubs exist for newer estimators (`WooldridgeDiD`, `SimulationMDEResults`, etc.) that have never been committed. | `docs/api/_autosummary/` | — | Low |
| HonestDiD `test_m0_short_circuit` uses wall-clock `elapsed < 0.5s` as a proxy for "short-circuit path taken" instead of calling the full optimizer. Replace with a direct correctness signal (mock/spy the optimizer or check a state flag) so the test doesn't depend on CI timing. Not flaky today at 500ms, but load-bearing correctness on a timing proxy is brittle. | `tests/test_methodology_honest_did.py:246` | — | Low |

---

Expand Down
17 changes: 15 additions & 2 deletions tests/test_se_accuracy.py
Original file line number Diff line number Diff line change
Expand Up @@ -252,12 +252,18 @@ def test_se_vs_r_benchmark(self):
assert se_diff_pct < 0.01, \
f"SE differs from R by {se_diff_pct:.4f}%, expected <0.01%"

@pytest.mark.slow
def test_timing_performance(self, cs_results):
"""
Ensure estimation timing doesn't regress.

Baseline: ~0.005s for 200 units x 8 periods (small scale)
Threshold: <0.1s (20x margin for CI variance)
Threshold: <0.1s.

Excluded from default CI via ``@pytest.mark.slow`` — wall-clock time
on shared runners is noisy (BLAS path variation, neighbor VM
contention, cold caches) and produces false positives. Run locally
with ``pytest -m slow`` for ad-hoc performance sanity checks.
"""
_, elapsed = cs_results

Expand Down Expand Up @@ -398,8 +404,15 @@ def test_influence_function_normalization(self):
f"Python SE {se_py:.4f} doesn't match standard {se_standard:.4f}"


@pytest.mark.slow
class TestPerformanceRegression:
"""Tests to prevent performance regression."""
"""Tests to prevent performance regression.

Excluded from default CI via ``@pytest.mark.slow`` — wall-clock time on
shared runners is noisy (BLAS path variation, neighbor VM contention,
cold caches) and produces false positives. Run locally with
``pytest -m slow`` for ad-hoc performance sanity checks.
"""

@pytest.mark.parametrize("n_units,max_time", [
(100, 0.15), # Small: <150ms (CI runners need headroom)
Expand Down
Loading