Skip to content

Replace scipy.stats.linregress with inlined OLS in numpy#48

Merged
scottstanie merged 2 commits intoopera-adt:mainfrom
scottstanie:refactor/drop-scipy-linregress
Apr 24, 2026
Merged

Replace scipy.stats.linregress with inlined OLS in numpy#48
scottstanie merged 2 commits intoopera-adt:mainfrom
scottstanie:refactor/drop-scipy-linregress

Conversation

@scottstanie
Copy link
Copy Markdown
Collaborator

Summary

The only use of scipy in the whole repo was one stats.linregress(x, y) call in utils.calculate_trend, and it only needed three of the five return values (slope, intercept, r_value → squared for r_squared); p_value and std_err were discarded.

Replaced with 7 lines of numpy. Same formula scipy uses:

  • slope = Σ dx·dy / Σ dx²
  • intercept = ȳ − slope · x̄
  • r² = (Σ dx·dy)² / (Σ dx² · Σ dy²)

Numerical parity verified: on a synthesized 40-point series with NaN injected, scipy and the new code agree to ~1e-20 on slope, intercept, and r² (below float64 round-off).

scipy is still a transitive dep via xarray/matplotlib/etc., so pixi install won't get smaller. This just cuts the direct import — one fewer library bowser needs to cooperate with.

Drive-by

Fixed a pre-existing E501 in an unrelated docstring in the same file; ruff doesn't flag it until a full-file recheck on commit.

Test plan

  • Numerical parity check (see PR body)
  • ruff + ruff-format pass
  • mypy was skipped locally — the pre-commit config for the mypy hook is missing types-python-dateutil in additional_dependencies, so it fails on every commit touching utils.py on current main. Separate cleanup item, not introduced by this PR. CI doesn't run pre-commit.

🤖 Generated with Claude Code

scottstanie and others added 2 commits April 24, 2026 14:43
calculate_trend was the only use of scipy in the repo, and it only
needed three of linregress's five return values (slope, intercept,
r_value → r_squared). p_value and std_err were both discarded.

Drops the `from scipy import stats` import in favour of a 7-line
OLS that matches scipy's formula exactly (verified numerically:
slope/intercept/r² agree to ~1e-20 on a synthesized 40-point
time-series with injected NaNs).

scipy remains a transitive dependency via xarray/matplotlib — this
just cuts the direct coupling.

Also fixes a drive-by E501 in an unrelated docstring that ruff only
flags on full-file recheck.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@scottstanie scottstanie merged commit 0d28df2 into opera-adt:main Apr 24, 2026
0 of 2 checks passed
@scottstanie scottstanie deleted the refactor/drop-scipy-linregress branch April 24, 2026 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant