Skip to content

test(0.2): memory benchmark suite + ceiling tests (Track 9.10)#153

Open
pmclSF wants to merge 1 commit intomainfrom
feat/0.2-track9-memory-benchmarks
Open

test(0.2): memory benchmark suite + ceiling tests (Track 9.10)#153
pmclSF wants to merge 1 commit intomainfrom
feat/0.2-track9-memory-benchmarks

Conversation

@pmclSF
Copy link
Copy Markdown
Owner

@pmclSF pmclSF commented May 3, 2026

Summary

Adds memory benchmarks + ceiling tests to plug the missing axis in
the bench surface. Existing benches measure CPU; this PR adds
allocation reporting + ceiling enforcement so memory regressions
surface as test failures, not adopter complaints.

  • Allocation benchmarks wrap existing analysis paths with
    b.ReportAllocs() for bytes/op + allocs/op tracking
  • Ceiling tests assert peak heap growth at 1k / 5k file
    scales + a leak-detection check across 5 repeated runs
  • Default-skipped behind TERRAIN_MEMORY_BENCH=1 (or the new
    make memory-bench target) so they don't slow the default
    go test loop

Notable finding

The leak-detection test reports ~1.5 GB heap growth across 5
iterations of analyzing a 500-file fixture — much higher than
the FileCache should permit. The test passes (ceiling at 2000 MB
catches regressions beyond current state) but the comment names
the issue as a Track 9.10 follow-up worth investigating. A fix
that brings growth back to near-zero will pass with massive
headroom; at that point the ceiling should ratchet down so the
gate stays useful.

Test plan

  • Default go test ./internal/analysis/... — memory tests
    skipped, no time impact
  • TERRAIN_MEMORY_BENCH=1 go test -run TestMemory — all 3
    ceiling tests pass
  • make memory-bench — convenience wrapper works
  • go build ./... clean

Plan tracker

Closes Track 9.10. Track 9 remaining: 9.1-9.6 (capability metadata,
panic recovery completion, registry refactor, etc.) + 9.7
(truth-verify). All explicitly post-0.2.0-blocking but safe to land
in either window.

🤖 Generated with Claude Code

Adds the missing axis to the bench surface: existing
BenchmarkFullAnalysis_* measure CPU but never fail on memory
regressions. Real adopter complaints take the shape "Terrain ate
4 GB on my monorepo," not "Terrain was slow"; this suite plugs
the gap.

Two categories:

Allocation benchmarks (Benchmark*_Memory)
  Wrap the existing analysis benches with b.ReportAllocs() so
  bytes/op + allocs/op surface as regression-comparable
  baselines. Run via:
    go test -bench Memory ./internal/analysis/...

Ceiling tests (TestMemoryCeiling_*, TestMemoryNoLeak_*)
  Run analysis at known scales and assert peak heap growth stays
  under a configured ceiling. Three tests:
    - 1k files: ceiling 250 MB (current observed: ~177 MB)
    - 5k files: ceiling 1300 MB (current observed: ~1050 MB)
    - 5-iter repeated analysis: ceiling 2000 MB (current
      observed: ~1500 MB across 5 iterations on a 500-file fixture)

Skipped by default
  Ceiling tests are gated on TERRAIN_MEMORY_BENCH=1 — they're
  expensive (force GCs, run analysis at scale) and surface ceiling
  regressions per the Track 9.10 baseline rather than smoke
  failures. The new `make memory-bench` target sets the env var
  for you. The default `go test ./...` loop is unaffected.

Track 9.10 follow-up: the leak test reports unexpectedly high
growth (~1.5 GB across 5 iterations on a 500-file fixture) — much
higher than the FileCache amortization should produce. Comment in
the test names this as the leading hypothesis (something in the
per-run allocation graph holds onto data the cache should
amortize) and points at where to look. Investigation is its own
work; the ceiling here catches regressions BEYOND the current
state, so a future fix that lowers actual growth will pass with
large headroom — at that point the ceiling should ratchet down.

Verification: all three ceiling tests pass under
`make memory-bench`; default `go test ./...` skips them; existing
benchmarks unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 3, 2026

Terrain AI Risk Review

Metric Value
AI surfaces 13
Eval scenarios 16
Impacted scenarios 0
Uncovered surfaces 13

Decision: PASS — AI surfaces are covered.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 3, 2026

[PASS] Terrain — Merge with caution

High-severity gaps found in changed code.

Metric Value
Changed files 2 (0 source · 1 test)
Tests selected 1 of 773 (0% of suite)

Pre-existing issues (1)

  • internal/analysis/memory_bench_test.go [HIGH] — [staticSkippedTest] 4 of 3 tests statically skipped (133%) in internal/analysis/memory_bench_test.go.

Recommended tests

1 test(s) with exact coverage of 0 impacted unit(s).

Test Confidence Why
internal/analysis/memory_bench_test.go exact test file directly changed

Limitations
  • No coverage artifacts provided; protection gaps reflect missing data, not measured absence. Provide --coverage to improve accuracy.
  • Mixed test cultures reduce cross-framework optimization confidence. Consider standardizing on fewer frameworks.

Generated by Terrain · terrain pr --json for machine-readable output

Targeted Test Results

Terrain selected 1 test(s) instead of the full suite.

  • Go tests: passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant