test(0.2): memory benchmark suite + ceiling tests (Track 9.10)#153
Open
test(0.2): memory benchmark suite + ceiling tests (Track 9.10)#153
Conversation
Adds the missing axis to the bench surface: existing
BenchmarkFullAnalysis_* measure CPU but never fail on memory
regressions. Real adopter complaints take the shape "Terrain ate
4 GB on my monorepo," not "Terrain was slow"; this suite plugs
the gap.
Two categories:
Allocation benchmarks (Benchmark*_Memory)
Wrap the existing analysis benches with b.ReportAllocs() so
bytes/op + allocs/op surface as regression-comparable
baselines. Run via:
go test -bench Memory ./internal/analysis/...
Ceiling tests (TestMemoryCeiling_*, TestMemoryNoLeak_*)
Run analysis at known scales and assert peak heap growth stays
under a configured ceiling. Three tests:
- 1k files: ceiling 250 MB (current observed: ~177 MB)
- 5k files: ceiling 1300 MB (current observed: ~1050 MB)
- 5-iter repeated analysis: ceiling 2000 MB (current
observed: ~1500 MB across 5 iterations on a 500-file fixture)
Skipped by default
Ceiling tests are gated on TERRAIN_MEMORY_BENCH=1 — they're
expensive (force GCs, run analysis at scale) and surface ceiling
regressions per the Track 9.10 baseline rather than smoke
failures. The new `make memory-bench` target sets the env var
for you. The default `go test ./...` loop is unaffected.
Track 9.10 follow-up: the leak test reports unexpectedly high
growth (~1.5 GB across 5 iterations on a 500-file fixture) — much
higher than the FileCache amortization should produce. Comment in
the test names this as the leading hypothesis (something in the
per-run allocation graph holds onto data the cache should
amortize) and points at where to look. Investigation is its own
work; the ceiling here catches regressions BEYOND the current
state, so a future fix that lowers actual growth will pass with
large headroom — at that point the ceiling should ratchet down.
Verification: all three ceiling tests pass under
`make memory-bench`; default `go test ./...` skips them; existing
benchmarks unaffected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Terrain AI Risk Review
Decision: PASS — AI surfaces are covered. |
[PASS] Terrain — Merge with caution
Pre-existing issues (1)
Recommended tests1 test(s) with exact coverage of 0 impacted unit(s).
Limitations
Generated by Terrain · Targeted Test ResultsTerrain selected 1 test(s) instead of the full suite.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds memory benchmarks + ceiling tests to plug the missing axis in
the bench surface. Existing benches measure CPU; this PR adds
allocation reporting + ceiling enforcement so memory regressions
surface as test failures, not adopter complaints.
b.ReportAllocs()for bytes/op + allocs/op trackingscales + a leak-detection check across 5 repeated runs
TERRAIN_MEMORY_BENCH=1(or the newmake memory-benchtarget) so they don't slow the defaultgo testloopNotable finding
The leak-detection test reports ~1.5 GB heap growth across 5
iterations of analyzing a 500-file fixture — much higher than
the FileCache should permit. The test passes (ceiling at 2000 MB
catches regressions beyond current state) but the comment names
the issue as a Track 9.10 follow-up worth investigating. A fix
that brings growth back to near-zero will pass with massive
headroom; at that point the ceiling should ratchet down so the
gate stays useful.
Test plan
go test ./internal/analysis/...— memory testsskipped, no time impact
TERRAIN_MEMORY_BENCH=1 go test -run TestMemory— all 3ceiling tests pass
make memory-bench— convenience wrapper worksgo build ./...cleanPlan tracker
Closes Track 9.10. Track 9 remaining: 9.1-9.6 (capability metadata,
panic recovery completion, registry refactor, etc.) + 9.7
(truth-verify). All explicitly post-0.2.0-blocking but safe to land
in either window.
🤖 Generated with Claude Code