docs(perf): Chapter 3 + Task-Manager-first diagnosis in perf-diagnose skill#619
Merged
docs(perf): Chapter 3 + Task-Manager-first diagnosis in perf-diagnose skill#619
Conversation
…osis Adds the IntersectionObserver / BufferLine retention story to memory-learnings.md (#617, upstream xtermjs/xterm.js#5820 + #5821). Skill updates: - New "Ground truth: Task Manager Memory Footprint, not proxies" section. performance.memory, system/Context count, closure:* count are all proxies that can diverge from Task Manager by 100x. #614 reduced Context growth 89% across six commits with zero Memory Footprint improvement — the cautionary tale. - New "Quiet-session A/B" requirement. Active agent terminals grow xterm scrollback legitimately; measurements on a busy session look indistinguishable from retention. #618 closed after a quiet-session A/B showed the +69 MB residual was all agent-stream activity. - New leak shape "Callback retained past dispose()" — when an observer's disconnect()/dispose() doesn't fully release the callback closure in practice (DevTools instrumentation, extensions, native registry quirks). Fix pattern: wrap `this` in WeakRef inside the callback. This is what #617 did for xterm's RenderService. - Promoted diff-heap.mjs + find-retainers.mjs to the top of the analyzer list, sorted by "start here". Sort heap diffs by bytes, not count — a 220 MB Uint32Array leak dominates any number of 40-byte Context churn. Also commits the two diagnostic scripts that had been sitting untracked in docs/perf-investigations/scripts/. Regenerated .claude/skills/perf-diagnose/SKILL.md via `just ai::apm`.
5d08b2b to
b3b5655
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Documents the Chapter 3 investigation so future agents don't burn three days chasing proxy metrics the way we did.
Source of truth edits (regenerated into
.claude/viajust ai::apm):docs/perf-investigations/memory-learnings.md— new Chapter 3 section covering perf(memory): cut canvas-toggle retention (-89% Contexts, -91% SVG bytes) #614 (closed without merge, the false trail) and fix(xterm): patch IntersectionObserver retention (-367 MB/30 mode-toggles) #617 (the one-line WeakRef that actually moved Task Manager Memory Footprint by −81%).agents/.apm/skills/perf-diagnose/SKILL.md— runbook additions:performance.memory,system/Contextcount, andclosure:*count are all proxies — they can drift 100× from Task Manager and mislead you into declaring a fix that does nothing.dispose()". When aWindow.<Observer>native registry (or a DevTools extension wrapping it) holds the callback closure past the explicitobserver.disconnect(), the callback keepsthis(and its whole service graph) reachable. Fix pattern: wrapthisinWeakRefinside the callback.diff-heap.mjs+find-retainers.mjsto the top of the analyzer list with "start here" language. Sort heap diffs by bytes, not count — a 220 MBUint32Arrayleak drowns any number of 40-byte Context churn.New committed tooling (had been sitting untracked):
docs/perf-investigations/scripts/diff-heap.mjs— per-class byte-delta between two heap snapshots.docs/perf-investigations/scripts/find-retainers.mjs— BFS from GC roots to every instance of a target class, grouped by path signature.Why
The three-day version of the story: I jumped straight to heap snapshots, found tens of thousands of retained
system/Contextobjects, and shipped six refactoring commits on #614 that reduced that count 89%. Task Manager didn't move. The actual load-bearing retention was a singleIntersectionObservercallback inxterm'sRenderServiceholding 220 MB ofUint32ArrayBufferLines. One heap diff sorted by bytes (not count) named the culprit in one line of output; oneWeakRefwrap fixed it.Codifying these so the next agent doesn't re-tread the path.
Test plan
just fmtcleanjust ai::apmregenerated.claude/skills/perf-diagnose/SKILL.mdto match APM sourceci/apm-syncvalidates.claude/matches sourcesci/fmt+ci/nixgreen🤖 Generated with Claude Code