Add per-module memory regression baseline check to CI Host#4430
Add per-module memory regression baseline check to CI Host#4430
Conversation
- Add per-module memory probes via QUnit suiteStart/suiteEnd in setup-qunit.js that log heap delta for each top-level test module (MEMPROBE_FILE lines). Uses double-GC at module boundaries for accurate snapshots. - Add __shard_warmup__ synthetic module (shard-warmup.ts) that runs first on every shard to absorb shared boot cost (~36MB), giving real test modules clean per-file deltas independent of shard position. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Per-shard memory reports are extracted from MEMPROBE_FILE test output and uploaded as artifacts. The merge-reports job aggregates them and compares against a committed baseline (packages/host/memory-baseline.json). Tiered thresholds: - Warn: >10% increase or +5MB (whichever is greater) - Fail: >100% increase or +50MB (whichever is greater) On main merge (all tests green), the baseline auto-updates so it tracks the current state. On PRs, regressions are flagged in GITHUB_STEP_SUMMARY. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Preview deployments |
There was a problem hiding this comment.
Pull request overview
Adds a CI-level per-module memory regression check for Host tests by emitting per-module heap deltas during QUnit runs, aggregating shard reports, and comparing results against a committed baseline to warn/fail on regressions.
Changes:
- Add a synthetic
__shard_warmup__test module to absorb shard boot cost before real modules run. - Add QUnit
suiteStart/suiteEndhooks to log per-top-level-module heap usage deltas (MEMPROBE_FILE ...). - Add CI steps and Node scripts to extract per-shard reports, compare to
memory-baseline.json, and (onmain) auto-update the baseline.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/host/tests/test-helper.js | Side-effect import to register the warmup QUnit module before test partitions load. |
| packages/host/tests/helpers/shard-warmup.ts | New synthetic warmup module that primes common runtime/setup work. |
| packages/host/tests/helpers/setup-qunit.js | Adds per-top-level-module heap delta logging via QUnit suite hooks. |
| packages/host/scripts/extract-memory-report.mjs | Parses test output and emits a per-shard JSON memory report artifact. |
| packages/host/scripts/check-memory-baseline.mjs | Compares merged reports against a committed baseline and emits CI summary + exit code. |
| packages/host/scripts/update-memory-baseline.mjs | Regenerates memory-baseline.json from shard reports (for main-branch updates). |
| packages/host/memory-baseline.json | Introduces initial committed baseline for per-module memory deltas. |
| .github/workflows/ci-host.yaml | Uploads memory artifacts per shard, merges them, checks baseline, and updates baseline on main. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| contents: write | ||
| id-token: write | ||
| pull-requests: write |
There was a problem hiding this comment.
Workflow-level permissions grants contents: write for all events (including pull_request). That unnecessarily broad scope increases the blast radius of any workflow change in a PR. Consider keeping workflow default at contents: read and granting contents: write only at the job (or step) that performs the baseline commit/push, gated to push on main.
| contents: write | |
| id-token: write | |
| pull-requests: write | |
| contents: read | |
| id-token: write | |
| pull-requests: read |
| const JSON_ENVELOPE_RE = /\{"type":"log","text":"(.*?)"\}\s*$/; | ||
|
|
||
| const log = readFileSync(inputPath, 'utf8'); | ||
| const report = {}; | ||
|
|
||
| for (const rawLine of log.split('\n')) { | ||
| if (!rawLine.includes('MEMPROBE_FILE')) continue; | ||
|
|
||
| let line = rawLine; | ||
|
|
||
| // Unwrap testem JSON envelope if present | ||
| const envMatch = line.match(JSON_ENVELOPE_RE); | ||
| if (envMatch) { | ||
| try { | ||
| line = JSON.parse(`"${envMatch[1]}"`); | ||
| } catch { | ||
| // fall through to raw parse | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
The testem JSON envelope unwrapping is fragile: JSON_ENVELOPE_RE will stop at the first " inside the text field (which is very likely here because MEMPROBE_FILE logs include a quoted module= value), causing the unwrap to fail and the probe line to be skipped. A more robust approach is to JSON.parse(rawLine) when it looks like a JSON object and read .text from { type: "log" }, falling back to the raw line otherwise.
| const JSON_ENVELOPE_RE = /\{"type":"log","text":"(.*?)"\}\s*$/; | |
| const log = readFileSync(inputPath, 'utf8'); | |
| const report = {}; | |
| for (const rawLine of log.split('\n')) { | |
| if (!rawLine.includes('MEMPROBE_FILE')) continue; | |
| let line = rawLine; | |
| // Unwrap testem JSON envelope if present | |
| const envMatch = line.match(JSON_ENVELOPE_RE); | |
| if (envMatch) { | |
| try { | |
| line = JSON.parse(`"${envMatch[1]}"`); | |
| } catch { | |
| // fall through to raw parse | |
| } | |
| } | |
| function unwrapTestemLogLine(rawLine) { | |
| const trimmed = rawLine.trim(); | |
| if (!trimmed.startsWith('{') || !trimmed.endsWith('}')) { | |
| return rawLine; | |
| } | |
| try { | |
| const parsed = JSON.parse(trimmed); | |
| if ( | |
| parsed && | |
| parsed.type === 'log' && | |
| typeof parsed.text === 'string' | |
| ) { | |
| return parsed.text; | |
| } | |
| } catch { | |
| // fall through to raw parse | |
| } | |
| return rawLine; | |
| } | |
| const log = readFileSync(inputPath, 'utf8'); | |
| const report = {}; | |
| for (const rawLine of log.split('\n')) { | |
| if (!rawLine.includes('MEMPROBE_FILE')) continue; | |
| const line = unwrapTestemLogLine(rawLine); |
| if (failures.length > 0) { | ||
| lines.push(`### Failures (>${HARD_RELATIVE * 100}% increase or +${HARD_ABSOLUTE_MB}MB)\n`); | ||
| lines.push('| Module | Baseline | Current | Change |'); | ||
| lines.push('|--------|----------|---------|--------|'); | ||
| for (const f of failures.sort((a, b) => b.diff - a.diff)) { | ||
| lines.push( | ||
| `| ${f.mod} | ${f.baseline.toFixed(1)} MB | ${f.current.toFixed(1)} MB | +${f.diff.toFixed(1)} MB (+${f.pct}%) |`, | ||
| ); | ||
| } | ||
| lines.push(''); | ||
| } | ||
|
|
||
| if (warnings.length > 0) { | ||
| lines.push(`### Warnings (>${SOFT_RELATIVE * 100}% + ${SOFT_ABSOLUTE_MB}MB increase)\n`); |
There was a problem hiding this comment.
The headings in the step summary output don’t match the implemented thresholds. The code uses Math.max(absolute, relative) (i.e. “whichever is greater”), but the strings currently read like “or” / “10% + 5MB”, which can mislead readers about when a module will actually warn/fail. Update the summary text to reflect the “whichever is greater” behavior (or adjust the threshold logic to match the wording).
I see the baseline file in this PR but it doesn’t seem like the comparison or flagging are happening, is this still in progress or for a followup? |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
MEMPROBE_FILE) to host tests via QUnitsuiteStart/suiteEndhooks, logging heap usage and delta for each top-level test module__shard_warmup__synthetic module that runs first on every shard, absorbing ~36MB of shared boot cost so real test modules report clean per-file deltasmemory-baseline.json), and flag regressions with tiered thresholds:How it works
MEMPROBE_FILElines from test output into a JSON artifactcheck-memory-baseline.mjsGITHUB_STEP_SUMMARY; hard failures block the PRupdate-memory-baseline.mjsregenerates and commits the baselineNoise validation
Ran 3 identical CI runs on the same SHA to measure reproducibility:
New files
packages/host/scripts/extract-memory-report.mjs— per-shard log parserpackages/host/scripts/check-memory-baseline.mjs— baseline comparisonpackages/host/scripts/update-memory-baseline.mjs— baseline regenerationpackages/host/memory-baseline.json— initial baseline (170 modules, median of 3 runs)packages/host/tests/helpers/shard-warmup.ts— synthetic warmup moduleTest plan
workflow_dispatchrun 24538269499🤖 Generated with Claude Code