Stream-parse Codex session files to fix oversize-cap drops on heavy users#207
Merged
iamtoruk merged 1 commit intogetagentseal:mainfrom May 3, 2026
Merged
Conversation
Heavy Codex users hit MAX_SESSION_FILE_BYTES (128 MB) on long-running
sessions. The file is read in full via readSessionFile and then split on
'\n', so even bumping the cap eventually runs into V8's 512 MB string
limit (split doubles the high-water mark).
readSessionLines is a streaming generator that already exists in
fs-utils for exactly this case but only readFirstLine was using it.
Switch the Codex provider to consume it and let the cap apply only when
streaming would still be unreasonable.
Changes:
- src/fs-utils.ts: introduce MAX_STREAM_SESSION_FILE_BYTES (2 GB) and
apply it in readSessionLines instead of the full-read cap. Keep
MAX_SESSION_FILE_BYTES for readSessionFile / readSessionFileSync
consumers that materialize the whole file.
- src/providers/codex.ts: replace `readSessionFile -> split('\n')` with
`for await (... of readSessionLines)`. Add sawAnyLine guard so a
failed/empty stream skips cache write, preserving the previous
early-return behavior.
Empirical impact on a real account with one 247 MB rollout: 7-day totals
went from 4,536 calls / €358.69 / 20.1M input tokens to 6,111 calls /
€550.67 / 37.3M input tokens. The previously-skipped session is now
included; no other behavior changes.
Refs getagentseal#204
iamtoruk
approved these changes
May 3, 2026
Member
iamtoruk
left a comment
There was a problem hiding this comment.
Clean, well-scoped fix. Streaming path is behaviorally equivalent for all practical cases — the two minor deltas (empty-file and partial-error caching) both favor the new code. 2 GB cap is numerically safe, fingerprint race is pre-existing and unchanged, memory improvement is substantial for the target use case. LGTM.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
Follow-up to #204. After installing 0.9.6 from Homebrew, both fixes from that issue (16 KB read cap +
info: nullestimation) work as advertised on my account. However, while testing with--verboseI noticed that one of my recent rollout files was being silently dropped:The session is 247 MB in a single
.jsonlfile.MAX_SESSION_FILE_BYTES = 128 MBinsrc/fs-utils.tsrejects it before any parsing happens, so the entire session disappears from the dashboard with no on-screen indication unless the user explicitly passes--verbose.Heavy users on long-running Codex sessions (large file contents in context, multi-hour coding pushes) hit this in practice.
Why bumping the cap isn't enough
The Codex provider reads the file in full via
readSessionFileand then doescontent.split('\n').splitroughly doubles the high-water memory while the new array of line strings is being built, so even withreadViaStreamkeeping the read itself bounded, we'd push V8 toward its ~512 MB string limit on a single 250 MB session.readSessionLinesalready exists infs-utils.tsas the streaming counterpart and is used byreadFirstLine. Memory there is bounded to one line at a time, which is what we want for full-file parsing too.Changes
src/fs-utils.ts— introduceMAX_STREAM_SESSION_FILE_BYTES = 2 GBand apply it inreadSessionLinesinstead of the full-read cap. The smallerMAX_SESSION_FILE_BYTES(128 MB) stays in place for the two consumers that materialize the whole file (readSessionFile,readSessionFileSync), where the V8 string limit is still a real constraint.src/providers/codex.ts— replacewith
The
sawAnyLineguard means a failed/oversized/empty stream still skips the cache write, so a transient read failure can't pin an empty result set against a fingerprint that would otherwise be re-parsed on the next run.Empirical impact
On my account with one 247 MB rollout file in the 7-day window:
Sessions under 128 MB show identical numbers before and after, as expected.
Test plan
npm run buildcleannpm test— 31 files / 419 tests passnode dist/cli.js report --period week --provider codex --format jsonruns to completion on a directory containing the 247 MB rolloutCODEBURN_VERBOSE=1no longer printsskipped oversize filefor that pathCODEBURN_VERBOSE=1does still print the warning if a file ever exceeds 2 GB (synthetic test by temporarily loweringMAX_STREAM_SESSION_FILE_BYTESlocally)Notes
MAX_SESSION_FILE_BYTESand thereadSessionFile/readSessionFileSyncexports unchanged. Other providers may rely on them and the V8 string-limit reasoning still applies there. Only the streaming path is allowed to grow.