-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Summary
When ProcessMemoryMonitor detects that a process has exceeded a memory threshold, it writes a warn-level log entry and optionally a heap snapshot. Nothing else happens. For an app that can run for days with dozens of background terminal processes, this means the operator is notified (in a log file most users never read) but the memory pressure is never relieved until the app crashes or the user manually intervenes.
Problem Statement
The monitor's response to detected pressure is limited to logging:
The existing HibernationService can already kill terminals in inactive projects to free resources — but it only triggers on a time-based schedule (inactive >24 hours by default):
These two services exist independently with no connection between them. The memory monitor cannot trigger hibernation, and hibernation cannot respond to memory conditions. The result is that a process approaching OOM receives no programmatic relief — the app simply continues until it crashes.
There is also no mechanism to clear Chromium caches (network cache, shader cache) under pressure, despite the fact that these caches can consume hundreds of megabytes and are safe to clear automatically (they rebuild transparently).
Desired Behavior
When the memory monitor (or its threshold/trend detection) determines that overall process memory is in a warning state, a graduated mitigation response is initiated automatically:
Tier 1 (soft, invisible to users): Safe, non-destructive actions taken immediately — clearing Chromium-managed caches (not cookies, localStorage, or auth-bearing storage) and requesting V8 garbage collection if available. No user notification required.
Tier 2 (moderate, potentially observable): If pressure continues or worsens, idle background terminals (those in non-active projects that have not been interacted with recently and are not currently running an agent) are gracefully hibernated via the existing HibernationService kill path. The hibernation threshold for memory-triggered hibernation should be much lower than the time-based 24-hour default — terminals idle for even 30–60 minutes in a background project are candidates.
The system should never hibernate terminals where the agent state indicates work is in progress. The active project is never touched.
Context
The PTY host already has a reactive governor that pauses terminal I/O when its own heap exceeds 85%:
That governor is scoped to the PTY host process only and addresses throughput backpressure, not memory reclamation. A complementary mechanism is needed in the main process that actually frees memory rather than just pausing I/O.
The gracefulKillByProject() path that hibernation already uses is designed to give agents time to print session IDs before terminating:
This same path is appropriate for memory-triggered hibernation.
Acceptance Criteria
- When any monitored process exceeds its warn threshold, safe Chromium cache clearing occurs automatically without user visibility
- When memory pressure is sustained (multiple consecutive polling intervals above threshold), terminals in non-active projects that are idle (no recent interaction, agent not working) are hibernated via the existing graceful kill path
- Terminals with an active agent state are never automatically hibernated
- The currently active project is never touched by automated mitigation
- Mitigation actions are logged at
infolevel so they appear in diagnostics reports - The memory-triggered hibernation path can be disabled via the same user setting that controls time-based hibernation
Edge Cases & Risks
session.clearStorageData()must only target safe storage types (appcache,shadercache,cachestorage) — notcookies,localstorage, orindexdb, which would destroy user auth state- Memory-triggered hibernation should have a cooldown to prevent thrashing: if hibernation was already triggered in the last N minutes, do not trigger again until pressure subsides and returns
- At app startup, memory naturally spikes during V8 JIT; mitigation must not fire until the monitor's warmup period has elapsed (see issue ProcessMemoryMonitor uses workingSetSize instead of privateBytes and lacks trend detection #3285)
- The
ResourceGovernorin the PTY host already pauses terminals under heap pressure — mitigation here should not duplicate that by also killing those same terminals
Dependencies
Depends on #3285 for reliable pressure detection before mitigation is triggered.