Skip to content

Memory pressure has no automated mitigation — monitor warns but takes no action #3286

@gregpriday

Description

@gregpriday

Summary

When ProcessMemoryMonitor detects that a process has exceeded a memory threshold, it writes a warn-level log entry and optionally a heap snapshot. Nothing else happens. For an app that can run for days with dozens of background terminal processes, this means the operator is notified (in a log file most users never read) but the memory pressure is never relieved until the app crashes or the user manually intervenes.

Problem Statement

The monitor's response to detected pressure is limited to logging:

The existing HibernationService can already kill terminals in inactive projects to free resources — but it only triggers on a time-based schedule (inactive >24 hours by default):

These two services exist independently with no connection between them. The memory monitor cannot trigger hibernation, and hibernation cannot respond to memory conditions. The result is that a process approaching OOM receives no programmatic relief — the app simply continues until it crashes.

There is also no mechanism to clear Chromium caches (network cache, shader cache) under pressure, despite the fact that these caches can consume hundreds of megabytes and are safe to clear automatically (they rebuild transparently).

Desired Behavior

When the memory monitor (or its threshold/trend detection) determines that overall process memory is in a warning state, a graduated mitigation response is initiated automatically:

Tier 1 (soft, invisible to users): Safe, non-destructive actions taken immediately — clearing Chromium-managed caches (not cookies, localStorage, or auth-bearing storage) and requesting V8 garbage collection if available. No user notification required.

Tier 2 (moderate, potentially observable): If pressure continues or worsens, idle background terminals (those in non-active projects that have not been interacted with recently and are not currently running an agent) are gracefully hibernated via the existing HibernationService kill path. The hibernation threshold for memory-triggered hibernation should be much lower than the time-based 24-hour default — terminals idle for even 30–60 minutes in a background project are candidates.

The system should never hibernate terminals where the agent state indicates work is in progress. The active project is never touched.

Context

The PTY host already has a reactive governor that pauses terminal I/O when its own heap exceeds 85%:

That governor is scoped to the PTY host process only and addresses throughput backpressure, not memory reclamation. A complementary mechanism is needed in the main process that actually frees memory rather than just pausing I/O.

The gracefulKillByProject() path that hibernation already uses is designed to give agents time to print session IDs before terminating:

This same path is appropriate for memory-triggered hibernation.

Acceptance Criteria

  • When any monitored process exceeds its warn threshold, safe Chromium cache clearing occurs automatically without user visibility
  • When memory pressure is sustained (multiple consecutive polling intervals above threshold), terminals in non-active projects that are idle (no recent interaction, agent not working) are hibernated via the existing graceful kill path
  • Terminals with an active agent state are never automatically hibernated
  • The currently active project is never touched by automated mitigation
  • Mitigation actions are logged at info level so they appear in diagnostics reports
  • The memory-triggered hibernation path can be disabled via the same user setting that controls time-based hibernation

Edge Cases & Risks

  • session.clearStorageData() must only target safe storage types (appcache, shadercache, cachestorage) — not cookies, localstorage, or indexdb, which would destroy user auth state
  • Memory-triggered hibernation should have a cooldown to prevent thrashing: if hibernation was already triggered in the last N minutes, do not trigger again until pressure subsides and returns
  • At app startup, memory naturally spikes during V8 JIT; mitigation must not fire until the monitor's warmup period has elapsed (see issue ProcessMemoryMonitor uses workingSetSize instead of privateBytes and lacks trend detection #3285)
  • The ResourceGovernor in the PTY host already pauses terminals under heap pressure — mitigation here should not duplicate that by also killing those same terminals

Dependencies

Depends on #3285 for reliable pressure detection before mitigation is triggered.

Metadata

Metadata

Assignees

Labels

backendMain process / backendenhancementNew feature or requestperformancePerformance optimization

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions