Skip to content

feat: effort-aware auto-compact and budget caps #702

@kokevidaurre

Description

@kokevidaurre

Context

Claude Code has sophisticated context window management that we're not tuning:

  • Auto-compact: triggers at effective_window - 13K buffer, summarizes old messages via side-query
  • Budget caps: maxBudgetUsd stops execution when cost exceeds threshold
  • Context window override: CLAUDE_CODE_AUTO_COMPACT_WINDOW controls when compaction fires

Our agents run with defaults regardless of effort level. Quick tasks waste tokens on 200K context windows; deep tasks could use 1M.

From Claude Code Source

// autoCompact.ts
const AUTOCOMPACT_BUFFER_TOKENS = 13_000;
const MAX_OUTPUT_TOKENS_FOR_SUMMARY = 20_000;

function getEffectiveContextWindowSize(model) {
  // Respects CLAUDE_CODE_AUTO_COMPACT_WINDOW env var
  // Falls back to model's native context window
}

Circuit breaker: after 3 consecutive autocompact failures, stops trying.

Proposed Changes

In buildAgentEnv(), inject tuning env vars based on effort level:

const COMPACT_WINDOW = { low: '80000', medium: '180000', high: '800000' };
const MAX_BUDGET = { low: '0.50', medium: '2.00', high: '10.00' };

// In buildAgentEnv():
if (effort) {
  env.CLAUDE_CODE_AUTO_COMPACT_WINDOW = COMPACT_WINDOW[effort] ?? '180000';
  // Budget cap via Claude Code's internal tracking
}

Implementation

  1. Add effort→env-var mapping constants
  2. Inject in buildAgentEnv() (already has effort param)
  3. Document the relationship between effort and context management

Files to change

  • src/lib/execution-engine.tsbuildAgentEnv() (~L316-350)

Expected impact

  • Low effort: Agents compact at 80K instead of 200K → shorter conversations, faster completion
  • High effort: Agents use up to 800K before compacting → deeper reasoning without losing context
  • Budget caps: Prevent runaway costs from agents stuck in loops

Risks

  • Too-aggressive compaction (low effort) might lose important context. Mitigation: 80K is still substantial (~20K tokens of usable conversation).
  • Budget caps too low could cut off agents mid-task. Mitigation: start conservative, adjust based on observability data.

Open questions

  • Does Claude Code expose a budget cap env var? Need to verify CLAUDE_CODE_MAX_BUDGET or equivalent exists. If not, we'd need to implement timeout-based cost control.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions