-
Notifications
You must be signed in to change notification settings - Fork 2
feat: effort-aware auto-compact and budget caps #702
Copy link
Copy link
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Context
Claude Code has sophisticated context window management that we're not tuning:
- Auto-compact: triggers at
effective_window - 13K buffer, summarizes old messages via side-query - Budget caps:
maxBudgetUsdstops execution when cost exceeds threshold - Context window override:
CLAUDE_CODE_AUTO_COMPACT_WINDOWcontrols when compaction fires
Our agents run with defaults regardless of effort level. Quick tasks waste tokens on 200K context windows; deep tasks could use 1M.
From Claude Code Source
// autoCompact.ts
const AUTOCOMPACT_BUFFER_TOKENS = 13_000;
const MAX_OUTPUT_TOKENS_FOR_SUMMARY = 20_000;
function getEffectiveContextWindowSize(model) {
// Respects CLAUDE_CODE_AUTO_COMPACT_WINDOW env var
// Falls back to model's native context window
}Circuit breaker: after 3 consecutive autocompact failures, stops trying.
Proposed Changes
In buildAgentEnv(), inject tuning env vars based on effort level:
const COMPACT_WINDOW = { low: '80000', medium: '180000', high: '800000' };
const MAX_BUDGET = { low: '0.50', medium: '2.00', high: '10.00' };
// In buildAgentEnv():
if (effort) {
env.CLAUDE_CODE_AUTO_COMPACT_WINDOW = COMPACT_WINDOW[effort] ?? '180000';
// Budget cap via Claude Code's internal tracking
}Implementation
- Add effort→env-var mapping constants
- Inject in
buildAgentEnv()(already has effort param) - Document the relationship between effort and context management
Files to change
src/lib/execution-engine.ts—buildAgentEnv()(~L316-350)
Expected impact
- Low effort: Agents compact at 80K instead of 200K → shorter conversations, faster completion
- High effort: Agents use up to 800K before compacting → deeper reasoning without losing context
- Budget caps: Prevent runaway costs from agents stuck in loops
Risks
- Too-aggressive compaction (low effort) might lose important context. Mitigation: 80K is still substantial (~20K tokens of usable conversation).
- Budget caps too low could cut off agents mid-task. Mitigation: start conservative, adjust based on observability data.
Open questions
- Does Claude Code expose a budget cap env var? Need to verify
CLAUDE_CODE_MAX_BUDGETor equivalent exists. If not, we'd need to implement timeout-based cost control.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request