Skip to content

Feature: Real-time Context Budget Guards #71

@Sahil5963

Description

@Sahil5963

Summary

Implement configurable real-time context budget enforcement to prevent context overflow during chat sessions.

Problem

Without budget enforcement:

  • Large tool results can overflow context mid-conversation
  • No visibility into current context usage
  • Failures happen unexpectedly when limits are hit
  • Difficult to debug context-related issues

Proposed Solution

Configuration Options

interface ContextBudgetConfig {
  enabled?: boolean;                    // Default: false
  
  budget?: {
    contextWindowTokens?: number;       // Your model's context window
    inputHeadroomRatio?: number;        // Default: 0.75 (reserve 25% for output)
    
    // Per-category limits (% of available budget)
    systemPromptShare?: number;         // Default: 0.15 (15%)
    historyShare?: number;              // Default: 0.50 (50%)
    toolResultsShare?: number;          // Default: 0.30 (30%)
    skillsShare?: number;               // Default: 0.05 (5%)
  };
  
  enforcement?: {
    mode?: 'warn' | 'truncate' | 'error';  // Default: 'truncate'
    onBudgetExceeded?: (info: BudgetInfo) => void;
  };
  
  monitoring?: {
    enabled?: boolean;                  // Track usage over time
    onUsageUpdate?: (usage: ContextUsage) => void;
  };
}

// Usage
const chat = createChatWithTools({
  contextBudget: {
    enabled: true,
    budget: {
      contextWindowTokens: 128_000,     // e.g., GPT-4 Turbo
      inputHeadroomRatio: 0.75,
      historyShare: 0.5,
      toolResultsShare: 0.3,
    },
    enforcement: {
      mode: 'truncate',
      onBudgetExceeded: (info) => {
        console.warn('Context budget exceeded:', info);
      }
    },
    monitoring: {
      enabled: true,
      onUsageUpdate: (usage) => {
        // Update UI with current usage
        updateUsageIndicator(usage);
      }
    }
  }
});

Context Usage Tracking

interface ContextUsage {
  total: { tokens: number; percent: number };
  breakdown: {
    systemPrompt: { tokens: number; percent: number };
    history: { tokens: number; percent: number };
    toolResults: { tokens: number; percent: number };
    skills: { tokens: number; percent: number };
  };
  budget: {
    available: number;
    remaining: number;
  };
}

// Get current usage
const usage = chat.getContextUsage();
console.log(`Using ${usage.total.percent}% of context budget`);

Enforcement Modes

Mode Behavior Use Case
warn Log warning, continue anyway Development/debugging
truncate Auto-truncate to fit budget Production (recommended)
error Throw error when exceeded Strict environments

Budget Guard Middleware

// Automatically installed when enabled
// Runs before every LLM call to ensure budget compliance

contextBudget: {
  enforcement: {
    mode: 'truncate',
    // Order of truncation when over budget:
    // 1. Oldest tool results
    // 2. Oldest history messages  
    // 3. Trim current tool results
  }
}

Use Cases

  • Production deployments needing reliability
  • Cost-conscious applications
  • Real-time usage monitoring in UI
  • Debugging context-related issues

Benefits

  • Fully optional - disabled by default
  • Configurable budgets - tune per model/use case
  • ✅ Prevents unexpected context overflow
  • ✅ Real-time usage visibility
  • ✅ Multiple enforcement modes
  • ✅ Callback hooks for custom handling
  • ✅ Per-category budget allocation

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions