Summary
Implement configurable real-time context budget enforcement to prevent context overflow during chat sessions.
Problem
Without budget enforcement:
- Large tool results can overflow context mid-conversation
- No visibility into current context usage
- Failures happen unexpectedly when limits are hit
- Difficult to debug context-related issues
Proposed Solution
Configuration Options
interface ContextBudgetConfig {
enabled?: boolean; // Default: false
budget?: {
contextWindowTokens?: number; // Your model's context window
inputHeadroomRatio?: number; // Default: 0.75 (reserve 25% for output)
// Per-category limits (% of available budget)
systemPromptShare?: number; // Default: 0.15 (15%)
historyShare?: number; // Default: 0.50 (50%)
toolResultsShare?: number; // Default: 0.30 (30%)
skillsShare?: number; // Default: 0.05 (5%)
};
enforcement?: {
mode?: 'warn' | 'truncate' | 'error'; // Default: 'truncate'
onBudgetExceeded?: (info: BudgetInfo) => void;
};
monitoring?: {
enabled?: boolean; // Track usage over time
onUsageUpdate?: (usage: ContextUsage) => void;
};
}
// Usage
const chat = createChatWithTools({
contextBudget: {
enabled: true,
budget: {
contextWindowTokens: 128_000, // e.g., GPT-4 Turbo
inputHeadroomRatio: 0.75,
historyShare: 0.5,
toolResultsShare: 0.3,
},
enforcement: {
mode: 'truncate',
onBudgetExceeded: (info) => {
console.warn('Context budget exceeded:', info);
}
},
monitoring: {
enabled: true,
onUsageUpdate: (usage) => {
// Update UI with current usage
updateUsageIndicator(usage);
}
}
}
});
Context Usage Tracking
interface ContextUsage {
total: { tokens: number; percent: number };
breakdown: {
systemPrompt: { tokens: number; percent: number };
history: { tokens: number; percent: number };
toolResults: { tokens: number; percent: number };
skills: { tokens: number; percent: number };
};
budget: {
available: number;
remaining: number;
};
}
// Get current usage
const usage = chat.getContextUsage();
console.log(`Using ${usage.total.percent}% of context budget`);
Enforcement Modes
| Mode |
Behavior |
Use Case |
warn |
Log warning, continue anyway |
Development/debugging |
truncate |
Auto-truncate to fit budget |
Production (recommended) |
error |
Throw error when exceeded |
Strict environments |
Budget Guard Middleware
// Automatically installed when enabled
// Runs before every LLM call to ensure budget compliance
contextBudget: {
enforcement: {
mode: 'truncate',
// Order of truncation when over budget:
// 1. Oldest tool results
// 2. Oldest history messages
// 3. Trim current tool results
}
}
Use Cases
- Production deployments needing reliability
- Cost-conscious applications
- Real-time usage monitoring in UI
- Debugging context-related issues
Benefits
- ✅ Fully optional - disabled by default
- ✅ Configurable budgets - tune per model/use case
- ✅ Prevents unexpected context overflow
- ✅ Real-time usage visibility
- ✅ Multiple enforcement modes
- ✅ Callback hooks for custom handling
- ✅ Per-category budget allocation
Summary
Implement configurable real-time context budget enforcement to prevent context overflow during chat sessions.
Problem
Without budget enforcement:
Proposed Solution
Configuration Options
Context Usage Tracking
Enforcement Modes
warntruncateerrorBudget Guard Middleware
Use Cases
Benefits