Skip to content

Feature: Context & History Management with Auto-Summarization #68

@Sahil5963

Description

@Sahil5963

Summary

Implement configurable context and conversation history management with optional auto-summarization to handle long-running chat sessions efficiently.

Problem

Long-running conversations can:

  • Exceed context window limits
  • Accumulate irrelevant old messages
  • Increase token costs unnecessarily
  • Degrade LLM response quality with too much context

Proposed Solution

Configuration Options

interface ContextManagementConfig {
  enabled?: boolean;                    // Default: false (opt-in)
  
  history?: {
    maxMessages?: number;               // Max messages to keep (e.g., 50)
    maxTokens?: number;                 // Max tokens for history
    maxContextShare?: number;           // Default: 0.5 (50% of context for history)
    pruneStrategy?: 'oldest' | 'least-relevant' | 'summarize';
  };
  
  summarization?: {
    enabled?: boolean;                  // Default: false
    triggerAt?: number;                 // Summarize when messages exceed this
    chunkSize?: number;                 // Messages per summary chunk
    preserveRecent?: number;            // Always keep last N messages intact
    fallbackBehavior?: 'truncate' | 'statistical' | 'error';
  };
  
  tokenEstimation?: {
    safetyMargin?: number;              // Default: 1.2 (20% buffer)
    charsPerToken?: number;             // Default: 4
  };
}

// Usage - Opt-in to context management
const chat = createChatWithTools({
  contextManagement: {
    enabled: true,
    history: {
      maxContextShare: 0.5,
      pruneStrategy: 'oldest',
    },
    summarization: {
      enabled: true,
      triggerAt: 30,
      preserveRecent: 5,
    }
  }
});

// Or keep current behavior (no management)
const chat = createChatWithTools({
  contextManagement: { enabled: false }
});

Summarization Strategies

Strategy Description When to Use
oldest Drop oldest messages first Simple, predictable
least-relevant Score by relevance to recent context Better quality
summarize Compress old messages to summary Best context retention

Multi-Stage Fallback

summarization: {
  fallbackBehavior: 'statistical',  
  // If full summarization fails:
  // 1. Try summarizing without oversized messages
  // 2. Fall back to statistical summary ("20 messages, 3 tool calls...")
  // 3. Or just truncate based on fallbackBehavior
}

Use Cases

  • Customer support bots with long sessions
  • Coding assistants with ongoing context
  • Research assistants accumulating findings
  • Any multi-turn conversation exceeding context limits

Benefits

  • Fully optional - disabled by default
  • Configurable - tune for your use case
  • ✅ Handles long conversations gracefully
  • ✅ Preserves important context while reducing tokens
  • ✅ Graceful fallback when summarization fails
  • ✅ Configurable safety margins for token estimation

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions