Skip to content

Conversation

@eneufeld
Copy link
Contributor

What it does

Automatically summarizes chat sessions when token usage approaches the context limit (90% of 200k tokens), enabling continued conversations without losing context from earlier messages.

Core functionality:

  • Add ChatSessionSummarizationService to orchestrate summarization
  • Add insertSummary() method to MutableChatModel for inserting summary nodes
  • Add isStale flag to mark pre-summary messages (excluded from future prompts)
  • Add kind field to ChatRequest interface ('user' | 'summary')

Budget-aware tool loop:

  • Add singleRoundTrip flag to UserRequest for controlled tool execution
  • Extend ChatLanguageModelServiceImpl with budget checking before/during requests
  • Trigger mid-turn summarization when threshold exceeded during tool loops
  • Support both threshold-triggered and explicit summarization

Token usage tracking:

  • Add TokenUsageService for recording token usage across providers
  • Add TokenUsageServiceClient for frontend notification of usage updates
  • Display token count indicator in chat UI with session switching support

UI components:

  • Add collapsible summary node rendering with bookmark icon
  • Add SummaryPartRenderer for displaying summary content
  • Add token usage indicator showing current session token count

fixes #16703
fixes #16724

Current Limitations:

  • only supported by anthropic
  • hard coded budget of 200k tokens
  • hard coded trigger when reaching 90% of tokens

How to test

Enable in the settings budget awareness for anthropic.
Start a chat using an anthropic model. let it grow. see that hopefully a summary is automatically triggered when reaching 180k tokens.

Follow-ups

Extend the tool handling to all other llm wrappers.

Breaking changes

  • This PR introduces breaking changes and requires careful review. If yes, the breaking changes section in the changelog has been updated.

Attribution

Review checklist

Reminder for reviewers

@eneufeld eneufeld requested review from planger and sdirix December 11, 2025 13:56
@github-project-automation github-project-automation bot moved this to Waiting on reviewers in PR Backlog Dec 11, 2025
…ersations

Automatically summarizes chat sessions when token usage approaches the context
limit (90% of 200k tokens), enabling continued conversations without losing
context from earlier messages.

Core functionality:
- Add `ChatSessionSummarizationService` to orchestrate summarization
- Add `insertSummary()` method to `MutableChatModel` for inserting summary nodes
- Add `isStale` flag to mark pre-summary messages (excluded from future prompts)
- Add `kind` field to `ChatRequest` interface ('user' | 'summary')

Budget-aware tool loop:
- Add `singleRoundTrip` flag to `UserRequest` for controlled tool execution
- Extend `ChatLanguageModelServiceImpl` with budget checking before/during requests
- Trigger mid-turn summarization when threshold exceeded during tool loops
- Support both threshold-triggered and explicit summarization

Token usage tracking:
- Add `TokenUsageService` for recording token usage across providers
- Add `TokenUsageServiceClient` for frontend notification of usage updates
- Display token count indicator in chat UI with session switching support

UI components:
- Add collapsible summary node rendering with bookmark icon
- Add `SummaryPartRenderer` for displaying summary content
- Add token usage indicator showing current session token count

fixes #16703
fixes #16724

Current Limitations:
- only supported by anthropic
- hard coded budget of 200k tokens
- hard coded trigger when reaching 90% of tokens
@eneufeld eneufeld force-pushed the feat/budget-aware-chat branch from 9e10da0 to 1cf9d64 Compare January 7, 2026 16:40
@eneufeld eneufeld force-pushed the feat/budget-aware-chat branch from 1cf9d64 to 48e23e5 Compare January 7, 2026 23:06
@sdirix
Copy link
Member

sdirix commented Jan 8, 2026

I will review at the latest next week.

Copy link
Member

@sdirix sdirix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick test, sadly it does not work for me: The tokens are not correctly counted. They reset all the time so they never go above 500.

I tried with

@Coder Check all typescript files for spelling errors

and Opus 4.5

I had a rough look over the code and left some comments.


| Command (from root) | Purpose |
|---------------------|---------|
| `npm install` | Install dependencies (required first) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `npm install` | Install dependencies (required first) |
| `npm ci` | Install dependencies (required first) |

| `npm install` | Install dependencies (required first) |
| `npm run build:browser` | Build all packages + browser app |
| `npm run start:browser` | Start browser example at localhost:3000 |
| `npm run start:electron` | Start Electron desktop app |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `npm run start:electron` | Start Electron desktop app |

}

const SummaryContent: React.FC<SummaryContentProps> = ({ content, openerService }) => {
const contentRef = useMarkdownRendering(content, openerService);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely a follow up but it would be amazing if the summary was editable in case the user is not satisfied with the summary afterwards and for example wants to highlight a specific fact.

Comment on lines 488 to 491
// Skip empty branches (can occur during insertSummary operations)
if (branch.items.length === 0) {
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole empty branch situation seems a bit brittle? Can we switch to a more deterministic and stable invariants so that code like this is not necessary? It should be possible to guarantee a proper branch structure throughout

Comment on lines +78 to +80
if (budgetAwareEnabled && request.tools?.length) {
return this.sendRequestWithBudgetAwareness(languageModel, request);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new budget loop does not properly handle the history mechanism, leading to weird history view behavior, rendering a lot of requests without responses.

const budgetAwareEnabled = this.preferenceService.get<boolean>(BUDGET_AWARE_TOOL_LOOP_PREF, false);

if (budgetAwareEnabled && request.tools?.length) {
return this.sendRequestWithBudgetAwareness(languageModel, request);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same kind of strategy pattern would be good I think. Everyone will need to handle the "tool loop" but adopters might want to do different things than handling summary.

@github-project-automation github-project-automation bot moved this from Waiting on reviewers to Waiting on author in PR Backlog Jan 15, 2026
@eneufeld eneufeld requested a review from sdirix January 22, 2026 14:10
@planger planger removed their request for review January 22, 2026 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Waiting on author

Development

Successfully merging this pull request may close these issues.

Unable to edit and send messages in restored AI Chat session Auto session summary

3 participants