-
Notifications
You must be signed in to change notification settings - Fork 2.8k
feat(ai-chat): add automatic chat session summarization for long conversations #16742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…ersations
Automatically summarizes chat sessions when token usage approaches the context
limit (90% of 200k tokens), enabling continued conversations without losing
context from earlier messages.
Core functionality:
- Add `ChatSessionSummarizationService` to orchestrate summarization
- Add `insertSummary()` method to `MutableChatModel` for inserting summary nodes
- Add `isStale` flag to mark pre-summary messages (excluded from future prompts)
- Add `kind` field to `ChatRequest` interface ('user' | 'summary')
Budget-aware tool loop:
- Add `singleRoundTrip` flag to `UserRequest` for controlled tool execution
- Extend `ChatLanguageModelServiceImpl` with budget checking before/during requests
- Trigger mid-turn summarization when threshold exceeded during tool loops
- Support both threshold-triggered and explicit summarization
Token usage tracking:
- Add `TokenUsageService` for recording token usage across providers
- Add `TokenUsageServiceClient` for frontend notification of usage updates
- Display token count indicator in chat UI with session switching support
UI components:
- Add collapsible summary node rendering with bookmark icon
- Add `SummaryPartRenderer` for displaying summary content
- Add token usage indicator showing current session token count
fixes #16703
fixes #16724
Current Limitations:
- only supported by anthropic
- hard coded budget of 200k tokens
- hard coded trigger when reaching 90% of tokens
9e10da0 to
1cf9d64
Compare
1cf9d64 to
48e23e5
Compare
|
I will review at the latest next week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick test, sadly it does not work for me: The tokens are not correctly counted. They reset all the time so they never go above 500.
I tried with
@Coder Check all typescript files for spelling errors
and Opus 4.5
I had a rough look over the code and left some comments.
.prompts/project-info.prompttemplate
Outdated
|
|
||
| | Command (from root) | Purpose | | ||
| |---------------------|---------| | ||
| | `npm install` | Install dependencies (required first) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| | `npm install` | Install dependencies (required first) | | |
| | `npm ci` | Install dependencies (required first) | |
| | `npm install` | Install dependencies (required first) | | ||
| | `npm run build:browser` | Build all packages + browser app | | ||
| | `npm run start:browser` | Start browser example at localhost:3000 | | ||
| | `npm run start:electron` | Start Electron desktop app | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| | `npm run start:electron` | Start Electron desktop app | |
| } | ||
|
|
||
| const SummaryContent: React.FC<SummaryContentProps> = ({ content, openerService }) => { | ||
| const contentRef = useMarkdownRendering(content, openerService); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely a follow up but it would be amazing if the summary was editable in case the user is not satisfied with the summary afterwards and for example wants to highlight a specific fact.
| // Skip empty branches (can occur during insertSummary operations) | ||
| if (branch.items.length === 0) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole empty branch situation seems a bit brittle? Can we switch to a more deterministic and stable invariants so that code like this is not necessary? It should be possible to guarantee a proper branch structure throughout
| if (budgetAwareEnabled && request.tools?.length) { | ||
| return this.sendRequestWithBudgetAwareness(languageModel, request); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new budget loop does not properly handle the history mechanism, leading to weird history view behavior, rendering a lot of requests without responses.
| const budgetAwareEnabled = this.preferenceService.get<boolean>(BUDGET_AWARE_TOOL_LOOP_PREF, false); | ||
|
|
||
| if (budgetAwareEnabled && request.tools?.length) { | ||
| return this.sendRequestWithBudgetAwareness(languageModel, request); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same kind of strategy pattern would be good I think. Everyone will need to handle the "tool loop" but adopters might want to do different things than handling summary.
What it does
Automatically summarizes chat sessions when token usage approaches the context limit (90% of 200k tokens), enabling continued conversations without losing context from earlier messages.
Core functionality:
ChatSessionSummarizationServiceto orchestrate summarizationinsertSummary()method toMutableChatModelfor inserting summary nodesisStaleflag to mark pre-summary messages (excluded from future prompts)kindfield toChatRequestinterface ('user' | 'summary')Budget-aware tool loop:
singleRoundTripflag toUserRequestfor controlled tool executionChatLanguageModelServiceImplwith budget checking before/during requestsToken usage tracking:
TokenUsageServicefor recording token usage across providersTokenUsageServiceClientfor frontend notification of usage updatesUI components:
SummaryPartRendererfor displaying summary contentfixes #16703
fixes #16724
Current Limitations:
How to test
Enable in the settings budget awareness for anthropic.
Start a chat using an anthropic model. let it grow. see that hopefully a summary is automatically triggered when reaching 180k tokens.
Follow-ups
Extend the tool handling to all other llm wrappers.
Breaking changes
Attribution
Review checklist
nlsservice (for details, please see the Internationalization/Localization section in the Coding Guidelines)Reminder for reviewers