From 165d60fdb937ee4f922cb1efe622907435c99140 Mon Sep 17 00:00:00 2001 From: Ahmed Hesham Abdelkader <23265119+ahmedhesham6@users.noreply.github.com> Date: Sun, 7 Dec 2025 01:51:17 +0100 Subject: [PATCH 1/2] docs(rfd): Add session usage and context status RFD Proposes standardized tracking of token usage, cost estimation, and context window status across ACP implementations. - Token usage reported in PromptResponse (per-turn data) - Context window and cost reported in session/status (session state) --- docs/rfds/session-usage-context-status.mdx | 288 +++++++++++++++++++++ 1 file changed, 288 insertions(+) create mode 100644 docs/rfds/session-usage-context-status.mdx diff --git a/docs/rfds/session-usage-context-status.mdx b/docs/rfds/session-usage-context-status.mdx new file mode 100644 index 0000000..f3bb825 --- /dev/null +++ b/docs/rfds/session-usage-context-status.mdx @@ -0,0 +1,288 @@ +--- +title: "Session Usage and Context Status" +--- + +- Author(s): [@ahmedhesham6](https://github.com/ahmedhesham6) + +## Elevator pitch + +> What are you proposing to change? + +Add standardized usage and context window tracking to the Agent Client Protocol, enabling agents to report token consumption, cost estimates, and context window utilization in a consistent way across implementations. + +## Status quo + +> How do things work today and what problems does this cause? Why would we change things? + +Currently, the ACP protocol has no standardized way for agents to communicate: + +1. **Token usage** - How many tokens were consumed in a turn or cumulatively +2. **Context window status** - How much of the model's context window is being used +3. **Cost information** - Estimated costs for API usage +4. **Prompt caching metrics** - Cache hits/misses for models that support caching + +This creates several problems: + +- **No visibility into resource consumption** - Clients can't show users how much of their context budget is being used +- **No cost transparency** - Users can't track spending or estimate costs before operations +- **No context management** - Clients can't warn users when approaching context limits or suggest compaction +- **Inconsistent implementations** - Each agent implements usage tracking differently (if at all) + +Industry research shows common patterns across AI coding tools: + +- LLM providers return cumulative token counts in API responses +- IDE extensions display context percentage prominently (e.g., radial progress showing "19%") +- Clients show absolute numbers on hover/detail (e.g., "31.4K of 200K tokens") +- Tools warn users at threshold percentages (75%, 90%, 95%) +- Auto-compaction features trigger when approaching context limits +- Cost tracking focuses on cumulative session totals rather than per-turn breakdowns + +## What we propose to do about it + +> What are you proposing to improve the situation? + +We propose separating usage tracking into two distinct concerns: + +1. **Token usage** - Reported in `PromptResponse` after each turn (per-turn data) +2. **Context window and cost** - Reported in `session/status` for on-demand queries (session state) + +This separation reflects how users consume this information: +- Token counts are tied to specific turns and useful immediately after a prompt +- Context window and cost are cumulative session state that users may want to check at any time + +### Token Usage in `PromptResponse` + +Add a `usage` field to `PromptResponse` for token consumption tracking: + +```json +{ + "jsonrpc": "2.0", + "id": 1, + "result": { + "sessionId": "sess_abc123", + "stopReason": "end_turn", + "usage": { + "total_tokens": 53000, + "input_tokens": 35000, + "output_tokens": 12000, + "reasoning_tokens": 5000, + "cached_read_tokens": 5000, + "cached_write_tokens": 1000 + } + } +} +``` + +#### Usage Fields + +- `total_tokens` (number, required) - Sum of all token types across session +- `input_tokens` (number, required) - Total input tokens across all turns +- `output_tokens` (number, required) - Total output tokens across all turns +- `reasoning_tokens` (number, optional) - Total reasoning tokens (for o1/o3 models) +- `cached_read_tokens` (number, optional) - Total cache read tokens +- `cached_write_tokens` (number, optional) - Total cache write tokens + +### Context Window and Cost in `session/status` + +Add `context_window` and `cost` fields to `session/status` response: + +```json +{ + "jsonrpc": "2.0", + "id": 2, + "method": "session/status", + "params": { + "sessionId": "sess_abc123" + } +} +``` + +```json +{ + "jsonrpc": "2.0", + "id": 2, + "result": { + "sessionId": "sess_abc123", + "status": "idle", + "context_window": { + "size": 200000, + "used": 53000, + "percentage": 26.5, + "remaining": 147000 + }, + "cost": { + "amount": 0.045, + "currency": "USD" + } + } +} +``` + +#### Context Window Fields (optional) + +- `size` (number, required) - Total context window size in tokens +- `used` (number, required) - Tokens currently in context +- `percentage` (number, required) - Percentage used (0-100) +- `remaining` (number, required) - Tokens remaining + +#### Cost Fields (optional) + +- `amount` (number, required) - Total cumulative cost for session +- `currency` (string, required) - ISO 4217 currency code (e.g., "USD", "EUR") + +### Design Principles + +1. **Separation of concerns** - Token usage is per-turn data, context window and cost are session state +2. **Agent calculates, client can verify** - Agent knows its model best and provides calculations, but includes raw data for client verification +3. **Flexible cost reporting** - Support any currency, don't assume USD +4. **Prompt caching support** - Include cache read/write tokens for models that support it +5. **Optional but recommended** - Usage tracking is optional to maintain backward compatibility + +## Shiny future + +> How will things will play out once this feature exists? + +**For Users:** + +- **Visibility**: Users see real-time context window usage with percentage indicators +- **Cost awareness**: Users can track spending and check cumulative cost at any time +- **Better planning**: Users know when to start new sessions or compact context +- **Transparency**: Clear understanding of resource consumption + +**For Client Implementations:** + +- **Consistent UI**: All clients can show usage in a standard way (progress bars, percentages, warnings) +- **Smart warnings**: Clients can warn users at 75%, 90% context usage +- **Cost controls**: Clients can implement budget limits and alerts +- **Analytics**: Clients can track usage patterns and optimize +- **On-demand checks**: Clients can poll `session/status` to update context and cost indicators without issuing prompts + +**For Agent Implementations:** + +- **Standard reporting**: Clear contract for what to report and when +- **Flexibility**: Optional fields allow agents to report what they can calculate +- **Model diversity**: Works with any model (GPT, Claude, Llama, etc.) +- **Caching support**: First-class support for prompt caching + +## Implementation details and plan + +> Tell me more about your implementation. What is your detailed implementation plan? + +1. **Update schema.json** to add: + - `Usage` type with token fields + - `ContextWindow` type with `size`, `used`, `percentage`, `remaining` fields + - `Cost` type with `amount` and `currency` fields + - Add optional `usage` field to `PromptResponse` + - Add optional `context_window` and `cost` fields to `SessionStatusResponse` + +2. **Update protocol documentation**: + - Document `usage` field in `/docs/protocol/prompt-turn.mdx` + - Document `context_window` and `cost` fields in session status documentation + - Add examples showing typical usage patterns + +## Frequently asked questions + +> What questions have arisen over the course of authoring this document or during subsequent discussions? + +### Why separate token usage from context window and cost? + +Different users care about different things at different times: + +- **Token counts**: Relevant immediately after a turn completes to understand the breakdown +- **Context window remaining**: Relevant at any time, especially before issuing a large prompt. "Do I need to handoff or compact?" +- **Cumulative cost**: Session-level state users want to check without issuing new prompts + +Separating them allows: +- Clients to poll context and cost status without issuing prompts +- Cleaner data model where per-turn data stays in turn responses +- Users to check session state (context, cost) before deciding on actions + +### Why is cost in session/status instead of PromptResponse? + +Cost is cumulative session state, similar to context window: +- Users want to check total spending at any time, not just after turns +- Keeps `PromptResponse` focused on per-turn token breakdown +- Both cost and context window are session-level metrics that belong together + +### How do users know when to handoff or compact the context? + +The `context_window` object in `session/status` provides everything needed: + +- `used` and `remaining` give absolute numbers for precise tracking +- `percentage` enables simple threshold-based warnings +- `size` lets clients understand the total budget + +**Recommended client behavior:** + +| Percentage | Action | +|------------|--------| +| < 75% | Normal operation | +| 75-90% | Yellow indicator, suggest "Context filling up" | +| 90-95% | Orange indicator, recommend "Start new session or summarize" | +| > 95% | Red indicator, warn "Next prompt may fail - handoff recommended" | + +Clients can also: +- Offer "Compact context" or "Summarize conversation" actions +- Auto-suggest starting a new session +- Implement automatic handoff when approaching limits + +### Why does the agent calculate percentage instead of the client? + +Agent knows its model best: + +- Agent knows exact context window size (varies by model) +- Agent knows how it counts tokens (different tokenizers) +- Agent knows about special tokens, system messages, etc. +- Client can still recalculate if needed (all raw data provided) + +### Why not assume USD for cost? + +Agents may bill in different currencies: + +- European agents might bill in EUR +- Asian agents might bill in JPY or CNY +- Some agents might use credits or points +- Currency conversion rates change + +Better to report actual billing currency and let clients convert if needed. + +### What if the agent can't calculate some fields? + +All fields except the basic token counts are optional. Agents report what they can calculate. Clients handle missing fields gracefully. + +### How does this work with streaming responses? + +- During streaming: Send progressive updates via `session/update` notifications +- Final response: Include complete token usage in `PromptResponse` +- Context window and cost: Always available via `session/status` + +### What about models without fixed context windows? + +- Report effective context window size +- For models with dynamic windows, report current limit +- Update size if it changes +- Set to `null` if truly unlimited (rare) + +### What about rate limits and quotas? + +This RFD focuses on token usage and context windows. Rate limits and quotas are a separate concern that could be addressed in a future RFD. However, the cost tracking here helps users understand their usage against quota limits. + +### Should cached tokens count toward context window? + +Yes, cached tokens still occupy context window space. They're just cheaper to process. The context window usage should include all tokens (regular + cached). + +### What alternative approaches did you consider, and why did you settle on this one? + +**Alternatives considered:** + +1. **Everything in PromptResponse** - Simpler, but context window and cost are session state that users may want to check independently of turns. + +2. **Everything in session/status** - Requires extra round-trip after every prompt to get token usage. Inconsistent with how LLM APIs work. + +3. **Client calculates everything** - Rejected because client doesn't know model's tokenizer, exact context window size, or pricing. + +4. **Only percentage, no raw tokens** - Rejected because users want absolute numbers, clients can't verify calculations, and it's less transparent. + +## Revision history + +- 2025-12-07: Initial draft From 4c879fbe1e173fab77fef2c74a5aa47709532967 Mon Sep 17 00:00:00 2001 From: Ahmed Hesham Abdelkader <23265119+ahmedhesham6@users.noreply.github.com> Date: Sat, 13 Dec 2025 15:28:29 +0100 Subject: [PATCH 2/2] docs(rfd): Update session usage tracking to use session/update notifications Refines the tracking of context window and cost information by transitioning from `session/status` requests to `session/update` notifications. This change allows agents to proactively push updates, enhancing flexibility and real-time data availability for clients. The `cost` field is now optional, and the `remaining` field has been removed, as clients can compute it from `size` and `used`. Updated documentation to reflect these changes and provide clearer usage patterns. --- docs/rfds/session-usage-context-status.mdx | 136 +++++++++++++-------- 1 file changed, 87 insertions(+), 49 deletions(-) diff --git a/docs/rfds/session-usage-context-status.mdx b/docs/rfds/session-usage-context-status.mdx index f3bb825..1ca0a8f 100644 --- a/docs/rfds/session-usage-context-status.mdx +++ b/docs/rfds/session-usage-context-status.mdx @@ -44,11 +44,21 @@ Industry research shows common patterns across AI coding tools: We propose separating usage tracking into two distinct concerns: 1. **Token usage** - Reported in `PromptResponse` after each turn (per-turn data) -2. **Context window and cost** - Reported in `session/status` for on-demand queries (session state) +2. **Context window and cost** - Reported via `session/update` notifications with `sessionUpdate: "context_update"` (session state) This separation reflects how users consume this information: - Token counts are tied to specific turns and useful immediately after a prompt -- Context window and cost are cumulative session state that users may want to check at any time +- Context window and cost are cumulative session state that agents push proactively when available + +Agents send context updates at appropriate times: +- On `session/new` response (if agent can query usage immediately) +- On `session/load` / `session/resume` (for resumed/forked sessions) +- After each `session/prompt` response (when usage data becomes available) +- Anytime context window state changes significantly + +This approach provides flexibility for different agent implementations: +- Agents that support getting current usage without a prompt can immediately send updates when creating, resuming, or forking chats +- Agents that only provide usage when actively prompting can send updates after sending a new prompt ### Token Usage in `PromptResponse` @@ -82,61 +92,71 @@ Add a `usage` field to `PromptResponse` for token consumption tracking: - `cached_read_tokens` (number, optional) - Total cache read tokens - `cached_write_tokens` (number, optional) - Total cache write tokens -### Context Window and Cost in `session/status` +### Context Window and Cost via `session/update` -Add `context_window` and `cost` fields to `session/status` response: +Agents send context window and cost information via `session/update` notifications with `sessionUpdate: "context_update"`: ```json { "jsonrpc": "2.0", - "id": 2, - "method": "session/status", + "method": "session/update", "params": { - "sessionId": "sess_abc123" + "sessionId": "sess_abc123", + "update": { + "sessionUpdate": "context_update", + "used": 53000, + "size": 200000, + "percentage": 26.5 + } } } ``` +#### Context Window Fields (required) + +- `used` (number, required) - Tokens currently in context +- `size` (number, required) - Total context window size in tokens +- `percentage` (number, required) - Percentage used (0-100) + +Note: Clients can compute `remaining` as `size - used` if needed. + +#### Cost Fields (optional) + +- `cost` (object, optional) - Cumulative session cost + - `amount` (number, required) - Total cumulative cost for session + - `currency` (string, required) - ISO 4217 currency code (e.g., "USD", "EUR") + +Example with optional cost: + ```json { "jsonrpc": "2.0", - "id": 2, - "result": { + "method": "session/update", + "params": { "sessionId": "sess_abc123", - "status": "idle", - "context_window": { - "size": 200000, + "update": { + "sessionUpdate": "context_update", "used": 53000, + "size": 200000, "percentage": 26.5, - "remaining": 147000 - }, - "cost": { - "amount": 0.045, - "currency": "USD" + "cost": { + "amount": 0.045, + "currency": "USD" + } } } } ``` -#### Context Window Fields (optional) - -- `size` (number, required) - Total context window size in tokens -- `used` (number, required) - Tokens currently in context -- `percentage` (number, required) - Percentage used (0-100) -- `remaining` (number, required) - Tokens remaining - -#### Cost Fields (optional) - -- `amount` (number, required) - Total cumulative cost for session -- `currency` (string, required) - ISO 4217 currency code (e.g., "USD", "EUR") - ### Design Principles 1. **Separation of concerns** - Token usage is per-turn data, context window and cost are session state -2. **Agent calculates, client can verify** - Agent knows its model best and provides calculations, but includes raw data for client verification -3. **Flexible cost reporting** - Support any currency, don't assume USD -4. **Prompt caching support** - Include cache read/write tokens for models that support it -5. **Optional but recommended** - Usage tracking is optional to maintain backward compatibility +2. **Agent-pushed notifications** - Agents proactively send context updates when data becomes available, following the same pattern as other dynamic session properties (`available_commands_update`, `current_mode_update`, `session_info_update`) +3. **Agent calculates, client can verify** - Agent knows its model best and provides calculations, but includes raw data for client verification +4. **Flexible cost reporting** - Cost is optional since not all agents track it. Support any currency, don't assume USD +5. **Prompt caching support** - Include cache read/write tokens for models that support it +6. **Optional but recommended** - Usage tracking is optional to maintain backward compatibility +7. **Flexible timing** - Agents send updates when they can: immediately for agents with on-demand APIs, or after prompts for agents that only provide usage during active prompting ## Shiny future @@ -155,7 +175,8 @@ Add `context_window` and `cost` fields to `session/status` response: - **Smart warnings**: Clients can warn users at 75%, 90% context usage - **Cost controls**: Clients can implement budget limits and alerts - **Analytics**: Clients can track usage patterns and optimize -- **On-demand checks**: Clients can poll `session/status` to update context and cost indicators without issuing prompts +- **Reactive updates**: Clients receive context updates reactively via notifications, updating UI immediately when agents push new data +- **No polling needed**: Updates arrive automatically when agents have new information, eliminating the need for clients to poll **For Agent Implementations:** @@ -170,15 +191,15 @@ Add `context_window` and `cost` fields to `session/status` response: 1. **Update schema.json** to add: - `Usage` type with token fields - - `ContextWindow` type with `size`, `used`, `percentage`, `remaining` fields - `Cost` type with `amount` and `currency` fields + - `ContextUpdate` type with `used`, `size`, `percentage` (required) and optional `cost` field - Add optional `usage` field to `PromptResponse` - - Add optional `context_window` and `cost` fields to `SessionStatusResponse` + - Add `ContextUpdate` variant to `SessionUpdate` oneOf array (with `sessionUpdate: "context_update"`) 2. **Update protocol documentation**: - Document `usage` field in `/docs/protocol/prompt-turn.mdx` - - Document `context_window` and `cost` fields in session status documentation - - Add examples showing typical usage patterns + - Document `session/update` notification with `sessionUpdate: "context_update"` variant + - Add examples showing typical usage patterns and when agents send context updates ## Frequently asked questions @@ -190,25 +211,26 @@ Different users care about different things at different times: - **Token counts**: Relevant immediately after a turn completes to understand the breakdown - **Context window remaining**: Relevant at any time, especially before issuing a large prompt. "Do I need to handoff or compact?" -- **Cumulative cost**: Session-level state users want to check without issuing new prompts +- **Cumulative cost**: Session-level state that agents push when available Separating them allows: -- Clients to poll context and cost status without issuing prompts - Cleaner data model where per-turn data stays in turn responses -- Users to check session state (context, cost) before deciding on actions +- Agents to push context updates proactively when data becomes available +- Clients to receive updates reactively without needing to poll -### Why is cost in session/status instead of PromptResponse? +### Why is cost in session/update instead of PromptResponse? Cost is cumulative session state, similar to context window: -- Users want to check total spending at any time, not just after turns +- Users want to track total spending, not just per-turn costs - Keeps `PromptResponse` focused on per-turn token breakdown - Both cost and context window are session-level metrics that belong together +- Cost is optional since not all agents track it ### How do users know when to handoff or compact the context? -The `context_window` object in `session/status` provides everything needed: +The context update notification provides everything needed: -- `used` and `remaining` give absolute numbers for precise tracking +- `used` and `size` give absolute numbers for precise tracking (clients can compute `remaining` as `size - used`) - `percentage` enables simple threshold-based warnings - `size` lets clients understand the total budget @@ -252,9 +274,9 @@ All fields except the basic token counts are optional. Agents report what they c ### How does this work with streaming responses? -- During streaming: Send progressive updates via `session/update` notifications +- During streaming: Agents may send progressive context updates via `session/update` notifications as usage changes - Final response: Include complete token usage in `PromptResponse` -- Context window and cost: Always available via `session/status` +- Context window and cost: Agents send `session/update` notifications with `sessionUpdate: "context_update"` when data becomes available (after prompt completion, on session creation/resume, or when context state changes significantly) ### What about models without fixed context windows? @@ -271,13 +293,28 @@ This RFD focuses on token usage and context windows. Rate limits and quotas are Yes, cached tokens still occupy context window space. They're just cheaper to process. The context window usage should include all tokens (regular + cached). +### Why notification instead of request? + +Using `session/update` notifications instead of a `session/status` request provides several benefits: + +1. **Consistency**: Follows the same pattern as other dynamic session properties (`available_commands_update`, `current_mode_update`, `session_info_update`) +2. **Agent flexibility**: Agents can send updates when they have data available, whether that's immediately (for agents with on-demand APIs) or after prompts (for agents that only provide usage during active prompting) +3. **No polling**: Clients receive updates reactively without needing to poll +4. **Real-time updates**: Updates flow naturally as part of the session lifecycle + +### What if the client connects mid-session? + +When a client connects to an existing session (via `session/load` or `session/resume`), agents **SHOULD** send a context update notification if they have current usage data available. This ensures the client UI can immediately display accurate context window and cost information. + +For agents that only provide usage during active prompting, the client UI may not show usage until after the first prompt is sent, which is acceptable given the agent's capabilities. + ### What alternative approaches did you consider, and why did you settle on this one? **Alternatives considered:** -1. **Everything in PromptResponse** - Simpler, but context window and cost are session state that users may want to check independently of turns. +1. **Everything in PromptResponse** - Simpler, but context window and cost are session state that users may want to track independently of turns. -2. **Everything in session/status** - Requires extra round-trip after every prompt to get token usage. Inconsistent with how LLM APIs work. +2. **Request/response (`session/status`)** - Requires clients to poll, and some agents don't have APIs to query current status without a prompt. The notification approach is more flexible and consistent with other dynamic session properties. 3. **Client calculates everything** - Rejected because client doesn't know model's tokenizer, exact context window size, or pricing. @@ -286,3 +323,4 @@ Yes, cached tokens still occupy context window space. They're just cheaper to pr ## Revision history - 2025-12-07: Initial draft +- 2025-12-13: Changed from `session/status` request method to `session/update` notification with `sessionUpdate: "context_update"`. Made `cost` optional and removed `remaining` field (clients can compute as `size - used`). This approach provides more flexibility for agents and follows the same pattern as other dynamic session properties.