agentclientprotocol · ahmedhesham6 · Dec 7, 2025 · Dec 13, 2025 · benbrandt · Dec 15, 2025
@@ -0,0 +1,326 @@
+---
+title: "Session Usage and Context Status"
+---
+
+- Author(s): [@ahmedhesham6](https://github.com/ahmedhesham6)
+
+## Elevator pitch
+
+> What are you proposing to change?
+
+Add standardized usage and context window tracking to the Agent Client Protocol, enabling agents to report token consumption, cost estimates, and context window utilization in a consistent way across implementations.
+
+## Status quo
+
+> How do things work today and what problems does this cause? Why would we change things?
+
+Currently, the ACP protocol has no standardized way for agents to communicate:
+
+1. **Token usage** - How many tokens were consumed in a turn or cumulatively
+2. **Context window status** - How much of the model's context window is being used
+3. **Cost information** - Estimated costs for API usage
+4. **Prompt caching metrics** - Cache hits/misses for models that support caching
+
+This creates several problems:
+
+- **No visibility into resource consumption** - Clients can't show users how much of their context budget is being used
+- **No cost transparency** - Users can't track spending or estimate costs before operations
+- **No context management** - Clients can't warn users when approaching context limits or suggest compaction
+- **Inconsistent implementations** - Each agent implements usage tracking differently (if at all)
+
+Industry research shows common patterns across AI coding tools:
+
+- LLM providers return cumulative token counts in API responses
+- IDE extensions display context percentage prominently (e.g., radial progress showing "19%")
+- Clients show absolute numbers on hover/detail (e.g., "31.4K of 200K tokens")
+- Tools warn users at threshold percentages (75%, 90%, 95%)
+- Auto-compaction features trigger when approaching context limits
+- Cost tracking focuses on cumulative session totals rather than per-turn breakdowns
+
+## What we propose to do about it
+
+> What are you proposing to improve the situation?
+
+We propose separating usage tracking into two distinct concerns:
+
+1. **Token usage** - Reported in `PromptResponse` after each turn (per-turn data)
+2. **Context window and cost** - Reported via `session/update` notifications with `sessionUpdate: "context_update"` (session state)
+
+This separation reflects how users consume this information:
+- Token counts are tied to specific turns and useful immediately after a prompt
+- Context window and cost are cumulative session state that agents push proactively when available
+
+Agents send context updates at appropriate times:
+- On `session/new` response (if agent can query usage immediately)
+- On `session/load` / `session/resume` (for resumed/forked sessions)
+- After each `session/prompt` response (when usage data becomes available)
+- Anytime context window state changes significantly
+
+This approach provides flexibility for different agent implementations:
+- Agents that support getting current usage without a prompt can immediately send updates when creating, resuming, or forking chats
+- Agents that only provide usage when actively prompting can send updates after sending a new prompt
+
+### Token Usage in `PromptResponse`
+
+Add a `usage` field to `PromptResponse` for token consumption tracking:
+
+```json
+{
+  "jsonrpc": "2.0",
+  "id": 1,
+  "result": {
+    "sessionId": "sess_abc123",
+    "stopReason": "end_turn",
+    "usage": {
+      "total_tokens": 53000,
+      "input_tokens": 35000,
+      "output_tokens": 12000,
+      "reasoning_tokens": 5000,
+      "cached_read_tokens": 5000,
+      "cached_write_tokens": 1000
+    }
+  }
+}
+```
+
+#### Usage Fields
+
+- `total_tokens` (number, required) - Sum of all token types across session
+- `input_tokens` (number, required) - Total input tokens across all turns
+- `output_tokens` (number, required) - Total output tokens across all turns
+- `reasoning_tokens` (number, optional) - Total reasoning tokens (for o1/o3 models)
+- `cached_read_tokens` (number, optional) - Total cache read tokens
+- `cached_write_tokens` (number, optional) - Total cache write tokens
+
+### Context Window and Cost via `session/update`
+
+Agents send context window and cost information via `session/update` notifications with `sessionUpdate: "context_update"`:
+
+```json
+{
+  "jsonrpc": "2.0",
+  "method": "session/update",
+  "params": {
+    "sessionId": "sess_abc123",
+    "update": {
+      "sessionUpdate": "context_update",
+      "used": 53000,
+      "size": 200000,
+      "percentage": 26.5
+    }
+  }
+}
+```
+
+#### Context Window Fields (required)
+
+- `used` (number, required) - Tokens currently in context
+- `size` (number, required) - Total context window size in tokens
+- `percentage` (number, required) - Percentage used (0-100)
+
+Note: Clients can compute `remaining` as `size - used` if needed.
+
+#### Cost Fields (optional)
+
+- `cost` (object, optional) - Cumulative session cost
+  - `amount` (number, required) - Total cumulative cost for session
+  - `currency` (string, required) - ISO 4217 currency code (e.g., "USD", "EUR")
+
+Example with optional cost:
+
+```json
+{
+  "jsonrpc": "2.0",
+  "method": "session/update",
+  "params": {
+    "sessionId": "sess_abc123",
+    "update": {
+      "sessionUpdate": "context_update",
+      "used": 53000,
+      "size": 200000,
+      "percentage": 26.5,
+      "cost": {
+        "amount": 0.045,
+        "currency": "USD"
+      }
+    }
+  }
+}
+```
+
+### Design Principles
+
+1. **Separation of concerns** - Token usage is per-turn data, context window and cost are session state
+2. **Agent-pushed notifications** - Agents proactively send context updates when data becomes available, following the same pattern as other dynamic session properties (`available_commands_update`, `current_mode_update`, `session_info_update`)
+3. **Agent calculates, client can verify** - Agent knows its model best and provides calculations, but includes raw data for client verification
+4. **Flexible cost reporting** - Cost is optional since not all agents track it. Support any currency, don't assume USD
+5. **Prompt caching support** - Include cache read/write tokens for models that support it
+6. **Optional but recommended** - Usage tracking is optional to maintain backward compatibility
+7. **Flexible timing** - Agents send updates when they can: immediately for agents with on-demand APIs, or after prompts for agents that only provide usage during active prompting
+
+## Shiny future
+
+> How will things will play out once this feature exists?
+
+**For Users:**
+
+- **Visibility**: Users see real-time context window usage with percentage indicators
+- **Cost awareness**: Users can track spending and check cumulative cost at any time
+- **Better planning**: Users know when to start new sessions or compact context
+- **Transparency**: Clear understanding of resource consumption
+
+**For Client Implementations:**
+
+- **Consistent UI**: All clients can show usage in a standard way (progress bars, percentages, warnings)
+- **Smart warnings**: Clients can warn users at 75%, 90% context usage
+- **Cost controls**: Clients can implement budget limits and alerts
+- **Analytics**: Clients can track usage patterns and optimize
+- **Reactive updates**: Clients receive context updates reactively via notifications, updating UI immediately when agents push new data
+- **No polling needed**: Updates arrive automatically when agents have new information, eliminating the need for clients to poll
+
+**For Agent Implementations:**
+
+- **Standard reporting**: Clear contract for what to report and when
+- **Flexibility**: Optional fields allow agents to report what they can calculate
+- **Model diversity**: Works with any model (GPT, Claude, Llama, etc.)
+- **Caching support**: First-class support for prompt caching
+
+## Implementation details and plan
+
+> Tell me more about your implementation. What is your detailed implementation plan?
+
+1. **Update schema.json** to add:
+   - `Usage` type with token fields
+   - `Cost` type with `amount` and `currency` fields
+   - `ContextUpdate` type with `used`, `size`, `percentage` (required) and optional `cost` field
+   - Add optional `usage` field to `PromptResponse`
+   - Add `ContextUpdate` variant to `SessionUpdate` oneOf array (with `sessionUpdate: "context_update"`)
+
+2. **Update protocol documentation**:
+   - Document `usage` field in `/docs/protocol/prompt-turn.mdx`
+   - Document `session/update` notification with `sessionUpdate: "context_update"` variant
+   - Add examples showing typical usage patterns and when agents send context updates
+
+## Frequently asked questions
+
+> What questions have arisen over the course of authoring this document or during subsequent discussions?
+
+### Why separate token usage from context window and cost?
+
+Different users care about different things at different times:
+
+- **Token counts**: Relevant immediately after a turn completes to understand the breakdown
+- **Context window remaining**: Relevant at any time, especially before issuing a large prompt. "Do I need to handoff or compact?"
+- **Cumulative cost**: Session-level state that agents push when available
+
+Separating them allows:
+- Cleaner data model where per-turn data stays in turn responses
+- Agents to push context updates proactively when data becomes available
+- Clients to receive updates reactively without needing to poll
+
+### Why is cost in session/update instead of PromptResponse?
+
+Cost is cumulative session state, similar to context window:
+- Users want to track total spending, not just per-turn costs
+- Keeps `PromptResponse` focused on per-turn token breakdown
+- Both cost and context window are session-level metrics that belong together
+- Cost is optional since not all agents track it
+
+### How do users know when to handoff or compact the context?
+
+The context update notification provides everything needed:
+
+- `used` and `size` give absolute numbers for precise tracking (clients can compute `remaining` as `size - used`)
+- `percentage` enables simple threshold-based warnings
+- `size` lets clients understand the total budget
+
+**Recommended client behavior:**
+
+| Percentage | Action |
+|------------|--------|
+| < 75% | Normal operation |
+| 75-90% | Yellow indicator, suggest "Context filling up" |
+| 90-95% | Orange indicator, recommend "Start new session or summarize" |
+| > 95% | Red indicator, warn "Next prompt may fail - handoff recommended" |
+
+Clients can also:
+- Offer "Compact context" or "Summarize conversation" actions
+- Auto-suggest starting a new session
+- Implement automatic handoff when approaching limits
+
+### Why does the agent calculate percentage instead of the client?
+
+Agent knows its model best:
+
+- Agent knows exact context window size (varies by model)
+- Agent knows how it counts tokens (different tokenizers)
+- Agent knows about special tokens, system messages, etc.
+- Client can still recalculate if needed (all raw data provided)
+
+### Why not assume USD for cost?
+
+Agents may bill in different currencies:
+
+- European agents might bill in EUR
+- Asian agents might bill in JPY or CNY
+- Some agents might use credits or points
+- Currency conversion rates change
+
+Better to report actual billing currency and let clients convert if needed.
+
+### What if the agent can't calculate some fields?
+
+All fields except the basic token counts are optional. Agents report what they can calculate. Clients handle missing fields gracefully.
+
+### How does this work with streaming responses?
+
+- During streaming: Agents may send progressive context updates via `session/update` notifications as usage changes
+- Final response: Include complete token usage in `PromptResponse`
+- Context window and cost: Agents send `session/update` notifications with `sessionUpdate: "context_update"` when data becomes available (after prompt completion, on session creation/resume, or when context state changes significantly)
+
+### What about models without fixed context windows?
+
+- Report effective context window size
+- For models with dynamic windows, report current limit
+- Update size if it changes
+- Set to `null` if truly unlimited (rare)
+
+### What about rate limits and quotas?
+
+This RFD focuses on token usage and context windows. Rate limits and quotas are a separate concern that could be addressed in a future RFD. However, the cost tracking here helps users understand their usage against quota limits.
+
+### Should cached tokens count toward context window?
+
+Yes, cached tokens still occupy context window space. They're just cheaper to process. The context window usage should include all tokens (regular + cached).
+
+### Why notification instead of request?
+
+Using `session/update` notifications instead of a `session/status` request provides several benefits:
+
+1. **Consistency**: Follows the same pattern as other dynamic session properties (`available_commands_update`, `current_mode_update`, `session_info_update`)
+2. **Agent flexibility**: Agents can send updates when they have data available, whether that's immediately (for agents with on-demand APIs) or after prompts (for agents that only provide usage during active prompting)
+3. **No polling**: Clients receive updates reactively without needing to poll
+4. **Real-time updates**: Updates flow naturally as part of the session lifecycle
+
+### What if the client connects mid-session?
+
+When a client connects to an existing session (via `session/load` or `session/resume`), agents **SHOULD** send a context update notification if they have current usage data available. This ensures the client UI can immediately display accurate context window and cost information.
+
+For agents that only provide usage during active prompting, the client UI may not show usage until after the first prompt is sent, which is acceptable given the agent's capabilities.
+
+### What alternative approaches did you consider, and why did you settle on this one?
+
+**Alternatives considered:**
+
+1. **Everything in PromptResponse** - Simpler, but context window and cost are session state that users may want to track independently of turns.
+
+2. **Request/response (`session/status`)** - Requires clients to poll, and some agents don't have APIs to query current status without a prompt. The notification approach is more flexible and consistent with other dynamic session properties.
+
+3. **Client calculates everything** - Rejected because client doesn't know model's tokenizer, exact context window size, or pricing.
+
+4. **Only percentage, no raw tokens** - Rejected because users want absolute numbers, clients can't verify calculations, and it's less transparent.
+
+## Revision history
+
+- 2025-12-07: Initial draft
+- 2025-12-13: Changed from `session/status` request method to `session/update` notification with `sessionUpdate: "context_update"`. Made `cost` optional and removed `remaining` field (clients can compute as `size - used`). This approach provides more flexibility for agents and follows the same pattern as other dynamic session properties.