Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
326 changes: 326 additions & 0 deletions docs/rfds/session-usage-context-status.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,326 @@
---
title: "Session Usage and Context Status"
---

- Author(s): [@ahmedhesham6](https://github.com/ahmedhesham6)

## Elevator pitch

> What are you proposing to change?

Add standardized usage and context window tracking to the Agent Client Protocol, enabling agents to report token consumption, cost estimates, and context window utilization in a consistent way across implementations.

## Status quo

> How do things work today and what problems does this cause? Why would we change things?

Currently, the ACP protocol has no standardized way for agents to communicate:

1. **Token usage** - How many tokens were consumed in a turn or cumulatively
2. **Context window status** - How much of the model's context window is being used
3. **Cost information** - Estimated costs for API usage
4. **Prompt caching metrics** - Cache hits/misses for models that support caching

This creates several problems:

- **No visibility into resource consumption** - Clients can't show users how much of their context budget is being used
- **No cost transparency** - Users can't track spending or estimate costs before operations
- **No context management** - Clients can't warn users when approaching context limits or suggest compaction
- **Inconsistent implementations** - Each agent implements usage tracking differently (if at all)

Industry research shows common patterns across AI coding tools:

- LLM providers return cumulative token counts in API responses
- IDE extensions display context percentage prominently (e.g., radial progress showing "19%")
- Clients show absolute numbers on hover/detail (e.g., "31.4K of 200K tokens")
- Tools warn users at threshold percentages (75%, 90%, 95%)
- Auto-compaction features trigger when approaching context limits
- Cost tracking focuses on cumulative session totals rather than per-turn breakdowns

## What we propose to do about it

> What are you proposing to improve the situation?

We propose separating usage tracking into two distinct concerns:

1. **Token usage** - Reported in `PromptResponse` after each turn (per-turn data)
2. **Context window and cost** - Reported via `session/update` notifications with `sessionUpdate: "context_update"` (session state)

This separation reflects how users consume this information:
- Token counts are tied to specific turns and useful immediately after a prompt
- Context window and cost are cumulative session state that agents push proactively when available

Agents send context updates at appropriate times:
- On `session/new` response (if agent can query usage immediately)
- On `session/load` / `session/resume` (for resumed/forked sessions)
- After each `session/prompt` response (when usage data becomes available)
- Anytime context window state changes significantly

This approach provides flexibility for different agent implementations:
- Agents that support getting current usage without a prompt can immediately send updates when creating, resuming, or forking chats
- Agents that only provide usage when actively prompting can send updates after sending a new prompt

### Token Usage in `PromptResponse`

Add a `usage` field to `PromptResponse` for token consumption tracking:

```json
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"sessionId": "sess_abc123",
"stopReason": "end_turn",
"usage": {
"total_tokens": 53000,
"input_tokens": 35000,
"output_tokens": 12000,
"reasoning_tokens": 5000,
"cached_read_tokens": 5000,
"cached_write_tokens": 1000
}
}
}
```

#### Usage Fields

- `total_tokens` (number, required) - Sum of all token types across session
- `input_tokens` (number, required) - Total input tokens across all turns
- `output_tokens` (number, required) - Total output tokens across all turns
- `reasoning_tokens` (number, optional) - Total reasoning tokens (for o1/o3 models)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since in ACP we usually refer to this as thought I wonder if we could align that?

- `cached_read_tokens` (number, optional) - Total cache read tokens
- `cached_write_tokens` (number, optional) - Total cache write tokens

### Context Window and Cost via `session/update`

Agents send context window and cost information via `session/update` notifications with `sessionUpdate: "context_update"`:

```json
{
"jsonrpc": "2.0",
"method": "session/update",
"params": {
"sessionId": "sess_abc123",
"update": {
"sessionUpdate": "context_update",
"used": 53000,
"size": 200000,
"percentage": 26.5
}
}
}
```

#### Context Window Fields (required)

- `used` (number, required) - Tokens currently in context
- `size` (number, required) - Total context window size in tokens
- `percentage` (number, required) - Percentage used (0-100)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we send this just to save a calculation on the client? since we allow them to calculate remaining, maybe we let them do this too?


Note: Clients can compute `remaining` as `size - used` if needed.

#### Cost Fields (optional)

- `cost` (object, optional) - Cumulative session cost
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming nitpick: it seems weird that this is part of a "context" update.
I wonder if all of this is just usage from a conceptual point of view?
And roughly the same data can be sent at the end of the turn, with mid-turn updates? So kind of merge these?

It seems you want to distinguish between turn usage vs total usage. Which makes sense, but I wonder if we can distinguish then between turn vs session usage?

- `amount` (number, required) - Total cumulative cost for session
- `currency` (string, required) - ISO 4217 currency code (e.g., "USD", "EUR")

Example with optional cost:

```json
{
"jsonrpc": "2.0",
"method": "session/update",
"params": {
"sessionId": "sess_abc123",
"update": {
"sessionUpdate": "context_update",
"used": 53000,
"size": 200000,
"percentage": 26.5,
"cost": {
"amount": 0.045,
"currency": "USD"
}
}
}
}
```

### Design Principles

1. **Separation of concerns** - Token usage is per-turn data, context window and cost are session state

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how will this work with subagent "tools" that aren't performing full turns but are actively updating token usage frequently

2. **Agent-pushed notifications** - Agents proactively send context updates when data becomes available, following the same pattern as other dynamic session properties (`available_commands_update`, `current_mode_update`, `session_info_update`)
3. **Agent calculates, client can verify** - Agent knows its model best and provides calculations, but includes raw data for client verification
4. **Flexible cost reporting** - Cost is optional since not all agents track it. Support any currency, don't assume USD
5. **Prompt caching support** - Include cache read/write tokens for models that support it
6. **Optional but recommended** - Usage tracking is optional to maintain backward compatibility
7. **Flexible timing** - Agents send updates when they can: immediately for agents with on-demand APIs, or after prompts for agents that only provide usage during active prompting

## Shiny future

> How will things will play out once this feature exists?

**For Users:**

- **Visibility**: Users see real-time context window usage with percentage indicators
- **Cost awareness**: Users can track spending and check cumulative cost at any time
- **Better planning**: Users know when to start new sessions or compact context
- **Transparency**: Clear understanding of resource consumption

**For Client Implementations:**

- **Consistent UI**: All clients can show usage in a standard way (progress bars, percentages, warnings)
- **Smart warnings**: Clients can warn users at 75%, 90% context usage
- **Cost controls**: Clients can implement budget limits and alerts
- **Analytics**: Clients can track usage patterns and optimize
- **Reactive updates**: Clients receive context updates reactively via notifications, updating UI immediately when agents push new data
- **No polling needed**: Updates arrive automatically when agents have new information, eliminating the need for clients to poll

**For Agent Implementations:**

- **Standard reporting**: Clear contract for what to report and when
- **Flexibility**: Optional fields allow agents to report what they can calculate
- **Model diversity**: Works with any model (GPT, Claude, Llama, etc.)
- **Caching support**: First-class support for prompt caching

## Implementation details and plan

> Tell me more about your implementation. What is your detailed implementation plan?

1. **Update schema.json** to add:
- `Usage` type with token fields
- `Cost` type with `amount` and `currency` fields
- `ContextUpdate` type with `used`, `size`, `percentage` (required) and optional `cost` field
- Add optional `usage` field to `PromptResponse`
- Add `ContextUpdate` variant to `SessionUpdate` oneOf array (with `sessionUpdate: "context_update"`)

2. **Update protocol documentation**:
- Document `usage` field in `/docs/protocol/prompt-turn.mdx`
- Document `session/update` notification with `sessionUpdate: "context_update"` variant
- Add examples showing typical usage patterns and when agents send context updates

## Frequently asked questions

> What questions have arisen over the course of authoring this document or during subsequent discussions?

### Why separate token usage from context window and cost?

Different users care about different things at different times:

- **Token counts**: Relevant immediately after a turn completes to understand the breakdown
- **Context window remaining**: Relevant at any time, especially before issuing a large prompt. "Do I need to handoff or compact?"
- **Cumulative cost**: Session-level state that agents push when available

Separating them allows:
- Cleaner data model where per-turn data stays in turn responses
- Agents to push context updates proactively when data becomes available
- Clients to receive updates reactively without needing to poll

### Why is cost in session/update instead of PromptResponse?

Cost is cumulative session state, similar to context window:
- Users want to track total spending, not just per-turn costs
- Keeps `PromptResponse` focused on per-turn token breakdown
- Both cost and context window are session-level metrics that belong together
- Cost is optional since not all agents track it

### How do users know when to handoff or compact the context?

The context update notification provides everything needed:

- `used` and `size` give absolute numbers for precise tracking (clients can compute `remaining` as `size - used`)
- `percentage` enables simple threshold-based warnings
- `size` lets clients understand the total budget

**Recommended client behavior:**

| Percentage | Action |
|------------|--------|
| < 75% | Normal operation |
| 75-90% | Yellow indicator, suggest "Context filling up" |
| 90-95% | Orange indicator, recommend "Start new session or summarize" |
| > 95% | Red indicator, warn "Next prompt may fail - handoff recommended" |

Clients can also:
- Offer "Compact context" or "Summarize conversation" actions
- Auto-suggest starting a new session
- Implement automatic handoff when approaching limits

### Why does the agent calculate percentage instead of the client?

Agent knows its model best:

- Agent knows exact context window size (varies by model)
- Agent knows how it counts tokens (different tokenizers)
- Agent knows about special tokens, system messages, etc.
- Client can still recalculate if needed (all raw data provided)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont' really understand these arguments... Unless percentage will somehow be different than used / total (which would be weird) I feel the agent gets all of this by controlling the token counts already


### Why not assume USD for cost?

Agents may bill in different currencies:

- European agents might bill in EUR
- Asian agents might bill in JPY or CNY
- Some agents might use credits or points
- Currency conversion rates change

Better to report actual billing currency and let clients convert if needed.

### What if the agent can't calculate some fields?

All fields except the basic token counts are optional. Agents report what they can calculate. Clients handle missing fields gracefully.

### How does this work with streaming responses?

- During streaming: Agents may send progressive context updates via `session/update` notifications as usage changes
- Final response: Include complete token usage in `PromptResponse`
- Context window and cost: Agents send `session/update` notifications with `sessionUpdate: "context_update"` when data becomes available (after prompt completion, on session creation/resume, or when context state changes significantly)

### What about models without fixed context windows?

- Report effective context window size
- For models with dynamic windows, report current limit
- Update size if it changes
- Set to `null` if truly unlimited (rare)

### What about rate limits and quotas?

This RFD focuses on token usage and context windows. Rate limits and quotas are a separate concern that could be addressed in a future RFD. However, the cost tracking here helps users understand their usage against quota limits.

### Should cached tokens count toward context window?

Yes, cached tokens still occupy context window space. They're just cheaper to process. The context window usage should include all tokens (regular + cached).

### Why notification instead of request?

Using `session/update` notifications instead of a `session/status` request provides several benefits:

1. **Consistency**: Follows the same pattern as other dynamic session properties (`available_commands_update`, `current_mode_update`, `session_info_update`)
2. **Agent flexibility**: Agents can send updates when they have data available, whether that's immediately (for agents with on-demand APIs) or after prompts (for agents that only provide usage during active prompting)
3. **No polling**: Clients receive updates reactively without needing to poll
4. **Real-time updates**: Updates flow naturally as part of the session lifecycle

### What if the client connects mid-session?

When a client connects to an existing session (via `session/load` or `session/resume`), agents **SHOULD** send a context update notification if they have current usage data available. This ensures the client UI can immediately display accurate context window and cost information.

For agents that only provide usage during active prompting, the client UI may not show usage until after the first prompt is sent, which is acceptable given the agent's capabilities.

### What alternative approaches did you consider, and why did you settle on this one?

**Alternatives considered:**

1. **Everything in PromptResponse** - Simpler, but context window and cost are session state that users may want to track independently of turns.

2. **Request/response (`session/status`)** - Requires clients to poll, and some agents don't have APIs to query current status without a prompt. The notification approach is more flexible and consistent with other dynamic session properties.

3. **Client calculates everything** - Rejected because client doesn't know model's tokenizer, exact context window size, or pricing.

4. **Only percentage, no raw tokens** - Rejected because users want absolute numbers, clients can't verify calculations, and it's less transparent.

## Revision history

- 2025-12-07: Initial draft
- 2025-12-13: Changed from `session/status` request method to `session/update` notification with `sessionUpdate: "context_update"`. Made `cost` optional and removed `remaining` field (clients can compute as `size - used`). This approach provides more flexibility for agents and follows the same pattern as other dynamic session properties.