Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,18 @@ OpenClaw now supports **provider-agnostic named model profiles**. This keeps mod

See [Model Profiles and Gemma](docs/MODEL_PROFILES.md) for the full configuration and evaluation guide.

### Prompt Caching

OpenClaw can now attach **provider-aware prompt caching hints** through the existing model-profile and provider seams rather than introducing a cache-specific runtime path.

- Configure prompt caching globally under `OpenClaw:Llm:PromptCaching` or per named model profile
- Supported cache dialects are normalized as `openai`, `anthropic`, `gemini`, or `none`
- `openai-compatible` and dynamic providers must opt into a dialect explicitly before cache hints are sent
- Cache usage is normalized into `cacheRead` / `cacheWrite` counters and exposed through diagnostics, session status, and provider usage summaries
- Keep-warm is intentionally selective in v1 and only applies to providers with explicit cache TTL/resource semantics

See [Prompt Caching](docs/PROMPT_CACHING.md) for configuration, provider behavior, and diagnostics details.

### Review-First Learning

- The runtime can observe completed sessions and create **pending learning proposals** instead of auto-mutating behavior
Expand Down
190 changes: 190 additions & 0 deletions docs/PROMPT_CACHING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
# Prompt Caching

OpenClaw.NET supports prompt caching as a provider-aware optimization layered on top of the existing provider and model-profile architecture. The runtime still talks to providers through the same `ILlmExecutionService` and model-selection flow. Prompt caching only changes request shaping, normalized usage accounting, and optional keep-warm behavior.

## Why it exists

Prompt caching helps when a large prefix of the request stays stable across turns:

- base system prompt
- tool declarations
- skill prompt content
- stable workspace prompt files

When the upstream provider supports prompt caching, OpenClaw can attach cache hints and normalize returned cache usage as:

- `cacheRead`
- `cacheWrite`

This improves cost and latency visibility for long-running sessions without introducing a provider-specific runtime fork.

## Configuration

Prompt caching can be configured globally:

```json
{
"OpenClaw": {
"Llm": {
"Provider": "openai",
"Model": "gpt-4.1",
"PromptCaching": {
"Enabled": true,
"Retention": "auto",
"Dialect": "openai",
"KeepWarmEnabled": false,
"KeepWarmIntervalMinutes": 55,
"TraceEnabled": false,
"TraceFilePath": "./memory/logs/cache-trace.jsonl"
}
}
}
}
```

Or per model profile:

```json
{
"OpenClaw": {
"Models": {
"DefaultProfile": "gemma4-prod",
"Profiles": [
{
"Id": "gemma4-prod",
"Provider": "openai-compatible",
"Model": "gemma-4",
"BaseUrl": "https://gateway.example.com/v1",
"ApiKey": "env:MODEL_PROVIDER_KEY",
"PromptCaching": {
"Enabled": true,
"Dialect": "openai",
"Retention": "auto"
}
},
{
"Id": "claude-research",
"Provider": "anthropic",
"Model": "claude-sonnet-4.5",
"PromptCaching": {
"Enabled": true,
"Dialect": "anthropic",
"Retention": "long",
"KeepWarmEnabled": true,
"KeepWarmIntervalMinutes": 55
}
}
]
}
}
}
```

Profile settings override the global `OpenClaw:Llm:PromptCaching` values field-by-field.

## Supported fields

- `Enabled`: turns prompt caching behavior on for that scope
- `Retention`: `none`, `short`, `long`, or `auto`
- `Dialect`: `auto`, `openai`, `anthropic`, `gemini`, or `none`
- `KeepWarmEnabled`: enables selective keep-warm for eligible providers
- `KeepWarmIntervalMinutes`: minimum warm interval
- `TraceEnabled`: emits cache-trace JSONL entries
- `TraceFilePath`: optional trace output path

## Provider behavior

### OpenAI and Azure OpenAI

- Uses deterministic cache-key hints through request additional properties
- Normalizes provider-reported cached prompt tokens into `cacheRead`
- Does not fabricate `cacheWrite` when the provider does not report it

### OpenAI-compatible

- Prompt caching is only enabled when `Dialect` is explicitly set to `openai`
- If prompt caching is enabled but the dialect stays `auto`, config validation and doctor mode warn before runtime

### Anthropic and Anthropic Vertex

- Uses Anthropic-style cache hints
- Maps provider cache read and cache creation/write usage when reported
- Eligible for keep-warm when explicitly enabled

### Amazon Bedrock

- Bedrock is available as a provider id for cache-policy routing and validation
- Anthropic-style cache behavior is only meaningful for Anthropic Claude models behind a Bedrock-compatible endpoint or adapter
- Non-Anthropic Bedrock models are treated as no-cache for retention/keep-warm purposes

### Gemini

- Uses Gemini cache dialect hints and normalized cache accounting
- Eligible for keep-warm when explicitly enabled

### Ollama

- No prompt caching behavior in v1
- Model capabilities reflect that prompt caching is unsupported

### Dynamic / plugin providers

- Prompt cache hints are passed through `ChatOptions.AdditionalProperties`
- The provider must opt into a cache dialect explicitly
- If the provider returns usage counters with cache fields, OpenClaw normalizes them into `cacheRead` / `cacheWrite`

## Diagnostics

Prompt cache usage is surfaced in:

- `/metrics/providers`
- `/doctor/text`
- session status summaries
- `/status` and `/usage` command output

If live session cache totals are missing, OpenClaw falls back to the most recent nonzero cache counters recorded in provider usage history for that session.

## Cache tracing

Cache tracing can be enabled with config:

```json
{
"OpenClaw": {
"Diagnostics": {
"CacheTrace": {
"Enabled": true,
"FilePath": "./memory/logs/cache-trace.jsonl",
"IncludeMessages": true,
"IncludePrompt": true,
"IncludeSystem": true
}
}
}
}
```

Or with environment variables:

- `OPENCLAW_CACHE_TRACE=1`
- `OPENCLAW_CACHE_TRACE_FILE=/path/to/cache-trace.jsonl`
- `OPENCLAW_CACHE_TRACE_PROMPT=0|1`
- `OPENCLAW_CACHE_TRACE_SYSTEM=0|1`

Trace output is JSONL and includes:

- selected profile/provider/model
- dialect and retention
- stable fingerprint
- normalized cache usage counters

## Keep-warm

Keep-warm is intentionally conservative in v1.

- It runs in a dedicated background service
- It only warms active sessions with recent stable prompt fingerprints
- It only warms profiles that explicitly set `KeepWarmEnabled=true`
- It only applies to providers with explicit TTL or cache-resource semantics

Providers that are not explicitly eligible are skipped without failing normal requests.
39 changes: 38 additions & 1 deletion src/OpenClaw.Agent/AgentRuntime.cs
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ public sealed class AgentRuntime : IAgentRuntime
private readonly SkillsConfig? _skillsConfig;
private readonly string? _skillWorkspacePath;
private readonly IReadOnlyList<string> _pluginSkillDirs;
private readonly string? _memoryRecallPrefix;
private readonly object _skillGate = new();
private string[] _loadedSkillNames = [];
private int _skillPromptLength;
Expand Down Expand Up @@ -158,6 +159,9 @@ public AgentRuntime(
_isContractRuntimeBudgetExceeded = isContractRuntimeBudgetExceeded;
_recordContractTurnUsage = recordContractTurnUsage;
_appendContractSnapshot = appendContractSnapshot;
var projectId = gatewayConfig?.Memory.ProjectId
?? Environment.GetEnvironmentVariable("OPENCLAW_PROJECT");
_memoryRecallPrefix = string.IsNullOrWhiteSpace(projectId) ? null : $"project:{projectId.Trim()}:";
ApplySkills(skills ?? []);
}

Expand Down Expand Up @@ -326,22 +330,29 @@ public async Task<string> RunAsync(
// Extract token usage from response
var inputTokens = response.Usage?.InputTokenCount ?? 0;
var outputTokens = response.Usage?.OutputTokenCount ?? 0;
var cacheUsage = PromptCacheUsageExtractor.FromUsage(response.Usage);
turnCtx.RecordLlmCall(llmSw.Elapsed, inputTokens, outputTokens);
_metrics?.IncrementLlmCalls();
_metrics?.AddInputTokens(inputTokens);
_metrics?.AddOutputTokens(outputTokens);
_metrics?.AddPromptCacheReads(cacheUsage.CacheReadTokens);
_metrics?.AddPromptCacheWrites(cacheUsage.CacheWriteTokens);
_providerUsage?.AddTokens(executionResult.ProviderId, executionResult.ModelId, inputTokens, outputTokens);
_providerUsage?.AddCacheTokens(executionResult.ProviderId, executionResult.ModelId, cacheUsage.CacheReadTokens, cacheUsage.CacheWriteTokens);
_providerUsage?.RecordTurn(
session.Id,
session.ChannelId,
executionResult.ProviderId,
executionResult.ModelId,
inputTokens,
outputTokens,
cacheUsage.CacheReadTokens,
cacheUsage.CacheWriteTokens,
LlmExecutionEstimateBuilder.BuildInputTokenEstimate(messages, inputTokens, _skillPromptLength));

// Track token usage on the session
session.AddTokenUsage(inputTokens, outputTokens);
session.AddCacheUsage(cacheUsage.CacheReadTokens, cacheUsage.CacheWriteTokens);
_recordContractTurnUsage?.Invoke(session, executionResult.ProviderId, executionResult.ModelId, inputTokens, outputTokens);

if (TryRejectContractBudget(session, out contractBudgetMessage))
Expand Down Expand Up @@ -496,6 +507,7 @@ public async IAsyncEnumerable<AgentStreamEvent> RunStreamingAsync(
}

session.AddTokenUsage(streamResult.InputTokens, streamResult.OutputTokens);
session.AddCacheUsage(streamResult.CacheReadTokens, streamResult.CacheWriteTokens);
if (!string.IsNullOrWhiteSpace(streamResult.ProviderId) && !string.IsNullOrWhiteSpace(streamResult.ModelId))
_recordContractTurnUsage?.Invoke(session, streamResult.ProviderId, streamResult.ModelId, streamResult.InputTokens, streamResult.OutputTokens);
if (!string.IsNullOrWhiteSpace(streamResult.ProviderId) && !string.IsNullOrWhiteSpace(streamResult.ModelId))
Expand All @@ -507,6 +519,8 @@ public async IAsyncEnumerable<AgentStreamEvent> RunStreamingAsync(
streamResult.ModelId,
streamResult.InputTokens,
streamResult.OutputTokens,
streamResult.CacheReadTokens,
streamResult.CacheWriteTokens,
LlmExecutionEstimateBuilder.BuildInputTokenEstimate(messages, streamResult.InputTokens, _skillPromptLength));
}

Expand Down Expand Up @@ -632,9 +646,16 @@ private async ValueTask TryInjectRecallAsync(List<ChatMessage> messages, string
try
{
var limit = Math.Clamp(_recall.MaxNotes, 1, 32);
var hits = await search.SearchNotesAsync(userMessage, prefix: null, limit, ct);
_metrics?.IncrementMemoryRecallSearches();
var hits = await search.SearchNotesAsync(userMessage, _memoryRecallPrefix, limit, ct);
if (hits.Count == 0 && !string.IsNullOrWhiteSpace(_memoryRecallPrefix))
{
_metrics?.IncrementMemoryRecallSearches();
hits = await search.SearchNotesAsync(userMessage, prefix: null, limit, ct);
}
if (hits.Count == 0)
return;
_metrics?.AddMemoryRecallHits(hits.Count);

var maxChars = Math.Clamp(_recall.MaxChars, 256, 100_000);

Expand Down Expand Up @@ -743,6 +764,8 @@ private sealed class StreamCollectResult
public List<FunctionCallContent> ToolCalls { get; } = [];
public int InputTokens { get; set; }
public int OutputTokens { get; set; }
public int CacheReadTokens { get; set; }
public int CacheWriteTokens { get; set; }
public string? ProviderId { get; set; }
public string? ModelId { get; set; }
public string? Error { get; set; }
Expand Down Expand Up @@ -789,6 +812,11 @@ private async Task<StreamCollectResult> StreamLlmCollectAsync(
result.InputTokens = (int)usage.Details.InputTokenCount.Value;
if (usage.Details.OutputTokenCount is > 0)
result.OutputTokens = (int)usage.Details.OutputTokenCount.Value;
var cacheUsage = PromptCacheUsageExtractor.FromUsage(usage.Details);
if (cacheUsage.CacheReadTokens > 0)
result.CacheReadTokens = (int)cacheUsage.CacheReadTokens;
if (cacheUsage.CacheWriteTokens > 0)
result.CacheWriteTokens = (int)cacheUsage.CacheWriteTokens;
}
}
}
Expand Down Expand Up @@ -884,6 +912,11 @@ private async Task<StreamCollectResult> StreamLlmCollectAsync(
result.InputTokens = (int)usage.Details.InputTokenCount.Value;
if (usage.Details.OutputTokenCount is > 0)
result.OutputTokens = (int)usage.Details.OutputTokenCount.Value;
var cacheUsage = PromptCacheUsageExtractor.FromUsage(usage.Details);
if (cacheUsage.CacheReadTokens > 0)
result.CacheReadTokens = (int)cacheUsage.CacheReadTokens;
if (cacheUsage.CacheWriteTokens > 0)
result.CacheWriteTokens = (int)cacheUsage.CacheWriteTokens;
}
}
}
Expand Down Expand Up @@ -936,7 +969,10 @@ private async Task<StreamCollectResult> StreamLlmCollectAsync(
_metrics?.IncrementLlmCalls();
_metrics?.AddInputTokens(result.InputTokens);
_metrics?.AddOutputTokens(result.OutputTokens);
_metrics?.AddPromptCacheReads(result.CacheReadTokens);
_metrics?.AddPromptCacheWrites(result.CacheWriteTokens);
_providerUsage?.AddTokens(_config.Provider, options.ModelId ?? _config.Model, result.InputTokens, result.OutputTokens);
_providerUsage?.AddCacheTokens(_config.Provider, options.ModelId ?? _config.Model, result.CacheReadTokens, result.CacheWriteTokens);
result.ProviderId = _config.Provider;
result.ModelId = options.ModelId ?? _config.Model;

Expand Down Expand Up @@ -1258,6 +1294,7 @@ public async Task CompactHistoryAsync(Session session, CancellationToken ct)

if (!string.IsNullOrWhiteSpace(summary))
{
_metrics?.IncrementMemoryCompactions();
session.History.RemoveRange(0, toSummarizeCount);
session.History.Insert(0, new ChatTurn
{
Expand Down
Loading
Loading