clawdotnet · Telli · Apr 8, 2026 · Apr 8, 2026 · Apr 8, 2026 · Apr 8, 2026
diff --git a/README.md b/README.md
@@ -65,6 +65,18 @@ OpenClaw now supports **provider-agnostic named model profiles**. This keeps mod
 
 See [Model Profiles and Gemma](docs/MODEL_PROFILES.md) for the full configuration and evaluation guide.
 
+### Prompt Caching
+
+OpenClaw can now attach **provider-aware prompt caching hints** through the existing model-profile and provider seams rather than introducing a cache-specific runtime path.
+
+- Configure prompt caching globally under `OpenClaw:Llm:PromptCaching` or per named model profile
+- Supported cache dialects are normalized as `openai`, `anthropic`, `gemini`, or `none`
+- `openai-compatible` and dynamic providers must opt into a dialect explicitly before cache hints are sent
+- Cache usage is normalized into `cacheRead` / `cacheWrite` counters and exposed through diagnostics, session status, and provider usage summaries
+- Keep-warm is intentionally selective in v1 and only applies to providers with explicit cache TTL/resource semantics
+
+See [Prompt Caching](docs/PROMPT_CACHING.md) for configuration, provider behavior, and diagnostics details.
+
 ### Review-First Learning
 
 - The runtime can observe completed sessions and create **pending learning proposals** instead of auto-mutating behavior

diff --git a/docs/PROMPT_CACHING.md b/docs/PROMPT_CACHING.md
@@ -0,0 +1,190 @@
+# Prompt Caching
+
+OpenClaw.NET supports prompt caching as a provider-aware optimization layered on top of the existing provider and model-profile architecture. The runtime still talks to providers through the same `ILlmExecutionService` and model-selection flow. Prompt caching only changes request shaping, normalized usage accounting, and optional keep-warm behavior.
+
+## Why it exists
+
+Prompt caching helps when a large prefix of the request stays stable across turns:
+
+- base system prompt
+- tool declarations
+- skill prompt content
+- stable workspace prompt files
+
+When the upstream provider supports prompt caching, OpenClaw can attach cache hints and normalize returned cache usage as:
+
+- `cacheRead`
+- `cacheWrite`
+
+This improves cost and latency visibility for long-running sessions without introducing a provider-specific runtime fork.
+
+## Configuration
+
+Prompt caching can be configured globally:
+
+```json
+{
+  "OpenClaw": {
+    "Llm": {
+      "Provider": "openai",
+      "Model": "gpt-4.1",
+      "PromptCaching": {
+        "Enabled": true,
+        "Retention": "auto",
+        "Dialect": "openai",
+        "KeepWarmEnabled": false,
+        "KeepWarmIntervalMinutes": 55,
+        "TraceEnabled": false,
+        "TraceFilePath": "./memory/logs/cache-trace.jsonl"
+      }
+    }
+  }
+}
+```
+
+Or per model profile:
+
+```json
+{
+  "OpenClaw": {
+    "Models": {
+      "DefaultProfile": "gemma4-prod",
+      "Profiles": [
+        {
+          "Id": "gemma4-prod",
+          "Provider": "openai-compatible",
+          "Model": "gemma-4",
+          "BaseUrl": "https://gateway.example.com/v1",
+          "ApiKey": "env:MODEL_PROVIDER_KEY",
+          "PromptCaching": {
+            "Enabled": true,
+            "Dialect": "openai",
+            "Retention": "auto"
+          }
+        },
+        {
+          "Id": "claude-research",
+          "Provider": "anthropic",
+          "Model": "claude-sonnet-4.5",
+          "PromptCaching": {
+            "Enabled": true,
+            "Dialect": "anthropic",
+            "Retention": "long",
+            "KeepWarmEnabled": true,
+            "KeepWarmIntervalMinutes": 55
+          }
+        }
+      ]
+    }
+  }
+}
+```
+
+Profile settings override the global `OpenClaw:Llm:PromptCaching` values field-by-field.
+
+## Supported fields
+
+- `Enabled`: turns prompt caching behavior on for that scope
+- `Retention`: `none`, `short`, `long`, or `auto`
+- `Dialect`: `auto`, `openai`, `anthropic`, `gemini`, or `none`
+- `KeepWarmEnabled`: enables selective keep-warm for eligible providers
+- `KeepWarmIntervalMinutes`: minimum warm interval
+- `TraceEnabled`: emits cache-trace JSONL entries
+- `TraceFilePath`: optional trace output path
+
+## Provider behavior
+
+### OpenAI and Azure OpenAI
+
+- Uses deterministic cache-key hints through request additional properties
+- Normalizes provider-reported cached prompt tokens into `cacheRead`
+- Does not fabricate `cacheWrite` when the provider does not report it
+
+### OpenAI-compatible
+
+- Prompt caching is only enabled when `Dialect` is explicitly set to `openai`
+- If prompt caching is enabled but the dialect stays `auto`, config validation and doctor mode warn before runtime
+
+### Anthropic and Anthropic Vertex
+
+- Uses Anthropic-style cache hints
+- Maps provider cache read and cache creation/write usage when reported
+- Eligible for keep-warm when explicitly enabled
+
+### Amazon Bedrock
+
+- Bedrock is available as a provider id for cache-policy routing and validation
+- Anthropic-style cache behavior is only meaningful for Anthropic Claude models behind a Bedrock-compatible endpoint or adapter
+- Non-Anthropic Bedrock models are treated as no-cache for retention/keep-warm purposes
+
+### Gemini
+
+- Uses Gemini cache dialect hints and normalized cache accounting
+- Eligible for keep-warm when explicitly enabled
+
+### Ollama
+
+- No prompt caching behavior in v1
+- Model capabilities reflect that prompt caching is unsupported
+
+### Dynamic / plugin providers
+
+- Prompt cache hints are passed through `ChatOptions.AdditionalProperties`
+- The provider must opt into a cache dialect explicitly
+- If the provider returns usage counters with cache fields, OpenClaw normalizes them into `cacheRead` / `cacheWrite`
+
+## Diagnostics
+
+Prompt cache usage is surfaced in:
+
+- `/metrics/providers`
+- `/doctor/text`
+- session status summaries
+- `/status` and `/usage` command output
+
+If live session cache totals are missing, OpenClaw falls back to the most recent nonzero cache counters recorded in provider usage history for that session.
+
+## Cache tracing
+
+Cache tracing can be enabled with config:
+
+```json
+{
+  "OpenClaw": {
+    "Diagnostics": {
+      "CacheTrace": {
+        "Enabled": true,
+        "FilePath": "./memory/logs/cache-trace.jsonl",
+        "IncludeMessages": true,
+        "IncludePrompt": true,
+        "IncludeSystem": true
+      }
+    }
+  }
+}
+```
+
+Or with environment variables:
+
+- `OPENCLAW_CACHE_TRACE=1`
+- `OPENCLAW_CACHE_TRACE_FILE=/path/to/cache-trace.jsonl`
+- `OPENCLAW_CACHE_TRACE_PROMPT=0|1`
+- `OPENCLAW_CACHE_TRACE_SYSTEM=0|1`
+
+Trace output is JSONL and includes:
+
+- selected profile/provider/model
+- dialect and retention
+- stable fingerprint
+- normalized cache usage counters
+
+## Keep-warm
+
+Keep-warm is intentionally conservative in v1.
+
+- It runs in a dedicated background service
+- It only warms active sessions with recent stable prompt fingerprints
+- It only warms profiles that explicitly set `KeepWarmEnabled=true`
+- It only applies to providers with explicit TTL or cache-resource semantics
+
+Providers that are not explicitly eligible are skipped without failing normal requests.
diff --git a/src/OpenClaw.Agent/AgentRuntime.cs b/src/OpenClaw.Agent/AgentRuntime.cs
@@ -65,6 +65,7 @@ public sealed class AgentRuntime : IAgentRuntime
     private readonly SkillsConfig? _skillsConfig;
     private readonly string? _skillWorkspacePath;
     private readonly IReadOnlyList<string> _pluginSkillDirs;
+    private readonly string? _memoryRecallPrefix;
     private readonly object _skillGate = new();
     private string[] _loadedSkillNames = [];
     private int _skillPromptLength;
@@ -158,6 +159,9 @@ public AgentRuntime(
         _isContractRuntimeBudgetExceeded = isContractRuntimeBudgetExceeded;
         _recordContractTurnUsage = recordContractTurnUsage;
         _appendContractSnapshot = appendContractSnapshot;
+        var projectId = gatewayConfig?.Memory.ProjectId
+            ?? Environment.GetEnvironmentVariable("OPENCLAW_PROJECT");
+        _memoryRecallPrefix = string.IsNullOrWhiteSpace(projectId) ? null : $"project:{projectId.Trim()}:";
         ApplySkills(skills ?? []);
     }
 
@@ -326,22 +330,29 @@ public async Task<string> RunAsync(
             // Extract token usage from response
             var inputTokens = response.Usage?.InputTokenCount ?? 0;
             var outputTokens = response.Usage?.OutputTokenCount ?? 0;
+            var cacheUsage = PromptCacheUsageExtractor.FromUsage(response.Usage);
             turnCtx.RecordLlmCall(llmSw.Elapsed, inputTokens, outputTokens);
             _metrics?.IncrementLlmCalls();
             _metrics?.AddInputTokens(inputTokens);
             _metrics?.AddOutputTokens(outputTokens);
+            _metrics?.AddPromptCacheReads(cacheUsage.CacheReadTokens);
+            _metrics?.AddPromptCacheWrites(cacheUsage.CacheWriteTokens);
             _providerUsage?.AddTokens(executionResult.ProviderId, executionResult.ModelId, inputTokens, outputTokens);
+            _providerUsage?.AddCacheTokens(executionResult.ProviderId, executionResult.ModelId, cacheUsage.CacheReadTokens, cacheUsage.CacheWriteTokens);
             _providerUsage?.RecordTurn(
                 session.Id,
                 session.ChannelId,
                 executionResult.ProviderId,
                 executionResult.ModelId,
                 inputTokens,
                 outputTokens,
+                cacheUsage.CacheReadTokens,
+                cacheUsage.CacheWriteTokens,
                 LlmExecutionEstimateBuilder.BuildInputTokenEstimate(messages, inputTokens, _skillPromptLength));
 
             // Track token usage on the session
             session.AddTokenUsage(inputTokens, outputTokens);
+            session.AddCacheUsage(cacheUsage.CacheReadTokens, cacheUsage.CacheWriteTokens);
             _recordContractTurnUsage?.Invoke(session, executionResult.ProviderId, executionResult.ModelId, inputTokens, outputTokens);
 
             if (TryRejectContractBudget(session, out contractBudgetMessage))
@@ -496,6 +507,7 @@ public async IAsyncEnumerable<AgentStreamEvent> RunStreamingAsync(
             }
 
             session.AddTokenUsage(streamResult.InputTokens, streamResult.OutputTokens);
+            session.AddCacheUsage(streamResult.CacheReadTokens, streamResult.CacheWriteTokens);
             if (!string.IsNullOrWhiteSpace(streamResult.ProviderId) && !string.IsNullOrWhiteSpace(streamResult.ModelId))
                 _recordContractTurnUsage?.Invoke(session, streamResult.ProviderId, streamResult.ModelId, streamResult.InputTokens, streamResult.OutputTokens);
             if (!string.IsNullOrWhiteSpace(streamResult.ProviderId) && !string.IsNullOrWhiteSpace(streamResult.ModelId))
@@ -507,6 +519,8 @@ public async IAsyncEnumerable<AgentStreamEvent> RunStreamingAsync(
                     streamResult.ModelId,
                     streamResult.InputTokens,
                     streamResult.OutputTokens,
+                    streamResult.CacheReadTokens,
+                    streamResult.CacheWriteTokens,
                     LlmExecutionEstimateBuilder.BuildInputTokenEstimate(messages, streamResult.InputTokens, _skillPromptLength));
             }
 
@@ -632,9 +646,16 @@ private async ValueTask TryInjectRecallAsync(List<ChatMessage> messages, string
         try
         {
             var limit = Math.Clamp(_recall.MaxNotes, 1, 32);
-            var hits = await search.SearchNotesAsync(userMessage, prefix: null, limit, ct);
+            _metrics?.IncrementMemoryRecallSearches();
+            var hits = await search.SearchNotesAsync(userMessage, _memoryRecallPrefix, limit, ct);
+            if (hits.Count == 0 && !string.IsNullOrWhiteSpace(_memoryRecallPrefix))
+            {
+                _metrics?.IncrementMemoryRecallSearches();
+                hits = await search.SearchNotesAsync(userMessage, prefix: null, limit, ct);
+            }
             if (hits.Count == 0)
                 return;
+            _metrics?.AddMemoryRecallHits(hits.Count);
 
             var maxChars = Math.Clamp(_recall.MaxChars, 256, 100_000);
 
@@ -743,6 +764,8 @@ private sealed class StreamCollectResult
         public List<FunctionCallContent> ToolCalls { get; } = [];
         public int InputTokens { get; set; }
         public int OutputTokens { get; set; }
+        public int CacheReadTokens { get; set; }
+        public int CacheWriteTokens { get; set; }
         public string? ProviderId { get; set; }
         public string? ModelId { get; set; }
         public string? Error { get; set; }
@@ -789,6 +812,11 @@ private async Task<StreamCollectResult> StreamLlmCollectAsync(
                                 result.InputTokens = (int)usage.Details.InputTokenCount.Value;
                             if (usage.Details.OutputTokenCount is > 0)
                                 result.OutputTokens = (int)usage.Details.OutputTokenCount.Value;
+                            var cacheUsage = PromptCacheUsageExtractor.FromUsage(usage.Details);
+                            if (cacheUsage.CacheReadTokens > 0)
+                                result.CacheReadTokens = (int)cacheUsage.CacheReadTokens;
+                            if (cacheUsage.CacheWriteTokens > 0)
+                                result.CacheWriteTokens = (int)cacheUsage.CacheWriteTokens;
                         }
                     }
                 }
@@ -884,6 +912,11 @@ private async Task<StreamCollectResult> StreamLlmCollectAsync(
                                 result.InputTokens = (int)usage.Details.InputTokenCount.Value;
                             if (usage.Details.OutputTokenCount is > 0)
                                 result.OutputTokens = (int)usage.Details.OutputTokenCount.Value;
+                            var cacheUsage = PromptCacheUsageExtractor.FromUsage(usage.Details);
+                            if (cacheUsage.CacheReadTokens > 0)
+                                result.CacheReadTokens = (int)cacheUsage.CacheReadTokens;
+                            if (cacheUsage.CacheWriteTokens > 0)
+                                result.CacheWriteTokens = (int)cacheUsage.CacheWriteTokens;
                         }
                     }
                 }
@@ -936,7 +969,10 @@ private async Task<StreamCollectResult> StreamLlmCollectAsync(
         _metrics?.IncrementLlmCalls();
         _metrics?.AddInputTokens(result.InputTokens);
         _metrics?.AddOutputTokens(result.OutputTokens);
+        _metrics?.AddPromptCacheReads(result.CacheReadTokens);
+        _metrics?.AddPromptCacheWrites(result.CacheWriteTokens);
         _providerUsage?.AddTokens(_config.Provider, options.ModelId ?? _config.Model, result.InputTokens, result.OutputTokens);
+        _providerUsage?.AddCacheTokens(_config.Provider, options.ModelId ?? _config.Model, result.CacheReadTokens, result.CacheWriteTokens);
         result.ProviderId = _config.Provider;
         result.ModelId = options.ModelId ?? _config.Model;
 
@@ -1258,6 +1294,7 @@ public async Task CompactHistoryAsync(Session session, CancellationToken ct)
 
             if (!string.IsNullOrWhiteSpace(summary))
             {
+                _metrics?.IncrementMemoryCompactions();
                 session.History.RemoveRange(0, toSummarizeCount);
                 session.History.Insert(0, new ChatTurn
                 {