|
1 | | -# Anthropic Prompt Caching Examples |
| 1 | +# Prompt Caching Examples (TypeScript + fetch) |
2 | 2 |
|
3 | | -This directory contains examples demonstrating Anthropic's prompt caching feature via OpenRouter using the raw fetch API. |
| 3 | +Examples demonstrating prompt caching with the fetch API. |
4 | 4 |
|
5 | | -## What is Prompt Caching? |
| 5 | +## Documentation |
6 | 6 |
|
7 | | -Anthropic's prompt caching allows you to cache large portions of your prompts (like system messages or context documents) to: |
8 | | -- **Reduce costs** - Cached tokens cost significantly less than regular tokens |
9 | | -- **Improve latency** - Cached content is processed faster on subsequent requests |
10 | | -- **Enable larger contexts** - Use more context without proportional cost increases |
| 7 | +For full prompt caching documentation including all providers, pricing, and configuration details, see: |
| 8 | +- **[Prompt Caching Guide](../../../../docs/prompt-caching.md)** |
11 | 9 |
|
12 | | -Cache TTL: 5 minutes for ephemeral caches |
| 10 | +## Examples in This Directory |
13 | 11 |
|
14 | | -## Examples |
| 12 | +- `user-message-cache.ts` - Cache large context in user messages |
| 13 | +- `multi-message-cache.ts` - Cache system prompt across multi-turn conversations |
| 14 | +- `no-cache-control.ts` - Control scenario (validates methodology) |
15 | 15 |
|
16 | | -### 1. System Message Cache (`system-message-cache.ts`) |
17 | | -The most common pattern - cache a large system prompt: |
18 | | -```bash |
19 | | -bun run typescript/fetch/src/prompt-caching/system-message-cache.ts |
20 | | -``` |
21 | | - |
22 | | -**Pattern**: System message with content-level `cache_control` |
| 16 | +## Quick Start |
23 | 17 |
|
24 | | -### 2. User Message Cache (`user-message-cache.ts`) |
25 | | -Cache large context in user messages (e.g., uploading documents): |
26 | 18 | ```bash |
| 19 | +# Run an example |
27 | 20 | bun run typescript/fetch/src/prompt-caching/user-message-cache.ts |
28 | 21 | ``` |
29 | 22 |
|
30 | | -**Pattern**: User message with content-level `cache_control` on context block |
31 | | - |
32 | | -### 3. Multi-Message Cache (`multi-message-cache.ts`) |
33 | | -Cache system prompt across multi-turn conversations: |
34 | | -```bash |
35 | | -bun run typescript/fetch/src/prompt-caching/multi-message-cache.ts |
36 | | -``` |
37 | | - |
38 | | -**Pattern**: System message cache persists through conversation history |
39 | | - |
40 | | -### 4. No Cache Control (`no-cache-control.ts`) |
41 | | -Control scenario - no caching should occur: |
42 | | -```bash |
43 | | -bun run typescript/fetch/src/prompt-caching/no-cache-control.ts |
44 | | -``` |
45 | | - |
46 | | -**Pattern**: Same structure but NO `cache_control` markers (validates methodology) |
47 | | - |
48 | | -## How to Use Cache Control |
49 | | - |
50 | | -```typescript |
51 | | -const requestBody = { |
52 | | - model: 'anthropic/claude-3.5-sonnet', |
53 | | - stream_options: { |
54 | | - include_usage: true, // CRITICAL: Required for cache metrics |
55 | | - }, |
56 | | - messages: [ |
57 | | - { |
58 | | - role: 'system', |
59 | | - content: [ |
60 | | - { |
61 | | - type: 'text', |
62 | | - text: 'Your large system prompt here...', |
63 | | - cache_control: { type: 'ephemeral' }, // Cache this block |
64 | | - }, |
65 | | - ], |
66 | | - }, |
67 | | - { |
68 | | - role: 'user', |
69 | | - content: 'Your question here', |
70 | | - }, |
71 | | - ], |
72 | | -}; |
73 | | -``` |
74 | | - |
75 | | -## Important Notes |
76 | | - |
77 | | -### OpenRouter Format Transformation |
78 | | -OpenRouter transforms Anthropic's native response format to OpenAI-compatible format: |
79 | | -- **Anthropic native**: `usage.cache_read_input_tokens`, `usage.cache_creation_input_tokens` |
80 | | -- **OpenRouter returns**: `usage.prompt_tokens_details.cached_tokens` (OpenAI-compatible) |
81 | | - |
82 | | -### Requirements for Caching |
83 | | -1. **stream_options.include_usage = true** - CRITICAL, otherwise no usage details |
84 | | -2. **Minimum 2048+ tokens** - Smaller content may not be cached reliably |
85 | | -3. **cache_control on content blocks** - Not on message level |
86 | | -4. **Exact match** - Cache only hits on identical content |
87 | | - |
88 | | -### Expected Behavior |
89 | | -- **First call**: `cached_tokens = 0` (cache miss, creates cache) |
90 | | -- **Second call**: `cached_tokens > 0` (cache hit, reads from cache) |
91 | | -- **Control**: `cached_tokens = 0` on both calls (no cache_control) |
| 23 | +## Key Requirements (Anthropic) |
92 | 24 |
|
93 | | -## Scientific Method |
94 | | -All examples follow scientific method principles: |
95 | | -- **Hypothesis**: cache_control triggers Anthropic caching |
96 | | -- **Experiment**: Make identical calls twice |
97 | | -- **Evidence**: Measure via `usage.prompt_tokens_details.cached_tokens` |
98 | | -- **Analysis**: Compare first call (miss) vs second call (hit) |
| 25 | +- `stream_options.include_usage = true` - Required for cache metrics |
| 26 | +- Minimum 2048+ tokens to cache reliably |
| 27 | +- `cache_control: {type: "ephemeral"}` on content blocks (not message-level) |
0 commit comments