|
1 | | -# Anthropic Prompt Caching Examples (Effect AI) |
| 1 | +# Prompt Caching Examples (Effect AI) |
2 | 2 |
|
3 | | -This directory contains examples demonstrating Anthropic's prompt caching feature via OpenRouter using @effect/ai and @effect/ai-openrouter. |
| 3 | +Examples demonstrating prompt caching with @effect/ai and @effect/ai-openrouter. |
4 | 4 |
|
5 | | -## What is Prompt Caching? |
| 5 | +## Documentation |
6 | 6 |
|
7 | | -Anthropic's prompt caching allows you to cache large portions of your prompts to: |
8 | | -- **Reduce costs** - Cached tokens cost significantly less |
9 | | -- **Improve latency** - Cached content is processed faster |
10 | | -- **Enable larger contexts** - Use more context without proportional cost increases |
| 7 | +For full prompt caching documentation including all providers, pricing, and configuration details, see: |
| 8 | +- **[Prompt Caching Guide](../../../../docs/prompt-caching.md)** |
11 | 9 |
|
12 | | -Cache TTL: 5 minutes for ephemeral caches |
| 10 | +## Examples in This Directory |
13 | 11 |
|
14 | | -## Examples |
| 12 | +- `user-message-cache.ts` - Cache large context in user messages |
| 13 | +- `multi-message-cache.ts` - Cache system prompt across multi-turn conversations |
| 14 | +- `no-cache-control.ts` - Control scenario (validates methodology) |
| 15 | + |
| 16 | +## Quick Start |
15 | 17 |
|
16 | | -### User Message Cache (`user-message-cache.ts`) |
17 | | -Cache large context in user messages using Effect AI: |
18 | 18 | ```bash |
| 19 | +# Run an example |
19 | 20 | bun run typescript/effect-ai/src/prompt-caching/user-message-cache.ts |
20 | 21 | ``` |
21 | 22 |
|
22 | | -**Pattern**: User message with `options.openrouter.cacheControl` using Effect.gen |
23 | | - |
24 | | -## How to Use with Effect AI |
| 23 | +## Effect AI Usage |
25 | 24 |
|
26 | 25 | ```typescript |
27 | | -import * as OpenRouterClient from '@effect/ai-openrouter/OpenRouterClient'; |
28 | 26 | import * as OpenRouterLanguageModel from '@effect/ai-openrouter/OpenRouterLanguageModel'; |
29 | | -import * as LanguageModel from '@effect/ai/LanguageModel'; |
30 | | -import * as Prompt from '@effect/ai/Prompt'; |
31 | | -import { Effect, Layer, Redacted } from 'effect'; |
32 | | - |
33 | | -// Create OpenRouter client layer |
34 | | -const OpenRouterClientLayer = OpenRouterClient.layer({ |
35 | | - apiKey: Redacted.make(process.env.OPENROUTER_API_KEY!), |
36 | | -}).pipe(Layer.provide(FetchHttpClient.layer)); |
37 | 27 |
|
38 | | -// Create language model layer with CRITICAL stream_options config |
39 | 28 | const OpenRouterModelLayer = OpenRouterLanguageModel.layer({ |
40 | 29 | model: 'anthropic/claude-3.5-sonnet', |
41 | 30 | config: { |
42 | | - stream_options: { include_usage: true }, // CRITICAL: Required! |
| 31 | + stream_options: { include_usage: true }, // Required for cache metrics |
43 | 32 | }, |
44 | | -}).pipe(Layer.provide(OpenRouterClientLayer)); |
| 33 | +}); |
45 | 34 |
|
46 | | -// Use in Effect.gen program |
47 | 35 | const program = Effect.gen(function* () { |
48 | 36 | const response = yield* LanguageModel.generateText({ |
49 | | - prompt: Prompt.make([ |
50 | | - { |
51 | | - role: 'user', |
52 | | - content: [ |
53 | | - { |
54 | | - type: 'text', |
55 | | - text: 'Large context here...', |
56 | | - options: { |
57 | | - openrouter: { |
58 | | - cacheControl: { type: 'ephemeral' }, // Cache this block |
59 | | - }, |
60 | | - }, |
61 | | - }, |
62 | | - { |
63 | | - type: 'text', |
64 | | - text: 'Your question here', |
65 | | - }, |
66 | | - ], |
67 | | - }, |
68 | | - ]), |
| 37 | + prompt: Prompt.make([{ |
| 38 | + role: 'user', |
| 39 | + content: [{ |
| 40 | + type: 'text', |
| 41 | + text: 'Large context...', |
| 42 | + options: { |
| 43 | + openrouter: { cacheControl: { type: 'ephemeral' } } |
| 44 | + } |
| 45 | + }] |
| 46 | + }]) |
69 | 47 | }); |
70 | 48 |
|
71 | 49 | // Check cache metrics |
72 | | - const cachedTokens = response.usage.cachedInputTokens ?? 0; |
| 50 | + const cached = response.usage.cachedInputTokens ?? 0; |
73 | 51 | }); |
74 | | - |
75 | | -// Run with dependencies |
76 | | -await program.pipe( |
77 | | - Effect.provide(OpenRouterModelLayer), |
78 | | - Effect.runPromise, |
79 | | -); |
80 | | -``` |
81 | | - |
82 | | -## Important Notes |
83 | | - |
84 | | -### Critical Configuration |
85 | | -**MUST include `stream_options: { include_usage: true }` in model config** |
86 | | -- Without this, usage.cachedInputTokens will be undefined |
87 | | -- OpenRouterClient only sets this for streaming by default |
88 | | -- Must be set explicitly in the layer configuration |
89 | | - |
90 | | -### Cache Metrics Location |
91 | | -Cache metrics are in `response.usage`: |
92 | | -```typescript |
93 | | -{ |
94 | | - inputTokens: number, |
95 | | - outputTokens: number, |
96 | | - cachedInputTokens: number // Number of tokens read from cache |
97 | | -} |
98 | 52 | ``` |
99 | 53 |
|
100 | | -### Requirements |
101 | | -1. **stream_options.include_usage = true** - In model config layer |
102 | | -2. **Minimum 2048+ tokens** - Smaller content may not be cached |
103 | | -3. **options.openrouter.cacheControl** - On content items in Prompt |
104 | | -4. **Exact match** - Cache only hits on identical content |
105 | | - |
106 | | -### Expected Behavior |
107 | | -- **First call**: `cachedInputTokens = 0` (cache miss, creates cache) |
108 | | -- **Second call**: `cachedInputTokens > 0` (cache hit, reads from cache) |
109 | | - |
110 | | -### Effect-Specific Patterns |
111 | | -- Use `Effect.gen` for composable effect workflows |
112 | | -- Layer-based dependency injection for client and model |
113 | | -- Type-safe error handling via Effect type |
114 | | -- Structured concurrency with Effect.sleep for delays |
| 54 | +## Effect-Specific Notes |
115 | 55 |
|
116 | | -## Scientific Method |
117 | | -All examples follow evidence-based verification: |
118 | | -- **Hypothesis**: options.openrouter.cacheControl triggers caching |
119 | | -- **Experiment**: Make identical calls twice |
120 | | -- **Evidence**: Measure via response.usage.cachedInputTokens |
121 | | -- **Analysis**: Compare cache miss vs cache hit |
| 56 | +- Use layer-based dependency injection for client and model configuration |
| 57 | +- `stream_options.include_usage` must be set in the model layer config |
| 58 | +- Cache metrics appear in `response.usage.cachedInputTokens` |
0 commit comments