Save 50-90% on Claude API costs with real production data & techniques
| Discovery | Details |
|---|---|
| Bigger batches = Faster | 294 requests finished before 10 requests! |
| 22x efficiency gap | Large batch: 0.45 min/req vs Small: 9.84 min/req |
| Not FIFO | Anthropic prioritizes bigger batches |
| Image cache = only 14% | Not 90%! (Images can't be cached, only text prompts) |
Want to save money? β Batch 100+ requests together
Want to save time? β Also batch 100+ (they finish first!)
Working with images? β Batch API is enough (cache doesn't help much)
Working with text? β Use both Batch + Cache (up to 95% savings)
# Install the skill
cp claude-api-cost-optimization.skill.md ~/.claude/skills/
# Calculate your potential savings
python scripts/calculate_savings.py --input 10000 --output 5000 --requests 100| Technique | Savings | Best For | Docs |
|---|---|---|---|
| Batch API | 50% off | Non-urgent bulk tasks | Reference |
| Prompt Caching | 90% off | Repeated system prompts | Reference |
| Extended Thinking | ~80% off | Complex reasoning | Reference |
From: Washin Village - Animal sanctuary in Boso Peninsula, Japan
Mission: AI behavior tagging for 294 daily videos of 28 rescue cats & dogs
| Metric | Value |
|---|---|
| Batches | 2 Γ 283 requests = 100% success |
| Cost | $5.32 vs Standard API: $714.86 |
| Savings | 99.3% ($709.54 saved) |
| Time | Completed within 24 hours (fully automated) |
Key Strategy:
Batch API (50% discount)
+ Prompt Caching (90% discount)
+ Structured JSON output
= From $714.86 β $5.32Technical Highlights:
- β Unified 3,500-token system prompt (annotation guidelines)
- β Near 100% cache hit rate (283Γ reuse)
- β Split batching strategy (avoid single large batch)
- β JSON structured output (reduce output tokens)
Cost Breakdown:
- Input: $0.51 + Cache: $0.39 = $0.90 (16.9%)
- Output: $4.42 (83.1%)
Key Finding: Even with extreme input optimization via caching, output tokens still dominate costs. Next optimization: reduce AI response length with structured JSON.
π Full case study: examples/jelly-294-gaia-tagging-batch.md
| Item | Value |
|---|---|
| Files Processed | 294 |
| Total Tokens | 1,500,944 |
| Original Cost | $11.04 |
| Batch Cost | $5.52 |
| π° Savings | $5.52 (50%) |
| Per Request | $0.0188 |
| Token Type | Count | Cost |
|---|---|---|
| Input (no cache) | 365,624 | $0.55 |
| Cache write (1h) | 106,920 | $0.32 |
| Cache read | 416,988 | $0.06 |
| Output | 611,412 | $4.59 |
| Batch | Requests | Sent | Done | Per Request |
|---|---|---|---|---|
| π Large | 294 | 10:22 | 12:35 | 0.45 min |
| π° Small | 10 | 11:50 | 13:28 | 9.84 min |
| π Test | 3 | 01:20 | 02:23 | 20.77 min |
Key findings:
- β Large batch finished 53 minutes before small batch (even though sent 1.5h earlier)
- β Large batch is 22x more efficient per request!
- β Anthropic does NOT process in order (FIFO) β bigger batches get priority
Think of GPU like an oven:
π₯ Preheat = 15 min (fixed cost)
Large (294): Preheat β Bake all 294 β 0.45 min each β
Small (10): Preheat β Bake only 10 β 9.84 min each β
The more you bake, the cheaper per item!
π Full case study: examples/batch-294-videos-case-study.md
| Usage Type | Input Tokens | Cache Read | Output | Savings Applied |
|---|---|---|---|---|
| Sonnet (standard) | 79,224 | 39,204 β | 71,608 | Caching working! |
| Sonnet (batch) | 3,612 | 3,564 β | 6,016 | Batch + Cache! |
π Full analysis: examples/billing-data-analysis.md
| Optimization | Cost/Video | Total | Savings |
|---|---|---|---|
| None | $0.038 | $11.14 | β |
| + Caching | $0.033 | $9.62 | 14% |
| + Batch | $0.019 | $5.57 | 50% |
| + Both | $0.016 | $4.79 | 57% π₯ |
π Full report: examples/GAIA-savings-report.md
response = client.messages.create(
model="claude-sonnet-4-5",
system=[{
"type": "text",
"text": "Your long system prompt (>1024 tokens)...",
"cache_control": {"type": "ephemeral"} # β This saves 90%!
}],
messages=[{"role": "user", "content": "Hello"}]
)batch = client.messages.batches.create(
requests=[
{
"custom_id": "task-001",
"params": {
"model": "claude-sonnet-4-5",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Translate..."}]
}
}
# Add up to 100,000 requests!
]
)π Full scripts: scripts/
Why only 14% caching savings instead of 90%?
In image tasks, images = ~85% of tokens. Only the system prompt (~15%) is cacheable.
Input Composition:
βββ System Prompt: ~15% β β
Cacheable (90% off)
βββ Image Data: ~85% β β Cannot cache
Actual Savings: 15% Γ 90% = ~14%
This is NOT in the official docs β we learned it the hard way!
βββ claude-api-cost-optimization.skill.md # β Install this!
β
βββ examples/ # Real evidence
β βββ billing-data-analysis.md # Anthropic Console CSV
β βββ real-batch-results.md # Actual API response
β βββ GAIA-savings-report.md # 294 video case study
β
βββ scripts/ # Ready-to-run code
β βββ batch_example.py
β βββ cache_example.py
β βββ calculate_savings.py
β
βββ references/ # Quick cheatsheets
βββ batch-api.md
βββ prompt-caching.md
βββ extended-thinking.md
| Model | Input | Output | Batch Input | Batch Output |
|---|---|---|---|---|
| Opus 4.5 | $5/MTok | $25/MTok | $2.50/MTok | $12.50/MTok |
| Sonnet 4.5 | $3/MTok | $15/MTok | $1.50/MTok | $7.50/MTok |
| Haiku 4.5 | $1/MTok | $5/MTok | $0.50/MTok | $2.50/MTok |
| Cache Type | Price | vs Normal |
|---|---|---|
| Cache write | $3.75/MTok | +25% (first time) |
| Cache read | $0.30/MTok | -90% β |
This skill was born from Washin Village β home of 28 cats & dogs in Japan. While building our AI pet recognition system, API bills added up quickly. We researched every cost-saving technique and compiled them here.
Full story: STORY.md
Made with π° by Washin Village β Save money, make more content!