Transparent proxy that sits between Claude CLI and Anthropic's API. It captures every request, runs a compaction pipeline to shrink the conversation context, and forwards the smaller payload upstream — saving tokens and money.
Claude CLI ──► Proxy (:4000) ──► Anthropic API
│ │
├─ parse body │
├─ extract session ID │
├─ save raw-request.json │
├─ compact messages │
├─ save artifacts │
├─ forward upstream ─────┘
│
◄─ stream response back
├─ inject stats line into first text delta (SSE)
└─ add X-Compacted-* headers
-
Intercept — the proxy catches every HTTP request. Only
POST /v1/messageswith >6 messages gets the full treatment; everything else passes through untouched. -
Skip check — requests are skipped (no compaction, no capture) when:
- The model is
haiku(cheap model, not worth compacting) - The last user message contains
[SUGGESTION MODE(autocomplete, not a real turn)
- The model is
-
Session extraction — the session UUID is pulled from
metadata.user_id(format:session_<uuid>). This groups all requests from the same Claude CLI session into one directory. -
Capture — the full raw request (URL, method, headers, body) is saved to
data/sessions/<id>/raw-request.json. This is the unmodified payload exactly as Claude CLI sent it. -
Compaction — when
COMPACT=true(default) and the conversation has >6 messages, the messages array is run through the transform pipeline. Messages are split into three zones:
┌─────────────────────────────────────────────────┐
│ FROZEN (first 2 turns) │ MIDDLE │ HOT (last 4 turns) │
│ never modified │ compacted│ never modified │
│ = cache-hit stable │ │ = model needs these │
└─────────────────────────────────────────────────┘
Transforms run sequentially on the MIDDLE zone (some run globally):
| # | Transform | What it does | Zone |
|---|---|---|---|
| 1 | stripThinking |
Removes thinking blocks from all but the last assistant message | Global |
| 2 | compactToolResults |
Replaces large tool results with [Compacted: N lines] + first line |
Middle |
| 3 | truncateToolInputs |
Strips execution metadata from commands; truncates large inputs using the description field when available |
Middle |
| 4 | truncateLogDumps |
Detects repetitive log output (>30% shared prefixes) and keeps head + tail + error lines | Global |
| 5 | compactContinuations |
Truncates "This session is being continued..." resumption blobs to first paragraph | Middle |
| 6 | stripNarration |
Removes short filler text ("Let me check that", "Sure") from assistant messages | Middle |
| 7 | collapseToolRuns |
Collapses all tool_use blocks into grouped summaries like [3 tool calls: exec_command x2, Read x1] |
Global |
-
Save artifacts — alongside
raw-request.json, the proxy writes:compacted-request.json— the payload actually sent to Anthropicstats.json— byte counts, estimated tokens, reduction percentagesdiff.json— only the changed parts, showing original vs compacted with which rule fired
-
Forward — the compacted body (or original if compaction was skipped) is sent to Anthropic.
-
SSE injection — when
NOTIFY=true(default), the proxy intercepts the streaming response and injects a stats line into the firsttext_deltaSSE event. This is deterministic — no prompt injection, no relying on Claude to follow instructions. The user sees the line at the very top of Claude's response:🗜️ 12% compressed · ~3,240 tokens saved · ~48,000 tokens in session
Injection only fires on human-typed turns (not automatic tool-result continuations), so you get exactly one stats line per message you send. Set
NOTIFY=falseto disable.
data/sessions/<session-uuid>/
├── raw-request.json # exact payload from Claude CLI
├── compacted-request.json # what was sent to Anthropic
├── stats.json # byte savings + token estimates
└── diff.json # per-block changelog with rule attribution
A single assistant message from a real Codex session — raw vs compacted:
Raw (35 blocks: 7 thinking, 7 text, 21 tool_use):
{
"type": "text",
"text": "I stopped the stuck parent process. Next I'm running only batch `2` ..."
},
{
"type": "thinking",
"thinking": "**Setting 20-second interval**\n\n**Setting 20-second interval**"
},
{
"type": "tool_use",
"name": "exec_command",
"input": { "command": "{'cmd': 'desloppify review --run-batches ... --only-batches 2', ...}" }
}Compacted (text + summaries):
{
"type": "text",
"text": "I stopped the stuck parent process. Next I'm running only batch `2` ..."
},
{
"type": "text",
"text": "[3 tool calls: exec_command x2, write_stdin x1]"
}Thinking gone. Tool chatter collapsed. Intent preserved. This session went from 852k tokens to 116k (83.5% reduction).
More examples and stats in COMPACTION_SHOWCASE.md.
| Package | Description |
|---|---|
@context-compactor/core |
Transform pipeline — 7 composable transforms, zone splitting, diff engine |
@context-compactor/proxy |
Hono/Bun HTTP proxy — capture, compact, forward, save artifacts |
Requires Bun (v1.0+) — used as both the runtime and package manager. The proxy uses Bun's native HTTP server, so Node won't work here.
curl -fsSL https://bun.sh/install | bash
git clone https://github.com/wjessup/context-compactor.git
cd context-compactor
bun install
bun run startThen point Claude CLI at the proxy:
ANTHROPIC_BASE_URL=http://localhost:4000 claudeOr in YOLO mode:
ANTHROPIC_BASE_URL=http://localhost:4000 claude --dangerously-skip-permissionsThat's it. The proxy forwards your API key from Claude CLI's headers — no additional configuration needed.
| Variable | Default | Description |
|---|---|---|
UPSTREAM_URL |
https://api.anthropic.com |
Upstream API base URL |
PORT |
4000 |
Proxy listen port |
COMPACT |
true |
Set false to disable compaction (passthrough only) |
NOTIFY |
true |
Set false to disable the in-response compression summary |
curl http://localhost:4000/statsReturns cumulative compression stats per active session (bytes saved, token estimates, rules fired).
When compaction fires, the proxy adds these headers to the response:
| Header | Example | Description |
|---|---|---|
x-compacted-bytes-saved |
14208 |
Bytes saved this turn |
x-compacted-tokens-saved |
3552 |
Estimated tokens saved this turn |
x-compacted-reduction-pct |
12.3 |
Reduction percentage |
x-compacted-rules |
stripThinking,compactToolResults |
Transforms that fired |
x-compacted-session-total-saved |
48000 |
Cumulative bytes saved this session |
x-compacted-session-requests |
5 |
Compacted requests so far |
MIT