Skip to content

wjessup/context-compactor

Repository files navigation

context-compactor

CI License: MIT

Transparent proxy that sits between Claude CLI and Anthropic's API. It captures every request, runs a compaction pipeline to shrink the conversation context, and forwards the smaller payload upstream — saving tokens and money.

How it works

Claude CLI ──► Proxy (:4000) ──► Anthropic API
                 │                     │
                 ├─ parse body            │
                 ├─ extract session ID    │
                 ├─ save raw-request.json │
                 ├─ compact messages      │
                 ├─ save artifacts        │
                 ├─ forward upstream ─────┘
                 │
                 ◄─ stream response back
                 ├─ inject stats line into first text delta (SSE)
                 └─ add X-Compacted-* headers

Benchmarking

image

Request flow (step by step)

  1. Intercept — the proxy catches every HTTP request. Only POST /v1/messages with >6 messages gets the full treatment; everything else passes through untouched.

  2. Skip check — requests are skipped (no compaction, no capture) when:

    • The model is haiku (cheap model, not worth compacting)
    • The last user message contains [SUGGESTION MODE (autocomplete, not a real turn)
  3. Session extraction — the session UUID is pulled from metadata.user_id (format: session_<uuid>). This groups all requests from the same Claude CLI session into one directory.

  4. Capture — the full raw request (URL, method, headers, body) is saved to data/sessions/<id>/raw-request.json. This is the unmodified payload exactly as Claude CLI sent it.

  5. Compaction — when COMPACT=true (default) and the conversation has >6 messages, the messages array is run through the transform pipeline. Messages are split into three zones:

┌─────────────────────────────────────────────────┐
│  FROZEN (first 2 turns)  │  MIDDLE  │  HOT (last 4 turns)  │
│  never modified          │  compacted│  never modified       │
│  = cache-hit stable      │          │  = model needs these  │
└─────────────────────────────────────────────────┘

Transforms run sequentially on the MIDDLE zone (some run globally):

# Transform What it does Zone
1 stripThinking Removes thinking blocks from all but the last assistant message Global
2 compactToolResults Replaces large tool results with [Compacted: N lines] + first line Middle
3 truncateToolInputs Strips execution metadata from commands; truncates large inputs using the description field when available Middle
4 truncateLogDumps Detects repetitive log output (>30% shared prefixes) and keeps head + tail + error lines Global
5 compactContinuations Truncates "This session is being continued..." resumption blobs to first paragraph Middle
6 stripNarration Removes short filler text ("Let me check that", "Sure") from assistant messages Middle
7 collapseToolRuns Collapses all tool_use blocks into grouped summaries like [3 tool calls: exec_command x2, Read x1] Global
  1. Save artifacts — alongside raw-request.json, the proxy writes:

    • compacted-request.json — the payload actually sent to Anthropic
    • stats.json — byte counts, estimated tokens, reduction percentages
    • diff.json — only the changed parts, showing original vs compacted with which rule fired
  2. Forward — the compacted body (or original if compaction was skipped) is sent to Anthropic.

  3. SSE injection — when NOTIFY=true (default), the proxy intercepts the streaming response and injects a stats line into the first text_delta SSE event. This is deterministic — no prompt injection, no relying on Claude to follow instructions. The user sees the line at the very top of Claude's response:

    🗜️ 12% compressed · ~3,240 tokens saved · ~48,000 tokens in session

    Injection only fires on human-typed turns (not automatic tool-result continuations), so you get exactly one stats line per message you send. Set NOTIFY=false to disable.

Session directory structure

data/sessions/<session-uuid>/
├── raw-request.json          # exact payload from Claude CLI
├── compacted-request.json    # what was sent to Anthropic
├── stats.json                # byte savings + token estimates
└── diff.json                 # per-block changelog with rule attribution

Real-world example

A single assistant message from a real Codex session — raw vs compacted:

Raw (35 blocks: 7 thinking, 7 text, 21 tool_use):

{
  "type": "text",
  "text": "I stopped the stuck parent process. Next I'm running only batch `2` ..."
},
{
  "type": "thinking",
  "thinking": "**Setting 20-second interval**\n\n**Setting 20-second interval**"
},
{
  "type": "tool_use",
  "name": "exec_command",
  "input": { "command": "{'cmd': 'desloppify review --run-batches ... --only-batches 2', ...}" }
}

Compacted (text + summaries):

{
  "type": "text",
  "text": "I stopped the stuck parent process. Next I'm running only batch `2` ..."
},
{
  "type": "text",
  "text": "[3 tool calls: exec_command x2, write_stdin x1]"
}

Thinking gone. Tool chatter collapsed. Intent preserved. This session went from 852k tokens to 116k (83.5% reduction).

More examples and stats in COMPACTION_SHOWCASE.md.

Packages

Package Description
@context-compactor/core Transform pipeline — 7 composable transforms, zone splitting, diff engine
@context-compactor/proxy Hono/Bun HTTP proxy — capture, compact, forward, save artifacts

Quick start

Requires Bun (v1.0+) — used as both the runtime and package manager. The proxy uses Bun's native HTTP server, so Node won't work here.

curl -fsSL https://bun.sh/install | bash

git clone https://github.com/wjessup/context-compactor.git
cd context-compactor
bun install
bun run start

Then point Claude CLI at the proxy:

ANTHROPIC_BASE_URL=http://localhost:4000 claude

Or in YOLO mode:

ANTHROPIC_BASE_URL=http://localhost:4000 claude --dangerously-skip-permissions

That's it. The proxy forwards your API key from Claude CLI's headers — no additional configuration needed.

Environment variables

Variable Default Description
UPSTREAM_URL https://api.anthropic.com Upstream API base URL
PORT 4000 Proxy listen port
COMPACT true Set false to disable compaction (passthrough only)
NOTIFY true Set false to disable the in-response compression summary

Live stats endpoint

curl http://localhost:4000/stats

Returns cumulative compression stats per active session (bytes saved, token estimates, rules fired).

Response headers

When compaction fires, the proxy adds these headers to the response:

Header Example Description
x-compacted-bytes-saved 14208 Bytes saved this turn
x-compacted-tokens-saved 3552 Estimated tokens saved this turn
x-compacted-reduction-pct 12.3 Reduction percentage
x-compacted-rules stripThinking,compactToolResults Transforms that fired
x-compacted-session-total-saved 48000 Cumulative bytes saved this session
x-compacted-session-requests 5 Compacted requests so far

License

MIT

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors