Skip to content

feat(proxy): log compressed messages alongside original request#261

Open
gglucass wants to merge 3 commits intochopratejas:mainfrom
gglucass:feat/compressed-messages-log
Open

feat(proxy): log compressed messages alongside original request#261
gglucass wants to merge 3 commits intochopratejas:mainfrom
gglucass:feat/compressed-messages-log

Conversation

@gglucass
Copy link
Copy Markdown
Contributor

@gglucass gglucass commented Apr 24, 2026

Description

Expose the post-compression message list that was actually sent upstream as a new compressed_messages field on RequestLog, paired with the existing (now consistently pre-compression) request_messages. Consumers of /transformations/feed — dashboards and any downstream observability — can now diff the two sides of a compression to see exactly what the pipeline stripped, replaced, or kept. Turns an abstract "saved N tokens" into a legible before/after.

Gated by the same log_full_messages flag as request_messages so the two sides stay in sync; it's pointless to store one without the other.

Also fixes a latent correctness bug: today's request_messages field is inconsistent across the four RequestLog construction sites — sometimes it's the pre-compression snapshot, sometimes it's the mutated body["messages"] (which is the compressed list, because the proxy mutates body in place before the log call). After this change, request_messages always means pre-compression and compressed_messages always means what went upstream.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Performance improvement
  • Code refactoring (no functional changes)

Note on "breaking": strictly speaking this is a semantic correction of an inconsistently-populated field, not a schema break. The field name request_messages is unchanged and the JSON shape is unchanged; what changes is that the field now consistently holds the pre-compression list. Consumers that treated it as "whatever messages we have" continue to work. Consumers that depended on the accidental post-compression value (if any existed) would shift to compressed_messages.

Changes Made

  • headroom/proxy/models.py: RequestLog gains compressed_messages: list[dict] | None = None. Doc comment explains it's paired with request_messages and gated by the same log_full_messages flag.
  • headroom/proxy/handlers/anthropic.py (2 sites — Bedrock non-streaming and main non-streaming): request_messages now consistently sources from original_messages (the pre-compression snapshot at line 724), compressed_messages sources from body["messages"] (the compressed list after in-place mutation at line 1189). Both gated symmetrically.
  • headroom/proxy/handlers/streaming.py (2 sites — main streaming in _finalize_stream_response, Bedrock streaming in _stream_response_bedrock): same treatment. _stream_response_bedrock gains a new original_messages: list[dict] | None = None parameter so it has access to the pre-compression snapshot; the sole caller in anthropic.py now threads it through.
  • headroom/proxy/server.py: /transformations/feed adds compressed_messages to the JSON payload alongside the existing request_messages / response_content. Split into a separate preceding commit is a one-time EOL normalization to LF — the file blob in history carries CRLF but .gitattributes declares *.py text eol=lf, so any contributor editing server.py triggers the same whole-file renormalization. Separating the two commits keeps this feature commit's diff at a single line.
  • headroom/proxy/request_logger.py: compressed_messages is stripped from the JSONL file log and from get_recent() alongside the existing request_messages / response_content stripping. get_memory_stats() also counts it toward the deque's byte budget.

Testing

  • Unit tests pass (pytest)
  • Linting passes (ruff check . and ruff format --check .)
  • Type checking passes (mypy headroom)
  • New tests added for new functionality
  • Manual testing performed (via the Headroom Desktop client that consumes /transformations/feed — confirmed both fields arrive and render)

Test coverage added/extended:

  • tests/test_proxy/test_request_logger.py (new file): round-trip unit tests for RequestLogger. Confirms get_recent strips both sides (pre + post), get_recent_with_messages exposes both, and the JSONL file log drops both when log_full_messages=False.
  • tests/test_proxy/test_transformations_feed.py: extended to assert compressed_messages appears in the endpoint payload alongside request_messages / response_content.
  • tests/test_proxy_streaming_request_logger.py: existing include/omit tests updated to assert both sides populate when the flag is on and both are None when it's off.

Test Output

$ uv run ruff check headroom tests
All checks passed!
$ uv run ruff format --check headroom tests
614 files already formatted
$ uv run pytest tests/test_proxy/test_request_logger.py tests/test_proxy_streaming_request_logger.py tests/test_proxy/test_transformations_feed.py -v
...
tests/test_proxy/test_request_logger.py::test_get_recent_strips_compressed_messages_alongside_request_and_response PASSED
tests/test_proxy/test_request_logger.py::test_get_recent_with_messages_returns_compressed_messages PASSED
tests/test_proxy/test_request_logger.py::test_jsonl_file_strips_both_sides_when_log_full_messages_disabled PASSED
tests/test_proxy_streaming_request_logger.py::test_finalize_stream_response_logs_original_and_compressed_messages PASSED
tests/test_proxy_streaming_request_logger.py::test_finalize_stream_response_omits_messages_when_log_full_messages_disabled PASSED
tests/test_proxy/test_transformations_feed.py::test_transformations_feed_returns_messages PASSED
...
============================== 11 passed in 5.36s ==============================

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas (the two-sided gating at each log site, the _stream_response_bedrock parameter addition, and the get_memory_stats accounting)
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have updated the CHANGELOG.md if applicable

Additional Notes

Non-Anthropic backends

handlers/openai.py and handlers/gemini.py do not currently emit RequestLog entries at all — only Anthropic and the shared streaming paths do. This PR therefore only populates compressed_messages on Anthropic traffic (which is what /transformations/feed shows today). Wiring OpenAI and Gemini into RequestLogger end-to-end is a separate, larger gap worth its own PR.

server.py EOL normalization

The feature change in server.py is a single line. To keep the diff readable, the preceding commit is a whitespace-only chore(proxy): normalize server.py to LF per .gitattributes — the file blob was stored with CRLF terminators but .gitattributes declares *.py text eol=lf. Any contributor touching server.py triggers this renormalization; isolating it here keeps the feature commit reviewable. Happy to rebase / drop / reshape as preferred.

Downstream desktop compatibility

The Headroom Desktop client I work on now consumes compressed_messages and renders the pre/post pair side-by-side on the "Recent large compression" card. The desktop was updated to handle both shapes: proxies without the field render the legacy single "Request" block; proxies with the field render "Request (original, N tokens)" + "Request (compressed, M tokens)" where N/M come from input_tokens_original / input_tokens_optimized. No changes needed downstream if this PR lands.

🤖 Generated with Claude Code

The server.py blob in history carries CRLF line terminators, but
.gitattributes declares `*.py text eol=lf`. Any edit to server.py
therefore produces a whole-file diff as git renormalizes. Split this
one-time cleanup into its own commit so feature diffs stay readable.
Expose the post-compression message list that was actually sent
upstream as a new `compressed_messages` field on `RequestLog`, paired
with the existing (now consistently pre-compression) `request_messages`.
Consumers of `/transformations/feed` can now diff the two sides of a
compression to see exactly what the pipeline stripped, replaced, or
kept — turning an abstract "saved N tokens" into a legible before/after.

Gated by the same `log_full_messages` flag as `request_messages` so the
two sides stay in sync; it's pointless to store one without the other.

Fixes a latent correctness bug along the way: today's `request_messages`
field is inconsistent across the four `RequestLog` construction sites —
sometimes it's the pre-compression snapshot, sometimes it's the mutated
`body["messages"]` (which is the compressed list, because the proxy
mutates `body` in place before the log call). After this change,
`request_messages` is always pre-compression and `compressed_messages`
is always what went upstream.

Changes
- `RequestLog` gains `compressed_messages: list[dict] | None = None`.
- All four `RequestLog(...)` sites (`anthropic.py` non-streaming ×2,
  `streaming.py` streaming ×2) thread both fields through.
- `_stream_response_bedrock` gains an `original_messages` parameter so
  the Bedrock streaming path has access to the pre-compression snapshot
  (the only caller now passes it in).
- `/transformations/feed` adds `compressed_messages` to its JSON payload.
- `RequestLogger` strips `compressed_messages` from the JSONL file log
  and from `get_recent` the same way it strips `request_messages`.
- Memory accounting in `RequestLogger.get_memory_stats` now counts
  `compressed_messages` toward the deque's byte total.

Tests
- `test_proxy/test_request_logger.py` (new): strip behaviour on
  `get_recent`, presence on `get_recent_with_messages`, and JSONL file
  gating.
- `test_proxy/test_transformations_feed.py`: extended to assert the new
  key is present/null alongside `request_messages` / `response_content`.
- `test_proxy_streaming_request_logger.py`: existing include/omit tests
  updated to assert both sides (pre and post) are populated when the
  flag is on and dropped when it's off.
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

❌ Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
headroom/proxy/request_logger.py 33.33% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

…stats

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant