feat(proxy): log compressed messages alongside original request#261
Open
gglucass wants to merge 3 commits intochopratejas:mainfrom
Open
feat(proxy): log compressed messages alongside original request#261gglucass wants to merge 3 commits intochopratejas:mainfrom
gglucass wants to merge 3 commits intochopratejas:mainfrom
Conversation
The server.py blob in history carries CRLF line terminators, but .gitattributes declares `*.py text eol=lf`. Any edit to server.py therefore produces a whole-file diff as git renormalizes. Split this one-time cleanup into its own commit so feature diffs stay readable.
Expose the post-compression message list that was actually sent upstream as a new `compressed_messages` field on `RequestLog`, paired with the existing (now consistently pre-compression) `request_messages`. Consumers of `/transformations/feed` can now diff the two sides of a compression to see exactly what the pipeline stripped, replaced, or kept — turning an abstract "saved N tokens" into a legible before/after. Gated by the same `log_full_messages` flag as `request_messages` so the two sides stay in sync; it's pointless to store one without the other. Fixes a latent correctness bug along the way: today's `request_messages` field is inconsistent across the four `RequestLog` construction sites — sometimes it's the pre-compression snapshot, sometimes it's the mutated `body["messages"]` (which is the compressed list, because the proxy mutates `body` in place before the log call). After this change, `request_messages` is always pre-compression and `compressed_messages` is always what went upstream. Changes - `RequestLog` gains `compressed_messages: list[dict] | None = None`. - All four `RequestLog(...)` sites (`anthropic.py` non-streaming ×2, `streaming.py` streaming ×2) thread both fields through. - `_stream_response_bedrock` gains an `original_messages` parameter so the Bedrock streaming path has access to the pre-compression snapshot (the only caller now passes it in). - `/transformations/feed` adds `compressed_messages` to its JSON payload. - `RequestLogger` strips `compressed_messages` from the JSONL file log and from `get_recent` the same way it strips `request_messages`. - Memory accounting in `RequestLogger.get_memory_stats` now counts `compressed_messages` toward the deque's byte total. Tests - `test_proxy/test_request_logger.py` (new): strip behaviour on `get_recent`, presence on `get_recent_with_messages`, and JSONL file gating. - `test_proxy/test_transformations_feed.py`: extended to assert the new key is present/null alongside `request_messages` / `response_content`. - `test_proxy_streaming_request_logger.py`: existing include/omit tests updated to assert both sides (pre and post) are populated when the flag is on and dropped when it's off.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
…stats Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Expose the post-compression message list that was actually sent upstream as a new
compressed_messagesfield onRequestLog, paired with the existing (now consistently pre-compression)request_messages. Consumers of/transformations/feed— dashboards and any downstream observability — can now diff the two sides of a compression to see exactly what the pipeline stripped, replaced, or kept. Turns an abstract "saved N tokens" into a legible before/after.Gated by the same
log_full_messagesflag asrequest_messagesso the two sides stay in sync; it's pointless to store one without the other.Also fixes a latent correctness bug: today's
request_messagesfield is inconsistent across the fourRequestLogconstruction sites — sometimes it's the pre-compression snapshot, sometimes it's the mutatedbody["messages"](which is the compressed list, because the proxy mutatesbodyin place before the log call). After this change,request_messagesalways means pre-compression andcompressed_messagesalways means what went upstream.Type of Change
Note on "breaking": strictly speaking this is a semantic correction of an inconsistently-populated field, not a schema break. The field name
request_messagesis unchanged and the JSON shape is unchanged; what changes is that the field now consistently holds the pre-compression list. Consumers that treated it as "whatever messages we have" continue to work. Consumers that depended on the accidental post-compression value (if any existed) would shift tocompressed_messages.Changes Made
headroom/proxy/models.py:RequestLoggainscompressed_messages: list[dict] | None = None. Doc comment explains it's paired withrequest_messagesand gated by the samelog_full_messagesflag.headroom/proxy/handlers/anthropic.py(2 sites — Bedrock non-streaming and main non-streaming):request_messagesnow consistently sources fromoriginal_messages(the pre-compression snapshot at line 724),compressed_messagessources frombody["messages"](the compressed list after in-place mutation at line 1189). Both gated symmetrically.headroom/proxy/handlers/streaming.py(2 sites — main streaming in_finalize_stream_response, Bedrock streaming in_stream_response_bedrock): same treatment._stream_response_bedrockgains a neworiginal_messages: list[dict] | None = Noneparameter so it has access to the pre-compression snapshot; the sole caller inanthropic.pynow threads it through.headroom/proxy/server.py:/transformations/feedaddscompressed_messagesto the JSON payload alongside the existingrequest_messages/response_content. Split into a separate preceding commit is a one-time EOL normalization to LF — the file blob in history carries CRLF but.gitattributesdeclares*.py text eol=lf, so any contributor editingserver.pytriggers the same whole-file renormalization. Separating the two commits keeps this feature commit's diff at a single line.headroom/proxy/request_logger.py:compressed_messagesis stripped from the JSONL file log and fromget_recent()alongside the existingrequest_messages/response_contentstripping.get_memory_stats()also counts it toward the deque's byte budget.Testing
pytest)ruff check .andruff format --check .)mypy headroom)/transformations/feed— confirmed both fields arrive and render)Test coverage added/extended:
tests/test_proxy/test_request_logger.py(new file): round-trip unit tests forRequestLogger. Confirmsget_recentstrips both sides (pre + post),get_recent_with_messagesexposes both, and the JSONL file log drops both whenlog_full_messages=False.tests/test_proxy/test_transformations_feed.py: extended to assertcompressed_messagesappears in the endpoint payload alongsiderequest_messages/response_content.tests/test_proxy_streaming_request_logger.py: existing include/omit tests updated to assert both sides populate when the flag is on and both areNonewhen it's off.Test Output
Checklist
_stream_response_bedrockparameter addition, and theget_memory_statsaccounting)Additional Notes
Non-Anthropic backends
handlers/openai.pyandhandlers/gemini.pydo not currently emitRequestLogentries at all — only Anthropic and the shared streaming paths do. This PR therefore only populatescompressed_messageson Anthropic traffic (which is what/transformations/feedshows today). Wiring OpenAI and Gemini intoRequestLoggerend-to-end is a separate, larger gap worth its own PR.server.pyEOL normalizationThe feature change in
server.pyis a single line. To keep the diff readable, the preceding commit is a whitespace-onlychore(proxy): normalize server.py to LF per .gitattributes— the file blob was stored with CRLF terminators but.gitattributesdeclares*.py text eol=lf. Any contributor touchingserver.pytriggers this renormalization; isolating it here keeps the feature commit reviewable. Happy to rebase / drop / reshape as preferred.Downstream desktop compatibility
The Headroom Desktop client I work on now consumes
compressed_messagesand renders the pre/post pair side-by-side on the "Recent large compression" card. The desktop was updated to handle both shapes: proxies without the field render the legacy single "Request" block; proxies with the field render "Request (original, N tokens)" + "Request (compressed, M tokens)" where N/M come frominput_tokens_original/input_tokens_optimized. No changes needed downstream if this PR lands.🤖 Generated with Claude Code