Skip to content

feat(transforms): attribute read_lifecycle + tool_crush tags#249

Open
gglucass wants to merge 1 commit intochopratejas:mainfrom
gglucass:feat/transforms-attribution
Open

feat(transforms): attribute read_lifecycle + tool_crush tags#249
gglucass wants to merge 1 commit intochopratejas:mainfrom
gglucass:feat/transforms-attribution

Conversation

@gglucass
Copy link
Copy Markdown
Contributor

@gglucass gglucass commented Apr 23, 2026

Description

Enrich the transforms_applied tags emitted by ReadLifecycleManager and ToolCrusher so each tag carries the specific target it acted on — the source file for a stale/superseded Read replacement, and the tool names for crushed tool outputs. Today these tags are opaque counters (read_lifecycle:stale, tool_crush:42), which tells you that a transform ran but not what it operated on. Any consumer of /transformations/feed — dashboards, logs, metrics, or downstream UIs — has to treat these as black boxes. Threading the per-item target through the existing tag vocabulary is the smallest-surface way to expose that information without adding new fields or endpoints.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Performance improvement
  • Code refactoring (no functional changes)

Changes Made

  • headroom/transforms/read_lifecycle.py: read_lifecycle:<state>read_lifecycle:<state>:<file_path>. A new _format_read_lifecycle_transform helper composes the tag from the existing ReadClassification.file_path, so no new data collection is needed. Paths containing : are preserved — consumers are expected to bound their split to 3 parts (s.split(":", 2)).
  • headroom/transforms/tool_crusher.py: tool_crush:<n>tool_crush:<n>:<tool1,tool2,...> when the assistant's tool_use blocks (Anthropic) or tool_calls entries (OpenAI) resolve the crushed tool names. A new _build_tool_name_index helper builds the id→name map in one pass over assistant messages; a _format_tool_crush_transform helper emits the tail only when at least one name resolves. Tags fall back to the legacy tool_crush:<n> shape when no names are available (e.g. orphaned tool results).

Testing

  • Unit tests pass (pytest)
  • Linting passes (ruff check . and ruff format --check .)
  • Type checking passes (mypy headroom)
  • New tests added for new functionality
  • Manual testing performed

8 new tests cover: enriched read_lifecycle tag shape (OpenAI + Anthropic formats), file paths containing :, enriched tool_crush tag shape (OpenAI + Anthropic formats), duplicate-tool dedup, the no-name fallback, and the name-index skipping entries that are missing id / name. All 25 existing tests in test_read_lifecycle.py and test_tool_crusher.py continue to pass unchanged — the substring assertions in TestTransformTracking ("stale" in t) are compatible with the enriched shape by design.

Test Output

$ uv run pytest tests/test_transforms/test_read_lifecycle.py tests/test_transforms/test_tool_crusher.py -v
...
tests/test_transforms/test_read_lifecycle.py::TestTransformTracking::test_transform_tag_includes_file_path_openai PASSED
tests/test_transforms/test_read_lifecycle.py::TestTransformTracking::test_transform_tag_includes_file_path_anthropic PASSED
tests/test_transforms/test_read_lifecycle.py::TestTransformTracking::test_transform_tag_preserves_colons_in_path PASSED
tests/test_transforms/test_tool_crusher.py::TestToolCrusher::test_transform_tag_includes_tool_names_openai PASSED
tests/test_transforms/test_tool_crusher.py::TestToolCrusher::test_transform_tag_includes_tool_names_anthropic PASSED
tests/test_transforms/test_tool_crusher.py::TestToolCrusher::test_transform_tag_dedupes_repeated_tool PASSED
tests/test_transforms/test_tool_crusher.py::TestToolCrusher::test_tool_name_index_skips_entries_missing_id_or_name PASSED
tests/test_transforms/test_tool_crusher.py::TestToolCrusher::test_transform_tag_falls_back_when_no_names PASSED

============================== 33 passed in 0.32s ==============================

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have updated the CHANGELOG.md if applicable

Additional Notes

Compatibility considerations

  • transforms_applied consumers in this repo:
    • proxy/cost.py::_summarize_transforms — uses a plain Counter. Enriched tags are distinct keys, so two stale reads on different files now count as two entries instead of one key with count=2. This is arguably more faithful to what happened; callers that want a coarser roll-up can still bucket by prefix. The docstring example was already illustrative, not asserted.
    • proxy/cost.py uncompressed-reason categorisation — uses substring matches for "excluded" / "protected". Unaffected.
    • x-headroom-transforms response header (gemini/openai/anthropic handlers) — emitted via ",".join(transforms_applied). The new tool_crush:<n>:<A,B,C> shape contains commas, making the joined header ambiguous if re-split. Grep confirms the header is only written and never parsed in this repo or by any known downstream, so this is documented as a latent concern rather than a break.
  • No schema change to /transformations/feed — the JSON field remains transforms_applied: string[]. Consumers that treat entries as opaque strings continue to work.
  • No config flag. The enriched shape replaces the legacy shape unconditionally. Rolling-upgrade safety is in the hands of consumers, which (a) should tolerate unknown strings and (b) can easily parse both shapes with a bounded split.

Why tag enrichment rather than a new structured field

Adding savings_breakdown: [{category, target, tokens}] on the request log was the alternative I considered. It's cleaner but a bigger surface: a new field on the hot log schema, new JSON shape on /transformations/feed, and mandatory churn for every consumer. Enriching the existing tags reuses infrastructure that's already downstream-compatible. If a structured breakdown ever lands, these tags remain a useful compact summary alongside it.

🤖 Generated with Claude Code

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Enrich `transforms_applied` tags so downstream UIs can show WHAT a
compression acted on, not just that it happened:

- `read_lifecycle:<state>` → `read_lifecycle:<state>:<file_path>`
- `tool_crush:<n>` → `tool_crush:<n>:<tool1,tool2,...>` when the
  assistant's tool_use metadata resolves the crushed tool names.
  Falls back to `tool_crush:<n>` when no names can be resolved.

Paths containing `:` survive because consumers are expected to bound
their split to 3 parts. Only written to `transforms_applied`; the
opaque `x-headroom-transforms` response header is unaffected (no
consumers parse it).

Adds 8 tests covering OpenAI + Anthropic formats, paths with colons,
dedup of repeated tools, the no-name fallback, and name-index skips
when id / name are missing.
@gglucass gglucass force-pushed the feat/transforms-attribution branch from 7a53ede to 8b39dc2 Compare April 23, 2026 15:34
@gglucass
Copy link
Copy Markdown
Contributor Author

gglucass commented Apr 23, 2026

@chopratejas in user conversations it often comes up that they (a) don't understand where the tokens savings are coming from and (b) that they are unsure the token savings are even real. This is a first attempt to enable bringing more transparency to this, which will be vital in sales conversations where you need to convince skeptical committees

That said, I'm not 100% sure this is the right approach. It might be better to add a savings_breakdown field on the transformation log ([{category, target, tokens}]) to showcase proper insights. That's a pretty big lift though and I wonder if it might introduce extra overhead we don't want.

What do you think? Shall I also raise a PR to add savings_breakdown and then we can compare?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant