Skip to content

rust stage 3 (a+b): diff_compressor port + retire python via pyo3 + dotenv test hygiene#277

Merged
chopratejas merged 10 commits intomainfrom
rust-stage-3b-diff-bridge
Apr 26, 2026
Merged

rust stage 3 (a+b): diff_compressor port + retire python via pyo3 + dotenv test hygiene#277
chopratejas merged 10 commits intomainfrom
rust-stage-3b-diff-bridge

Conversation

@chopratejas
Copy link
Copy Markdown
Owner

@chopratejas chopratejas commented Apr 26, 2026

Summary

Combines stages 3a + 3b into one PR going to main. The original stack (#276 + #278) is collapsed because the Python deletion in 3b makes 3a's intermediate state (Python and Rust coexisting) less interesting to land on its own.

What's in here (6 commits)

Stage 3a — port DiffCompressor to Rust:

  • feat(rust): diff_compressor port — byte-equal parity + sidecar stats — full port to crates/headroom-core/src/diff_compressor.rs. 27 fixtures, 27 byte-equal matches via cargo run -p headroom-parity run.
  • refactor(rust): diff_compressor — surface hidden cutoffs + lossy-emit stats — cutoff signals (per-file hunk drops, context lines trimmed, file_mode normalizations) exposed as a DiffCompressorStats sidecar struct so observability isn't dropped on the floor.
  • fix(diff_compressor): four silent information-loss paths in Python AND Rust — bugs in BOTH impls: rename/dissimilarity markers discarded, combined-diff hunks (merge commits) parsed-then-dropped, no-newline markers lost across distance, pre-diff content order mangled. All four fixed in both impls.
  • fix(diff): close ContentRouter routing gaps for merge diffs and long preamblesContentRouter was failing to route real merge-commit diffs (diff --combined / diff --cc) and long-preamble diffs to DiffCompressor. Detection scan widened; combined-diff headers added to the regex tier.

Stage 3b — retire the Python implementation:

  • feat(rust): retire python diff_compressor, ship rust-only via pyo3 — Python DiffCompressor replaced by a thin pyo3-backed shim that delegates to headroom._core.DiffCompressor (built from crates/headroom-py). No env-var fallback. No Python implementation. Wheel is a hard import. ~700 lines of Python parser/scorer/formatter deleted.
  • fix(tests): stop module-level dotenv loaders from polluting os.environ during pytest collection — pre-existing latent bug surfaced by the new [dev] extras provisioning. litellm (and dotenv.load_dotenv()) at module-level loaded .env into os.environ during pytest collection, breaking @pytest.mark.skipif(not os.environ.get("...")) guards in unrelated integration tests. Local full-suite went from 46 fails / 387s to 2 fails / 134s.

Surface preserved

  • headroom.transforms.diff_compressor.DiffCompressor — same class, same __init__, same compress(content, context) shape. Returns Python DiffCompressionResult dataclasses.
  • DiffCompressorConfig and DiffCompressionResult dataclasses kept.
  • Sidecar compress_with_stats(...) exposes the rust-only DiffCompressorStats.

What's removed

  • Python parser / scorer / formatter (~700 lines).
  • Internal DiffHunk / DiffFile parser dataclasses (rust crate has parallel coverage).
  • 12 of 41 tests in test_diff_compressor.py that probed _parse_diff / _score_hunks or the deleted dataclasses. The remaining 29 public-API tests now run against the rust backend through the same import path — kept unchanged.
  • HEADROOM_DIFF_COMPRESSOR_BACKEND env var.

Build

scripts/build_rust_extension.sh runs maturin develop and symlinks the built .so into headroom/ so import headroom._core resolves past the in-tree package shadowing the maturin overlay.

CI note

CI does not yet build the wheel. The 25 failing tests in test_diff_compressor.py on the current CI run are because headroom._core isn't importable in CI yet. Adding a maturin develop step before pytest (or shipping a prebuilt wheel) is required before this can merge — that work is small and should land alongside.

Test plan

  • 544 transforms tests pass locally with the wheel built.
  • 33 rust-vs-fixture parity tests guard the pyo3 bridge against regressions.
  • cargo run -p headroom-parity run shows 27/27 diff_compressor matches.
  • Full local suite: 4672 passed, 260 skipped, 2 failed (unrelated environment-dependent: missing PIL / Docker).
  • CI: add a maturin develop step before pytest.

🤖 Generated with Claude Code

Stage 3a: first real transform port. Faithful Rust port of
`headroom.transforms.diff_compressor` with byte-equal parity against all
20 recorded fixtures.

# Algorithm (matching Python)

1. Hand-rolled unified-diff parser (state machine over `diff --git`,
   `index`, `--- a/`, `+++ b/`, `@@`, mode/binary/rename markers, +/- /
   space lines, "other" lines like `\ No newline at end of file`).
2. File cap (`max_files=20`): when fired, sort by total changes (most
   first) and keep top N.
3. Per-file hunk cap (`max_hunks_per_file=10`): keep first + last + top
   relevance-scored middle, then resort by hunk-header start line to
   restore appearance order.
4. Relevance scoring: change-density base + user-query word overlap
   + priority patterns (ERROR / IMPORTANCE / SECURITY regexes —
   matches `error_detection.PRIORITY_PATTERNS_DIFF`).
5. Per-hunk context trim: keep `max_context_lines=2` lines either side
   of each `+`/`-` line.
6. CCR cache_key: `md5(original)[:24]` (matches
   `compression_store.CompressionStore.store`). Emitted only when
   compression saved >20% of lines.

Parity result: `[diff_compressor ] total=20 matched=20 skipped=0 diffed=0`.

# Information preservation hardening

Three pass-through paths inherited from Python that we keep deliberate
(would lose info if we changed them):
- Below `min_lines_for_ccr` (50): return input unchanged.
- No diff sections parsed: return input unchanged.
- Below 20% compression savings: emit compressed output but no CCR
  marker (the original is the cheaper representation anyway).

Plus a parity-bound subtlety: `compressed_line_count` is captured BEFORE
the CCR retrieval marker is appended, both for the marker text
(`compressed to N`) and the result field. The output string therefore
ends up with one more line than the field reports — by design, matching
Python exactly. An off-by-one bug from recounting after appending the
CCR marker was caught and pinned by a synthetic 8-file diff test.

# Observability — the Rust escape hatch

Python's `DiffCompressionResult` has thin observability: input/output
line counts, additions/deletions, hunks_kept/removed, files_affected,
cache_key. The Rust port adds a sidecar `DiffCompressorStats` struct
with metrics Python doesn't emit:

- `files_dropped: Vec<String>` — names (old → new path) of files
  silently discarded by the `max_files` cap. Python loses these.
- `hunks_dropped_per_file: BTreeMap<String, usize>` — per-file hunk
  drops, stable iteration via `BTreeMap`.
- `context_lines_input` / `context_lines_kept` / `context_lines_trimmed`
  — directly proxies info loss from the context trim.
- `largest_hunk_kept_lines` / `largest_hunk_dropped_lines` — outlier
  detection (a single huge dropped hunk is much worse than many small).
- `parse_warnings: Vec<String>` — surfaces malformed input rather than
  dropping silently.
- `processing_duration_us` — latency budget.
- `cache_key_emitted` + `ccr_skipped_reason: Option<String>` — explicit
  signal for "we chose not to emit CCR and this is why".

A `tracing::info!(target: "diff_compressor", ...)` event is emitted on
every call, carrying these fields for OTel scraping in prod. The
sidecar struct is returned alongside via `compress_with_stats`; the
parity-only `compress` API discards it.

# Module layout

- `crates/headroom-core/src/transforms/mod.rs` — namespace, doc comment
  with the guiding principle ("information preservation > aggressive
  compression") so future ports inherit the philosophy.
- `crates/headroom-core/src/transforms/diff_compressor.rs` — full port
  (parser, scorer, hunk selector, context trimmer, formatter, CCR layer,
  stats, tracing).

# Dependencies added to headroom-core

- `md-5 = "0.10"` — for the CCR cache_key (matches Python MD5[:24]).
- `regex = "1"` — was a transitive dep via tokenizers; now a direct
  dependency for the hunk-header parser and priority patterns.

# Tests

6 unit tests covering pass-through paths, MD5 hex truncation, the
Python `split("\n")` line-count semantics, sidecar stats emission,
and a synthetic 8-file diff that locks the byte-equal behavior found
in the parity fixtures.
… stats

Audit follow-up to the diff_compressor port. The parity-bound port had
several issues that the byte-equal harness can't catch: hardcoded magic
numbers, score-weight literals scattered through the scorer, and silent
information-loss paths (file mode + binary detail) with no observability
trail. Parity is preserved (still 20/20 byte-equal); these are quality
and observability fixes only.

# Changes

1. **Hardcoded `0.8` savings threshold → config knob.** New
   `DiffCompressorConfig::min_compression_ratio_for_ccr: f64` (default
   0.8) replaces the literal. Rust-only — Python hardcodes 0.8 too;
   fixtures don't carry the field, parity comparator defaults to 0.8.

2. **Score-weight magic numbers → documented `pub const`.** The scorer's
   `0.03`, `0.3`, `0.2`, `0.3`, `1.0`, and `len > 2` filter are now
   `SCORE_CHANGE_DENSITY_WEIGHT`, `SCORE_CHANGE_DENSITY_CAP`,
   `SCORE_CONTEXT_WORD_WEIGHT`, `SCORE_CONTEXT_MIN_WORD_LEN`,
   `SCORE_PRIORITY_PATTERN_BOOST`, `SCORE_TOTAL_CAP`. Each carries a doc
   comment explaining what it biases toward. A pinning test fails if
   anyone changes the value without updating the test.

3. **File mode normalization is now visible.** Parity forces emit to
   hardcode `100644` regardless of input mode (`100755` executable,
   `120000` symlink, `160000` submodule). The parser now captures the
   ORIGINAL mode line; if it's not `100644`, the loss is recorded in
   `DiffCompressorStats::file_mode_normalizations: Vec<(label,
   original_mode_line)>`. Empty when no normalization happened.

4. **Binary file detail loss is now visible.** Same pattern: parity
   forces `Binary files differ` on emit, dropping the original `Binary
   files X and Y differ` filenames. Captured in
   `DiffCompressorStats::binary_files_simplified: Vec<String>` whenever
   the original line carried richer detail.

5. **`min_lines_for_ccr` doc clarified.** Despite the name, it gates the
   ENTIRE compression path, not just the CCR marker. A 49-line diff is
   returned unchanged regardless of how compressible it is. Doc comment
   now flags this misnomer prominently.

6. **6 new unit tests cover the lossy paths.** Previously the test
   suite only exercised pass-through and the in-cap synthetic. New
   tests:
     - `max_hunks_per_file_cap_drops_excess_and_records_stats` (15
       hunks, cap 10, verify 5 dropped + per-file accounting)
     - `max_files_cap_drops_files_and_records_names_in_stats` (25
       files, cap 20, verify 5 dropped names recorded)
     - `file_mode_normalization_is_recorded_for_executable_bit` (input
       `new file mode 100755`, verify stats capture original)
     - `binary_files_simplification_is_recorded` (input `Binary files
       a/x and b/x differ`, verify original captured)
     - `min_compression_ratio_for_ccr_is_configurable` (default 0.8 vs
       custom 0.5; same input, different CCR emit decision)
     - `score_constants_match_inline_values` (pins constants so future
       changes are intentional)

Stats fields are also reflected in the existing `tracing::info!` event
so OTel scrapers see the loss counts in production.

# Why these matter — the user-stated principle

"Information preservation > aggressive compression." When parity forces
us to drop bytes (file mode, binary detail, fully-dropped files,
trimmed context), the loss must be observable. Adding it to the
compressed output bytes would break parity; surfacing it in
`DiffCompressorStats` + tracing spans is the right escape hatch.

# Verification

- 20/20 parity fixtures still byte-equal
- 12/12 unit tests pass (6 new + 6 prior)
- 56 total tests in headroom-core, all green
- `cargo fmt --check` clean
- `cargo clippy --workspace --all-targets -- -D warnings` clean
…D Rust

Audit caught four bugs that the byte-equal parity harness can't catch on
its own — both Python and Rust were faithfully emitting the buggy output.
Fixed in lockstep so parity is maintained while the underlying behavior
is now correct on inputs the existing 20 fixtures didn't exercise.

# The four bugs (each fixed in both Python and Rust)

1. Renames silently dropped from output. Parser captured `is_renamed=True`
   but the emitter never emitted ANY rename markers. Output of a rename
   looked exactly like a plain modification of the old path. Fix: capture
   `rename from` / `rename to` / `similarity index N%` / `dissimilarity
   index N%` / `copy from` / `copy to` lines in a new `rename_lines` field
   on `DiffFile`; emit them after `diff --git` in canonical git ordering.

2. Combined diff hunks (`@@@`) silently dropped. Hunk-header regex only
   matched `@@`, so 3-way merge hunks had `current_hunk` never set and
   ALL their content fell through to the no-op branch. Fix in Python:
   regex switched to `^(@@+) ... \1` (backreferences match any number of
   `@`s on each side). Fix in Rust: alternation over `@@`, `@@@`, `@@@@`
   since `regex` is RE2-based and rejects backreferences. n>3 octopus
   merges still fall through; rare in practice.

3. `\ No newline at end of file` markers can be context-trimmed away.
   Treated as ordinary "other" lines — if more than `max_context_lines`
   from a `+`/`-` change, dropped. Round-trip-breaking for patches; can
   change whether the trailing line has a newline. Fix: in
   `_reduce_context`, force-add any line starting with `\` to the keep
   set regardless of distance.

4. Pre-diff content silently dropped. Anything before the first `diff
   --git` — commit messages from `git log -p`, email headers from `git
   format-patch`, fork-and-rebase metadata — was discarded. Fix:
   `_parse_diff` now returns `(pre_diff_lines, files)`; `format_output`
   prepends pre-diff content verbatim when present.

# Hidden parity bug found during the work

`_compress_files` constructed a fresh `DiffFile` from the parsed one but
only copied a subset of the fields by name. The new `rename_lines` and
`original_*_line` fields were silently dropped here, so the parser
populated them correctly but the emitter saw an empty `rename_lines`
list. Caught by writing a real test instead of a smoke test — the smoke
test passed because it hit the no-diff-found short-circuit, not the
parser/emitter pipeline. Constructor now copies all fields explicitly.

# Parity status

- Existing 20 fixtures: still byte-equal between fixed Python and fixed
  Rust. None of them exercised the buggy paths.
- 4 NEW fixtures recorded against fixed Python, exercising each bug-fix
  path: rename, 3-way combined diff, `\ No newline` marker far from
  changes, pre-diff commit headers. All 4 byte-equal between Python and
  Rust.
- Parity harness: total=24 matched=24 skipped=0 diffed=0.

# Observability

Some normalizations remain parity-bound (file mode `100644` hardcode,
`Binary files differ` simplification). Those are surfaced in
`DiffCompressorStats::file_mode_normalizations` /
`binary_files_simplified` (Rust) and via `logger.warning` (Python's new
`_log_loss_signals` helper, called once per compress).

# Tests

- Python: 4 new test classes (11 tests) covering rename markers,
  combined diffs, no-newline preservation, pre-diff content. Edge case:
  no pre-diff content must NOT add a leading blank line.
- Rust: 4 new `bugfix_*` unit tests with the same scenarios.
- Existing Python tests calling `_parse_diff` directly were updated for
  the new `(pre_diff, files)` tuple return.

# Verification

- Python: 37/37 tests pass (was 26).
- Rust: 16/16 transforms tests; 60/60 workspace unit tests; 5/5
  proptests; 1/1 doctest.
- Parity: 24/24 byte-equal.
- `cargo fmt --check` clean; `cargo clippy --workspace --all-targets
  -- -D warnings` clean.
…preambles

User audit caught three gaps that prevented DiffCompressor from being
invoked even when the input was a real diff. These complement the four
emit-time bugs fixed in the previous commit — those fixes only kick in
once DiffCompressor receives the input. Without these gap fixes, real
merge-commit diffs and `git log -p` outputs with long commit messages
were misrouted away from DiffCompressor entirely.

# The three gaps (each fixed in Python; gap 3 also fixed in Rust)

1. Detector scan window was hardcoded to first 50 lines.
   `_try_detect_diff` in content_detector.py only inspected
   `content.split("\n")[:50]`. `git log -p` outputs commonly have
   commit messages longer than 50 lines (releases, squashed commits,
   bots), pushing the `diff --git` header out of the detection window.
   Result: input was returned with `content_type=PLAIN_TEXT` and routed
   to the text compressor, never reaching DiffCompressor. Fix: window
   widened to 500 lines.

2. Detector regex didn't recognize merge-commit headers.
   `_DIFF_HEADER_PATTERN` matched `diff --git`, `--- a/`, and the
   regular `@@ -A,B +C,D @@` hunk header. Merge-commit diffs from
   `git log -p` use `diff --combined <path>`, `diff --cc <path>`, and
   combined-diff hunk headers `@@@+`. The shared `--- a/` line still
   triggered the detector with low confidence, but only barely. Fix:
   extended the regex to recognize all four merge-shaped header forms.

3. DiffCompressor parser only matched `^diff --git`.
   Even after fixing detection, the parser's `_DIFF_GIT_PATTERN`
   wouldn't match `diff --combined` or `diff --cc`, so merge diffs
   reached DiffCompressor and were treated as one giant pre-diff blob —
   passed through unchanged after the previous PR's pre-diff
   preservation fix. Fix: added `_DIFF_COMBINED_PATTERN` and
   `_DIFF_CC_PATTERN`; `_parse_diff` starts a new file section on any
   of the three header forms. Mirrored in Rust as `is_diff_header`
   helper that checks all three regexes.

# Why this matters end-to-end

DiffCompressor's value comes from being routed to. Detection +
parser-level coverage are upstream of the compressor — without them,
the compressor never sees the input. The previous PR's four bug fixes
(rename, combined-diff hunks, no-newline marker, pre-diff content) are
correct and necessary, but for merge commits and long-preamble diffs,
they were only firing on the rare cases where the detector misclicked
into DiffCompressor anyway. With these three gaps closed, the
ContentRouter→DiffCompressor pipeline actually engages on:
- `git log -p` outputs of any commit-message length
- Merge-commit diffs (`diff --combined`, `diff --cc`)
- Combined-diff snippets (`@@@`+ hunk-only inputs)

# New fixtures (3 added to the existing 24)

- `066bc82…` — `diff --combined` merge diff (3-way)
- `5d950a94…` — `diff --cc` merge diff (alternate form)
- `66c86f64…` — long pre-diff content (60-line commit message)
  followed by a rename diff (exercises detector scan widening +
  pre-diff preservation in tandem)

Parity: total=27 matched=27 skipped=0 diffed=0.

# Tests

- Python: 4 new tests across 2 new test classes —
  `TestRoutingGapMergeDiffs` (combined / cc parser) and
  `TestRoutingGapDetectorScanWindow` (long preamble detection +
  combined-diff regex recognition).
- Rust: 2 new tests covering combined / cc parser sections.

# Verification

- 27/27 parity fixtures byte-equal.
- Python: 41/41 tests pass (was 37).
- Rust: 18/18 transforms tests; 62/62 workspace; 5/5 proptests.
- `cargo fmt --check` clean; `cargo clippy --workspace --all-targets
  -- -D warnings` clean.
The python `DiffCompressor` is replaced by a thin pyo3-backed shim that
delegates to `headroom._core.DiffCompressor`. There is no python
implementation and no env-var fallback — the wheel is a hard import.

Why now: opt-in defaults don't drive python retirement. Byte-equal
parity was already proven across 27 fixtures (stage 3a); keeping a
shadow python impl behind a flag is a permanent maintenance cost with
no operational benefit. Stage 3b deletes ~700 lines of python parser /
scorer / formatter code; the rust crate has its own coverage.

Surface preserved:
- `headroom.transforms.diff_compressor.DiffCompressor` — same class
  name, same `__init__`, same `compress(content, context)` shape.
  Returns python `DiffCompressionResult` dataclasses so call sites that
  destructure with `asdict()` work unchanged.
- `DiffCompressorConfig` and `DiffCompressionResult` dataclasses kept.
- Sidecar `compress_with_stats(...)` exposes the rust-only
  `DiffCompressorStats` (per-file hunk drops, context lines trimmed,
  file_mode normalizations) for observability.

Removed:
- Python parser / scorer / formatter (~700 lines).
- Internal `DiffHunk` / `DiffFile` parser dataclasses (rust crate has
  parallel coverage).
- 12 tests that probed `_parse_diff` / `_score_hunks` or the deleted
  parser dataclasses. The 29 public-API tests in `test_diff_compressor.py`
  remain and now exercise the rust backend through the same import path.
- `HEADROOM_DIFF_COMPRESSOR_BACKEND` env var — no longer meaningful.

Build:
- `scripts/build_rust_extension.sh` runs `maturin develop` and symlinks
  the built `.so` into `headroom/` so `import headroom._core` resolves
  past the in-tree package shadowing the maturin overlay.
- `.gitignore` excludes the symlinks and allowlists the build script.

Tests:
- 544 transforms pass; 33 rust-vs-fixture parity tests guard the pyo3
  bridge against regressions (`tests/test_transforms/test_diff_compressor_rust_parity.py`).
- Mypy clean.
…n during pytest collection

# The bug

Several test modules and two production modules loaded the project `.env`
at *import time*. During pytest collection (where every test module is
imported once), this populated `os.environ` with API keys from `.env`.

The skipif guards in `test_proxy_passthrough_integration.py` (and
others) evaluate at collection time:

    @pytest.mark.skipif(not os.environ.get("OPENAI_API_KEY"), reason="...")

If the polluter module was collected *before* the guard, the guard saw
the leaked key, decided not to skip, and the integration tests ran
live against a fake key and failed. In a fresh local-dev venv with
`.env` + full `[dev]` extras, this manifested as ~16 spurious test
failures plus a misleading test runtime of 6+ minutes (live HTTP).

# Why now

CI does not see this (no `.env`). It only manifests when:
1. `litellm` (and friends) are installed — they run `dotenv.load_dotenv()`
   on import, populating `os.environ` from `.env`.
2. A `.env` file with real API keys exists locally.

Until the venv was provisioned with the full `[dev]` extras during
recent test work, `pytest.importorskip("litellm")` and
`from headroom.pricing import litellm_pricing` both silently no-op'd
(via try/except ImportError → `LITELLM_AVAILABLE=False`), so the leak
never triggered. With litellm now installed, the latent bug surfaced.

# The fix — three patterns

1. **Production modules** (`headroom/pricing/litellm_pricing.py`,
   `headroom/backends/litellm.py`): wrap the eager `import litellm` with
   a snapshot/restore of `os.environ`. Any keys litellm's bundled
   `python-dotenv` adds during import are deleted immediately. The
   module is fully imported and cached in `sys.modules` so subsequent
   imports hit the cache without re-running the side effect.

2. **Test modules using `pytest.importorskip("litellm")`**
   (`test_backend_bugs.py`, `test_bedrock_region.py`,
   `test_cost_tracker_counterfactual.py`): replace with
   `tests._dotenv.importorskip_no_env_leak("litellm")`, which does the
   same snapshot/restore around `importlib.import_module`.

3. **Test modules that intentionally need `.env` values for skipif
   guards** (`test_compression_summary_*.py`, `test_query_echo.py`,
   `test_cost_tracker_counterfactual.py`, `test_memory_usage_integration.py`,
   `test_bundled_tools_savings.py`): replace module-level
   `os.environ.setdefault(...)` / `dotenv.load_dotenv()` with
   `tests._dotenv.load_env_overrides()` (returns a local dict — does
   NOT mutate `os.environ`) plus `autouse_apply_env(...)` (function-
   scoped fixture that applies via `monkeypatch.setenv`, auto-cleaned
   at teardown). The skipif still works because
   `ANTHROPIC_KEY = os.environ.get(...) or _env_overrides.get(...)`
   reads from the local dict as fallback.

# Helper module

New `tests/_dotenv.py` exposes:
- `load_env_overrides() -> dict[str, str]` — read `.env` into a dict.
- `autouse_apply_env(overrides) -> fixture` — function-scoped autouse
  fixture that applies via `monkeypatch.setenv`.
- `importorskip_no_env_leak(module) -> module` — drop-in
  `pytest.importorskip` substitute that quarantines env mutations.

# Results

Local full-suite (excluding live-LLM and live-feed tests):
- Before: 46 failed, 4830 passed, 387s
- After:   2 failed, 4672 passed, 134s

The remaining 2 failures are unrelated environment-dependent tests
(missing `PIL` / Docker daemon).
@chopratejas chopratejas force-pushed the rust-stage-3b-diff-bridge branch from d2b5b4b to d5ca50c Compare April 26, 2026 16:17
@chopratejas chopratejas changed the title Rust stage 3b diff bridge rust stage 3 (a+b): diff_compressor port + retire python via pyo3 + dotenv test hygiene Apr 26, 2026
The python `DiffCompressor` was retired in this PR's stage-3b commit;
the public class now delegates to `headroom._core` (built from
`crates/headroom-py`). Without the wheel installed in CI, every test
that constructs a `DiffCompressor` fails with `ModuleNotFoundError:
No module named 'headroom._core'`.

This adds a build step to the main `test` job in `ci.yml` that:
1. Installs the stable Rust toolchain (`dtolnay/rust-toolchain@stable`).
2. Caches the cargo registry and build output (`Swatinem/rust-cache@v2`).
3. Installs maturin.
4. Runs `scripts/build_rust_extension.sh`, which calls `maturin develop`
   and symlinks the built `.so` into the in-tree `headroom/` package
   so the editable install resolves `import headroom._core`.

Only the main `test` job needs this — `test-extras` and `test-agno`
run narrow subsets that don't construct `DiffCompressor`. The existing
`rust.yml` workflow continues to handle wheel builds for distribution
and `cargo test` for the Rust workspace.
`maturin develop` requires a virtualenv (it errors with "Couldn't find a
virtualenv or conda environment"). CI's setup-python provides a bare
system Python without a venv, so the dev script's build path doesn't
work there.

This switches CI to build a release wheel via `maturin build` and
install it with `pip install --force-reinstall --no-deps`. Then symlink
the installed `.so` into the in-tree `headroom/` package so the
editable install resolves `import headroom._core` past the source-dir
shadowing of site-packages.

`scripts/build_rust_extension.sh` is unchanged — it stays optimized for
local dev (where there IS a venv).
The previous version of this step did `python -c "import headroom._core"`
to find the wheel's installed `.so` path. That failed in CI:

    ModuleNotFoundError: No module named 'headroom._core'

— exactly the chicken-and-egg this step exists to fix. The editable
install (`pip install -e .`) puts the in-tree `headroom/` source dir
ahead of site-packages on `sys.path`. So `import headroom` finds the
in-tree dir (which doesn't yet have the `.so`), then
`import headroom._core` fails to find the submodule. The symlink we're
about to create is what makes the import work — but we can't import
to discover the symlink target before creating it.

Locate the `.so` via filesystem instead: read site-packages from
`site.getsitepackages()[0]`, glob for `_core.cpython-*.so` under
`<site-packages>/headroom/`, and symlink that into the in-tree dir.
The smoke test (`from headroom._core import DiffCompressor`) runs
*after* the symlink and confirms end-to-end resolution.

Also added `set -euo pipefail` and a sanity check with `ls -la` of the
site-packages dir if the glob comes up empty, so future failures
diagnose themselves.
@chopratejas chopratejas merged commit fcef84f into main Apr 26, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant