Skip to content

fix: release image router models after compression#280

Open
eggrollofchaos wants to merge 1 commit intochopratejas:mainfrom
eggrollofchaos:codex/fix-image-model-release-clean
Open

fix: release image router models after compression#280
eggrollofchaos wants to merge 1 commit intochopratejas:mainfrom
eggrollofchaos:codex/fix-image-model-release-clean

Conversation

@eggrollofchaos
Copy link
Copy Markdown

@eggrollofchaos eggrollofchaos commented Apr 26, 2026

Description

Releases image-router model resources after image compression so the proxy does not keep one-off vision model loads resident for the lifetime of the process.

Fixes: N/A

Impact

Memory

With Headroom's default image stack, the retained models are estimated at roughly google/siglip-base-patch16-224 (~400 MB) plus chopratejas/technique-router (~100 MB), using the estimates already tracked in headroom/models/config.py. So the expected steady-state improvement after an image request is ~500 MB of reclaimable model memory, plus whatever backend allocator overhead PyTorch/CPU/GPU/MPS decides to hand back. Not a precise benchmark, but certainly nonzero.

Runtime behavior

For normal text-only proxy traffic, this should be a no-op. The change matters after image compression has caused the trained router / SigLIP stack to load; once that request finishes, Headroom drops those model refs and asks the shared registry + backend allocators to give the memory back.

Tradeoff

The tradeoff is that bursty image-heavy workloads may reload the image stack more often. For this support-infra / local-agent case, that is the right default: don't keep half a gig warm forever because one image request went through once. If someone is running sustained image traffic, a future keep-warm knob may make sense.

Provenance

I was running the Headroom proxy w/ Codex on a machine that was also doing local LLM code-review + model-heavy work, and the goal was to track and reduce memory footprint. Codex noticed the image route could leave the image-model stack hanging around after a one-off image request, and decided (autonomously) to clone the source checkout down and patch it locally. Seemed like a good idea, so I decided to open this PR and share.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Performance improvement
  • Code refactoring (no functional changes)

Changes Made

  • Add targeted unload support to the ML model registry for image-related model prefixes
  • Add close/release paths for TrainedRouter and ImageCompressor
  • Avoid retaining proxy-level image compressors longer than needed in OpenAI, Anthropic, and content-router flows
  • Release image optimizer resources during proxy shutdown
  • Add regression coverage for image compressor release behavior, proxy helper lifecycle behavior, and shutdown cleanup

Testing

Describe the tests you ran to verify your changes:

  • Unit tests pass (pytest)
  • Linting passes (ruff check . targeted to changed files)
  • Type checking passes (mypy headroom)
  • New tests added for new functionality
  • Manual testing performed

Test Output

uv run --frozen --extra dev python -m pytest \
  tests/test_image_compressor.py \
  tests/test_proxy_handler_helpers.py \
  tests/test_proxy_pipeline_lifecycle.py \
  -q

40 passed, 25 skipped in 3.93s
No warnings summary was reported by pytest.
uv run --frozen --extra dev ruff check \
  headroom/image/compressor.py \
  headroom/image/trained_router.py \
  headroom/models/ml_models.py \
  headroom/proxy/handlers/anthropic.py \
  headroom/proxy/handlers/openai.py \
  headroom/proxy/helpers.py \
  headroom/proxy/server.py \
  headroom/transforms/content_router.py \
  tests/test_image_compressor.py \
  tests/test_proxy_handler_helpers.py \
  tests/test_proxy_pipeline_lifecycle.py

All checks passed!

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have updated the CHANGELOG.md if applicable

Screenshots (if applicable)

N/A

Additional Notes

This PR intentionally avoids repo-wide formatting or line-ending normalization. The local checkout currently reports pre-existing CRLF/LF noise because .gitattributes declares *.py text eol=lf while some committed Python blobs still contain CRLF or mixed endings. The committed diff is limited to the image-router lifecycle fix and related tests.

The lifecycle change is intentionally boring in the best way: load image models when an image request needs them, then hand the memory back.

That keeps the proxy a better neighbor on machines already juggling local review or other model-heavy work.

Copy link
Copy Markdown
Author

@eggrollofchaos eggrollofchaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing the lifecycle change. Overall the per-request create-and-close pattern is sound and the registry unload_prefix shutdown hook is well-placed. Two findings that look worth addressing before merge, plus a few smaller notes.

High — ContentRouter._image_optimizer cached, then closed in finally

File: headroom/transforms/content_router.py:1349,1413-1415

_get_image_optimizer caches the compressor on self._image_optimizer, but optimize_images_in_messages now always closes it in the finally. Two concurrent calls against the same ContentRouter instance both grab the cached compressor; whichever returns first calls compressor.close()TrainedRouter.release_models(), which sets _classifier, _siglip_model, etc. to None while the second call is still inside compressor.compress(...). That second call will then AttributeError/NoneType partway through.

Today the only callers are tests, so it does not blow up in production — but the method is public on a hot-path class and the footgun is shaped exactly like _get_image_compressor() in proxy/helpers.py, which sidesteps the issue by creating a fresh instance every call. Two viable fixes:

  • Mirror the helpers.py pattern: drop the self._image_optimizer cache, instantiate per call, close in finally.
  • Keep the cache but stop closing it inline; release on ContentRouter shutdown instead.

Either is fine; the current shape is the worst of both worlds.

High — _default_compressor singleton contradicts the PR's intent

File: headroom/image/compressor.py:574-583, plus tests/test_image_compressor.py:780-784

The PR description says "Avoid retaining proxy-level image compressors longer than needed", but the module-level _default_compressor + get_compressor() still hand out a process-lifetime singleton that nothing ever closes. Any code path that touches get_compressor() (and the test test_get_compressor_singleton actively pins this behavior) keeps a TrainedRouter resident for the lifetime of the process — exactly what the rest of the diff is trying to prevent.

Two cleanups, pick one:

  • Remove _default_compressor, get_compressor(), and test_get_compressor_singleton. Repoint any remaining callers at compress_images() (which already does create-use-close) or have them instantiate directly.
  • If a singleton must stay for back-compat, register its close() with the proxy shutdown() next to the unload_prefix calls so it is not unbounded.

Medium — MLModelRegistry._release_runtime_memory hard-imports torch

File: headroom/models/ml_models.py:82-93

The bare import torch at the top of _release_runtime_memory will raise ImportError if the optional image stack isn't installed, which then propagates out of unload/unload_prefix. The rest of the codebase treats torch as optional (helpers.py wraps the image import in try/except ImportError). Suggest wrapping the torch block in with contextlib.suppress(ImportError): so a config without torch can still call unload_prefix during shutdown without crashing.

While here, when release_models unloads both classifier and siglip the registry runs gc.collect() + torch.cuda.empty_cache() twice in a row. Not a bug, but unload_prefix already exists — feel free to consolidate to one call so the runtime-memory pass runs once.

Medium — Per-request reload thrashing under image-heavy load

File: headroom/proxy/handlers/anthropic.py:712-725, openai.py:240-255

Every image request now loads classifier + SigLIP, runs once, releases, and the next image request reloads from scratch. The PR description acknowledges this; flagging it because under burst image traffic the cost is real (model load + torch device transfer per request). Worth either:

  • Adding a config knob (image_release_after_use: bool, default True) so heavy-image deployments can opt back into a longer-lived compressor.
  • Or an idle-timer release: keep the compressor warm for N seconds after last use, then unload.

Not a blocker, but I would not merge without at least the config knob if you expect any image-heavy users.

Low

  • headroom/proxy/helpers.py:66 — name _get_image_compressor reads like a getter; it allocates per call. Consider _make_image_compressor / _create_image_compressor to make the lifecycle obvious to future readers.
  • headroom/proxy/handlers/anthropic.py:712compressor = _get_image_compressor() is outside the try. If that helper ever starts raising anything other than ImportError (which it currently catches), the compressor leaks. Cheap fix: pull the call inside the try.
  • headroom/models/ml_models.py:111-125unload_prefix returns list[str] of removed keys but neither shutdown caller logs them. A single logger.info at shutdown listing what was released would make the new behavior observable in proxy logs.

Nits

  • The CHANGELOG entry would be useful given this changes user-visible memory behavior.
  • release_models zeroes _text_embeddings etc. before calling MLModelRegistry.unload. The registry path doesn't depend on those being None, but the test test_release_models_unloads_registry_entries only checks the post-state — adding an ordering assertion (or a comment about why state-then-registry) would freeze the intent.

Tests look good — the lifecycle coverage in test_proxy_pipeline_lifecycle.py and test_proxy_handler_helpers.py is the right shape. Once the two High items are addressed, this is close to mergeable.

@eggrollofchaos eggrollofchaos force-pushed the codex/fix-image-model-release-clean branch 2 times, most recently from cc2efdf to e8d9b6f Compare April 26, 2026 23:58
@eggrollofchaos
Copy link
Copy Markdown
Author

Codex Fix — Round 1

Addressed the 2 High findings from the review and picked up the small safety notes that fit the same lifecycle change.

Changes

  • ContentRouter cached compressor then closed it per call — changed ContentRouter._get_image_optimizer() to create a fresh one-shot ImageCompressor for each optimization pass, so concurrent callers cannot share a compressor that another caller closes.
  • Module-level default compressor retained models indefinitely — removed the process-lifetime singleton behavior from get_compressor(); it now returns a fresh caller-owned compressor while preserving the public import.
  • Helper allocation outside cleanup guard — moved OpenAI and Anthropic compressor creation inside the try block with compressor = None initialization.
  • Optional torch cleanup clarity — made the torch import path explicit in MLModelRegistry._release_runtime_memory() so no-image-stack installs simply return after GC.
  • Duplicate runtime cleanup — added MLModelRegistry.unload_many() and routed TrainedRouter.release_models() through it, so classifier + SigLIP unload with one cleanup pass.
  • Shutdown observability — proxy shutdown now logs released image optimizer model keys when any are unloaded.

Verification

  • uv run --frozen --extra dev python -m pytest tests/test_image_compressor.py tests/test_proxy_handler_helpers.py tests/test_proxy_pipeline_lifecycle.py -q — 40 passed, 25 skipped
  • uv run --frozen --extra dev ruff check headroom/image/compressor.py headroom/image/trained_router.py headroom/models/ml_models.py headroom/proxy/handlers/anthropic.py headroom/proxy/handlers/openai.py headroom/proxy/server.py headroom/transforms/content_router.py tests/test_image_compressor.py — all checks passed

Deferred

  • Did not add an image-heavy deployment knob or idle timer in this PR. That is a product/performance policy choice; this PR stays focused on safe default release behavior and avoids widening config surface.

Fixes pushed in e8d9b6f.

Copy link
Copy Markdown
Author

@eggrollofchaos eggrollofchaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent review for #280. Covers full file context, not just diff. No Critical/High findings — verdict: LGTM (commenting for transparency on Mediums/Lows).

Summary

Severity Count
Critical 0
High 0
Medium 4
Low 2

Medium

M1 — Per-request load/unload thrashing on bursty image trafficheadroom/image/compressor.py:105-111, headroom/image/trained_router.py:207-229, headroom/proxy/handlers/{anthropic,openai}.py, headroom/transforms/content_router.py:1411-1414. With the new lifecycle, every image request loads technique-router + SigLIP, runs one compression pass, then immediately calls MLModelRegistry.unload_many(...) and best-effort gc.collect() / torch.{cuda,mps}.empty_cache(). For sparse image traffic this is the right call (and the PR description names it explicitly). For sustained image traffic (e.g. repeated multi-modal prompts in one agent loop), each request pays the full from_pretrained allocation cost and the runtime memory cleanup. There is no idle-timeout / TTL between "release immediately" and "pin forever". Suggested fix (follow-up, not blocking): add an optional idle-timeout cache so back-to-back image requests reuse models, but a configurable idle window (e.g. 30–60s) lets memory return after a quiet period. Worth a pm/backlog.md entry.

M2 — Dead self._image_optimizer = None in finallyheadroom/transforms/content_router.py:1414. After the refactor, _get_image_optimizer() no longer assigns to self._image_optimizer (it constructs and returns directly). The _image_optimizer instance attribute is initialized to None at content_router.py:662 and never set to anything else, so resetting it in the finally is a no-op. Suggested fix: remove the line (and ideally remove the unused instance attribute at content_router.py:662), or — if you want to keep a defensive guard against future reintroduction — leave a one-line comment explaining the assignment is intentional. Test test_content_router_releases_image_optimizer_after_use asserts router._image_optimizer is None post-call, so removing the line still passes the assertion (the attribute starts at None), but the assertion stops verifying anything meaningful — consider replacing it with fake.close.assert_called_once() only.

M3 — Test mutates module global without cleanuptests/test_proxy_handler_helpers.py:159-172 (test_proxy_helper_creates_fresh_image_compressors). Sets helpers._image_compressor_available = None to reset state but never restores after the test. After this test runs, _image_compressor_available is left as True (assuming import succeeds inside the patch context, which it does because _FreshCompressor patches headroom.image.ImageCompressor, not the from headroom.image import ImageCompressor lookup… actually, see below — patching the module attribute does take effect). Subsequent tests that depend on the "first-call logs once" behavior or on the import-failure short-circuit may see stale memo state. Suggested fix: wrap in a small fixture that snapshots/restores helpers._image_compressor_available, or use monkeypatch.setattr(helpers, "_image_compressor_available", None). Not a current failure, just an order-dependence trap.

M4 — Unnecessary contextlib.suppress(Exception) around gc.collect()headroom/image/trained_router.py:226-229. The non-registry branch of release_models does:

with contextlib.suppress(Exception):
    import gc
    gc.collect()

import gc is a stdlib import that cannot fail at runtime in a working interpreter, and gc.collect() returns an int — it doesn't raise. The suppress masks no real failure mode. Suggested fix: drop the suppress and the local import (move import gc to module top, where import gc already exists in ml_models.py — and could exist here too). Minor, but error-masking patterns are worth pruning.

Low

L1 — Inconsistent default local_path resolution between callersheadroom/image/trained_router.py:174-179 resolves the local technique-router path as Path(__file__).parent.parent.parent / "models" / "technique-router-mini" / "final" (file-relative absolute path), while headroom/models/ml_models.py:282 resolves it as Path("headroom/models/technique-router-mini/final/") (CWD-relative). Today this never triggers a mismatch because TrainedRouter._load_models always passes a fully-resolved model_id to MLModelRegistry.get_technique_router(model_path=...), so the registry never reaches its own if model_path is None branch. If any other caller invokes MLModelRegistry.get_technique_router(model_path=None) directly while the local path exists, it will use a different cache key than TrainedRouter would produce, creating a duplicate entry the prefix-based shutdown unload would still cover via unload_prefix("technique_router:") — so still safe at shutdown, but routers and registry could end up with two entries for the "same" model. Not introduced by this PR; worth a small follow-up to centralize the resolution. Suggested fix: extract _resolve_default_technique_router_path() and reuse it from both sites.

L2 — Public API semantic change for get_compressor()headroom/image/compressor.py:574-580. Previous behavior: returned a process-wide singleton. New behavior: returns a fresh instance each call. The new docstring notes "Kept for backwards-compatible imports", but this is a quiet behavior change — any external consumer that imports get_compressor and relies on identity / shared last_result state will silently break. Suggested fix: optional — add a DeprecationWarning on get_compressor() pointing callers at ImageCompressor() directly, since the helper now adds nothing. Or note in CHANGELOG that the function returns a fresh instance now.

Verified positively

  • Cache key consistency across trained_router._classifier_key / _siglip_key and MLModelRegistry.get_technique_router / get_siglip keys (both build f"technique_router:{model_path}" and f"siglip:{model_name}" from the same resolved values when TrainedRouter is the caller).
  • Idempotence of release_models (re-call sees classifier_key=None, filters to [], unload_many([]) short-circuits without runtime cleanup).
  • Order in release_models — local refs nulled before registry unload, so registry can actually free memory.
  • Order in proxy/server.py:shutdown — http client + memory handler closed before model unload, so no in-flight request is using a freshly-evicted registry entry. contextlib.suppress(Exception) around the unload is defensible (shutdown should not raise).
  • unload_many and unload_prefix thread-safety via instance._model_lock.
  • _release_runtime_memory defensively guards torch.cuda.is_available() and torch.mps.empty_cache() (handles both backends and missing-attr versions).
  • Try/finally placement in both proxy handlers is correct: compressor = None defaulted before the try, _get_image_compressor() inside the try (so a future failure during construction also closes any partial state), compressor.close() guarded by compressor and hasattr(compressor, "close") in the finally.
  • Behaviour on text-only (no-images) requests: fresh ImageCompressor() is constructed → has_images() returns Falseclose() early-returns because self._router is None. Cost is one object allocation; no model load triggered. Good.
  • compress_images convenience function constructs ImageCompressor() directly rather than going through get_compressor(), decoupling it from M2.
  • Test coverage for the close path: test_release_models_unloads_registry_entries, test_compress_images_function_closes_temporary_compressor, test_proxy_helper_creates_fresh_image_compressors, test_content_router_releases_image_optimizer_after_use, test_proxy_shutdown_unloads_image_models — comprehensive.

Verdict

LGTM — approve. The PR is well-scoped to the stated lifecycle fix, the tradeoff (per-request load vs. memory pinning) is intentional and documented, and the test surface covers the new release path on every code site that constructs a compressor. None of the Mediums block merge; M1 is worth a follow-up backlog entry, M2/M3/M4 are cleanups, L1/L2 are defensive notes.

@eggrollofchaos eggrollofchaos force-pushed the codex/fix-image-model-release-clean branch from e8d9b6f to 3501616 Compare April 27, 2026 00:15
@eggrollofchaos
Copy link
Copy Markdown
Author

Codex Fix — Round 2

Cleaned up the easy Medium notes from the independent review.

Changes

  • M2 dead ContentRouter assignment — removed the unused _image_optimizer attribute and the no-op reset in finally; the close assertion now carries the test.
  • M3 test global restore — changed test_proxy_helper_creates_fresh_image_compressors to use monkeypatch so _image_compressor_available is restored after the test.
  • M4 unnecessary suppress — replaced the contextlib.suppress(Exception) wrapper in release_models(unload_registry=False) with a direct gc.collect() import/use.
  • L2 public behavior note — expanded the CHANGELOG entry to call out that get_compressor() now returns a fresh caller-owned compressor instead of a process-lifetime singleton.

Verification

  • uv run --frozen --extra dev python -m pytest tests/test_image_compressor.py tests/test_proxy_handler_helpers.py tests/test_proxy_pipeline_lifecycle.py -q — 40 passed, 25 skipped
  • uv run --frozen --extra dev ruff check headroom/image/compressor.py headroom/image/trained_router.py headroom/models/ml_models.py headroom/proxy/handlers/anthropic.py headroom/proxy/handlers/openai.py headroom/proxy/server.py headroom/transforms/content_router.py tests/test_image_compressor.py tests/test_proxy_handler_helpers.py — all checks passed

Remaining deferred notes

  • M1 image-heavy idle cache/config knob — deferred as follow-up/product policy. This PR intentionally favors returning memory after image work.
  • L1 default local path resolution cleanup — pre-existing and safe at shutdown; deferred.

Fixes pushed in 3501616.

Copy link
Copy Markdown
Author

@eggrollofchaos eggrollofchaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final Confirmation Review (commit 3501616)

Independent re-pass against the full PR, not just the Round 2 delta. No new Critical/High findings — verdict: LGTM (commenting for transparency on the small deltas).

Summary

Severity Count
Critical 0
High 0
Medium 0
Low 2

Round 2 fixes verified

  • M2 (ContentRouter dead _image_optimizer) — instance attribute removed at headroom/transforms/content_router.py:654-664; optimize_images_in_messages no longer touches self._image_optimizer in finally. The matching test test_content_router_releases_image_optimizer_after_use (tests/test_image_compressor.py:970-988) now asserts fake.close.assert_called_once() only — the meaningless post-state check is gone. ✅
  • M3 (test global mutation)tests/test_proxy_handler_helpers.py:159-172 uses monkeypatch.setattr(helpers, "_image_compressor_available", None). Restores cleanly after the test. ✅
  • M4 (unnecessary contextlib.suppress(Exception) around gc.collect())headroom/image/trained_router.py:220-226 now does else: gc.collect() directly; module-level import gc is in place at headroom/image/trained_router.py:13. No suppress, no re-import. ✅
  • L2 (public-API behaviour change for get_compressor())CHANGELOG.md:19-23 explicitly calls out the singleton → fresh-instance shift. ✅

Other invariants spot-checked on this commit

  • ImageCompressor.close() at headroom/image/compressor.py:105-111 is a no-op when _router is None, so text-only requests cost only one allocation.
  • TrainedRouter.release_models() at headroom/image/trained_router.py:207-228 nulls local refs before calling MLModelRegistry.unload_many — registry can free, idempotent on repeat calls (classifier_key=Nonekeys=[]unload_many([]) short-circuits).
  • MLModelRegistry.unload_many / unload_prefix (headroom/models/ml_models.py:103-136) hold instance._model_lock during mutation and only run _release_runtime_memory() when something was actually evicted.
  • Proxy handlers (headroom/proxy/handlers/anthropic.py:712-726, headroom/proxy/handlers/openai.py:242-256) initialise compressor = None before the try, allocate inside the try, and guard the finally with compressor and hasattr(compressor, "close"). Construction failures cannot leak.
  • ProxyServer.shutdown (headroom/proxy/server.py:846-853) wraps the unload pass in contextlib.suppress(Exception) (defensible at shutdown) and logs the released keys.

Low

  • L-new-1 — test_content_router_loads_image_compressor / ..._is_image_compressor allocate but never close. tests/test_image_compressor.py:920-943. With the new lifecycle each _get_image_optimizer() call is supposed to be caller-owned. These two tests construct compressors and discard them. No model load happens (compress isn't called), so it's hygiene rather than a leak. Optional follow-up: add compressor.close() for symmetry, or wrap in a try/finally.
  • L-new-2 — PR currently mergeable: CONFLICTING against main. Behind origin/main by ~10 commits (last merged: #277 rust-stage-3b-diff-bridge). The PR body documents pre-existing CRLF/LF noise as intentional, but the underlying CONFLICTING state still needs a rebase before squash-merge. Not a code finding; flagging so it isn't missed at merge time.

Carried-over deferred items (acknowledged, not blocking)

  • M1 (carry-forward) — per-request load/unload thrashing on bursty image traffic. Idle-timeout/TTL warm cache is a worthwhile follow-up; backlog entry recommended.
  • L1 (carry-forward) — default local technique-router path drift between TrainedRouter._load_models and MLModelRegistry.get_technique_router(model_path=None). Pre-existing; safe at shutdown via unload_prefix("technique_router:"). Centralise in a follow-up.

Verdict

LGTM — approve. Round 2 fixes are correctly applied, no regressions introduced, lifecycle invariants hold. Triggering auto-approval workflow.

@eggrollofchaos eggrollofchaos force-pushed the codex/fix-image-model-release-clean branch from 3501616 to 4ff2e8e Compare April 27, 2026 02:45
@eggrollofchaos
Copy link
Copy Markdown
Author

Rebase Update

Rebased the PR onto current origin/main after upstream merged the Rust-stage work and line-ending renormalization.

Conflict resolution

Resolved conflicts in:

  • headroom/proxy/handlers/openai.py
  • headroom/proxy/server.py
  • headroom/transforms/content_router.py
  • tests/test_proxy_handler_helpers.py
  • tests/test_proxy_pipeline_lifecycle.py

Kept upstream's current main changes and reapplied only the image-model lifecycle cleanup.

Verification

  • uv run --frozen --extra dev python -m pytest tests/test_image_compressor.py tests/test_proxy_handler_helpers.py tests/test_proxy_pipeline_lifecycle.py -q — 40 passed, 25 skipped
  • uv run --frozen --extra dev ruff check headroom/image/compressor.py headroom/image/trained_router.py headroom/models/ml_models.py headroom/proxy/handlers/anthropic.py headroom/proxy/handlers/openai.py headroom/proxy/helpers.py headroom/proxy/server.py headroom/transforms/content_router.py tests/test_image_compressor.py tests/test_proxy_handler_helpers.py tests/test_proxy_pipeline_lifecycle.py — all checks passed

Pushed as 4ff2e8e.

Copy link
Copy Markdown
Author

@eggrollofchaos eggrollofchaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed against 4ff2e8e. All v1 findings addressed.

Resolved

  • High — ContentRouter._image_optimizer race: cache attribute removed entirely; _get_image_optimizer returns fresh instance. New test_content_router_compressor_is_fresh_per_call and test_content_router_releases_image_optimizer_after_use pin both halves of the contract.
  • High — _default_compressor singleton leak: singleton dropped, get_compressor() returns fresh instance, compress_images() wraps in try/finally with close(). Old singleton test replaced with test_get_compressor_returns_fresh_instance; new test_compress_images_function_closes_temporary_compressor pins the close path.
  • Medium — torch hard-import: wrapped in try/except ImportError with early return. Bonus: outer torch ops now wrapped in contextlib.suppress(Exception) for defensive teardown.
  • Medium — duplicate gc/empty_cache passes: new unload_many consolidates both keys behind one runtime cleanup. release_models uses it.
  • Low — _get_image_compressor() outside try: both anthropic and openai handlers now init compressor = None first, assign inside try. Cleaner.
  • Low — silent unload: server.py shutdown now logs Released image optimizer models: {keys}.
  • Nit — CHANGELOG: entry added explaining lifecycle change and get_compressor() semantics shift.

Deferred by design

  • Medium — per-request reload thrashing: no config knob added. PR description acknowledges the trade-off and the user-facing tone is "good neighbor on memory". Acceptable as-is; flag for follow-up if image-burst workloads ever become a deployment target.

New observations on v2

  • close = release_models alias on TrainedRouter (headroom/image/trained_router.py:225) is a direct attribute bind rather than a method wrapper. Subclasses overriding release_models won't see the override via close. Tiny footgun, only matters if subclassing happens later — feel free to convert to a one-line method if you want subclass-safety, otherwise fine.
  • helpers._image_compressor_available stays sticky across the process lifetime. If ImageCompressor import ever fails once and then becomes available later (e.g., user installs the optional extra mid-session), the helper will keep returning None. Practically never happens; skipping unless someone hits it.
  • Shutdown's contextlib.suppress(Exception) wraps both registry calls and the logger.info. If logging happens to fail during teardown, suppression hides it — defensible for a shutdown path; documenting here so reviewers don't get surprised.

LGTM

Clean lifecycle, tests cover the intent, shutdown observable. Approve from my side.

@eggrollofchaos eggrollofchaos force-pushed the codex/fix-image-model-release-clean branch from 4ff2e8e to 8a3591d Compare April 27, 2026 03:14
@eggrollofchaos eggrollofchaos marked this pull request as ready for review April 27, 2026 03:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant