Skip to content

feat(proxy): support production multi-worker proxy startup + CLI/env tuning#279

Open
Kayzo wants to merge 2 commits intochopratejas:mainfrom
Kayzo:feature/production-multi-worker-proxy-startup
Open

feat(proxy): support production multi-worker proxy startup + CLI/env tuning#279
Kayzo wants to merge 2 commits intochopratejas:mainfrom
Kayzo:feature/production-multi-worker-proxy-startup

Conversation

@Kayzo
Copy link
Copy Markdown
Contributor

@Kayzo Kayzo commented Apr 26, 2026

Summary

Adds first-class support for running the Headroom proxy with multiple Uvicorn workers through the official headroom proxy CLI and the Docker image, so Headroom can be used as a shared local proxy for multi-session / multi-subagent
coding-agent setups without needing a custom launcher.

This was requested in the issue that proposed CLI flags + env vars for workers and connection pool sizing on headroom proxy, and documented the workaround of running a separate ASGI launcher under python -m uvicorn.

Motivation

Running Headroom as a shared proxy in front of multiple agent sessions (Codex/OpenAI, Anthropic, Gemini) is CPU-bound during compression/tokenization. With a single worker, the proxy can bottleneck even when the host has many cores and RAM
available.

Before this PR:

  • headroom proxy offered no way to scale workers, concurrency, or connection pool.
  • Setting HEADROOM_WORKERS as an env var had no effect because the CLI did not read it.
  • Passing --workers N directly to python -m headroom.proxy.server failed because Uvicorn requires an import string to enable multiple workers, not an instantiated app.
  • Users had to write a custom ASGI module + override the Docker entrypoint to scale at all.

Changes

headroom/cli/proxy.py

  • New CLI flags + env vars on the headroom proxy command:
    • --workers / HEADROOM_WORKERS (default 1)
    • --limit-concurrency / HEADROOM_LIMIT_CONCURRENCY (default 1000)
    • --max-connections / HEADROOM_MAX_CONNECTIONS (default 500)
    • --max-keepalive / HEADROOM_MAX_KEEPALIVE (default 100)
  • CLI flags override env vars via Click precedence.
  • workers and limit_concurrency are forwarded to run_server(...) only when non-default, so default single-worker behavior is byte-identical to before.

headroom/proxy/server.py

  • When workers > 1, run_server(...) serializes the full ProxyConfig into an internal env var HEADROOM_PROXY_CONFIG_JSON and starts Uvicorn with:
    • app_target = "headroom.proxy.server:create_app_from_env"
    • factory=True
    • workers=N, limit_concurrency=M, plus the existing proxy_headers=False.
  • When workers == 1, the old behavior is preserved: create_app(config) is called directly and passed as the app object.
  • New helpers:
    • _json_ready(value) — recursive dataclass → JSON-safe conversion.
    • _proxy_config_payload(config) — builds a serializable dict for all ProxyConfig fields. Verified to preserve all 92 fields.
    • proxy_config_from_env() — reconstructs ProxyConfig from HEADROOM_PROXY_CONFIG_JSON; falls back to key HEADROOM* env vars for direct Uvicorn entrypoint usage.
    • create_app_from_env() — public ASGI factory referenced by the import string.

This means in multi-worker mode, each Uvicorn worker process re-creates the exact same Headroom app with the exact same ProxyConfig used by the parent CLI process.

headroom/proxy/loopback_guard.py

  • Bugfix: is_loopback_host("::ffff:127.0.0.1") now returns True on Linux dual-stack sockets as its docstring and existing tests promised. Uses IPv6Address.ipv4_mapped.is_loopback explicitly instead of relying on IPv6Address.is_loopback,
    which returns False for mapped literals.

Tests

  • tests/test_proxy_scalability.py::test_run_server_uses_import_string_for_multiple_workers:
    • Verifies that with workers > 1, uvicorn.run receives the import string, factory=True, correct workers/limit_concurrency, and that HEADROOM_PROXY_CONFIG_JSON contains the expected config.
  • tests/test_cli_proxy_env.py::test_production_scaling_env_vars and
    tests/test_cli_proxy_env.py::test_production_scaling_cli_flags_override_env_vars:
    • Verify env → CLI → ProxyConfig + run_server(...) kwargs wiring and precedence.

Backwards compatibility

  • Single-worker path is unchanged; same create_app(config) + direct Uvicorn call.
  • run_server(...) signature preserved (workers=1, limit_concurrency=1000).
  • No route or response-shape changes.
  • No changes to request handlers, compression pipeline, memory, or backends.

Docker / production usage

  services:
    headroom-proxy:
      image: ghcr.io/chopratejas/headroom:latest
      environment:
        HEADROOM_WORKERS: "4"
        HEADROOM_MAX_CONNECTIONS: "200"
        HEADROOM_MAX_KEEPALIVE: "50"
        HEADROOM_LIMIT_CONCURRENCY: "250"
      ports:
        - "8787:8787"

Verified locally: built the image from this branch with the runtime stage; container boots workers=2, shows two real child processes, and serves /health from both workers. Banner reports the configured values:

  Workers: 2    Concurrency Limit: 25
  Conn Pool: max=33, keepalive=7

Caveats

Known behavior in multi-worker mode, not a regression from this PR but worth calling out for users:

  • In-memory state (cache, rate limiter buckets, WebSocket registry, display session, recent requests) is per worker.
  • Dashboard//stats views may appear to change as requests land on different workers.
  • Worker-aware session management and cross-worker stats aggregation are planned as a separate follow-up issue/branch — intentionally out of scope here to keep this change focused on startup.

Validation

  • Config round-trip: all 92 ProxyConfig fields survive JSON round-trip.
  • Full proxy/CLI/health/debug/WS/backpressure targeted test sets pass:
    • test_cli_proxy_env.py, test_proxy_scalability.py, test_proxy_healthchecks.py, test_proxy_debug_endpoints.py, test_ws_session_registry.py, test_anthropic_pre_upstream_backpressure.py.
  • Broader proxy + Codex/WS suite: 273 passed, 103 skipped (skipped are environment-dependent / external).
  • Runtime startup smoke: 2-worker host startup, two real workers serving /health.
  • Docker startup smoke: 2-worker container from this branch, two real workers serving /health, env vars honored by headroom proxy.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 26, 2026

Codecov Report

❌ Patch coverage is 68.00000% with 16 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
headroom/proxy/server.py 55.55% 13 Missing and 3 partials ⚠️

📢 Thoughts on this report? Let us know!

@chopratejas
Copy link
Copy Markdown
Owner

Fantastic work, @Kayzo - thank you -

There seems to be a linting error - can you help fix that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants