Skip to content

add gateway request hedging, coalescing, health checks, admission control #130

@cchinchilla-dev

Description

@cchinchilla-dev

Description

ProviderGateway (src/agentloom/providers/gateway.py) routes by static priority + fallback chain. #50 (strategy-based provider selection) covers cost/latency/priority strategies and #49 covers multi-key round-robin for rate limit distribution. Even with both of those, the gateway lacks several production-grade routing primitives:

  • Request hedging. Send the same request to two providers in parallel; use whichever responds first; cancel the other. Standard SRE pattern for tail-latency reduction. Today the gateway tries one provider at a time — if the first hangs near its timeout, the user waits the full timeout before fallback kicks in.
  • Request coalescing. Two callers with identical (messages, model, params) arriving simultaneously result in two upstream API calls. Coalescing collapses them to one, fanning out the response. For batched evaluation harnesses (PhD's H4 testing the same prompt across many scenarios) this can halve cost.
  • Proactive health checks. Today the circuit breaker only learns about provider health by attempting real requests. A provider that's been down for 30 minutes only gets retested when the next workflow happens to route to it. Periodic background probes detect recovery faster.
  • Admission control / backpressure. Under burst load (1000 workflows started in the same second), all 1000 hit the gateway, all 1000 enqueue at the rate limiter, all 1000 wait indefinitely. There is no "reject if queue depth > N" — the system has no graceful degradation under overload.

Proposal

Four independent gateway features that can ship in any order. Each is small enough for a single PR.

1. Request hedging:

- id: critical_step
  type: llm_call
  prompt: "..."
  hedge:
    enabled: true
    delay_ms: 500          # wait this long before firing the second request
    max_parallel: 2        # most providers in race

Implementation: in gateway.complete(), when hedge.enabled, fire the primary request immediately and start a timer. After delay_ms, fire the second request to the next candidate. Race them via anyio.move_on_after + task group. First to return wins; the other is cancelled. Cancellation goes through the same GeneratorExit path as #106 — it must NOT count as a circuit breaker failure.

Default off — opt-in per step. Doubles cost when both win the race against the timer (rare but real).

2. Request coalescing:

config:
  coalescing:
    enabled: true
    window_ms: 100         # requests within this window with identical key are coalesced

Implementation: in gateway.complete(), hash (messages, model, temperature, max_tokens, kwargs). If an in-flight request with the same hash exists and started within window_ms, await its result instead of issuing a new request. Both callers receive the same ProviderResponse. Cost is attributed evenly across coalesced callers (or, simpler: to the first caller; document the choice).

Default off — opt-in via config. Useful only when AgentLoom processes are long-lived (servers, batch runs).

3. Proactive health checks:

class ProviderEntry:
    health_check_interval_s: float = 60.0
    health_check_endpoint: str = "/health"   # or a cheap "ping" model call

A background task per provider periodically probes — sends a minimal request and records latency. On failure, increments a "passive failure" counter that, if it crosses a threshold, opens the circuit even without real workflow traffic. On success in OPEN state, transitions to HALF_OPEN faster than the time-based recovery would allow.

Stops probing if the workflow is paused (no active workflows for N minutes) — don't burn quota when idle.

4. Admission control:

config:
  admission:
    max_queue_depth: 100     # per provider
    on_overflow: reject      # or "shed_oldest" / "block"

Implementation: rate limiter tracks pending acquires. If the count exceeds max_queue_depth, the next acquire() either:

  • reject: raises AdmissionRejectedError immediately. Caller decides what to do (typically: fail fast).
  • shed_oldest: cancels the oldest waiting request (it gets AdmissionRejectedError); admits the new one.
  • block: existing behavior (wait indefinitely).

Default block for backward compat. Production deployments configure to reject to fail fast under overload instead of building up a multi-minute backlog.

Scope

  • src/agentloom/providers/gateway.py — hedging logic, coalescing logic, admission control wiring.
  • src/agentloom/resilience/rate_limiter.py — admission control inside acquire().
  • src/agentloom/resilience/health_checker.py — new module with the periodic probe loop.
  • src/agentloom/core/models.pyHedgeConfig, CoalescingConfig, AdmissionConfig on WorkflowConfig / StepDefinition.
  • src/agentloom/exceptions.pyAdmissionRejectedError.
  • src/agentloom/observability/metrics.py — counters: agentloom_hedge_wins_total{primary_provider, winner_provider}, agentloom_coalesced_requests_total, agentloom_admission_rejections_total{provider, reason}, agentloom_health_checks_total{provider, status}.

Regression tests

For each feature:

  • test_hedge_returns_first_response
  • test_hedge_cancels_loser_without_circuit_failure
  • test_hedge_only_one_request_when_first_succeeds_before_delay
  • test_coalesce_combines_identical_requests_within_window
  • test_coalesce_does_not_combine_requests_with_different_kwargs
  • test_coalesce_propagates_failure_to_all_waiters
  • test_health_check_runs_periodically_when_workflows_active
  • test_health_check_stops_when_idle
  • test_health_check_failure_opens_circuit
  • test_admission_reject_raises_immediately_when_queue_full
  • test_admission_shed_oldest_cancels_oldest_waiter
  • test_admission_block_existing_behavior_unchanged

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestprovidersProvider gateway and adaptersresilienceCircuit breaker, retry, rate limiter

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions