Skip to content

add MockProvider fault modes for deterministic failure replay #123

@cchinchilla-dev

Description

@cchinchilla-dev

Description

MockProvider (src/agentloom/providers/mock.py) replays pre-recorded successful responses. There is no way to make it return a failure (timeout, 5xx, rate-limit response, malformed JSON, partial stream). #62 (chaos/fault injection testing mode) addresses fault injection at the gateway level — random/probabilistic failure injection for stress tests. That is complementary but does not solve the same problem.

The PhD's Simulator (per agenttest-planteamiento.md) includes a fault injection mode that simulates an agent failing — timeout, invalid response, crash — to measure cascading failures and recovery. For this to work in a reproducible test harness, faults must be deterministic and replayable, not probabilistic. A failed scenario must produce the same failure on every run, just like a successful scenario produces the same response on every run.

Today the only way to test failure handling deterministically is to mock at the httpx layer per test — outside AgentLoom, bypassing the gateway, brittle.

Proposal

Extend MockProvider's response file format to support fault declarations, and add gateway-level integration so faults flow through the same code paths as real failures (circuit breaker, retry, fallback chain).

1. Extended response file format:

{
  "step_classify": {
    "content": "question",
    "model": "gpt-4o-mini",
    "usage": {"prompt_tokens": 10, "completion_tokens": 1, "total_tokens": 11},
    "cost_usd": 0.0001
  },
  "step_answer": {
    "fault": {
      "type": "timeout",
      "after_ms": 5000
    }
  },
  "step_summarize": {
    "fault": {
      "type": "http_error",
      "status_code": 429,
      "headers": {"Retry-After": "30"},
      "body": "Rate limit exceeded"
    }
  },
  "step_explain": {
    "fault": {
      "type": "http_error",
      "status_code": 500,
      "body": "Internal server error"
    }
  },
  "step_translate": {
    "fault": {
      "type": "malformed_response",
      "raw": "not valid json"
    }
  },
  "step_stream_long": {
    "fault": {
      "type": "stream_truncate",
      "after_chunks": 3,
      "raise": "ConnectionError"
    }
  }
}

2. Fault types:

Type Effect
timeout await anyio.sleep(after_ms) then raise TimeoutError
http_error Raise ProviderError(status_code=N) mimicking provider HTTP error; RateLimitError when 429
malformed_response Return invalid bytes that fail to parse — exercises adapter error paths
stream_truncate Yield N chunks then raise specified exception mid-stream — exercises gateway stream cancellation logic (#106)
connection_reset Immediate httpx.ConnectError
partial_response Returns content but with usage.completion_tokens=0 and finish_reason="length" — exercises usage parsing

3. Determinism:

Each step keyed by step_id (same as success replay). The same fault fires every time the workflow runs that step. No probability — that's #62's domain. This one is "scenario X always times out at iteration 2 of the agent loop."

4. Composition with success replay:

A workflow can mix faults and successes in the same run by keying the response file by step_id:

{
  "first_attempt": {"fault": {"type": "http_error", "status_code": 500}},
  "second_attempt": {"content": "success after retry"}
}

Combined with the workflow's retry policy, this exercises the full retry-after-failure path deterministically.

5. Stream fault support:

For stream_truncate, MockProvider's stream() method (today missing — see #107) yields N chunks from the recorded content then raises. This validates that the gateway's _wrapped_iter correctly distinguishes consumer cancellation from provider failure (#106's fix).

6. Observability:

When a fault fires, MockProvider increments a counter:

agentloom_mock_fault_total{workflow, step_id, fault_type}

And sets a span attribute mock.fault_type so test runs are visible in traces.

Scope

  • src/agentloom/providers/mock.py — extended response file format, fault dispatch.
  • src/agentloom/providers/mock.py::stream — implement streaming with fault support (depends on / coordinates with fix record/replay: concurrent write race, streaming capture, hash key coverage #107).
  • src/agentloom/observability/metrics.py — new mock fault counter.
  • tests/providers/test_mock.py — comprehensive fault coverage.
  • examples/fault_replay.yaml — example workflow + response file showing each fault type.
  • docs/ — fault scenarios chapter in record/replay docs.

Regression tests

  • test_mock_fault_timeout_raises_timeout_error
  • test_mock_fault_http_error_429_raises_rate_limit_error
  • test_mock_fault_http_error_500_raises_provider_error
  • test_mock_fault_malformed_response_fails_adapter_parsing
  • test_mock_fault_stream_truncate_raises_mid_stream
  • test_mock_fault_connection_reset_immediate
  • test_mock_fault_partial_response_finish_reason_length
  • test_mock_fault_in_workflow_triggers_retry_policy
  • test_mock_fault_propagates_to_circuit_breaker
  • test_mock_fault_metric_recorded

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestprovidersProvider gateway and adapters

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions