The Problem
When a `did:web` DID document cannot be fetched (DNS failure, HTTP timeout, malformed response), the resolver returns an error and the verification fails. This is correct. But there is no circuit breaker — every subsequent request for the same DID hits the network again.
If an issuer's `did:web` endpoint goes down, every bundle with that issuer fails, and every failure attempt burns the full HTTP timeout waiting for a response that will not come. At high request rates, this means a large fraction of the thread pool is stuck waiting on a dead endpoint.
This is distinct from the global lock issue (#10). Even after fixing the lock, without a circuit breaker, a dead `did:web` endpoint causes repeated expensive failures for every request with that issuer.
What Must Change
- Track failure counts per DID in the resolver. After N consecutive failures (configurable, default 5) within a window, open the circuit for that DID.
- While the circuit is open, return an error immediately without attempting network I/O.
- After a configurable cooldown period (default 60 seconds), allow one probe request through. If it succeeds, close the circuit. If it fails, extend the cooldown.
- Expose the circuit state as a metric or in the `/readyz` response so operators can see which DIDs are tripped.
A circuit breaker is not optional for a service that makes outbound HTTP calls in the critical verification path. Without one, a single unreachable issuer degrades the entire service for all traffic.
Severity
MEDIUM. Operational resilience issue rather than a direct security bypass. Becomes HIGH in deployments where `did:web` issuers are on external infrastructure not under your control.
The Problem
When a `did:web` DID document cannot be fetched (DNS failure, HTTP timeout, malformed response), the resolver returns an error and the verification fails. This is correct. But there is no circuit breaker — every subsequent request for the same DID hits the network again.
If an issuer's `did:web` endpoint goes down, every bundle with that issuer fails, and every failure attempt burns the full HTTP timeout waiting for a response that will not come. At high request rates, this means a large fraction of the thread pool is stuck waiting on a dead endpoint.
This is distinct from the global lock issue (#10). Even after fixing the lock, without a circuit breaker, a dead `did:web` endpoint causes repeated expensive failures for every request with that issuer.
What Must Change
A circuit breaker is not optional for a service that makes outbound HTTP calls in the critical verification path. Without one, a single unreachable issuer degrades the entire service for all traffic.
Severity
MEDIUM. Operational resilience issue rather than a direct security bypass. Becomes HIGH in deployments where `did:web` issuers are on external infrastructure not under your control.