Skip to content

feat: remote health circuit breaker and S3 error diagnostics#22

Merged
jleni merged 5 commits intomainfrom
feat/v4-planner-contract
Apr 6, 2026
Merged

feat: remote health circuit breaker and S3 error diagnostics#22
jleni merged 5 commits intomainfrom
feat/v4-planner-contract

Conversation

@jleni
Copy link
Copy Markdown
Member

@jleni jleni commented Apr 6, 2026

Summary

  • Daemon RemoteHealth circuit breaker: Suppresses repetitive S3 HEAD probe warnings after 3 consecutive failures, entering a 45s degradation window where probes are skipped and logged at debug level. Recovers automatically on next success.
  • Improved S3 error messages: Captures HTTP status code from the raw SDK response before into_service_error() strips it, so Ceph/MinIO errors now show HTTP 403 instead of unhandled error.
  • Warming grace period: Remote checks now wait up to 750ms for manifest prefetch to complete before falling through to HEAD probes.
  • Makefile → Justfile migration: Build tasks moved to Justfile; CI workflows updated accordingly.
  • kache-service planner updates: Updated planner integration in service crate.

Test plan

  • Verify cargo test head_object passes (unit tests for HEAD error classification)
  • Verify cargo test remote_health passes (circuit breaker unit tests)
  • Verify cargo test warming passes (warming barrier tests)
  • Manual: run daemon against unreachable S3 → confirm only 1 warn + circuit breaker activation instead of per-crate spam
  • Manual: confirm error message includes HTTP status code (e.g. HTTP 403)

@jleni jleni force-pushed the feat/v4-planner-contract branch from ebc6cf0 to f4f342f Compare April 6, 2026 17:02
…iagnostics

- Replace Makefile with Justfile for build task management
- Add RemoteHealth circuit breaker to suppress repetitive HEAD probe
  warnings after 3 consecutive failures (45s degradation window)
- Capture HTTP status from S3 SdkError before into_service_error()
  so Ceph/MinIO errors show "HTTP 403" instead of "unhandled error"
- Add warming grace period for remote checks during daemon startup
- Migrate CI workflows to use Justfile targets
- Update kache-service planner integration
@jleni jleni force-pushed the feat/v4-planner-contract branch from f4f342f to 22972b8 Compare April 6, 2026 17:04
jleni added 2 commits April 6, 2026 19:26
Cargo captures RUSTC_WRAPPER stderr and stores it as compiler diagnostics,
replaying stale warnings on every subsequent up-to-date build. Default the
wrapper's stderr tracing to "off"; users can still opt in via KACHE_LOG.
@jleni jleni merged commit 44e660d into main Apr 6, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant