Skip to content

Conversation

@martintmk
Copy link
Member

@martintmk martintmk commented Jan 16, 2026

Introduces seatbelt, a production-ready resilience middleware crate providing:

  • Retry: Automatic retries with configurable backoff (constant, linear, exponential) and jitter
  • Timeout: Cancellation of long-running operations with per-request overrides
  • Circuit Breaker: Prevents cascading failures by blocking requests when failure rates exceed thresholds

Key features:

  • Runtime-agnostic design (works with any async runtime via tick::Clock)
  • Type-state pattern enforces correct configuration at compile time
  • OpenTelemetry metrics and structured logging support
  • Composable layers compatible with tower/layered middleware stacks

Example:

let clock = Clock::new_tokio();
  let context = ResilienceContext::new(&clock).name("my_service");

  let stack = (
      // Retry: recover from transient failures
      Retry::layer("retry", &context)
          .clone_input()
          .max_retry_attempts(3)
          .backoff(Backoff::Exponential)
          .recovery_with(|output: &Result<String, String>, _| match output {
              Ok(_) => RecoveryInfo::never(),
              Err(_) => RecoveryInfo::retry(),
          }),
      // Circuit breaker: stop calling failing services
      Circuit::layer("circuit", &context)
          .recovery_with(|output: &Result<String, String>, _| match output {
              Ok(_) => RecoveryInfo::never(),
              Err(_) => RecoveryInfo::retry(),
          })
          .rejected_input_error(|_, _| "circuit open".to_string()),
      // Timeout: bound operation duration
      Timeout::layer("timeout", &context)
          .timeout(Duration::from_secs(5))
          .timeout_output(|_| Err("timeout".to_string())),
      Execute::new(call_external_service),
  );

  let service = stack.build();
  let result = service.execute("request".to_string()).await;

@codecov
Copy link

codecov bot commented Jan 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.0%. Comparing base (7d6332c) to head (b0a20e7).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##             main     #203     +/-   ##
=========================================
  Coverage   100.0%   100.0%             
=========================================
  Files         105      135     +30     
  Lines        6756     8150   +1394     
=========================================
+ Hits         6756     8150   +1394     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Jan 16, 2026

⚠️ Breaking Changes Detected

error: failed to retrieve local crate data from git revision

Caused by:
    0: failed to retrieve manifest file from git revision source
    1: possibly due to errors: [
         failed to parse /home/runner/work/oxidizer/oxidizer/target/semver-checks/git-origin_main/2a2b8bad34ba5eda499d10ddb1319145e513035e/Cargo.toml: no `package` table,
         failed when reading /home/runner/work/oxidizer/oxidizer/target/semver-checks/git-origin_main/2a2b8bad34ba5eda499d10ddb1319145e513035e/scripts/crate-template/Cargo.toml: TOML parse error at line 9, column 26
         |
       9 | keywords = ["oxidizer", {{CRATE_KEYWORDS}}]
         |                          ^
       missing key for inline table element, expected key
       : TOML parse error at line 9, column 26
         |
       9 | keywords = ["oxidizer", {{CRATE_KEYWORDS}}]
         |                          ^
       missing key for inline table element, expected key
       ,
       ]
    2: package `seatbelt` not found in /home/runner/work/oxidizer/oxidizer/target/semver-checks/git-origin_main/2a2b8bad34ba5eda499d10ddb1319145e513035e

Stack backtrace:
   0: anyhow::error::<impl anyhow::Error>::msg
   1: cargo_semver_checks::rustdoc_gen::RustdocFromProjectRoot::get_crate_source
   2: cargo_semver_checks::rustdoc_gen::StatefulRustdocGenerator<cargo_semver_checks::rustdoc_gen::CoupledState>::prepare_generator
   3: cargo_semver_checks::Check::check_release::{{closure}}
   4: cargo_semver_checks::Check::check_release
   5: cargo_semver_checks::exit_on_error
   6: cargo_semver_checks::main
   7: std::sys::backtrace::__rust_begin_short_backtrace
   8: main

If the breaking changes are intentional then everything is fine - this message is merely informative.

Remember to apply a version number bump with the correct severity when publishing a version with breaking changes (1.x.x -> 2.x.x or 0.1.x -> 0.2.x).

Copy link
Member

@geeknoid geeknoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your throughs on the fallback and hedging patterns? Do you expect to add those later on?

@martintmk
Copy link
Member Author

What are your throughs on the fallback and hedging patterns? Do you expect to add those later on?

Currently we only focused for the middlewares that are part of standard resilience pipeline in .NET:
https://learn.microsoft.com/en-us/dotnet/core/resilience/http-resilience?tabs=dotnet-cli#standard-resilience-handler-defaults

Once necessary we will add fallback and hedging next. Currently design will work great for these, actually hedging becomes much simpler and powerful with the design we have now.

Hedging requires cloning the input, in .NET because Polly worked with callbacks we had to put some workarounds to get to the original input. Here, it's much more natural as the input is immediately accessible by every middleware layer.

//!
//! # Configuration
//!
//! The [`TimeoutLayer`] uses a type state pattern to enforce that all required properties are
Copy link
Member

@geeknoid geeknoid Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what's the benefit of using this pattern, rather then just requiring the parameters on the layer function in the first place?

Timeout::layer("timeout", &context, Duration::from_secs(30), |_| io::Error::new(io::ErrorKind::TimedOut, "operation timed out"));

That's a lot less typing...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at TimeoutLayer you can notice there are multiple ways on how the required parameters can be configured. This all depends on combination of generics. For example, if output is result-based one can call TimeoutLayer::timeout_error, or just TimeoutLayer::timeout_output if full flexibility is necessary.

Requiring all required parameters to be specified when constructing the layer removes that flexibility or can lead to many layer constructors to be exposed. Same thing applies to circuit-breaker and retries. So rather than deal with many constructors, I embraced the builder approach for everything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing with RetryLayer, depending on the scenario once can choose from these required methods:

  • RetryLayer::clone_input_with
  • RetryLayer::clone_input
  • RetryLayer::recovery_with
  • RetryLayer::recovery

Supporting all these combinations in constructor would be tricky and kinda verbose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely see the value of the type state approach in general. My comment is specifically about required parameters. If they're truly required values, then using the type state pattern for those just seems like an extra burden, more keystrokes. And I think it's less intuitive when learning to use an API.

It looks as though I can create a timeout thingy easily. Oh, but I get a (in this case fairly cryptic) compile time error telling me something is wrong. I need to figure out that I must call these other two functions in order to supply parameters.

When it's just there in the constructor as an explicit parameter, it's unambiguous.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked with Ralf regarding state pattern and he is generally fine with it. We agreed on doing some simplification regarding the name of the type states.

Regarding the required parameters. Having a constructor to set these and abandon type-state pattern would work if there was only a single way to set these, but this is not the case here. These parameters can be set using multiple methods and these is no simple way to set these using a constructor.

Copy link
Collaborator

@ralfbiedert ralfbiedert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes from discussion:

  • PipelineContext -> ResilienceContext (or so)
  • Hide Set, NotSet
  • Rename + document typestate variables
  • Remove re-exports of layered
  • .build() -> (in)to_service (in layered)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants