Adding Time Models to Shuttle by dylanjwolff · Pull Request #217 · awslabs/shuttle

dylanjwolff · 2025-10-01T19:18:29Z

This PR creates a new API in Shuttle to allow for specifying a model for wall-clock time and associated functions / primitives.
In spirit, this API is similar to that of the Scheduler API in that it allows users to swap out different time models (or create their own time models), depending on their application and use case.

As initial examples, this PR contains two time models:

The ConstantSteppedTimeModel, which advances a global time by a configurable constant amount (starting from 0) at each scheduling step.
The FrozenTimeModel, which is based on a ConstantSteppedTimeModel of where the step size is zero. This model additionally provides the ability for users to manually expire Sleeps and Timeouts without advancing the global clock.

Both of these models will automatically "fast-forward" time to wake the next sleeping tasks if all tasks are sleeping.

Primitive Representations:

As part of this change, Shuttle now vends it's own Duration, Instant, Sleep, Interval and Timeout primitives, whose behavior depends on the current TimeModel. These primitives are intended to eventually fully model the corresponding types in std::time and tokio::time. In this PR coverage of all functionality is incomplete, but enough core features are implemented such that most common use cases should be covered. Full feature parity can be achieved over time as the TimeModels become more mature.

The Shuttle Instant primitive is an enum. For this PR, there is only one representation of Durations, using a concrete std::time::Duration. However, in the future we can add representations to support other Time Models which might use logical scalar or vector clocks, for example.

The Shuttle Duration primitive is just a std::time::Duration by default. The reason for this is that std::time::Duration is very commonly used in library APIs. This can make switching to a different Duration type extremely painful, as all dependencies also must be made compatible. By leaving Duration unchanged, we can keep the barrier to adoption of time modeling low. With the advanced-time-models feature flag, users can opt-in to time models that may not be able to represent time intervals as a single value (for example, elapsed time between vector-clock timestamps).

Alternative Designs:

Because std::time::Duration and other primitives are concrete types, the corresponding Shuttle primitives also must be concrete. Instead of enums with a fixed number of variants, we could instead opt for a struct which contains a type erased Box<dyn Duration>. This is more flexible, as users can even bring their own primitive representations without needing to make changes to Shuttle itself to add an enum variant. However, it would require type-casting in all operations involving multiple primitives (for example, comparing two durations or arithmetic operations between instants and/or durations). This approach is also further complicated by many Duration methods being const, which would preclude a fully dynamic Duration implementation.

As long as there are only a small number of representations for these primitives that all Time Models share, then an enum allows them to share logic without these complications. However, if later we find that there are many different primitive representations then we may need to reconsider the fully dynamic approach instead.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

sarsko · 2025-10-07T15:06:07Z

Needs conflict resolution in order to run benchmarks

shuttle/src/runtime/thread/continuation.rs

shuttle/src/runtime/execution.rs

sarsko · 2025-10-13T17:41:16Z

shuttle/src/runtime/execution.rs

+        while ExecutionState::num_runnable() == 0
+            && ExecutionState::with(|s| Rc::clone(&s.time_model))
+                .borrow_mut()
+                .wake_next()


Ditto on the with comment from above

Also why do we need a loop? We can just do one call no?

We use a loop to protect against stale wakers in the time model. It's possible that we don't need this with more diligent tracking, but at least for the current TimeModel implementations there are cases where we have old wakers that don't end up actually making any futures runnable when woken.

I think it is more complexity that it's worth to say that the TimeModel must always wake a non-stale waker with wake_next

sarsko · 2025-10-13T17:43:56Z

shuttle/src/runtime/execution.rs

    runnable_tasks: Vec<*const Task>,
+
+    // Counter for unique timing resource ids (Sleeps, Timeouts and Intervals)
+    pub(crate) timer_id_counter: u64,


What's this for? It's never used no?

it's used in shuttle/src/sync/time/mod.rs to differentiate sleepers with the same deadlines (see uses of increment_timer_counter())

sarsko · 2025-10-13T17:46:19Z

shuttle/src/runtime/execution.rs

+            && ExecutionState::with(|s| Rc::clone(&s.time_model))
+                .borrow_mut()
+                .wake_next()
+        {}


Ditto on the loop comment

sarsko · 2025-10-13T17:47:08Z

shuttle/src/runtime/execution.rs


+    pub(crate) fn num_runnable() -> usize {
+        Self::with(|state| state.tasks.iter().filter(|t| t.runnable()).count())
+    }


We should improve the tracking of runnable so that this becomes O(1)

agreed, but I don't know if we want to block this PR on that change. if so, I should probably open another PR to do that separately.

this would be strongly related to building the set of runnable tasks incrementally, which I have a prototype of on this branch:

https://github.com/dylanjwolff/shuttle/tree/incremental-persistent-vec

but I haven't PR'ed it because I actually think the right thing to do is to change the Shuttle scheduler API so that the individual schedulers manage their own runnable task sets in whatever data structure is best for that algorithm (for example, priority queue for RP).

Oh yeah no don't block the PR on this.

And yeah, lets chat about potential scheduler API changes on Monday

dylanjwolff · 2025-10-31T23:20:35Z

Might be worth adding this example as a test case:

https://rfd.shared.oxide.computer/rfd/0609

shuttle/src/time/constant_stepped.rs

sarsko · 2026-01-13T00:19:33Z

shuttle/tests/basic/pct.rs

    thread::spawn(move || {
        for _ in 0..TEST_LENGTH {
-            thread::sleep(Duration::from_millis(1));
+            thread::sleep(Duration::ZERO);


I didn't pay attention to this first time around, but why are we doing Duration::ZERO? The test should work just fine with a sleep of 1ms, no?

If I recall correctly, I read the original intent of these sleeps to be to insert a yield point without calling yield_now, since PCT takes that yield_now call as a hint that it might be in a busy loop and reduces the current thread's priority.

Since these tests are for the PCT algorithm/implementation and not meant to actually exercise timing behaviors, I think the timeouts should be Duration::ZERO if kept as a sleep, or (probably better) changed to an explicit yield_now_not_a_scheduling_hint API.

Ah getcha, thanks. We could make switch pub. But also: surely these tests will be equivalent with what was before when running under the time model which implements time as modeled before this PR? Like we could just create a time model which makes these tests work with Duration::from_millis(1) ?

Like we could just create a time model which makes these tests work with Duration::from_millis(1) ?

Yeah, this is also fine. My thought was that changing the duration to zero makes it more explicit that this test doesn't actually want any timing related behaviors to influence its outcome.

We could make switch pub.

For this specific test, I think it would probably be most clear if we did call switch directly. But the tradeoff for making it fully pub is that there are now two ways to insert a context switch for users of Shuttle, and the difference between the two is relatively subtle.

Decided to not make switch pub as we generally don't want people inserting scheduling points at whim (though they of course can by sleeping for no ZERO duration)

shuttle/src/time/mod.rs

sarsko · 2026-01-13T00:21:47Z

Converted to draft just to note that it is not mergeable because time models should live elsewhere. I'll reopen once its ready for review

shuttle/src/time/mod.rs

sarsko · 2026-01-29T01:47:23Z

Made the code reviewable (and started integrating it into our testing).
I have a few comments:

I am generally pro merging and publishing this.
I am tempted to refactor this into being in a "Futures-first" shape (eg. meaning it could be plug-and-played with bach or tokio or any other async runtime).
The refactoring is not urgent, but I'd prefer that model over every tool building their own way of doing time modeling.
The same Futures-first approach is how I'd like to do network modeling as well

sarsko · 2026-01-29T08:35:46Z

shuttle/src/time/constant_stepped.rs

+
+    #[allow(clippy::useless_conversion)]
+    fn advance(&mut self, dur: Duration) {
+        self.current_time_elapsed += dur.into();


This is wrong no? We also need to potentially wake the next task (aka add an unblock_expired)

This is generally a sharp edge when implementing time models. I'm thinking we'll want to protect against this by doing something like the following.

trait TimeModelRunnable: TimeModel { fn advance(&mut self, dur: Duration) { TimeModel::advance(self, dur); self.unblock_expired(); } // etc for all other functions } impl<T> TimeModelRunnable for T where T: TimeModel {} trait TimeModel { fn unblock_expired(&mut self) {} fn advance(&mut self, dur: Duration) {} // etc more functions }

It's been a while, but I think you are right that this is missing and unblock_expired. I can't think of a use case for advance without waking, and even if there is one, that should be a more explicit call.

+1 for adding the wrapper -- seems like a good idea. Probably should be done for step as well, maybe even wake_next.

How come we .into() here?

sarsko · 2026-01-29T08:51:53Z

shuttle/src/time/mod.rs

+    type Output = ();
+
+    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
+        let is_expired = get_time_model()


I don't see the reason for doing the call to register_sleep then a second call to register_sleep with the waker after — just pass the waker the first time.

Also this way of implementing register_sleep is dangerous in that we are requiring the time model to correctly manage the waker. I can't really see when a custom "waker manager" would be needed meaning we could just manage the waker for the time model, following the same layered trait methodology as above.

Actually I'm not sure I see how register_sleep is a function of the time model at all — why don't we just let the time model manage time (via current_time) and have the TimeModelRunnable implement the following:

fn register_sleep(&mut self, deadline: Instant, sleep_id: u64, waker: Waker) -> bool { if self.current_time() >= deadline { true } else { self.register_waker(waker) false } } fn register_waker(waker: Waker) { todo!() }

wrt. to why we need a custom register_sleep, I think this gets into the more "advanced time models" territory, where you might want a representation of time that isn't easily represented as a single Instant. Like if we use a vector time where each task with a particular tag has it's own view of the time (emulating a distributed system).

But maybe this can be put behind the "advanced time models" feature flag to hide some of the complexity for the basic case?

sarsko · 2026-01-29T08:58:45Z

shuttle/src/time/mod.rs

+    thread::switch();
+}
+
+/// Returns a future which sleeps until the duration has elapsed


I think all of this async stuff should be moved into a pub mod async and have their async_ prefix stripped

Sike, async is of course a registered keyword and I think r#async is ugly

sarsko · 2026-01-29T22:06:33Z

shuttle/src/time/mod.rs

+#[derive(Debug)]
+pub struct Timeout<F>
+where
+    F: Future,


This constraint needs to be removed (not there in Tokio: https://docs.rs/tokio/latest/tokio/time/struct.Timeout.html)

sarsko · 2026-01-29T22:30:37Z

shuttle/src/time/mod.rs

+/// Timeout a future
+#[pin_project]
+#[derive(Debug)]
+pub struct Interval {


todo: add fn disable

sarsko · 2026-02-05T01:45:52Z

shuttle/src/time/constant_stepped.rs

+
+impl TimeModel for ConstantSteppedTimeModel {
+    fn pause(&mut self) {
+        warn!("Pausing stepped model has no effect")


Surely pausing ConstantSteppedTimeModel means that it becomes the FrozenTimeModel ?

Yeah, it probably should. Maybe there's an argument that you shouldn't be manually pausing/unpausing the non-frozen models because it's error prone?

sarsko · 2026-02-05T01:46:31Z

shuttle/src/time/constant_stepped.rs

+    }
+
+    fn resume(&mut self) {
+        warn!("Resuming stepped model has no effect")


Similar comment as for pause

sarsko · 2026-02-05T02:21:48Z

shuttle/src/time/mod.rs

+
+/// Expire all current timeouts/sleeps requested by tasks whose tags match the
+/// given predicate. May not be implemented by all TimeModels.
+pub fn trigger_timeouts<F>(trigger: F)


A few comments:

I don't see why this only is implemented for the FrozenTimeModel

With a bit of refactoring of the traits this can be implemented for all time models

I am undecided on whether we want this at all. I understand we have it for legacy reasons, but I'm wondering whether we want to support that use case at all. See below.

This way of doing it is based on tracking stuff in Labels and using that to time things out via the passed trigger. This is 1: Error prone on the "maintaining stuff in the Labels side", 2: Error prone on the trigger_timeouts side, 3: Generally hard to maintain and keep up to date, 4: Not actually correct with regards to how time works and 5: Dangerous because it creates a scheme where a task is permanently expired (ie. it will have a weird form of "Midas' touch" and timeout every single timing event it ever touches).

If we add an api to get the current task count (this can be gotten in a roundabout way currently, and TaskId is also forgeable via From<usize>), and a way to get the Instant where a task will be woken, trigger_timeouts mechanism can be implemented in "user space" by doing a similar loop as we do in FrozenTimeModel::trigger_timeouts, but with time::advance() as the driver of timeouts. This solves 4 and 5 above, and alleviates 1 to 3 somewhat, though generally I find trigger_timeouts and friends to not be something which fills me with joy and happiness.

I don't see why this only is implemented for the FrozenTimeModel

I think for the same reason you mentioned; it's there for legacy reasons and I wasn't sure we wanted it at all. Generally I think this kind of manually trying to trigger timing behaviors is a bit of an anti-pattern for automated testing -- the whole point is to catch scenarios that you didn't think of, not ones you knew about enough already to hard-code.

Doing it in user-space seems reasonable to me (though realistically, I'm not sure who would actually go to the trouble).

sarsko · 2026-02-05T02:43:22Z

I wanna add a tokio::Runtime-like construct to Shuttle(-tokio), which contains the tasks on that Runtime and their clock. The way we do time modeling would have to be updated to accommodate that. The motivation for adding Runtime is 1: I think it is an ergonomic and intuitive way of separating hosts, 2: A lot of code that uses Tokio is written that way already, 3: It's the way Turmoil is implemented, so we could (with a couple more additions) get Turmoil support cheaply.

sarsko · 2026-02-06T01:19:56Z

shuttle/src/time/constant_stepped.rs

+    }
+
+    /// Manually wake a task without affecting the global clock
+    pub fn wake_frozen(&mut self, sleep_id: u64) {


Wait why does this exist?

It's been a while, but I think it's because the frozen time model is just a thin wrapper around the constant stepped model (step size zero). So this is to allow the frozen time model to trigger timeouts without advancing the clock of the inner constant model. For sure it shouldn't be pub, but more generally I don't know whether we do/don't want to advance the clock on triggering timeouts for the frozen model. Per the other convo, it's not even clear whether we even want to support "trigger_timeouts" manually anyways

sarsko · 2026-02-06T01:20:29Z

shuttle/src/time/constant_stepped.rs

+    distribution: ConstantTimeDistribution,
+    current_step_size: std::time::Duration,
+    current_time_elapsed: std::time::Duration,
+    waiters: BinaryHeap<Reverse<(std::time::Duration, TaskId, u64)>>,


Not a super fan of the untypedness of hits

(ie what are the fields, in particular the u64)

I think it's just the sleep ID

sarsko · 2026-02-06T01:21:58Z

shuttle/src/time/constant_stepped.rs

+    current_step_size: std::time::Duration,
+    current_time_elapsed: std::time::Duration,
+    waiters: BinaryHeap<Reverse<(std::time::Duration, TaskId, u64)>>,
+    wakers: HashMap<u64, Waker>,


Why have we separated the waiters and the wakers?

I don't remember, but I think it is because of type restrictions for what you can put on a binary heap in Rust (PartialOrd).

sarsko · 2026-02-06T01:26:57Z

shuttle/src/time/mod.rs

+/// Puts the current thread to sleep
+/// Behavior of this function depends on the TimeModel provided to Shuttle
+pub fn sleep(dur: Duration) {
+    if dur == Duration::ZERO {


I'm not sure I agree with the backdoor here

what is even the use case of a zero duration sleep?

note that without the backdoor, it is impossible for the current thread to be scheduled again right away after sleeping for zero with the current constant time model implementation (since the clock needs to advance one step to wake it)

sarsko · 2026-02-06T01:42:00Z

shuttle/src/time/constant_stepped.rs

+
+/// A constant distribution; each sample returns the same time
+#[derive(Clone, Copy, Debug, Hash, PartialEq, Eq)]
+pub struct ConstantTimeDistribution {


Wait huh why does this exist?

This was a bit speculative, but you might want to sample the time-step size from different distributions (e.g. normal, exponential, constant).

dylanjwolff force-pushed the time-models-pr branch from b74d4a5 to 95b637b Compare October 3, 2025 23:03

dylanjwolff mentioned this pull request Oct 7, 2025

Adding global config for Shuttle #218

Closed

sarsko reviewed Oct 13, 2025

View reviewed changes

shuttle/src/runtime/thread/continuation.rs Outdated Show resolved Hide resolved

sarsko reviewed Oct 13, 2025

View reviewed changes

shuttle/src/runtime/execution.rs Outdated Show resolved Hide resolved

sarsko reviewed Oct 13, 2025

View reviewed changes

dylanjwolff force-pushed the time-models-pr branch from d6d5ee7 to 2e1ffeb Compare October 15, 2025 21:45

dylanjwolff force-pushed the time-models-pr branch from 2e1ffeb to 9135038 Compare October 24, 2025 21:06

jorajeev mentioned this pull request Dec 8, 2025

Missing tokio constructs not supported by shuttle-tokio #241

Open

sarsko force-pushed the time-models-pr branch from 1d7e86c to ac99d6e Compare December 13, 2025 10:44

sarsko mentioned this pull request Jan 12, 2026

Add futurelock tests #245

Open

sarsko force-pushed the time-models-pr branch from ac99d6e to 665ff08 Compare January 13, 2026 00:01

sarsko reviewed Jan 13, 2026

View reviewed changes

shuttle/src/time/constant_stepped.rs Show resolved Hide resolved

sarsko reviewed Jan 13, 2026

View reviewed changes

shuttle/src/time/mod.rs Show resolved Hide resolved

sarsko marked this pull request as draft January 13, 2026 00:20

sarsko reviewed Jan 13, 2026

View reviewed changes

shuttle/src/time/mod.rs Show resolved Hide resolved

Adding TimeModels to Shuttle

f072dc5

sarsko force-pushed the time-models-pr branch from 665ff08 to 7bd34cb Compare January 28, 2026 20:22

Move TimeModels out of sync

0a42ef9

sarsko force-pushed the time-models-pr branch from 7bd34cb to 0a42ef9 Compare January 28, 2026 20:28

sarsko marked this pull request as ready for review January 28, 2026 20:32

Make Sleep must_use

a06a0ea

sarsko reviewed Jan 29, 2026

View reviewed changes

shuttle/src/time/mod.rs

/// Timeout a future

#[pin_project]

#[derive(Debug)]

pub struct Interval {

Copy link

Contributor

sarsko Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: add fn disable

Refactor async_sleep

a89ccbd

sarsko force-pushed the time-models-pr branch from a0807e0 to a89ccbd Compare February 3, 2026 23:08

sarsko reviewed Feb 5, 2026

View reviewed changes

sarsko reviewed Feb 6, 2026

View reviewed changes

Remove trigger_timeouts related functionality

5790385

sarsko reviewed Feb 6, 2026

View reviewed changes

sarsko added 3 commits February 5, 2026 18:38

Remove distribution sample in ConstantSteppedTimeModel

89317af

Refactor register_sleep

dde6347

Remove trigger timeouts tests

1868371

Conversation

dylanjwolff commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Primitive Representations:

Alternative Designs:

Uh oh!

sarsko commented Oct 7, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dylanjwolff Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dylanjwolff commented Oct 31, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sarsko commented Jan 13, 2026

Uh oh!

Uh oh!

sarsko commented Jan 29, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sarsko commented Feb 5, 2026

Uh oh!

Choose a reason for hiding this comment

dylanjwolff commented Oct 1, 2025 •

edited

Loading

dylanjwolff Oct 15, 2025 •

edited

Loading

dylanjwolff Feb 6, 2026 •

edited

Loading