async: thread-safe schedule() #218

pschyska · 2025-10-02T21:31:58Z

Proposed changes

As mentioned in #110, my work on making Scheduler.schedule() thread-safe.

This would enable schedule() to be called from other threads, e.g. async-compat or other "sidecar-runtime" setups. It also makes sure epoll is interrupted when there are IO completion notifications coming in from outside of the event loop, leading to prompt continuation.

While this doesn't provide a native hyper/client as @bavshin-f5 wanted, it makes the default tokio implementation work via Compat. This would be a viable stopgap solution for us. I've added some examples, including hyper and reqwest. In the future, one could implement a "sidecar-runtime" approach as in async-compat natively that would use a separate epoll loop in a thread, or inject additional fds from the Rust side to nginx's epoll instance (if possible).

Some notes:

requires ngx_thread_tid to be present.
not compatible with no_std right now: OnceLock (might be replaceable by something from spin) and crossbeam-channel, and probably more. I've added std as a dependency for async to reflect that (this would be a breaking change, but async Rust probably implies std anyways).

Checklist

Before creating a PR, run through this checklist and mark each as complete.

I have written my commit messages in the Conventional Commits format.
I have read the CONTRIBUTING doc
I have added tests (when possible) that prove my fix is effective or that my feature works (don't think it's possible)
I have checked that all unit tests pass after adding my changes
I have updated necessary documentation
I have rebased my branch onto main
I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

bavshin-f5 · 2025-11-06T04:44:31Z

ngx_notify is "thread-safe" under a very narrow set of conditions. One of those is that nobody outside of the nginx internal code is allowed to call it.
Check ngx_epoll_module.c:769 and consider what would happen if multiple modules will start invoking ngx_notify() with different handler methods.
I don't want to allow mixing internal and external async runtimes or encourage use of threads. Both seem to be fragile and dangerous.
I don't even believe you need to mix both runtimes: if you intend to use tokio, just run all the asynchronous code in the tokio task.
This change would break or make significantly slower any IO implementation that properly integrates with the nginx event loop (such as hyper client in nginx-acme).

pschyska · 2025-11-07T11:35:04Z

ngx_notify is "thread-safe" under a very narrow set of conditions. One of those is that nobody outside of the nginx internal code is allowed to call it.
Check ngx_epoll_module.c:769 and consider what would happen if multiple modules will start invoking ngx_notify() with different handler methods.

I see it now.

I don't want to allow mixing internal and external async runtimes or encourage use of threads. Both seem to be fragile and dangerous.

If it's guaranteed that all tasks run on the main thread, I don't think it's dangerous. This change only allows scheduling from other threads. It's not uncommon that libraries start their own helper threads, for instance. async-compat starts a transparent tokio runtime in a thread for IO completion handlers, while still using our executor for the tasks.

I also can image situations where you'd want to start non-IO compute in a thread pool to not block nginx - in our case, for example, crypto. You'd want to be able to notify the request handler async task of completion by writing to a channel or a similar mechanism. This, in turn, would call the waker from that thread (AFAIK), which calls schedule for the task from that thread, but the woken task would be scheduled to run on the main thread via ngx_notify.

I don't even believe you need to mix both runtimes: if you intend to use tokio, just run all the asynchronous code in the tokio task.

We need to work with the request heavily (mutate headers_in and headers_out, read client bodies, produce response bodies) in response to I/O (external requests, database queries, custom crypto/tunneling), which can only be done on the main thread safely. If all our code is running in a completely separate engine, it all becomes extremely hard. In addition, we need a way to interrupt nginx' epoll reacting I/O events, which aren't all bound to a request (OpenID shared signals, e.g.).
async-compat seemed like a good compromise to me: use the tokio "runtime" (I/O setup,...) , but with the ngx-rust scheduler/executor.

This change would break or make significantly slower any IO implementation that properly integrates with the nginx event loop (such as hyper client in nginx-acme).

I don't think it would do that. If the waker is invoked from the main thread, schedule in my branch would simply .run() the runnable, and everything stays on the main thread. ngx_notify would not be called (except once during the lifetime of a worker process because it's not known which tid is main). I have to admit I didn't test with nginx-acme yet though.

To recap, I'd still like the following:

A way to interrupt epoll
A way to move tasks to the main thread
Safe to call schedule from other threads

Given ngx_epoll_module.c:769, ngx_notify from other threads is indeed inherently unsafe.

However, what if we do this:

ngx_post_event a custom event, its handler being notify_handler
write(notify_fd, &inc, sizeof(uint64_t)) to interrupt epoll. The event loop would then find our custom event promptly.

Would this work for you?

schedule() can now be called from any thread, but will move tasks to the event loop thread. pthread_kill(main_thread, SIGIO) is used to ensure prompt reponse if needed. This enables receiving I/O notification from "sidecar runtimes" like async-compat, for instance. The async example has been rewritten to use async_::spawn, demonstrating usage of reqwest and hyper clients wrapped in Compat to provide a tokio runtime environment while using the async_ Scheduler as executor.

pschyska · 2025-11-07T17:04:37Z

@bavshin-f5 I've rewritten the code to not rely on ngx_notify. Instead, I'm using ngx_post_event, followed by pthread_kill(main_thread, SIGIO) as I had a hard time getting the notify_fd from within ngx-rust. Does that address your concern?
schedule still can be called from other threads, e.g. from a waker, and moves the tasks to the main thread. The SIGIO ensures prompt reaction.

bavshin-f5 · 2025-11-07T18:39:23Z

If it's guaranteed that all tasks run on the main thread, I don't think it's dangerous. This change only allows scheduling from other threads. It's not uncommon that libraries start their own helper threads, for instance. async-compat starts a transparent tokio runtime in a thread for IO completion handlers, while still using our executor for the tasks.

Ah. I got why you assume that this is safe. I don't believe it is, and I expect that some of your code is quietly being scheduled on a tokio executor in another thread. async-compat is not the kind of magic that can override tokio scheduling, it merely allows creating and polling certain tokio types outside of the runtime-owned thread.
I also suspect that tokio is not quite prepared for deallocation of seemingly exclusively owned objects from a thread outside of the runtime.

The only approach I would consider safe is where nothing owned by a request or a cycle pool is allowed to move to another runtime, either accidentally or intentionally. Many things we do are lacking such protection because we assume single-threaded environment.

bavshin-f5 · 2025-11-07T18:50:28Z

src/async_/spawn.rs

+                event.log = ngx_cycle_log().as_ptr();
+
+                unsafe {
+                    ngx_post_event(&mut *event, ptr::addr_of_mut!(ngx_posted_events));


Posting to ngx_posted_events can easily lead to an infinite loop. If the current task is already running from a posted event handler, no IO could happen before the next wakeup.

If the current task is running on the event thread, there is actually no need to post the event, as the handler is still currently reading from the channel here, necessarily. Therefore we can just skip it like the SIGIO

I tried to remove the ngx_post_event when on event thread, but the example started deadlocking. I'm not sure why, but it seems we can't skip it. (but we should still skip the SIGIO). Can you elaborate on the deadlock that you suspect can happen now? Doesn't ngx_post_event just add it to the queue? When we are on the event thread, we know nginx is currently spinning, so the posted event should just get picked up the next turn.
Why is "no IO could happen before the next wakeup" relevant here?

bavshin-f5 · 2025-11-07T19:06:48Z

src/async_/spawn.rs

+/// Initialize async by storing MAIN_THREAD
+pub fn initialize_async() {
+    MAIN_THREAD
+        .set(unsafe { pthread_self() })


You call this from the master process, but POSIX does not specify if thread ID remains the same after fork().
It's better to initialize this in spawn, because spawn is the entry point of async runtime and it supposed to be called from a worker process.

Raw pthread use is also non-portable, we have that one platform without pthread.h that we pretend to support. ngx_thread_tid presence depends on the nginx build options, and Rust's std::thread::ThreadId is very expensive to obtain.

Hmm, I'm running async_initialize in init_process. The dev guide reads:

The master process creates one or more worker processes and the init_process handler is called in each of them.

(emphasis mine)

I read this as: "called once per worker", and I think I'm seeing that happening right now. Am I mistaken?

Fair point on pthreads, what would you recommend?

I'd have used nginx_thread_tid (potentially requiring the corresponding build options for the "async" feature), and had it working for just the "on event thread" detection, but then I don't have anything to pass to pthread_kill...

Would a normal kill(getpid(), SIGIO) be ok, too? It did seem to work fine when I tested it a few weeks back while working on the initial version, and it would actually enable me to remove the required init from init_process again.

On init during spawn - I considered it, but I wasn't sure I can rely on it happening on the event thread. Couldn't the user have set up a ngx_thread_task, and call the first spawn in its handler, shooting themselves into the foot?

pschyska · 2025-11-07T21:32:55Z

If it's guaranteed that all tasks run on the main thread, I don't think it's dangerous. This change only allows scheduling from other threads. It's not uncommon that libraries start their own helper threads, for instance. async-compat starts a transparent tokio runtime in a thread for IO completion handlers, while still using our executor for the tasks.

Ah. I got why you assume that this is safe. I don't believe it is, and I expect that some of your code is quietly being scheduled on a tokio executor in another thread. async-compat is not the kind of magic that can override tokio scheduling, it merely allows creating and polling certain tokio types outside of the runtime-owned thread. I also suspect that tokio is not quite prepared for deallocation of seemingly exclusively owned objects from a thread outside of the runtime.
The only approach I would consider safe is where nothing owned by a request or a cycle pool is allowed to move to another runtime, either accidentally or intentionally. Many things we do are lacking such protection because we assume single-threaded environment.

I don't claim to fully understand it, but they state:

"Otherwise, a new single-threaded runtime will be created on demand. That does not mean the future is polled by the tokio runtime ."

The tokio runtime could spawn their own tasks into that runtime, sure. e.g some kind of helper task. But I don't see how my task could end up there. If my tasks Runnable.schedule() arrange it to be scheduled on the event thread, which is precisely what my PR does, it will run just there.

I'm not an expert, but I think what happens is this:

async_::spawn(my_task)
event handler starts running it (part1) until await:
- reqwest.get(...).await
  - reqwest.get(...) is polled -> Pending, waker is set to my_task(part2).schedule()
    - tokio runtime thread things happen, ..., eventually waker is called (from that thread! which is why I want schedule() to work from other threads)
    - my_task(part2).schedule() is our Scheduler.schedule(), will post an event and push the Runnable to the queue
    - my_task(part2) runs on the main thread

This is what I see right now, using the code from the PR. This is also what I'd expect to happen with a "sidecar"-tokio-runtime that I started myself (no async-compat).

…d for oet

pschyska · 2025-11-07T22:36:54Z

I just pushed an experiment with a sidecar tokio runtime and added tid debug logging here: https://github.com/pschyska/ngx-rust/blob/a5ff1bb0cc3e6d5bb15f46e24348a1d2fa694f18/examples/async.rs#L115
What I see is this:

2025/11/07 23:27:02 [debug] 494044#494044: async: spawning new task
!!! schedule tid=494044
!!! run eager tid=494044
!!! async entry, tid=494044
!!! external task entry, tid=494047
!!! schedule tid=494047
!!! run handler tid=494044
!!! async resume, tid=494044, result=42
!!! schedule tid=494046
!!! run handler tid=494044
!!! after await tid=494044

This supports my theory: my task is never moved to the tokio runtime. It calls schedule from its own threads though - when using tokio::spawn from the thread of the runtime (494047), when awaiting tokio::time::sleep directly, from the sleep-thread, presumably. However, code in my task always runs in the event thread.

I've also pushed a change to main to switch to kill and nginx_thread_tid. It works fine also.

pschyska · 2025-11-08T20:58:56Z

If it's guaranteed that all tasks run on the main thread, I don't think it's dangerous. This change only allows scheduling from other threads. It's not uncommon that libraries start their own helper threads, for instance. async-compat starts a transparent tokio runtime in a thread for IO completion handlers, while still using our executor for the tasks.

Ah. I got why you assume that this is safe. I don't believe it is, and I expect that some of your code is quietly being scheduled on a tokio executor in another thread. async-compat is not the kind of magic that can override tokio scheduling, it merely allows creating and polling certain tokio types outside of the runtime-owned thread. I also suspect that tokio is not quite prepared for deallocation of seemingly exclusively owned objects from a thread outside of the runtime.

The only approach I would consider safe is where nothing owned by a request or a cycle pool is allowed to move to another runtime, either accidentally or intentionally. Many things we do are lacking such protection because we assume single-threaded environment.

I just had another idea that helped me visualize this:

If Futures !Send could move executors at will, it would be able for them to end up in an executor that requires Send (and/or Sync).

E.g.: if the "part-2" future of my task, after awaiting a future from a tokio runtime, would magically run in a tokio executor using threads somehow, it would have to be Send. But If I used e.g. async_task::spawn_local, it could be just 'static. The compiler would not compile that code. (of course, crucial parts of an executor are unsafe, but this would still make this behaviour wildly illegal in Rust).

I don't know of any method of making a task move executors. If wanted to connect futures of different executors beyond their output for some reason (e.g. to be able to cancel the other task), I would use a remote_handle. But AFAIK this doesn't change the Context (which ties back to schedule() and task), but establishes an oneshot between the tasks.

We could use spawn_local instead of spawn_unchecked (which would store Rust's thread id and check that it is the same on .run()), but this is unnecessary overhead in this case, it simply can't happen. The example code I wrote which leads to waking from other threads all the time still runs fine with spawn_unchecked.

Another angle on this - the spawn_unchecked docs state:

Safety

If Fut is not 'static, borrowed non-metadata variables must outlive its Runnable. and: If schedule is not 'static, borrowed variables must outlive all instances of the Runnable's Waker.
✅ doesn't apply: we require 'static for the Future and Scheduler is 'static (current and my PR)
If Fut is not [Send], its [Runnable] must be used and dropped on the original thread
✅ run() is only called on the event thread (current and my PR), which is what "used and dropped" implies, I believe, according to the language used in the introduction.
If schedule is not Send and Sync, all instances of the Runnable's Waker must be used and dropped on the original thread.
currently: ❗schedule is claimed to be Send + Sync, but it is not. It must not be called from another thread (and by extension Wakers, that call Runnable.schedule()). The fact that I'm even able to do it (e.g. accidentally by using async-compat, or manually by polling myself and calling Wakers, etc. ...) indicates an issue. Currently though, the Runnable will be .run() on an arbitrary thread. As there is no way to communicate that requirement in the type system, a runtime check would have been required (e.g.: spawn_local).
PR: ✅ (IMHO :-) schedule is Send + Sync. The event is only mutated to update the log to the current ngx_cycle_log, and it is guarded with the RwLock. If not for that fact, the event could be `static actually (we only use it to communicate the static callback address) and there would be no need for the "unsafe impl"'s.

I think I have now fully convinced myself, let me know if this helps to convince you as well 🙂

…ionally, e.g. in detached tasks)

pschyska · 2025-11-17T11:21:18Z

@bavshin-f5 Did you have a chance to take a look?
Cheers

…end,ync}

pschyska force-pushed the main branch 2 times, most recently from e327e07 to e1c9191 Compare October 6, 2025 13:06

pschyska changed the title ~~RFC: thread-safe spawn with ngx_notify~~ thread-safe spawn with ngx_notify Oct 6, 2025

pschyska force-pushed the main branch 5 times, most recently from 53f60c3 to 4a95650 Compare October 8, 2025 13:25

pschyska force-pushed the main branch from 4a95650 to 7a736f8 Compare November 7, 2025 16:56

pschyska force-pushed the main branch from 7a736f8 to 35f97ff Compare November 7, 2025 16:59

bavshin-f5 reviewed Nov 7, 2025

View reviewed changes

pschyska changed the title ~~thread-safe spawn with ngx_notify~~ async: thread-safe schedule() Nov 7, 2025

Don't use pthread API, kill(pid, SIGIO) instead and use ngx_thread_ti…

6a0dbf4

…d for oet

ngx_posted_events → ngx_posted_next_events (stops getting stuck occas…

1e89804

…ionally, e.g. in detached tasks)

pschyska force-pushed the main branch from 9884c8f to 168e2e1 Compare November 19, 2025 12:23

Lock-free design by creating fresh events; no need for unsafe impl S{…

f7fdbeb

…end,ync}

pschyska force-pushed the main branch from 168e2e1 to f7fdbeb Compare November 19, 2025 12:25

async: thread-safe schedule() #218

Are you sure you want to change the base?

async: thread-safe schedule() #218

Conversation

pschyska commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Uh oh!

bavshin-f5 commented Nov 6, 2025

Uh oh!

pschyska commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pschyska commented Nov 7, 2025

Uh oh!

bavshin-f5 commented Nov 7, 2025

Uh oh!

bavshin-f5 Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

pschyska Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

pschyska Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

bavshin-f5 Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

pschyska Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

pschyska commented Nov 7, 2025

Uh oh!

pschyska commented Nov 7, 2025

Uh oh!

pschyska commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Safety

Uh oh!

pschyska commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pschyska commented Oct 2, 2025 •

edited

Loading

pschyska commented Nov 7, 2025 •

edited

Loading

pschyska commented Nov 8, 2025 •

edited

Loading