Skip to content

Conversation

pschyska
Copy link

@pschyska pschyska commented Oct 2, 2025

Proposed changes

As mentioned in #110 my POC for async via nginx-notify.
This would enable schedule() to be called from other threads, e.g. async-compat or other "sidecar-runtime" setups. It also makes sure epoll is interrupted when there are IO completion notifications coming in from outside of the event loop, leading to prompt continuation.
While this doesn't provide a native hyper/client as @bavshin-f5 wanted, it makes the default tokio implementation work via Compat. This would be a viable stopgap solution for us. I've added some examples, including hyper and reqwest.

There are a bunch of minor issues that I'd need some guidance on:

  • not sure if the MAIN_PID approach is the best one to determine on_event_thread(), but I didn't find a better one yet
  • not compatible with no_std right now: OnceLock (might be replaceable by something from spin) and crossbeam-channel
  • I didn't feel comfortable exploring "batching", i.e. making the channel larger or unbounded. Particularly to decide (from another thread) if nginx_notify needs to be called or not. notify_handler could toggle an atomic when entering and exiting, and if schedule() sees it set it could just send to the channel without calling ngx_notify. But it might not be a big issue leaving it like it is: Yes, the driver threads might block briefly, but the important thing is to not block the main thread. If the event loop is lagging external events, buffering wouldn't make it run faster, anyways.

Checklist

Before creating a PR, run through this checklist and mark each as complete.

  • I have written my commit messages in the Conventional Commits format.
  • I have read the CONTRIBUTING doc
  • I have added tests (when possible) that prove my fix is effective or that my feature works
  • I have checked that all unit tests pass after adding my changes
  • I have updated necessary documentation
  • I have rebased my branch onto main
  • I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

There was a theoretical deadlock possible when an external thread would have filled the
channel while a task on the event loop tries to schedule, potentially blocking the main
thread.

Additionally, when multiple requests schedule in parallel before MAIN_TID is
initialized, they could deadlock (because they have to go via nginx_notify — although
they come from main thread, we didn't capture MAIN_TID yet). Making the channel
unbounded prevents that. After MAIN_TID is initialized, schedule will always .run(), not
schedule.
@pschyska
Copy link
Author

pschyska commented Oct 2, 2025

There was a theoretical deadlock with re-entrancy optimization (scheduled_while_running), which would indirect via the queue on re-entrancy. This means a potentially blocking queue send which could fail if some external thread pushed a task in parallel. I think always calling .run() when on the main thread, and send() otherwise is clearer anyways.

Plus an actual one on a freshly started server that hasn't filled its MAIN_TID yet and receives many requests in parallel (via drill) - before MAIN_TID is initialized, schedules have to go via the queue, but being the initial tasks, they would have to be sent from the main thread. This would block it and deadlock nginx. Therefore, the channel is now unbounded. If we can solve the MAIN_TID issue differently, the channel can be bounded(1), which is probably a bit more efficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant