-
Notifications
You must be signed in to change notification settings - Fork 4
Description
I observed some very interesting behavior while using this crate in a few services of mine. I've observed that running multiple tasks on the tokio runtime along with a nitox client will cause the other tasks to never be executed, or receive very minimal execution, while nitox is running.
Of course, first I suspected that it was my code, but I reviewed it thoroughly and did not find anything obvious which would block the runtime. After reviewing the networking code in this crate, it looks like parking_lot::RwLock is used as a fairly fundamental building block for the nats client and the connection multiplexer. This is what is causing the blocking.
The parking_lot crate does provide smaller and more efficient locking mechanisms than the std::sync primitives, but they still block the calling thread when attempting to acquire read/write access. The documentation explicitly states this. Take write for example: RwLock's type def, which uses lock_api::RwLock under the hood. Docs state:
Locks this RwLock with exclusive write access, blocking the current thread until it can be acquired. [...]
The blocking locks are used primarily in the client, net/connection & net/connection_inner.
what do we do about this?
First, I think that this is not a good thing. We definitely do not want nitox to block the event loop, because then it essentially defeats the purpose of using an event loop. Pretty sure we can all agree with that.
As a path forward, I would propose that we use a non-blocking message passing based approach. A pattern which I have used quite successfully in the past. Details below.
message passing instead of locks
- the public nats client type will continue to present roughly the same interface.
- when the nats client is instantiated/connected, the nats client will spawn a new private task onto the runtime which owns the connection. Just for reference here, we can call that spawned task the
daemon. - when the daemon is first spawned, it will be given a futures mpsc receiver. This mpsc channel will communicate oneshot channels of
Result<NatsClient, NatsError>or the like. - the public nats client will send oneshot receivers over the channel to request a clone of the nats connection which the daemon is managing.
- because the daemon owns the connection, the memory does not need to be shared and therefore no locks are needed. This applies to the multiplexer as well. The daemon's communication channel (or channels) will be able to receive commands to update the multiplexer for new subscriptions &c.
- we can also setup the public nats client in such a way that when it is dropped, it will issue a command to the daemon which will cause it to shutdown and clean up its resources. This will resolve Clients never disconnect, even after manual drop. #22 & How to Disconnect Client #6 (which is not actually resolved).