Skip to content

Conversation

@rockneurotiko
Copy link

If the channel disconnects (phx_close or phx_error), the channel process does not close, and the socket process don't remove it from the state, so the socket thinks that it's subscribed but it's not and it won't rejoin.

I saw this behaviour doing some tests, when the connection to the server was closed, the socket disconnects, and when reconnecting it cleans it's internal channel states, but the old channels are not closed, leading to a leak in the channel processes.

With this, when receiving a phx_close or phx_error the channel process just stops itself, because it won't process again that subscription. Also, because the socket is monitoring this process, it will receive the :DOWN message and remove it from it's state.

I changed two minor things too:

  • When leaving the channel, I moved the stop to the handle_call, in this case everytime that you receive a leave message, even if they don't use the public method, it will stop the process.
  • On the tests, I removed the capture_log when starting the endpoint (And all the code needed for it to work), I tried with both capture_log and without it and it didn't changed anything, but maybe I'm missing something 🤔

@rockneurotiko
Copy link
Author

As we found out on #26, the phoenix.js library do automatic rejoins on the channels, and probably is the best idea to emulate how it works, it's a nice feature and would make the library much easier to work.

I changed this PR to do the rejoin with backoff, and for that I had to change how the join works. It doesn't make sense to have rejoins but the first join is synchronous, because you can try to join at first that doesn't work, but eventually it would work. As a reference, the js library works like this too.

I documented all the changes on the CHANGELOG.md, but basically the changes are:

  • join asynchronous (changed the return value)
  • join accept options for the backoff timeouts (this could accept more options in the future)
  • Moved the responsability to send the join/leave message to Channel instead of Socket, so we can manage the rejoin. The ability to have only one topic process still exists, because the socket keeps track of the channels, and it's the first thing added/checked when initializing the channel.
  • The channels do automatic rejoins

@mobileoverlord @paulstatezny let me know what do you think about this changes, maybe I went too far with the breaking changes, but as I said, I think they are needed for the rejoin to make sense on the library.

There are future work to do of course, this is a list of some things that came to my mind and I can work on if you find it useful on the library:

  • Save pushes in a queue when the channel is not joined, and send them once joined
  • Send socket & channel status in a new message struct (StatusMessage ?).
  • When doing a push_async and join, return the message ref, so the clients can match on the messages received.
  • The Socket notifies all the channels about it's connection/disconnection, so the channels have more information about it and can do work faster (like rejoins)
  • Handle timeout when push and there are no response on the expected timeout. Right now the call crashes, but it would be nice to return a timeout

@paulstatezny
Copy link
Contributor

Thanks @rockneurotiko. I need to review the changes later when I have time, but the core of my perspective is:

It's preferable that this client work identically to the JavaScript client, as much as reasonable.

I like that this PR seems to be moving in that direction.

@mobileoverlord
Copy link
Owner

Thanks for your work on this. I think that the channel should be responsible for its own retry and back off, and so I agree with the spirit of your PR. I’ll need a bit of time to review it. I’ll test it out over the coming days and follow up.

@rockneurotiko
Copy link
Author

Ey @mobileoverlord do you think you will be able to review it anytime soon? (Taking in account the current global situation I know it's not the best moment)

In our system, we create/close many websocket connections, and with the current release the Channel processes just keep there without being closed, and it creates a real problem after a while with many dangling processes. With this at least that's not happening, and you have channel reconnects, which is a plus. We've been using it for over 3 months on production now, but I would love to go back using the official release instead of my fork.

@mobileoverlord
Copy link
Owner

This PR is way too much to review. There are parts of it that I like (like the addition of backoff for socket connections) and there are parts that I feel try to make the library work in a way that mimics the javascript one. Any way you could break this up?

@rockneurotiko
Copy link
Author

Sure, sorry for adding that many things, I can try to break it up.

I can do the following PRs:

  • Fix for leaking channel processes, I think the most important one
  • Channel join asynchronous instead of synchronous
  • Backoff retries for channel or phx_error. This needs the previous step, I can do them in the same PR or in two PRs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants