-
Notifications
You must be signed in to change notification settings - Fork 8
Fix: Resend preflight when local agent joins to prevent message blocking #441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds transport support to regenerate and resend preflight messages to connected peers when local agent info changes, exposes a helper to generate per-peer preflight, updates DefaultTransport to hold a handler reference, integrates resend on local agent join, and adds tests simulating slow preflight race conditions. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
The following will be added to the changelog [0.4.0-dev.3] - 2026-01-21Bug Fixes
Testing
|
Deploying kitsune2 with
|
| Latest commit: |
2722783
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://fdc945e2.kitsune2.pages.dev |
| Branch Preview URL: | https://fix-blocking-due-to-prefligh.kitsune2.pages.dev |
|
✔️ 2906ee8...2722783 - Conventional commits check succeeded. |
ThetaSinner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't read the code but looking at the title, that logic doesn't sound right. The intention of the pre-flight is to decide whether or not to establish a connection. Putting peer information into the pre-flight was a Holochain optimization and not supposed to be part of the K2 logic. We certainly can't rely on every host to do that and it's not really reasonable to re-check whether the connection should have been established.
When operating with a single space:
- The first local agent won't be discoverable until they have published their peer info, so it's impossible that they aren't joined to the space.
- Before a local agent has joined, it's possible to fetch peer info from the bootstrap server and start contacting other peers - I think that was part of this investigation, whether K2 will do that in any cases.
With multiple spaces:
- The same logic applies for the first space.
- For subsequent spaces, there's no preflight on join where a connection is already in place. There is a design for a "hello" module service which is part of the "access" module implementation. That's the solution to that problem but currently not implemented.
|
@ThetaSinner, in #417 you wrote:
The 2nd test in here shows that this is possible. I've analyized the Holochain code for this and it can totally happen. There are several ways to go about it. Changing the default like I did in #417 would be one. I've also looked into fixing this race in Holochain, but I'm not sure that would be possible without architectural changes. (moving the peer-store into K2?).
I'm not sure I get what your saying here. In short: because of race in Holochain, pre-flight could be sent without any agent info since adding the local agent didn't finish yet. Problem when at the same time the default for connections without agents is blocked - and stays blocked until another pre-flight comes in. What this does is re-sending the pre-flight, after it got successfully added locally. We could also close and reinit the connection, but that seems more like a problem. Or was the simple default change in #417 the right way of fixing this afterall? |
|
I was confused by the race condition description in the PR, but I think I tracked down a case where it would arise:
I this case I think Bob does already have an access decision for Alice, but isn't checking it. Instead he is checking for an access decision for a blank peer url. I think a more direct solution would be to not send empty preflights, and instead close the connection. And maybe also on the receiving end as well to validate the size of preflight messages and close the connection if invalid. |
This adds two new tests with two agents initiating a connection, reproducing these scenarios that potentially lead to blocked messages:
First test passed before changes, invalidating that hypothesis.
Second test failed initially, but passes with the changes in this PR.
Problem
Messages were being incorrectly blocked at the beginning of sessions between agents. This was traced to a race condition where a connection could be established and preflight exchanged before a local agent joins the space.
Race-Condition in Holochain around space joining
If a connection is established between T1 and T4, the preflight will have an empty agent list.
Consequence in Kitsune
When a connection is established before
local_agent_join()is called:This matches the production issue described in PR #417.
The race in Holochain could be improved, though I regard this fix here in Kitsune2 as a more robust solution to the problem.
Investigation Findings
Transport Behavior
Both
tx5andirohtransports exhibit this behavior because:What Doesn't Cause the Issue
MemPeerStorelistener is synchronous, so access decisions are computed beforeinsert()returnsConfirmed with added test in this changeset.
Solution
Added a new method
resend_preflight_to_connected_peers()to theTransporttrait that is called when a local agent joins. This ensures remote peers receive updated agent information even if the connection was established before the agent joined.Notes
local_agent_join()may still be blocked (unavoidable since there's no agent info yet)local_agent_join()will not be blocked because preflight is resenttx5andirohtransportsRelated Issues
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.