-
Notifications
You must be signed in to change notification settings - Fork 243
Fix a race condition in metaclient leader election #3058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4a93ea7 to
f756e5e
Compare
junkaixue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please give the a comprehensive description on PR for what to fix.
...t/src/main/java/org/apache/helix/metaclient/recipes/leaderelection/LeaderElectionClient.java
Outdated
Show resolved
Hide resolved
f756e5e to
72802ec
Compare
72802ec to
bd39805
Compare
...t/src/main/java/org/apache/helix/metaclient/recipes/leaderelection/LeaderElectionClient.java
Outdated
Show resolved
Hide resolved
47d4f57 to
eb4a28f
Compare
| private void subscribeAndTryCreateLeaderEntry(String leaderPath) { | ||
| _metaClient.subscribeDataChange(leaderPath + LEADER_ENTRY_KEY, _reElectListener, false); | ||
| _leaderGroups.add(leaderPath + LEADER_ENTRY_KEY); | ||
| registerAllListeners(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @xyuanlu thanks for this!
I have a question here.
what happens if listener re-subscription succeeds but participant node creation fails? What i mean is, currently, the flow would re-subscribe all listeners successfully, then if it fail to create the participant node (due to network issues, parent path missing, whatever), but still proceed to attempt leader node creation and touch the leader node. could this result in a participant becoming leader without being a valid member of the participant pool, leading to an invalid leader state that other participants don't recognize as legitimate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In handleConnectStateChanged, where we recreate participant node, re-subscribe listeners and touch leader node. I think if recreate participant node failed in line 425 and if it is not nodeAlreadyExist error, it would through exception out and exit before register listeners and touch leader node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, Thanks!
|
This change is ready to be merged. Approved by @junkaixue |
Issues
idle node in Meta client leader election #3059
Description
Fix a race condition in metaclient leader election.
When the client gets an expired event, it need to re-register all user registered listener and its own re-election listeners before rectreate the leader node.
(Write a concise description including what, why, how)
Tests
(List the names of added unit/integration tests)
(If CI test fails due to known issue, please specify the issue and test PR locally. Then copy & paste the result of "mvn test" to here.)
Changes that Break Backward Compatibility (Optional)
(Consider including all behavior changes for public methods or API. Also include these changes in merge description so that other developers are aware of these changes. This allows them to make relevant code changes in feature branches accounting for the new method/API behavior.)
Documentation (Optional)
(Link the GitHub wiki you added)
Commits
Code Quality
(helix-style-intellij.xml if IntelliJ IDE is used)