[fix] [broker] Part-1: Replicator can not created successfully due to an orphan replicator in the previous topic owner#21946
Merged
poorbarcode merged 22 commits intoapache:masterfrom Apr 23, 2024
Conversation
4 tasks
Technoboy-
reviewed
Jan 26, 2024
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractReplicator.java
Outdated
Show resolved
Hide resolved
Technoboy-
reviewed
Jan 26, 2024
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractReplicator.java
Outdated
Show resolved
Hide resolved
Contributor
|
@poorbarcode Does this PR fix the issue mentioned in #21203 ? |
Contributor
Author
Yes, the current PR also fixed the issue that #21203 tries to fix. |
4 tasks
3eb5393 to
498ebec
Compare
Contributor
codelipenghui
left a comment
There was a problem hiding this comment.
Is it possible to add a test to cover this case?
And it looks like we can simplify the fix by adding a new method terminate() to the replicator so that we don't need to mix the closeProducer and closeReplicator logic.
05de423 to
257f163
Compare
Contributor
Author
|
Rebase master |
257f163 to
3bb81fa
Compare
a42bd91 to
5793ca1
Compare
Technoboy-
pushed a commit
to Technoboy-/pulsar
that referenced
this pull request
Apr 24, 2024
… an orphan replicator in the previous topic owner (apache#21946)
4 tasks
Contributor
Author
|
Because there are too many conflicts and there are no new releases for
|
nikhil-ctds
pushed a commit
to datastax/pulsar
that referenced
this pull request
May 13, 2024
… an orphan replicator in the previous topic owner (apache#21946) (cherry picked from commit 4924052) (cherry picked from commit 670aff0)
srinath-ctds
pushed a commit
to datastax/pulsar
that referenced
this pull request
May 16, 2024
… an orphan replicator in the previous topic owner (apache#21946) (cherry picked from commit 4924052) (cherry picked from commit 670aff0)
Merged
4 tasks
hanmz
pushed a commit
to hanmz/pulsar
that referenced
this pull request
Feb 12, 2025
… an orphan replicator in the previous topic owner (apache#21946)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
There is a race condition that makes an orphan replicator in the original owner of a topic, and causes the new owner of the topic can not start a replicator due to
org.apache.pulsar.broker.service.BrokerServiceException$NamingException Producer with name 'pulsar.repl.{local_cluster}-->{remote_cluster}' is already connected to topic.Scenario 1
Scenario 2
replication_clusters.Current PR is focusing on Scenario 1.
Steps of Scenario 1
thread start replicatorunload bundlepulsar.replclosingreplicator.disconnectreplicator.stat --> Stoppedreplicator.stat --> Startingreplicator.stat --> StartedreadMoreEntries, since there is no entries to read, just pending this requestpulsar.replProducer with name 'pulsar.repl.{local_cluster}-->{remote_cluster}' is already connected to topicModifications
Replicator.State.StoppedintoProducer_StoppedandClosed.terminateto close the Replicator.disconnectonly used to close the internal producer.A case that hit this issue
Picture-1: An orphan producer was left in

old broker, it is not associated with any topic/replicatorPicture-2: After the topic is transferred to

new broker, it can not start a new Replicator successfullySince the scenario is too complex, I can not add a test. But I reproduced the Scenario 1 locally.


#21948 fixes the following issues:
topic.unfenceTopicToResumeaftertopic.closefailed.Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: x