MM2: fail-fast on truncation + auto-recover on topic reset (MirrorSourceTask) #20515
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR enhances MirrorMaker 2 (MM2) with fault-tolerance capabilities to address critical data loss scenarios in cross-cluster replication setups.
Problem Statement
Vanilla MM2 has two critical gaps:
Solution
Added fault-tolerance enhancements to
MirrorSourceTask
:Fail-Fast Truncation Detection
OffsetOutOfRangeException
during consumer pollingConnectException
to fail-fast and alert operators immediatelymirrorsource.fail.on.truncation=true
(default)Graceful Topic Reset Handling
AdminClient
to track topic IDs and detect delete/recreate eventsUnknownTopicOrPartitionException
with retry logicmirrorsource.auto.recover.on.reset=true
(default)Technical Details
connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorSourceTask.java
mm2.fault.tolerance
for easy filteringTesting
Impact