Skip to content

Bookie ledger replication error lead to increased disk usage #1908

@codelipenghui

Description

@codelipenghui

BUG REPORT

Problem happens after reboot bookie(Bare metal), i can't remember a little bit about the detailed steps at the time.

Found that disk usage continued growth:
image

Then check the bookie log, found some ledger replicate error log:

00:02:15.246 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L426399 E4139-E4139, Sent to [172.30.92.15:3181, 172.30.92.16:3181], Heard from [] : bitset = {}. First unread entry is -1
00:02:15.246 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.client.LedgerFragmentReplicator - BK error reading ledger entry: 4139
org.apache.bookkeeper.client.BKException$BKNoSuchEntryException: No such entry
	at org.apache.bookkeeper.client.BKException.create(BKException.java:84) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerFragmentReplicator$3.readComplete(LedgerFragmentReplicator.java:305) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerHandle$6.onFailure(LedgerHandle.java:797) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:38) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:26) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
00:02:15.246 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.proto.BookkeeperInternalCallbacks - Error in multi callback : -13
00:02:15.259 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L426399 E4140-E4140, Sent to [172.30.92.15:3181, 172.30.92.16:3181], Heard from [] : bitset = {}. First unread entry is -1
00:02:15.259 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.client.LedgerFragmentReplicator - BK error reading ledger entry: 4140
org.apache.bookkeeper.client.BKException$BKNoSuchEntryException: No such entry
	at org.apache.bookkeeper.client.BKException.create(BKException.java:84) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerFragmentReplicator$3.readComplete(LedgerFragmentReplicator.java:305) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerHandle$6.onFailure(LedgerHandle.java:797) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:38) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:26) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
00:02:15.259 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.proto.BookkeeperInternalCallbacks - Error in multi callback : -13
00:02:15.262 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L426399 E4141-E4141, Sent to [172.30.92.15:3181, 172.30.92.16:3181], Heard from [] : bitset = {}. First unread entry is -1
00:02:15.262 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.client.LedgerFragmentReplicator - BK error reading ledger entry: 4141
org.apache.bookkeeper.client.BKException$BKNoSuchEntryException: No such entry
	at org.apache.bookkeeper.client.BKException.create(BKException.java:84) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerFragmentReplicator$3.readComplete(LedgerFragmentReplicator.java:305) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerHandle$6.onFailure(LedgerHandle.java:797) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:38) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:26) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
00:02:15.262 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.proto.BookkeeperInternalCallbacks - Error in multi callback : -13
00:02:15.269 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L426399 E4142-E4142, Sent to [172.30.92.15:3181, 172.30.92.16:3181], Heard from [] : bitset = {}. First unread entry is -1
00:02:15.269 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.client.LedgerFragmentReplicator - BK error reading ledger entry: 4142
org.apache.bookkeeper.client.BKException$BKNoSuchEntryException: No such entry
	at org.apache.bookkeeper.client.BKException.create(BKException.java:84) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerFragmentReplicator$3.readComplete(LedgerFragmentReplicator.java:305) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerHandle$6.onFailure(LedgerHandle.java:797) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:38) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:26) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
00:02:15.269 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.proto.BookkeeperInternalCallbacks - Error in multi callback : -13
00:02:15.275 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L426399 E4143-E4143, Sent to [172.30.92.15:3181, 172.30.92.16:3181], Heard from [] : bitset = {}. First unread entry is -1
00:02:15.275 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.client.LedgerFragmentReplicator - BK error reading ledger entry: 4143
org.apache.bookkeeper.client.BKException$BKNoSuchEntryException: No such entry
	at org.apache.bookkeeper.client.BKException.create(BKException.java:84) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerFragmentReplicator$3.readComplete(LedgerFragmentReplicator.java:305) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerHandle$6.onFailure(LedgerHandle.java:797) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:38) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:26) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
00:02:15.276 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.proto.BookkeeperInternalCallbacks - Error in multi callback : -13
00:02:15.280 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L426399 E4144-E4144, Sent to [172.30.92.15:3181, 172.30.92.16:3181], Heard from [] : bitset = {}. First unread entry is -1
00:02:15.280 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.client.LedgerFragmentReplicator - BK error reading ledger entry: 4144
org.apache.bookkeeper.client.BKException$BKNoSuchEntryException: No such entry
	at org.apache.bookkeeper.client.BKException.create(BKException.java:84) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerFragmentReplicator$3.readComplete(LedgerFragmentReplicator.java:305) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerHandle$6.onFailure(LedgerHandle.java:797) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:38) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:26) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
00:02:15.280 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.proto.BookkeeperInternalCallbacks - Error in multi callback : -13
00:02:15.280 [BookKeeperClientWorker-OrderedExecutor-15-0] ERROR org.apache.bookkeeper.client.LedgerFragmentReplicator - BK error replicating ledger fragments for ledger: 426399
org.apache.bookkeeper.client.BKException$BKLedgerRecoveryException: Error while recovering ledger
	at org.apache.bookkeeper.client.BKException.create(BKException.java:78) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerFragmentReplicator$SingleFragmentCallback.processResult(LedgerFragmentReplicator.java:357) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerFragmentReplicator$1.processResult(LedgerFragmentReplicator.java:184) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.proto.BookkeeperInternalCallbacks$MultiCallback.doCallback(BookkeeperInternalCallbacks.java:244) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.proto.BookkeeperInternalCallbacks$MultiCallback.callback(BookkeeperInternalCallbacks.java:236) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.proto.BookkeeperInternalCallbacks$MultiCallback.tick(BookkeeperInternalCallbacks.java:218) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.proto.BookkeeperInternalCallbacks$MultiCallback.processResult(BookkeeperInternalCallbacks.java:254) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerFragmentReplicator$3.readComplete(LedgerFragmentReplicator.java:306) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.LedgerHandle$6.onFailure(LedgerHandle.java:797) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:38) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:26) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
00:02:15.280 [ReplicationWorker] WARN  org.apache.bookkeeper.replication.ReplicationWorker - BKLedgerRecoveryException while replicating the fragment
org.apache.bookkeeper.client.BKException$BKLedgerRecoveryException: Error while recovering ledger
	at org.apache.bookkeeper.client.BKException.create(BKException.java:78) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.BookKeeperAdmin.replicateLedgerFragment(BookKeeperAdmin.java:1058) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.client.BookKeeperAdmin.replicateLedgerFragment(BookKeeperAdmin.java:1036) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.replication.ReplicationWorker.rereplicate(ReplicationWorker.java:254) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.replication.ReplicationWorker.rereplicate(ReplicationWorker.java:207) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.bookkeeper.replication.ReplicationWorker.run(ReplicationWorker.java:168) [org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]

This won't always happen, I also restarted bookie before.
I don't know why lost these four entries[E4139 - E4144] in bookie and ledger.
And this is dangerous that ledger replicate so many times.

After we disable autorecovery and find the topic use this ledger, then delete the topic,
entry log was successfully cleaned up

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions