[Improvement] trim redundant read when scanning entry log.#4161
[Improvement] trim redundant read when scanning entry log.#4161thetumbled wants to merge 13 commits intoapache:masterfrom
Conversation
|
Could you help to reviw this PR? thanks! |
bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/storage/EntryLogScanner.java
Outdated
Show resolved
Hide resolved
bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/DefaultEntryLogger.java
Outdated
Show resolved
Hide resolved
eolivelli
left a comment
There was a problem hiding this comment.
very good, I left one final comment about the "switch" statement
| return; | ||
| } | ||
| scanner.process(ledgerId, offset, data); | ||
| break; |
There was a problem hiding this comment.
please add the "default" case
hangc0276
left a comment
There was a problem hiding this comment.
I'm unsure of the extent of the improvements that can be achieved with this change, given the following concerns:
- All the changes in this PR are specific to uncommon cases, occurring only when the entrylog file's index is missing.
- The readFromLogChannel function utilizes BufferedLogChannel, which is a RandomAccessFile channel. When reading data from the file channel, the data is pre-fetched into the OS PageCache, with a default pre-fetch size of 4KB. Even though we only retrieve the entryId, the actual data read from the disk is a minimum of 4KB.
There is a potential risk associated with this PR:
- We solely read the entryID without validating the entry data. If the file is corrupted, the scan operation will be unable to detect it and will incorrectly populate the index with the wrong ledger size. This introduces a high level of risk.
We have encountered cases that the ledger map in entry log is missed, maybe because the bookie crashed before flushing entry log. As long as the corrupted entry log file exists, bookie will scan the entry log by
It is common that the size of one entry reach 4MB ( we set the max batch size of Pulsar Client to be 4MB), so we can decrease 99.9% of the disk read with this enhancement.
The fault detecting logic is not changed, in the old logic we just read |
Descriptions of the changes in this PR:
Remove the redundant disk read when scannig the entry log file, which could reduce the pressure of disk a lot.
Motivation
We have

org.apache.bookkeeper.bookie.storage.EntryLogScannerto process the entry data when scanning the entry log file, in whichprocessmethod is used to process the entry data.We have a lot of implementations for
EntryLogScannerfor various reason.But some of them do not need to access all data in the entry log file.
For example:
extractEntryLogMetadataByScanningonly need to know theentrySizefield, in such case we do not need to read any data in the entry.org.apache.bookkeeper.bookie.InterleavedStorageRegenerateIndexOp#initiateonly need to know theentryIdfield, in such case we only need to read the first 16 bytes(ledgerId+entryId).So, we do not need to read the entry data when scanning the entry log in some cases.
Changes
Add method
getLengthToReadinEntryLogScannerto indicate how much data do we need to read, and read this mount of data only.