Skip to content

#2974: Add hash-based journal selection for improved ledger distribution#4709

Open
jprieto-temporal wants to merge 3 commits intoapache:masterfrom
jprieto-temporal:jprieto/wxyxqoplswvq
Open

#2974: Add hash-based journal selection for improved ledger distribution#4709
jprieto-temporal wants to merge 3 commits intoapache:masterfrom
jprieto-temporal:jprieto/wxyxqoplswvq

Conversation

@jprieto-temporal
Copy link

Main Issue: #2974

Motivation

When multiple journals are configured, BookKeeper selects a journal using ledgerId % numJournals. This works well when
ledger IDs are random or sequential, but produces poor distribution when ledger IDs follow strided patterns (e.g., when
multiple clients allocate IDs from disjoint ranges). Uneven journal utilization can cause hotspots and reduce write
throughput.

Changes

  • Add journalHashBasedSelection configuration option (default: false)
  • When enabled, use Fibonacci hashing on ledger IDs before journal selection
  • The hash function multiplies by the golden ratio constant and folds high bits into low bits to break predictable
    patterns
  • Add integration test verifying read/write correctness with hash-based selection

return journals.get(MathUtils.signSafeMod(ledgerId, journals.size()));
long index = ledgerId;
if (journalHashBasedSelection) {
index = ledgerId * FIBONACCI_HASH_CONSTANT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about putting this block into a static method so that we can add unit tests ?

The function gets the index and the number of journals and computes the journal index

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review. I added the tests and factored out the static method.

@dlg99
Copy link
Contributor

dlg99 commented Feb 3, 2026

there are other places where ledgerid is used for routing (e.g. ordered thread executor).
Related change: #3023
Maybe it is worth refactoring the selection? Also the PR has a test you can try to reuse

@jprieto-temporal
Copy link
Author

@dlg99 not opposed to factoring it out and using the same hashing that is used in this PR since it provides better distribution in the general case, but I think that could be done in a separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants