fix: coalesce ledger WAL flushes and extend to non-authority nodes#1218
Open
skylar-simoncelli wants to merge 3 commits intomainfrom
Open
fix: coalesce ledger WAL flushes and extend to non-authority nodes#1218skylar-simoncelli wants to merge 3 commits intomainfrom
skylar-simoncelli wants to merge 3 commits intomainfrom
Conversation
Three improvements to the ledger parity-db WAL flush task: 1. Add flush coalescing via AtomicBool — if a flush is already in progress when the next block notification arrives, skip rather than spawning another spawn_blocking task. Prevents unbounded task queue buildup when flush duration exceeds the 6-second block interval. 2. Extend WAL flushing to non-authority nodes (RPCs, bootnodes, bridges). These nodes never author blocks so BlockOrigin::Own never matches, leaving their WAL to grow until the 64 MB threshold causes a synchronous stall. Non-authority nodes now flush every 50 imported blocks. 3. Move the flush task outside the `if role.is_authority()` block so it runs for all node types.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Changes
1. Flush coalescing (prevents task queue buildup)
The current code in
feat/ledger_enact_parity_db_logsspawns a newspawn_blockingtask for everyBlockOrigin::Ownnotification. If a flush takes longer than the 6-second block interval, tasks accumulate without bound.Added an
AtomicBoolflag (flush_in_progress) — if a flush is already running when the next notification arrives, we skip rather than spawn another task. This guarantees at most one flush is running at any time.2. Non-authority node coverage
The current code only flushes on
BlockOrigin::Own, which means non-authority nodes (RPCs, bootnodes, bridges, semi-trusted RPCs) never trigger a flush. Their ledger WAL grows until the 64 MB threshold causes a synchronous stall, which can cause:Non-authority nodes now flush every 50 imported blocks — frequent enough to prevent WAL buildup, infrequent enough to avoid excessive I/O.
3. Task moved outside authority gate
The flush task is no longer inside
if role.is_authority(), so it runs on all node types.Relationship to other PRs
feat/ledger_enact_parity_db_logs(ledger WAL flush on block import)Together these two fixes address the full parity-db WAL problem:
feat/ledger_enact_parity_db_logs+ this PR → prevents runtime WAL stalls on both authority and non-authority nodesContext
Discovered during investigation of chain-state truncation after unclean shutdown (#1140). While testing on guardnet, we measured every node having ~9,000-10,000 blocks of metadata sitting only in the WAL at any given time, confirming the WAL accumulation problem.
📌 Submission Checklist
🧪 Testing Evidence
Logic-only change to an unreleased feature. The coalescing and interval flush are safe additive behaviors on top of the existing flush mechanism.
🔱 Fork Strategy