refactor(sync-service): convert start_parachain to reactive state machine#3227
refactor(sync-service): convert start_parachain to reactive state machine#3227
Conversation
…hine The main sync loop now starts immediately after fetching the parachain head from the relay chain (Phase 1). Runtime bootstrap (Phase 2) runs concurrently as a future inside the event loop rather than blocking before it. This means foreground messages (SubscribeAll, IsNearHeadOfChainHeuristic, etc.), network events, and peer management are all responsive while the runtime is being downloaded and compiled. When bootstrap completes, AllSync is rebuilt with Aura consensus using the current finalized header (not the stale one from bootstrap start), and all tracked peers are re-added. Closes #3222
smolbench verificationAll parachains initialize correctly with the reactive state machine. Tested cold start, warm start (DB write), and warm start (DB read):
The storage read timeouts (bulletin_storage_read, asset_hub_dotns_resolve) are unrelated — they require syncing enough blocks to serve storage queries and timeout at 72s on cold start. This is expected baseline behavior. @bkchr would appreciate your review on this one. |
…d, peer passing 1. Add AllSync::set_finalized_consensus() (through NonFinalizedTree → AllForksSync → AllSync) to update consensus in-place. No more AllSync rebuild/peer drain/request abort dance. 2. Guard against stale Aura authorities: if finalization advanced past the bootstrap block, discard the result and re-bootstrap from the current finalized header. 3. Remove redundant network subscription from bootstrap_parachain_consensus. The main loop passes a peer via oneshot channel instead.
Extract the bootstrap future construction (oneshot channel + peer feeding + bootstrap_parachain_consensus call) into Task::start_bootstrap(). Three identical copies collapse to one. Remove the re-bootstrap-on-drift guard. Aura authority sets change at session boundaries (hours apart), not every few blocks. If the finalized block advances a few blocks during bootstrap the authorities are still valid. If they're wrong, verify_header catches it immediately.
|
Closing — the reactive state machine doesn't improve cold or warm start times. Bootstrap wall clock is identical (same :code download, WASM compile, Aura proofs either way). The responsiveness benefit during the 5-15s bootstrap window is marginal since dApps can't serve useful data without the runtime anyway. The real wins are in #3225 (25-57% cold start via try_join3), #3214 (35-60% warm start via cached runtime + network verification), and the rest of the stack (#3213, #3210, #3200). May revisit if foreground latency during bootstrap becomes a concrete user complaint. |
Summary
start_parachainfrom a blocking linear function (Phase 1 → Phase 2 → Phase 3) to a reactive state machine where Phase 2 (runtime bootstrap) runs concurrently inside the main event loopChanges
start_parachain; replaced with abootstrap_futurepolled as an event source in the main loopBootstrapCompleteandBootstrapRetryReadyvariants toWakeUpReasonwith full handlers for success, failure with retry, and AllSync rebuild with peer migrationbootstrap_parachain_consensusto accept SCALE-encoded header bytes instead of&ValidChainInformation, decoupling it from the AllSync stateBootstrappedParachainto carry just Aura parameters (authorities + slot_duration) rather than a full chain_info, since the finalized header comes from the current AllSync at rebuild timeWarn-level header verification logs during bootstrap (expected since consensus isUnknown)bootstrap_future,bootstrap_retry_sleep, andblock_number_bytesfields toTaskTesting
cargo fmt --check,cargo clippy,cargo test -p smoldot-lightall passCloses #3222