refactor(sync-service): convert start_parachain to reactive state machine by replghost · Pull Request #3227 · paritytech/smoldot

replghost · 2026-04-22T22:23:36Z

Summary

Converts start_parachain from a blocking linear function (Phase 1 → Phase 2 → Phase 3) to a reactive state machine where Phase 2 (runtime bootstrap) runs concurrently inside the main event loop
The main sync loop now starts immediately after Phase 1 (relay chain head fetch), so foreground messages, network events, and peer management are responsive while the runtime is still being downloaded and compiled
When bootstrap completes, AllSync is rebuilt with Aura consensus using the current finalized header (which may have advanced via paraheads) and all tracked peers are migrated

Changes

Removed the blocking Phase 2 retry loop from start_parachain; replaced with a bootstrap_future polled as an event source in the main loop
Added BootstrapComplete and BootstrapRetryReady variants to WakeUpReason with full handlers for success, failure with retry, and AllSync rebuild with peer migration
Changed bootstrap_parachain_consensus to accept SCALE-encoded header bytes instead of &ValidChainInformation, decoupling it from the AllSync state
Changed BootstrappedParachain to carry just Aura parameters (authorities + slot_duration) rather than a full chain_info, since the finalized header comes from the current AllSync at rebuild time
Suppressed Warn-level header verification logs during bootstrap (expected since consensus is Unknown)
Added bootstrap_future, bootstrap_retry_sleep, and block_number_bytes fields to Task

Testing

cargo fmt --check, cargo clippy, cargo test -p smoldot-light all pass
Needs end-to-end verification with smolbench against a live parachain

…hine The main sync loop now starts immediately after fetching the parachain head from the relay chain (Phase 1). Runtime bootstrap (Phase 2) runs concurrently as a future inside the event loop rather than blocking before it. This means foreground messages (SubscribeAll, IsNearHeadOfChainHeuristic, etc.), network events, and peer management are all responsive while the runtime is being downloaded and compiled. When bootstrap completes, AllSync is rebuilt with Aura consensus using the current finalized header (not the stale one from bootstrap start), and all tracked peers are re-added. Closes #3222

replghost · 2026-04-22T22:32:20Z

smolbench verification

All parachains initialize correctly with the reactive state machine. Tested cold start, warm start (DB write), and warm start (DB read):

Milestone	Cold	Warm (run 1)	Warm (run 2)
Relay initialized	4.3s	5.8s	4.2s
Bulletin initialized	6.8s	8.6s	8.1s
Asset Hub initialized	7.8s	10.1s	9.2s
People initialized	13.2s	9.0s	9.0s
All runtimes resolved	yes	yes	yes

The storage read timeouts (bulletin_storage_read, asset_hub_dotns_resolve) are unrelated — they require syncing enough blocks to serve storage queries and timeout at 72s on cold start. This is expected baseline behavior.

@bkchr would appreciate your review on this one.

…d, peer passing 1. Add AllSync::set_finalized_consensus() (through NonFinalizedTree → AllForksSync → AllSync) to update consensus in-place. No more AllSync rebuild/peer drain/request abort dance. 2. Guard against stale Aura authorities: if finalization advanced past the bootstrap block, discard the result and re-bootstrap from the current finalized header. 3. Remove redundant network subscription from bootstrap_parachain_consensus. The main loop passes a peer via oneshot channel instead.

Extract the bootstrap future construction (oneshot channel + peer feeding + bootstrap_parachain_consensus call) into Task::start_bootstrap(). Three identical copies collapse to one. Remove the re-bootstrap-on-drift guard. Aura authority sets change at session boundaries (hours apart), not every few blocks. If the finalized block advances a few blocks during bootstrap the authorities are still valid. If they're wrong, verify_header catches it immediately.

replghost · 2026-04-23T17:49:50Z

Closing — the reactive state machine doesn't improve cold or warm start times. Bootstrap wall clock is identical (same :code download, WASM compile, Aura proofs either way). The responsiveness benefit during the 5-15s bootstrap window is marginal since dApps can't serve useful data without the runtime anyway.

The real wins are in #3225 (25-57% cold start via try_join3), #3214 (35-60% warm start via cached runtime + network verification), and the rest of the stack (#3213, #3210, #3200).

May revisit if foreground latency during bootstrap becomes a concrete user complaint.

replghost requested a review from bkchr April 22, 2026 22:23

replghost mentioned this pull request Apr 22, 2026

refactor(sync-service): convert start_parachain from blocking linear function to reactive state machine #3222

Open

replghost added 2 commits April 22, 2026 20:08

replghost closed this Apr 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(sync-service): convert start_parachain to reactive state machine#3227

refactor(sync-service): convert start_parachain to reactive state machine#3227
replghost wants to merge 3 commits intomainfrom
worktree-refactor-start-parachain-reactive

replghost commented Apr 22, 2026

Uh oh!

replghost commented Apr 22, 2026

Uh oh!

replghost commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

replghost commented Apr 22, 2026

Summary

Changes

Testing

Uh oh!

replghost commented Apr 22, 2026

smolbench verification

Uh oh!

replghost commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant