Skip to content

refactor(sync-service): convert start_parachain to reactive state machine#3227

Closed
replghost wants to merge 3 commits intomainfrom
worktree-refactor-start-parachain-reactive
Closed

refactor(sync-service): convert start_parachain to reactive state machine#3227
replghost wants to merge 3 commits intomainfrom
worktree-refactor-start-parachain-reactive

Conversation

@replghost
Copy link
Copy Markdown
Contributor

Summary

  • Converts start_parachain from a blocking linear function (Phase 1 → Phase 2 → Phase 3) to a reactive state machine where Phase 2 (runtime bootstrap) runs concurrently inside the main event loop
  • The main sync loop now starts immediately after Phase 1 (relay chain head fetch), so foreground messages, network events, and peer management are responsive while the runtime is still being downloaded and compiled
  • When bootstrap completes, AllSync is rebuilt with Aura consensus using the current finalized header (which may have advanced via paraheads) and all tracked peers are migrated

Changes

  • Removed the blocking Phase 2 retry loop from start_parachain; replaced with a bootstrap_future polled as an event source in the main loop
  • Added BootstrapComplete and BootstrapRetryReady variants to WakeUpReason with full handlers for success, failure with retry, and AllSync rebuild with peer migration
  • Changed bootstrap_parachain_consensus to accept SCALE-encoded header bytes instead of &ValidChainInformation, decoupling it from the AllSync state
  • Changed BootstrappedParachain to carry just Aura parameters (authorities + slot_duration) rather than a full chain_info, since the finalized header comes from the current AllSync at rebuild time
  • Suppressed Warn-level header verification logs during bootstrap (expected since consensus is Unknown)
  • Added bootstrap_future, bootstrap_retry_sleep, and block_number_bytes fields to Task

Testing

  • cargo fmt --check, cargo clippy, cargo test -p smoldot-light all pass
  • Needs end-to-end verification with smolbench against a live parachain

Closes #3222

…hine

The main sync loop now starts immediately after fetching the parachain
head from the relay chain (Phase 1). Runtime bootstrap (Phase 2) runs
concurrently as a future inside the event loop rather than blocking
before it.

This means foreground messages (SubscribeAll, IsNearHeadOfChainHeuristic,
etc.), network events, and peer management are all responsive while the
runtime is being downloaded and compiled.

When bootstrap completes, AllSync is rebuilt with Aura consensus using
the current finalized header (not the stale one from bootstrap start),
and all tracked peers are re-added.

Closes #3222
@replghost
Copy link
Copy Markdown
Contributor Author

smolbench verification

All parachains initialize correctly with the reactive state machine. Tested cold start, warm start (DB write), and warm start (DB read):

Milestone Cold Warm (run 1) Warm (run 2)
Relay initialized 4.3s 5.8s 4.2s
Bulletin initialized 6.8s 8.6s 8.1s
Asset Hub initialized 7.8s 10.1s 9.2s
People initialized 13.2s 9.0s 9.0s
All runtimes resolved yes yes yes

The storage read timeouts (bulletin_storage_read, asset_hub_dotns_resolve) are unrelated — they require syncing enough blocks to serve storage queries and timeout at 72s on cold start. This is expected baseline behavior.

@bkchr would appreciate your review on this one.

…d, peer passing

1. Add AllSync::set_finalized_consensus() (through NonFinalizedTree →
   AllForksSync → AllSync) to update consensus in-place. No more
   AllSync rebuild/peer drain/request abort dance.

2. Guard against stale Aura authorities: if finalization advanced past
   the bootstrap block, discard the result and re-bootstrap from the
   current finalized header.

3. Remove redundant network subscription from bootstrap_parachain_consensus.
   The main loop passes a peer via oneshot channel instead.
Extract the bootstrap future construction (oneshot channel + peer
feeding + bootstrap_parachain_consensus call) into Task::start_bootstrap().
Three identical copies collapse to one.

Remove the re-bootstrap-on-drift guard. Aura authority sets change at
session boundaries (hours apart), not every few blocks. If the finalized
block advances a few blocks during bootstrap the authorities are still
valid. If they're wrong, verify_header catches it immediately.
@replghost
Copy link
Copy Markdown
Contributor Author

Closing — the reactive state machine doesn't improve cold or warm start times. Bootstrap wall clock is identical (same :code download, WASM compile, Aura proofs either way). The responsiveness benefit during the 5-15s bootstrap window is marginal since dApps can't serve useful data without the runtime anyway.

The real wins are in #3225 (25-57% cold start via try_join3), #3214 (35-60% warm start via cached runtime + network verification), and the rest of the stack (#3213, #3210, #3200).

May revisit if foreground latency during bootstrap becomes a concrete user complaint.

@replghost replghost closed this Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor(sync-service): convert start_parachain from blocking linear function to reactive state machine

1 participant