perf(sync-service): skip parachain runtime download on warm start#3214
Open
perf(sync-service): skip parachain runtime download on warm start#3214
Conversation
On warm restart from databaseContent, the relay chain may already be synced. fetch_parachain_head_from_relay() was waiting for a NEW Notification::Finalized event from subscribe_all(), which might not arrive for seconds (or indefinitely if the relay sync stalls). The fix: try the already-finalized block from subscribe_all immediately before waiting for new notifications. This is the block that's already available in subscription.finalized_block_scale_encoded_header. Before: parachain warm restart NEVER initialized (>5min timeout) After: parachain warm restart initializes in ~3s The runtime hint verification in bootstrap_parachain_consensus already handles reusing the cached runtime from databaseContent — it verifies the merkle value and skips the ~2MB download when it matches. Fixes #3204.
When databaseContent includes the runtime code (runtimeCode in the JSON), compile it locally instead of downloading ~2 MiB from a P2P peer. The warm path still fetches two lightweight Aura call proofs (~few KB each) to verify the cached runtime works against the current block. If compilation or verification fails, falls back to the full P2P download. Changes: - database.rs: persist code_storage_value (was intentionally discarded); decode it back as runtime_code in DatabaseContent - sync_service.rs: add saved_runtime_code to ConfigParachain - parachain.rs: add try_warm_start_from_cached_code() that compiles cached code and verifies via AuraApi call proofs; extract cold_bootstrap_loop() - lib.rs: thread saved_runtime_code from database through to ConfigParachain Tested on Paseo, Polkadot, Kusama Asset Hubs: - Paseo: warm para 1.1s vs cold 2.1s (no :code download) - Polkadot: warm para 4.1s vs cold 2.2s (call proof latency) - Kusama: warm para 5.5s vs cold 5.9s (call proof latency) - All three: runtimeCode saved to DB (2.0-2.5 MB), no download on warm Builds on #3210 (correctness fix for the warm hang).
This was referenced Apr 22, 2026
…t fallback Database tests: - decode_database_without_runtime_code: no runtime_code field → None - decode_database_with_runtime_code_only: runtimeCode without merkle hint - decode_database_with_full_hint_populates_both: all three fields present - decode_database_invalid_base64_runtime_code_returns_error: bad input - encode_shrink_drops_runtime_code_when_too_large: size cap drops code Warm-start fallback tests: - invalid_cached_runtime_fails_compilation: garbage bytes → Err - empty_cached_runtime_fails_compilation: empty bytes → Err - wasm_without_memory_fails_gracefully: truncated WASM → Err (not panic)
The warm path was trusting saved Aura params and heap pages from the database without any network verification. This could silently use a stale runtime if it was upgraded between sessions, and compile with wrong heap pages if the chain uses custom :heappages. Fix: - Warm path now fetches :heappages + both Aura call proofs from the network (~few KB), verifying the cached code works against current state. Only the ~2 MiB :code download is skipped. - Extract shared helpers (wait_for_peer, fetch_call_proof, run_aura_calls, build_bootstrapped_parachain) used by both cold and warm paths, eliminating the code duplication. - Remove SavedParachainState struct — just pass Option<Vec<u8>> for the cached runtime code. Aura params are always verified from network. - Remove aura_slot_duration/aura_authorities from DatabaseContent and the Aura JSON parsing in decode_database. - Fix double-decode in decode_database: base64 is decoded once, shared between runtime_code_hint and runtime_code. - Remove tests that only tested HostVmPrototype::new (the WASM compiler), not the warm-start logic.
Fetch :code alongside :heappages in the warm-start storage proof and verify cached runtime bytes (or their blake2_256 hash, for state v1 value-stripped proofs) against the on-chain trie node. On mismatch the warm path fails and falls back to cold bootstrap, preventing a stale or substituted runtime from passing the Aura-output sanity check unnoticed.
…head fetch" This reverts commit a4c0549.
Debug-level log of which match arm verified cached :code and the proof byte size, to diagnose whether peers strip the value.
Request the non-existent strict descendant `:code\0` so the absence proof traverses through `:code`'s leaf without loading its 2 MiB value. For state v1 chains the leaf encoding then carries only `Hashed(blake2_256(value))`, which we already verify against the cached bytes via blake2. Falls back to byte-equality check if the peer bundles the value anyway.
Drop the "DB decode result" Warn log left over from benching, downgrade the parachain warm-start availability log to Debug, and tighten the warm_bootstrap comments. Hoist the `:code\0` probe key into a named constant `CODE_ANCHOR_PROBE_KEY` and simplify the storage-proof error messages to drop the escape-noisy key list.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On every warm restart, smoldot downloads the full parachain runtime (~2 MiB) again, even though the same bytes were saved to the DB last session. If the peer it picks is slow, the download can pass the 16 s timeout, fall back to cold, and the user waits tens of seconds.
Fix
Save
runtimeCodeindatabaseContent. Split the parachain bootstrap into two paths:warm_bootstrap): DB has cached runtime bytes. Check them against the chain, compile locally, fetch only the Aura call proofs. On any failure, returnErrand let the caller try cold.cold_bootstrap_loop, the original logic moved into a retry loop): no cached bytes, download:code+:heappagesand retry on failure.start_parachainruns warm if cached bytes are present, otherwise cold. If warm fails, cold takes over.How we check the cached bytes without downloading the runtime
We can't ask the peer for
:codedirectly — substrate's prover always bundles the 2 MiB value with that proof, so we'd lose the saving.Instead we ask for
:code\0, a key that doesn't exist on chain. To prove it doesn't exist, the peer has to walk the trie down to:code's leaf. The leaf's value isn't read, so it isn't bundled. We get the leaf'sblake2_256hash in ~883 bytes, hash our cached bytes, and compare.Benchmark — Paseo AH, parachain bootstrap step (10 warm restarts)
This is the parachain bootstrap step only, which is what the PR changes. End-to-end startup time (relay warp sync + paraheads + para bootstrap) swings ±10 s between runs because of relay-side network conditions, not anything in this PR.
Database size
Adds ~2.0–2.5 MB (base64) to the DB JSON. The existing shrink logic drops the field if the DB exceeds
max_size, so callers with size limits just fall back to cold like today.Stale cache
If the chain runs a runtime upgrade between sessions, the cached bytes won't match. The hash check fails, warm returns
Err, and cold takes over. So whether warm fires in practice depends on how recent the user's DB is.