perf(sync-service): skip parachain runtime download on warm start by replghost · Pull Request #3214 · paritytech/smoldot

replghost · 2026-04-22T01:10:43Z

Problem

On every warm restart, smoldot downloads the full parachain runtime (~2 MiB) again, even though the same bytes were saved to the DB last session. If the peer it picks is slow, the download can pass the 16 s timeout, fall back to cold, and the user waits tens of seconds.

Fix

Save runtimeCode in databaseContent. Split the parachain bootstrap into two paths:

Warm path (warm_bootstrap): DB has cached runtime bytes. Check them against the chain, compile locally, fetch only the Aura call proofs. On any failure, return Err and let the caller try cold.
Cold path (cold_bootstrap_loop, the original logic moved into a retry loop): no cached bytes, download :code + :heappages and retry on failure.

start_parachain runs warm if cached bytes are present, otherwise cold. If warm fails, cold takes over.

How we check the cached bytes without downloading the runtime

We can't ask the peer for :code directly — substrate's prover always bundles the 2 MiB value with that proof, so we'd lose the saving.

Instead we ask for :code\0, a key that doesn't exist on chain. To prove it doesn't exist, the peer has to walk the trie down to :code's leaf. The leaf's value isn't read, so it isn't bundled. We get the leaf's blake2_256 hash in ~883 bytes, hash our cached bytes, and compare.

Benchmark — Paseo AH, parachain bootstrap step (10 warm restarts)

metric	before PR	after PR	Δ
median	1.43 s	0.38 s	−74%
max	2.62 s	0.56 s	−79%
mean	1.42 s	0.38 s	−73%
range	0.64–2.62 s	0.27–0.56 s	tighter

This is the parachain bootstrap step only, which is what the PR changes. End-to-end startup time (relay warp sync + paraheads + para bootstrap) swings ±10 s between runs because of relay-side network conditions, not anything in this PR.

Database size

Adds ~2.0–2.5 MB (base64) to the DB JSON. The existing shrink logic drops the field if the DB exceeds max_size, so callers with size limits just fall back to cold like today.

Stale cache

If the chain runs a runtime upgrade between sessions, the cached bytes won't match. The hash check fails, warm returns Err, and cold takes over. So whether warm fires in practice depends on how recent the user's DB is.

On warm restart from databaseContent, the relay chain may already be synced. fetch_parachain_head_from_relay() was waiting for a NEW Notification::Finalized event from subscribe_all(), which might not arrive for seconds (or indefinitely if the relay sync stalls). The fix: try the already-finalized block from subscribe_all immediately before waiting for new notifications. This is the block that's already available in subscription.finalized_block_scale_encoded_header. Before: parachain warm restart NEVER initialized (>5min timeout) After: parachain warm restart initializes in ~3s The runtime hint verification in bootstrap_parachain_consensus already handles reusing the cached runtime from databaseContent — it verifies the merkle value and skips the ~2MB download when it matches. Fixes #3204.

When databaseContent includes the runtime code (runtimeCode in the JSON), compile it locally instead of downloading ~2 MiB from a P2P peer. The warm path still fetches two lightweight Aura call proofs (~few KB each) to verify the cached runtime works against the current block. If compilation or verification fails, falls back to the full P2P download. Changes: - database.rs: persist code_storage_value (was intentionally discarded); decode it back as runtime_code in DatabaseContent - sync_service.rs: add saved_runtime_code to ConfigParachain - parachain.rs: add try_warm_start_from_cached_code() that compiles cached code and verifies via AuraApi call proofs; extract cold_bootstrap_loop() - lib.rs: thread saved_runtime_code from database through to ConfigParachain Tested on Paseo, Polkadot, Kusama Asset Hubs: - Paseo: warm para 1.1s vs cold 2.1s (no :code download) - Polkadot: warm para 4.1s vs cold 2.2s (call proof latency) - Kusama: warm para 5.5s vs cold 5.9s (call proof latency) - All three: runtimeCode saved to DB (2.0-2.5 MB), no download on warm Builds on #3210 (correctness fix for the warm hang).

…t fallback Database tests: - decode_database_without_runtime_code: no runtime_code field → None - decode_database_with_runtime_code_only: runtimeCode without merkle hint - decode_database_with_full_hint_populates_both: all three fields present - decode_database_invalid_base64_runtime_code_returns_error: bad input - encode_shrink_drops_runtime_code_when_too_large: size cap drops code Warm-start fallback tests: - invalid_cached_runtime_fails_compilation: garbage bytes → Err - empty_cached_runtime_fails_compilation: empty bytes → Err - wasm_without_memory_fails_gracefully: truncated WASM → Err (not panic)

The warm path was trusting saved Aura params and heap pages from the database without any network verification. This could silently use a stale runtime if it was upgraded between sessions, and compile with wrong heap pages if the chain uses custom :heappages. Fix: - Warm path now fetches :heappages + both Aura call proofs from the network (~few KB), verifying the cached code works against current state. Only the ~2 MiB :code download is skipped. - Extract shared helpers (wait_for_peer, fetch_call_proof, run_aura_calls, build_bootstrapped_parachain) used by both cold and warm paths, eliminating the code duplication. - Remove SavedParachainState struct — just pass Option<Vec<u8>> for the cached runtime code. Aura params are always verified from network. - Remove aura_slot_duration/aura_authorities from DatabaseContent and the Aura JSON parsing in decode_database. - Fix double-decode in decode_database: base64 is decoded once, shared between runtime_code_hint and runtime_code. - Remove tests that only tested HostVmPrototype::new (the WASM compiler), not the warm-start logic.

Fetch :code alongside :heappages in the warm-start storage proof and verify cached runtime bytes (or their blake2_256 hash, for state v1 value-stripped proofs) against the on-chain trie node. On mismatch the warm path fails and falls back to cold bootstrap, preventing a stale or substituted runtime from passing the Aura-output sanity check unnoticed.

…head fetch" This reverts commit a4c0549.

Debug-level log of which match arm verified cached :code and the proof byte size, to diagnose whether peers strip the value.

Request the non-existent strict descendant `:code\0` so the absence proof traverses through `:code`'s leaf without loading its 2 MiB value. For state v1 chains the leaf encoding then carries only `Hashed(blake2_256(value))`, which we already verify against the cached bytes via blake2. Falls back to byte-equality check if the peer bundles the value anyway.

Drop the "DB decode result" Warn log left over from benching, downgrade the parachain warm-start availability log to Debug, and tighten the warm_bootstrap comments. Hoist the `:code\0` probe key into a named constant `CODE_ANCHOR_PROBE_KEY` and simplify the storage-proof error messages to drop the escape-noisy key list.

replghost added 3 commits April 21, 2026 16:21

style: collapse single-line format! calls to satisfy rustfmt

d1fdfb6

replghost mentioned this pull request Apr 22, 2026

perf(sync-service): warm-start parachains from restored database state #3218

Closed

style: fix remaining rustfmt issues in lib.rs and parachain.rs

fb8ef1d

This was referenced Apr 22, 2026

perf(sync-service): parallelize Aura call proofs during parachain bootstrap #3219

Open

perf(light-base): pre-compile cached parachain runtime during DB load #3221

Closed

replghost mentioned this pull request Apr 23, 2026

perf(sync-service): pre-compile cached parachain runtime during relay chain fetch #3226

Closed

replghost mentioned this pull request Apr 23, 2026

refactor(sync-service): convert start_parachain to reactive state machine #3227

Closed

replghost requested review from bkchr and lexnv April 27, 2026 00:25

BigTava mentioned this pull request Apr 28, 2026

Parachain bootstrap waits up to 67s for relay finalization #3235

Closed

lrubasze added 6 commits April 28, 2026 17:34

Merge branch 'main' into perf/parachain-warm-skip-download

4dbf406

Revert "fix(sync-service): use initial finalized block for parachain …

3c566d2

…head fetch" This reverts commit a4c0549.

chore(sync-service): log warm-start :code anchor branch

57d7a1b

Debug-level log of which match arm verified cached :code and the proof byte size, to diagnose whether peers strip the value.

lrubasze requested a review from skunert May 5, 2026 07:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(sync-service): skip parachain runtime download on warm start#3214

perf(sync-service): skip parachain runtime download on warm start#3214
replghost wants to merge 12 commits intomainfrom
perf/parachain-warm-skip-download

replghost commented Apr 22, 2026 •

edited by lrubasze

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

replghost commented Apr 22, 2026 • edited by lrubasze Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

How we check the cached bytes without downloading the runtime

Benchmark — Paseo AH, parachain bootstrap step (10 warm restarts)

Database size

Stale cache

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

replghost commented Apr 22, 2026 •

edited by lrubasze

Loading