Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions lib/src/chain/async_tree.rs
Original file line number Diff line number Diff line change
Expand Up @@ -505,8 +505,13 @@ where
/// Panics if the [`AsyncOpId`] is invalid.
///
pub fn async_op_failure(&mut self, async_op_id: AsyncOpId, now: &TNow) {
let new_timeout = now.clone() + self.retry_after_failed;
let retry_after = now.clone() + self.retry_after_failed;
self.async_op_failure_retry_at(async_op_id, &retry_after);
}

/// Similar to [`AsyncTree::async_op_failure`], but retries at the given time
/// instead of `now + retry_after_failed`.
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

async_op_failure_retry_at is a new public API but its docs omit the same # Panic contract as async_op_failure (it will also panic if AsyncOpId is invalid due to the internal unwrap()). Please document the panic conditions (and any expectations such as whether retry_after may be in the past) to keep the API contract consistent.

Suggested change
/// instead of `now + retry_after_failed`.
/// instead of `now + retry_after_failed`.
///
/// `retry_after` may be in the past, in which case the operation can become immediately
/// necessary again.
///
/// # Panic
///
/// Panics if the [`AsyncOpId`] is invalid.

Copilot uses AI. Check for mistakes.
pub fn async_op_failure_retry_at(&mut self, async_op_id: AsyncOpId, retry_after: &TNow) {
// Update the blocks that were performing this operation.
// The blocks are iterated from child to parent, so that we can check, for each node,
// whether its parent has the same asynchronous operation id.
Expand All @@ -523,11 +528,11 @@ where
AsyncOpState::InProgress {
async_op_id: id,
timeout: Some(ref timeout),
} if id == async_op_id => Some(cmp::min(timeout.clone(), new_timeout.clone())),
} if id == async_op_id => Some(cmp::min(timeout.clone(), retry_after.clone())),
AsyncOpState::InProgress {
async_op_id: id,
timeout: None,
} if id == async_op_id => Some(new_timeout.clone()),
} if id == async_op_id => Some(retry_after.clone()),
_ => continue,
};

Expand Down
37 changes: 32 additions & 5 deletions light-base/src/runtime_service.rs
Original file line number Diff line number Diff line change
Expand Up @@ -831,6 +831,7 @@ async fn run_background<TPlat: PlatformRef>(
blocks_stream: None,
runtime_downloads: stream::FuturesUnordered::new(),
progress_runtime_call_requests: stream::FuturesUnordered::new(),
no_peers_retry_count: 0,
}
};

Expand Down Expand Up @@ -2766,6 +2767,7 @@ async fn run_background<TPlat: PlatformRef>(
};

// Insert the runtime into the tree.
background.no_peers_retry_count = 0;
match &mut background.tree {
Tree::FinalizedBlockRuntimeKnown { tree, .. } => {
tree.async_op_finished(async_op_id, runtime);
Expand Down Expand Up @@ -2810,12 +2812,26 @@ async fn run_background<TPlat: PlatformRef>(
);
}

match &mut background.tree {
Tree::FinalizedBlockRuntimeKnown { tree, .. } => {
tree.async_op_failure(async_op_id, &background.platform.now());
if error.is_no_peers() && background.no_peers_retry_count < 3 {
let delay_ms = 200u64 << background.no_peers_retry_count;
background.no_peers_retry_count += 1;
let retry_at = background.platform.now() + Duration::from_millis(delay_ms);
match &mut background.tree {
Tree::FinalizedBlockRuntimeKnown { tree, .. } => {
tree.async_op_failure_retry_at(async_op_id, &retry_at);
}
Tree::FinalizedBlockRuntimeUnknown { tree, .. } => {
tree.async_op_failure_retry_at(async_op_id, &retry_at);
}
}
Tree::FinalizedBlockRuntimeUnknown { tree, .. } => {
tree.async_op_failure(async_op_id, &background.platform.now());
} else {
match &mut background.tree {
Tree::FinalizedBlockRuntimeKnown { tree, .. } => {
tree.async_op_failure(async_op_id, &background.platform.now());
}
Tree::FinalizedBlockRuntimeUnknown { tree, .. } => {
tree.async_op_failure(async_op_id, &background.platform.now());
}
}
}
}
Expand All @@ -2832,6 +2848,13 @@ enum RuntimeDownloadError {
}

impl RuntimeDownloadError {
fn is_no_peers(&self) -> bool {
match self {
RuntimeDownloadError::StorageQuery(err) => err.is_no_peers(),
RuntimeDownloadError::InvalidHeader(_) => false,
}
}

/// Returns `true` if this is caused by networking issues, as opposed to a consensus-related
/// issue.
fn is_network_problem(&self) -> bool {
Expand Down Expand Up @@ -2893,6 +2916,10 @@ struct Background<TPlat: PlatformRef> {
/// Stream of notifications coming from the sync service. `None` if not subscribed yet.
blocks_stream: Option<Pin<Box<dyn Stream<Item = sync_service::Notification> + Send>>>,

/// Number of consecutive runtime download failures due to no peers being available.
/// Used for exponential backoff (200ms, 400ms, 800ms) before falling to the normal cooldown.
no_peers_retry_count: u32,

/// List of runtimes currently being downloaded from the network.
/// For each item, the download id, storage value of `:code`, storage value of `:heappages`,
/// and Merkle value and closest ancestor of `:code`.
Expand Down
5 changes: 5 additions & 0 deletions light-base/src/sync_service.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1040,6 +1040,11 @@ pub struct StorageQueryError {
}

impl StorageQueryError {
/// Returns `true` if no peers were available to query.
pub fn is_no_peers(&self) -> bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a "no peers" path for parahead fetch?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, and it doesn't need one. fetch_parachain_head_from_relay (after #3210) waits on relay chain subscribe_all notifications, not directly on peers. The relay chain runtime service handles peer connectivity — the parachain fetch blocks on relay chain events, so there's no peer-level retry to optimize there.

self.errors.is_empty()
}

/// Returns `true` if this is caused by networking issues, as opposed to a consensus-related
/// issue.
pub fn is_network_problem(&self) -> bool {
Expand Down
Loading