Skip to content

Snapshot validation issues#1936

Closed
cursor[bot] wants to merge 1 commit intofeat/sync-011-snapshot-syncfrom
cursor/snapshot-validation-issues-dbf4
Closed

Snapshot validation issues#1936
cursor[bot] wants to merge 1 commit intofeat/sync-011-snapshot-syncfrom
cursor/snapshot-validation-issues-dbf4

Conversation

@cursor
Copy link
Contributor

@cursor cursor bot commented Feb 11, 2026

Fixes for Snapshot Sync DoS, Safety, and Validation Issues

Description

This PR addresses several critical issues related to snapshot synchronization, focusing on improving DoS protection, ensuring protocol safety, and enhancing validation logic. The changes aim to make snapshot transfers more robust against malicious input and prevent unintended state corruption.

Specifically, this PR fixes the following issues:

  • DoS limits are set far too high (bug_id: a0ed86bb-59cd-4b1b-9501-b1e0ae2cd2e7): Reduced MAX_ENTITIES_PER_PAGE, MAX_SNAPSHOT_PAGES, and MAX_DAG_HEADS to more conservative values to strengthen resource exhaustion protections.
  • Snapshot safety check is never enforced (bug_id: 69ffd70c-b9b0-44ed-825e-ef23306f2d74): Integrated check_snapshot_safety() into the SyncManager::sync_snapshot path, with an exception for crash recovery, to prevent accidental state overwrites on initialized nodes.
  • Verification errors are collapsed into boundary error (bug_id: aa7163d5-4be9-4f81-aa7d-cb6bb3ff04ca): Introduced specific SnapshotError variants for EntityCountMismatch and MissingPages and updated SnapshotVerifyResult::to_error() to map to these distinct errors, improving error specificity for better recovery decisions.
  • Page counters accept impossible states (bug_id: ref1_a93be70d-ce41-4539-8087-1bba1c6014ca): Added a check in SnapshotPage::is_valid() to ensure sent_count does not exceed page_count.
  • Entity page validation misses ordering invariants (bug_id: ref1_b8ec3886-0ff6-4fa6-a48d-20f261d49c49): Added checks in SnapshotEntityPage::is_valid() to validate page_number < total_pages and is_last coherence with page_number.
  • Compressed payload size is never validated (bug_id: ref2_a4dbd62a-886c-4513-9a9f-d7fed81b6857): Introduced MAX_COMPRESSED_PAYLOAD_SIZE and added a check in SnapshotPage::is_valid() to bound the payload length, preventing excessive allocation during decompression.

Test plan

The changes were verified by:

  • Running cargo check on the affected crates (crates/node/primitives, crates/node).
  • Running cargo test for crates/node/primitives, including new unit tests added to cover the updated validation logic for SnapshotPage::is_valid() and SnapshotEntityPage::is_valid(), as well as the new error handling paths in SnapshotVerifyResult::to_error().
  • Running cargo fmt --check and cargo clippy to ensure code style and best practices.
    No changes to the user interface were made. Existing end-to-end tests should continue to pass.

Documentation update

No public or internal documentation updates are required as these changes are internal protocol and validation improvements.


…issues

- Reduce DoS limits to reasonable values:
  - MAX_ENTITIES_PER_PAGE: 10,000 -> 1,000
  - MAX_SNAPSHOT_PAGES: 1,000,000 -> 10,000 (~2.5GB at 256KB/page)
  - MAX_DAG_HEADS: 1,000 -> 100
  - Add MAX_COMPRESSED_PAYLOAD_SIZE (8MB) for payload validation

- Add SnapshotPage::is_valid() checks:
  - Validate sent_count <= page_count (impossible state prevention)
  - Validate compressed payload size against MAX_COMPRESSED_PAYLOAD_SIZE

- Add SnapshotEntityPage::is_valid() ordering invariants:
  - Check page_number < total_pages
  - Check is_last coherence with page_number

- Fix SnapshotVerifyResult::to_error() to preserve error types:
  - Add SnapshotError::EntityCountMismatch variant
  - Add SnapshotError::MissingPages variant
  - Map verification results to specific errors instead of InvalidBoundary

- Wire check_snapshot_safety() into runtime:
  - Call in request_snapshot_sync() before applying snapshot
  - Allow crash recovery (sync-in-progress marker) to bypass check

- Add comprehensive tests for new validation logic
@cursor
Copy link
Contributor Author

cursor bot commented Feb 11, 2026

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

@github-actions
Copy link

Your PR title does not adhere to the Conventional Commits convention:

<type>(<scope>): <subject>

Common errors to avoid:

  1. The title must be in lower case.
  2. Allowed type values are: build, ci, docs, feat, fix, perf, refactor, test.

Copy link

@meroreviewer meroreviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Reviewer

Reviewed by 1 agents | Quality score: 33% | Review time: 164.0s

💡 2 suggestions, 📝 2 nitpicks. See inline comments.


🤖 Generated by AI Code Reviewer | Review ID: review-fa3c24f6

) -> Result<SnapshotSyncResult> {
info!(%context_id, %peer_id, "Starting snapshot sync");

// Check Invariant I5: Snapshot sync should only be used for fresh nodes
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Consider logging crash recovery detection

When crash recovery is detected (sync-in-progress marker present), emitting a log or metric would aid operational visibility for debugging sync issues.

Suggested fix:

Add `debug!(%context_id, "Skipping safety check due to crash recovery");` when `is_crash_recovery` is true.

@@ -360,6 +369,16 @@ impl SnapshotEntityPage {
return false;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Consider validating total_pages > 0 when page exists

When total_pages == 0, both new checks are skipped; a page with zero total pages is semantically invalid and may warrant explicit rejection.

Suggested fix:

Add `if self.total_pages == 0 { return false; }` or document why zero is acceptable.

@@ -1138,6 +1206,26 @@ mod tests {
sent_count: 10,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Nit: Duplicate test case for sent_count > page_count

The invalid_sent test duplicates the existing too_many test above (both use page_count=5, sent_count=10); consider removing the redundant case or testing a different invalid combination.

Suggested fix:

Remove `invalid_sent` test or use different values to test a distinct edge case.

page_count: 5,
sent_count: 10,
};
assert!(!invalid_sent.is_valid());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Nit: Large allocation in test may slow CI

Allocating 8MB+ in oversized_payload test is correct but could slow test runs; consider a smaller constant or documenting the necessity.

@xilosada
Copy link
Member

Changes incorporated into #1933

@xilosada xilosada closed this Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants