Snapshot validation issues#1936
Snapshot validation issues#1936cursor[bot] wants to merge 1 commit intofeat/sync-011-snapshot-syncfrom
Conversation
…issues - Reduce DoS limits to reasonable values: - MAX_ENTITIES_PER_PAGE: 10,000 -> 1,000 - MAX_SNAPSHOT_PAGES: 1,000,000 -> 10,000 (~2.5GB at 256KB/page) - MAX_DAG_HEADS: 1,000 -> 100 - Add MAX_COMPRESSED_PAYLOAD_SIZE (8MB) for payload validation - Add SnapshotPage::is_valid() checks: - Validate sent_count <= page_count (impossible state prevention) - Validate compressed payload size against MAX_COMPRESSED_PAYLOAD_SIZE - Add SnapshotEntityPage::is_valid() ordering invariants: - Check page_number < total_pages - Check is_last coherence with page_number - Fix SnapshotVerifyResult::to_error() to preserve error types: - Add SnapshotError::EntityCountMismatch variant - Add SnapshotError::MissingPages variant - Map verification results to specific errors instead of InvalidBoundary - Wire check_snapshot_safety() into runtime: - Call in request_snapshot_sync() before applying snapshot - Allow crash recovery (sync-in-progress marker) to bypass check - Add comprehensive tests for new validation logic
|
Cursor Agent can help with this pull request. Just |
|
Your PR title does not adhere to the Conventional Commits convention: Common errors to avoid:
|
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 1 agents | Quality score: 33% | Review time: 164.0s
💡 2 suggestions, 📝 2 nitpicks. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-fa3c24f6
| ) -> Result<SnapshotSyncResult> { | ||
| info!(%context_id, %peer_id, "Starting snapshot sync"); | ||
|
|
||
| // Check Invariant I5: Snapshot sync should only be used for fresh nodes |
There was a problem hiding this comment.
💡 Consider logging crash recovery detection
When crash recovery is detected (sync-in-progress marker present), emitting a log or metric would aid operational visibility for debugging sync issues.
Suggested fix:
Add `debug!(%context_id, "Skipping safety check due to crash recovery");` when `is_crash_recovery` is true.
| @@ -360,6 +369,16 @@ impl SnapshotEntityPage { | |||
| return false; | |||
There was a problem hiding this comment.
💡 Consider validating total_pages > 0 when page exists
When total_pages == 0, both new checks are skipped; a page with zero total pages is semantically invalid and may warrant explicit rejection.
Suggested fix:
Add `if self.total_pages == 0 { return false; }` or document why zero is acceptable.
| @@ -1138,6 +1206,26 @@ mod tests { | |||
| sent_count: 10, | |||
There was a problem hiding this comment.
📝 Nit: Duplicate test case for sent_count > page_count
The invalid_sent test duplicates the existing too_many test above (both use page_count=5, sent_count=10); consider removing the redundant case or testing a different invalid combination.
Suggested fix:
Remove `invalid_sent` test or use different values to test a distinct edge case.
|
Changes incorporated into #1933 |
Fixes for Snapshot Sync DoS, Safety, and Validation Issues
Description
This PR addresses several critical issues related to snapshot synchronization, focusing on improving DoS protection, ensuring protocol safety, and enhancing validation logic. The changes aim to make snapshot transfers more robust against malicious input and prevent unintended state corruption.
Specifically, this PR fixes the following issues:
MAX_ENTITIES_PER_PAGE,MAX_SNAPSHOT_PAGES, andMAX_DAG_HEADSto more conservative values to strengthen resource exhaustion protections.check_snapshot_safety()into theSyncManager::sync_snapshotpath, with an exception for crash recovery, to prevent accidental state overwrites on initialized nodes.SnapshotErrorvariants forEntityCountMismatchandMissingPagesand updatedSnapshotVerifyResult::to_error()to map to these distinct errors, improving error specificity for better recovery decisions.SnapshotPage::is_valid()to ensuresent_countdoes not exceedpage_count.SnapshotEntityPage::is_valid()to validatepage_number < total_pagesandis_lastcoherence withpage_number.MAX_COMPRESSED_PAYLOAD_SIZEand added a check inSnapshotPage::is_valid()to bound thepayloadlength, preventing excessive allocation during decompression.Test plan
The changes were verified by:
cargo checkon the affected crates (crates/node/primitives,crates/node).cargo testforcrates/node/primitives, including new unit tests added to cover the updated validation logic forSnapshotPage::is_valid()andSnapshotEntityPage::is_valid(), as well as the new error handling paths inSnapshotVerifyResult::to_error().cargo fmt --checkandcargo clippyto ensure code style and best practices.No changes to the user interface were made. Existing end-to-end tests should continue to pass.
Documentation update
No public or internal documentation updates are required as these changes are internal protocol and validation improvements.