feat(sync): implement SubtreePrefetch sync strategy (#1776)#2034
feat(sync): implement SubtreePrefetch sync strategy (#1776)#2034
Conversation
…/MessagePayload Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…otocol - Phase 1: Top-level hash comparison to identify divergent subtrees - Phase 2: Fetch entire subtrees in single request - Direct storage merge for received entities - Fallback to snapshot sync on failure - Wire into SyncManager dispatch and responder - Update protocol tests for new default capabilities Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ulation
- SyncState::SubtreePrefetch { peer, pending_roots }
- SyncMessage::SubtreePrefetchRequest/Response variants
- SubtreeTransfer struct for simulation entity transfer
- estimated_size for new message variants
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ests - Protocol selection: verify select_protocol() picks SubtreePrefetch for deep trees with low divergence - Scenario preconditions: verify force_subtree_prefetch() setup - Heuristic function: verify should_use_subtree_prefetch() conditions - SyncState variant: verify SubtreePrefetch state behavior - Message construction: verify new SyncMessage variants Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 100% | Review time: 194.6s
🔴 1 critical, 🟡 5 warnings, 💡 5 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-c8326405
| "SubtreePrefetch sync completed" | ||
| ); | ||
|
|
||
| Ok(SyncProtocol::SubtreePrefetch { |
There was a problem hiding this comment.
🔴 Phase 1 optimization is defeated - all subtrees marked divergent
The code adds prefixes from remote-only, local-only, AND the intersection, meaning ALL subtrees are always fetched regardless of divergence; this makes the two-phase protocol pointless overhead.
Suggested fix:
Only include prefixes in symmetric difference (remote XOR local), not the intersection; comparing actual hashes would be needed for true divergence detection of shared prefixes.
| %context_id, | ||
| total_applied, | ||
| subtrees_received = prefetch_response.subtree_count(), | ||
| not_found = prefetch_response.not_found.len(), |
There was a problem hiding this comment.
🟡 identify_divergent_subtrees always returns all prefixes
The function adds prefixes only in remote, only in local, AND the intersection—meaning every prefix is returned as divergent, defeating the optimization purpose of SubtreePrefetch.
Suggested fix:
Only include prefixes in the symmetric difference (remote XOR local), or implement actual hash comparison for shared prefixes to identify true divergence.
crates/node/src/sync/subtree_sync.rs
Outdated
| /// Handle a SubtreePrefetchRequest (Phase 2 of SubtreePrefetch). | ||
| /// | ||
| /// Walks storage for entities matching the requested subtree roots by key prefix, | ||
| /// and sends back a SubtreePrefetchResponse. |
There was a problem hiding this comment.
🟡 Missing input validation on subtree_roots vector size
The handle_subtree_prefetch_request accepts an unbounded subtree_roots vector from remote peers without size validation, which could enable resource exhaustion attacks.
Suggested fix:
Add validation similar to other snapshot messages: `if subtree_roots.len() > MAX_SUBTREE_ROOTS { return Err(...) }` with a reasonable constant limit.
| subtree_roots: divergent_roots.clone(), | ||
| max_depth: None, | ||
| }, | ||
| next_nonce: super::helpers::generate_nonce(), |
There was a problem hiding this comment.
🟡 No cryptographic verification of received entity data
Entity data received from peers in Phase 2 is applied via handle.put() after only calling is_valid(), which checks structural validity but not cryptographic integrity; a malicious responder could inject arbitrary data.
Suggested fix:
Verify the computed root hash of received entities matches the expected subtree root hash before applying to storage.
crates/node/src/sync/subtree_sync.rs
Outdated
| pub(crate) async fn handle_tree_node_request( | ||
| &self, | ||
| context_id: ContextId, | ||
| _node_id: [u8; 32], |
There was a problem hiding this comment.
🟡 Full datastore scan in handle_tree_node_request
Iterates ALL entries in the entire datastore to find prefixes for a single context_id; this is O(total_entries) instead of O(context_entries).
Suggested fix:
Use a range/prefix query on context_id if the storage layer supports it, or maintain a separate index of prefixes per context.
| subtrees: serialized.into(), | ||
| not_found_count: not_found.len() as u32, | ||
| }, | ||
| next_nonce: super::helpers::generate_nonce(), |
There was a problem hiding this comment.
🟡 Full datastore scan in collect_subtree_entities
Same O(total_entries) full scan pattern repeated; called once per subtree root in Phase 2, potentially multiplying the cost.
Suggested fix:
Use range queries scoped to context_id and prefix, or refactor to collect all needed entities in a single pass.
crates/node/src/sync/subtree_sync.rs
Outdated
|
|
||
| /// Collect all entities within a subtree identified by a root prefix. | ||
| /// | ||
| /// Iterates storage, filters by context_id and prefix match, and builds |
There was a problem hiding this comment.
💡 Hardcoded metadata loses entity provenance information
Entity metadata is hardcoded with version=0 and parent_hash=[0u8;32], which discards the original CRDT versioning that may be needed for proper conflict resolution and audit trails.
Suggested fix:
Retrieve and preserve the actual metadata from storage if it exists, or document why placeholder values are acceptable for this sync path.
| } | ||
|
|
||
| // ========================================================================= | ||
| // SubtreePrefetch Protocol Selection Tests |
There was a problem hiding this comment.
💡 Tests cover protocol selection but not actual sync logic
The new tests verify protocol selection heuristics and message construction, but don't test the core two-phase sync behavior in subtree_sync.rs (identify_divergent_subtrees, collect_subtree_entities, entity application).
Suggested fix:
Add integration tests that exercise the full request_subtree_prefetch flow with mock storage to verify correct divergence detection and entity merging.
| depth: usize, | ||
| ) -> eyre::Result<SubtreeData> { | ||
| let handle = self.context_client.datastore_handle(); | ||
| let mut iter = handle.iter::<ContextStateKey>()?; |
There was a problem hiding this comment.
💡 Root hash computed via simple XOR lacks meaningful integrity
XORing all entity keys produces a weak hash that doesn't detect reordering or provide real integrity verification for the subtree.
Suggested fix:
Consider using a proper cryptographic hash over sorted keys and values, or document that this hash is only for debugging/logging purposes.
| .await; | ||
| } | ||
| }; | ||
|
|
There was a problem hiding this comment.
💡 Redundant datastore iteration in identify_divergent_subtrees
This method re-scans the entire datastore to build local_prefixes, duplicating work already done when the responder built its TreeNodeResponse.
Suggested fix:
Consider caching local prefixes after initial computation, or restructuring the protocol to avoid redundant scans.
E2E Rust Apps FailedOne or more E2E workflows (e2e-kv-store, xcall-example) failed after retries. Please check the workflow logs for more details. |
E2E Blockchain Proposals FailedThe following proposal workflow(s) failed:
Please check the workflow logs for more details. |
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 91% | Review time: 282.4s
🔴 2 critical, 🟡 5 warnings, 💡 4 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-bd737647
| return self | ||
| .fallback_to_snapshot_sync(context_id, our_identity, peer_id, stream) | ||
| .await; | ||
| } |
There was a problem hiding this comment.
🔴 Missing root hash verification before applying remote state (Invariant I7)
Entities from SubtreePrefetchResponse are written to storage without verifying the resulting root hash matches the expected value, violating I7 which requires verification before ANY writes.
Suggested fix:
Compute expected root hash from the received entities, apply writes to a transaction, verify computed root matches advertised root, then commit—or abort and fall back to snapshot.
| } | ||
| }; | ||
|
|
||
| let subtrees_data = match phase2_response { |
There was a problem hiding this comment.
🔴 Direct put() overwrites may violate CRDT merge requirement (Invariant I5)
Using handle.put() directly writes remote data, but initialized nodes MUST use CRDT merge per I5 to prevent silent data loss; verify put() actually performs merge semantics or use explicit CRDT merge API.
Suggested fix:
Use explicit CRDT merge operation instead of put(), or verify that the datastore handle's put() for ContextState performs proper CRDT merge internally.
| .fallback_to_snapshot_sync(context_id, our_identity, peer_id, stream) | ||
| .await; | ||
| } | ||
|
|
There was a problem hiding this comment.
🟡 identify_divergent_subtrees always marks ALL prefixes as divergent
The function adds (remote \ local) + (local \ remote) + (local ∩ remote), which equals the union of all prefixes; this defeats the optimization since SubtreePrefetch will always fetch all subtrees.
Suggested fix:
Either implement actual hash comparison for the intersection case, or document that this is intentional and remove the intersection loop if those subtrees should not be fetched.
| /// and sends back a SubtreePrefetchResponse. | ||
| pub(crate) async fn handle_subtree_prefetch_request( | ||
| &self, | ||
| context_id: ContextId, |
There was a problem hiding this comment.
🟡 No limit on subtree_roots vector size enables DoS
The subtree_roots parameter from remote peers has no maximum size validation, allowing attackers to request arbitrarily many subtrees causing resource exhaustion.
Suggested fix:
Add a MAX_SUBTREE_ROOTS constant (e.g., 256) and reject requests exceeding this limit before processing.
| /// and sends back a SubtreePrefetchResponse. | ||
| pub(crate) async fn handle_subtree_prefetch_request( | ||
| &self, | ||
| context_id: ContextId, |
There was a problem hiding this comment.
🟡 O(m×n) complexity: full storage iteration per subtree root
handle_subtree_prefetch_request calls collect_subtree_entities for each subtree root, and each call iterates ALL storage entries; with m roots and n entries, this is O(m×n).
Suggested fix:
Iterate storage once, bucketing entities by their prefix byte, then extract requested subtrees from the buckets in O(n + m).
| .. | ||
| } => subtrees, | ||
| _ => { | ||
| warn!(%context_id, "SubtreePrefetch Phase 2: unexpected response, falling back to snapshot"); |
There was a problem hiding this comment.
🟡 All shared prefixes marked as divergent defeats Phase 1 optimization
The loop at lines 338-345 adds ALL prefixes present in both local and remote to the divergent list, meaning Phase 1 provides no filtering benefit—every subtree gets fetched regardless of actual divergence.
Suggested fix:
Either implement actual hash comparison for shared prefixes, or document that Phase 1 only filters prefixes unique to one side and rename the function to clarify semantics.
| context_id: ContextId, | ||
| subtree_roots: Vec<[u8; 32]>, | ||
| max_depth: Option<usize>, | ||
| stream: &mut Stream, |
There was a problem hiding this comment.
🟡 No upper bound validation on max_depth parameter
While a default is provided, a malicious peer can supply an arbitrarily large max_depth value; the only clamping in key_shares_subtree_prefix (to 32) doesn't prevent excessive iteration.
Suggested fix:
Validate max_depth against a reasonable upper bound (e.g., 32) and reject or clamp requests exceeding it at the handler entry point.
| subtree_root: &[u8; 32], | ||
| depth: usize, | ||
| runtime_env: &calimero_storage::env::RuntimeEnv, | ||
| ) -> eyre::Result<SubtreeData> { |
There was a problem hiding this comment.
🟡 Inconsistent hash computation between Phase 1 and SubtreeData
The root_hash in SubtreeData XORs only entity keys, while send_prefix_discovery and identify_divergent_subtrees XOR both keys and values; this inconsistency could cause confusion or bugs if the hashes are later compared.
Suggested fix:
Either align the hash computation (XOR keys+values consistently) or add a clear doc comment explaining why `SubtreeData.root_hash` is computed differently and where it's used.
| /// Handle a SubtreePrefetchRequest. | ||
| /// | ||
| /// Two modes depending on `subtree_roots`: | ||
| /// - **Empty** (discovery / Phase 1): enumerate unique first-byte prefixes of |
There was a problem hiding this comment.
🟡 Unbounded subtree_roots vector enables DoS
The subtree_roots parameter from remote peers has no size limit; a malicious peer could send thousands of roots causing the responder to iterate storage repeatedly and exhaust resources.
Suggested fix:
Add a maximum limit (e.g., 256 roots) and reject requests exceeding it, similar to MAX_TREE_REQUEST_DEPTH validation in other protocols.
| ); | ||
|
|
||
| let mut subtrees = Vec::new(); | ||
| let mut not_found = Vec::new(); |
There was a problem hiding this comment.
🟡 O(M×N) complexity in subtree collection
send_subtree_fetch calls collect_subtree_entities once per subtree root, and each call performs a full storage scan; with M roots and N keys this is O(M×N) when a single scan bucketing by prefix would be O(N).
Suggested fix:
Refactor to iterate storage once, collecting entities into a HashMap<u8, Vec<TreeLeafData>> keyed by first byte, then extract requested subtrees.
| let store = self.context_client.datastore_handle().into_inner(); | ||
| let runtime_env = create_runtime_env(&store, context_id, our_identity); | ||
|
|
||
| for subtree in &prefetch_response.subtrees { |
There was a problem hiding this comment.
🟡 XOR hash is not collision-resistant for divergence detection
The xor_into_hash function is trivially manipulable - an attacker can craft data that produces any target hash, potentially tricking nodes into skipping synchronization of tampered subtrees.
Suggested fix:
Use a cryptographic hash (SHA-256 or BLAKE3) for the per-prefix aggregation to prevent manipulation of divergence detection.
| /// state keys and return one empty `SubtreeData` per prefix so the initiator | ||
| /// can identify divergent subtrees. Then read the follow-up Phase 2 request | ||
| /// from the same stream and handle it. | ||
| /// - **Non-empty** (fetch / Phase 2): walk storage for entities matching the |
There was a problem hiding this comment.
💡 Responder waits indefinitely for Phase 2 after discovery
After sending the Phase 1 discovery response, handle_subtree_prefetch_request calls recv(stream, None) with no timeout; if the initiator finds no divergence and closes the stream, the responder blocks until stream close is detected.
Suggested fix:
Consider adding a reasonable timeout to the Phase 2 recv call to avoid holding resources if the initiator never sends Phase 2.
|
|
||
| Ok(SubtreeData::new(*subtree_root, root_hash, entities, depth)) | ||
| } | ||
|
|
There was a problem hiding this comment.
💡 Repeated fallback pattern reduces readability
The fallback_to_snapshot_sync call is repeated 9 times with nearly identical error handling; extracting a helper closure or macro would reduce boilerplate.
Suggested fix:
Consider a local closure like `let fallback = || self.fallback_to_snapshot_sync(context_id, our_identity, peer_id);` and chain with `.map_err` for logging before falling back.
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 100% | Review time: 228.1s
🔴 1 critical, 🟡 6 warnings, 💡 3 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-c9676a95
| warn!(%context_id, error = %e, "Failed to identify divergent subtrees, falling back to snapshot"); | ||
| return self | ||
| .fallback_to_snapshot_sync(context_id, our_identity, peer_id) | ||
| .await; |
There was a problem hiding this comment.
🔴 Remote peer can cause arbitrary data deletion via crafted response
A malicious peer can manipulate Phase 1 hashes to mark subtrees as divergent, then return empty/partial entity sets in Phase 2, causing the initiator to delete legitimate local data as 'stale'.
Suggested fix:
Implement cryptographic verification of remote state claims before deletion, or require consensus from multiple peers before removing local keys. Consider making stale-key cleanup opt-in or requiring CRDT tombstones for deletions.
| impl SyncManager { | ||
| /// Handle a SubtreePrefetchRequest. | ||
| /// | ||
| /// Two modes depending on `subtree_roots`: |
There was a problem hiding this comment.
🟡 No size limit on subtree_roots allows DoS via resource exhaustion
The subtree_roots vector from untrusted peers has no upper bound validation; a malicious peer could request thousands of roots causing memory/CPU exhaustion.
Suggested fix:
Add a constant like `MAX_SUBTREE_ROOTS` and reject requests exceeding this limit with an early return.
| // Resolve identity for RuntimeEnv (needed to read CRDT metadata from Index) | ||
| let identities = self | ||
| .context_client | ||
| .get_context_members(&context_id, Some(true)); |
There was a problem hiding this comment.
🟡 Silent return without response leaves initiator hanging
When no owned identity is found, the handler returns Ok(()) without sending any response, causing the initiator to timeout waiting.
Suggested fix:
Send an error response or return an error so the stream closes cleanly and the initiator can fall back immediately.
|
|
||
| /// Send subtree fetch response (Phase 2 responder). | ||
| /// | ||
| /// Collects entities for each requested subtree root and sends back a |
There was a problem hiding this comment.
🟡 O(n*m) storage scans in send_subtree_fetch
Each call to collect_subtree_entities iterates the entire storage; with m subtree roots and n entities, this is O(n*m) instead of O(n) with a single-pass bucketing approach.
Suggested fix:
Refactor to iterate storage once, bucketing entities by prefix into a HashMap<u8, Vec<TreeLeafData>>, then extract requested subtrees.
| context_id: ContextId, | ||
| our_identity: calimero_primitives::identity::PublicKey, | ||
| peer_id: libp2p::PeerId, | ||
| stream: &mut Stream, |
There was a problem hiding this comment.
🟡 Deserialized response from peer not validated for size before processing
The SubtreePrefetchResponse is deserialized from an arbitrary-length byte slice with only an is_valid() check; massive payloads could exhaust memory during deserialization.
Suggested fix:
Enforce a maximum byte size on `subtrees_data` before calling `borsh::from_slice`, similar to existing limits like `max_batch_size`.
|
|
||
| let mut sqx = Sequencer::default(); | ||
| let msg = StreamMessage::Message { | ||
| sequence_id: sqx.next(), |
There was a problem hiding this comment.
🟡 No upper bound validation on max_depth parameter from peer
When max_depth is explicitly provided by a remote peer, there is no upper bound check; a malicious peer could request excessive depth causing expensive storage iteration.
Suggested fix:
Clamp `max_depth` to a maximum constant (e.g., `MAX_TREE_REQUEST_DEPTH` already exists in wire.rs) regardless of peer-provided value.
|
|
||
| // Apply received entities with CRDT merge (Invariant I5: No Silent Data Loss). | ||
| // Uses apply_leaf_with_crdt_merge like HashComparison and LevelWise protocols, | ||
| // ensuring proper merge semantics based on crdt_type and HLC timestamps. |
There was a problem hiding this comment.
🟡 XOR hash has collision risk for divergence detection
XOR hashing can produce false negatives where different key-value sets yield identical hashes (e.g., swapping pairs), potentially causing the protocol to miss divergent data.
Suggested fix:
Consider using a cryptographic accumulator or sorted hash chain (e.g., XOR of SHA256(key||value) for each entry) to reduce collision probability.
|
|
||
| /// Send subtree fetch response (Phase 2 responder). | ||
| /// | ||
| /// Collects entities for each requested subtree root and sends back a |
There was a problem hiding this comment.
💡 No entity count limit in subtree fetch response
The subtree fetch collects unbounded entities per prefix; a malicious or large prefix could cause excessive memory allocation and potential OOM.
Suggested fix:
Add an entity count limit per subtree and set `is_truncated` when exceeded, or enforce the existing max_batch_size from SyncCapabilities.
| subtree_count = subtree_roots.len(), | ||
| depth, | ||
| "SubtreePrefetch fetch: collecting subtree data" | ||
| ); |
There was a problem hiding this comment.
💡 Error during entity collection is silently treated as 'not found'
When collect_subtree_entities fails with an error, it's logged but the root is added to not_found, masking real failures from the caller.
Suggested fix:
Consider distinguishing between 'subtree has no entities' and 'failed to collect entities' to avoid hiding errors that might warrant snapshot fallback.
| "SubtreePrefetch Phase 2: requesting divergent subtrees" | ||
| ); | ||
|
|
||
| // Phase 2: Request divergent subtrees |
There was a problem hiding this comment.
💡 TOCTOU race between stale key detection and deletion
Keys collected for deletion in the read phase could become stale or new keys could appear before the delete phase executes, potentially causing incorrect deletions if concurrent writes occur.
Suggested fix:
Document that context must be quiesced during sync, or use a transactional delete pattern.
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 96% | Review time: 291.8s
🔴 1 critical, 🟡 5 warnings, 💡 3 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-68585378
| } | ||
| }; | ||
|
|
||
| let subtrees_data = match phase2_response { |
There was a problem hiding this comment.
🔴 Trust boundary violation: deleting local data based on untrusted remote response
A malicious peer can cause data loss by returning empty subtrees or claiming keys in not_found, triggering deletion of legitimate local keys without cryptographic verification.
Suggested fix:
Require cryptographic proof (e.g., signed root hash or Merkle proof) before deleting local keys, or only allow deletions when multiple trusted peers agree.
| .fallback_to_snapshot_sync(context_id, our_identity, peer_id) | ||
| .await; | ||
| } | ||
| }; |
There was a problem hiding this comment.
🟡 Stale key deletion happens before entity application - partial failure leaves inconsistent state
Keys are deleted in a separate loop before entities are applied; if entity application fails midway, the deleted keys are already gone with no rollback.
Suggested fix:
Consider applying all changes in a single transaction or applying entities first (idempotent via CRDT merge), then deleting stale keys.
| /// hashes to filter out shared prefixes whose data already matches, | ||
| /// avoiding unnecessary Phase 2 fetches. | ||
| async fn send_prefix_discovery( | ||
| &self, |
There was a problem hiding this comment.
🟡 No upper bound validation on max_depth from untrusted peer request
The max_depth parameter from incoming requests is used without validation against MAX_TREE_REQUEST_DEPTH, potentially allowing resource exhaustion attacks.
Suggested fix:
Validate max_depth against the constant MAX_TREE_REQUEST_DEPTH (16) defined in wire.rs before processing.
| "Failed to collect subtree entities" | ||
| ); | ||
| not_found.push(*root); | ||
| } |
There was a problem hiding this comment.
🟡 max_depth parameter ignored during entity collection
The depth parameter in collect_subtree_entities is only stored in metadata but never used to limit the number of entities collected, potentially causing unbounded memory allocation for large subtrees.
Suggested fix:
Add an early-exit or truncation when entity count exceeds a threshold derived from `depth`, and set `is_truncated` accordingly.
| }; | ||
|
|
||
| if let Err(e) = super::stream::send(stream, &phase2_msg, None).await { | ||
| warn!(%context_id, error = %e, "SubtreePrefetch Phase 2 send failed, falling back to snapshot"); |
There was a problem hiding this comment.
🟡 Duplicate full storage scan during sync
The identify_divergent_subtrees method performs a full storage scan, and then the cleanup phase at lines 475-495 performs another full scan; these could be combined into a single pass to halve I/O.
Suggested fix:
Compute both `local_prefix_hashes` and collect candidate stale keys in a single storage iteration, returning both results.
| total_applied += 1; | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
🟡 Weak XOR-based hash allows divergence detection bypass
XOR is not collision-resistant; an attacker could craft payloads with identical prefix hashes but different data, causing sync to skip divergent subtrees or triggering unnecessary syncs.
Suggested fix:
Use a cryptographic hash function (e.g., BLAKE3 or SHA256) for divergence detection instead of XOR.
| let entity_id = Id::new(state_key); | ||
| match Index::<MainStorage>::get_index(entity_id) { | ||
| Ok(Some(idx)) => { | ||
| let crdt_type = idx |
There was a problem hiding this comment.
💡 Long function with repetitive fallback pattern
The request_subtree_prefetch function (~200 lines) repeats the same warn! + fallback_to_snapshot_sync pattern 9 times; extracting a helper or using a macro would reduce duplication.
Suggested fix:
Extract a helper like `async fn try_or_fallback<T>(&self, result: Result<T>, context_id, ...) -> ...` that handles logging and fallback uniformly.
| }; | ||
|
|
||
| super::stream::send(stream, &msg, None).await?; | ||
| Ok(()) |
There was a problem hiding this comment.
💡 N+1 index lookups per entity
Each entity in collect_subtree_entities triggers a separate Index::get_index call inside the loop, which may cause N+1 query overhead if the index is not cached.
Suggested fix:
Consider batching index lookups or caching index data if the underlying implementation doesn't already do so.
| context_id: ContextId, | ||
| subtree_root: &[u8; 32], | ||
| depth: usize, | ||
| runtime_env: &calimero_storage::env::RuntimeEnv, |
There was a problem hiding this comment.
💡 Inconsistent hash computation - should use xor_into_hash helper
The root_hash is computed with an inline loop while the same XOR pattern is abstracted in xor_into_hash used elsewhere; using the helper improves DRY consistency.
Suggested fix:
Replace the inline loop with: `for entity in &entities { xor_into_hash(&mut root_hash, &entity.key); }`
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| warn!(%context_id, error = %e, "SubtreePrefetch Phase 1 send failed, falling back to snapshot"); | ||
| return self | ||
| .fallback_to_snapshot_sync(context_id, our_identity, peer_id) | ||
| .await; |
There was a problem hiding this comment.
Snapshot fallback always fails for initialized nodes
High Severity
Every fallback path in request_subtree_prefetch calls fallback_to_snapshot_sync, which internally calls request_snapshot_sync with force=false. This enforces Invariant I5, rejecting snapshot sync for initialized nodes via check_snapshot_safety. However, SubtreePrefetch is only selected by select_protocol Rule 4, which runs after the Rule 2 fresh-node check — meaning the local node always has state. So every fallback attempt will fail with SnapshotOnInitializedNode, causing a hard sync error instead of graceful recovery.
Additional Locations (1)
| root[0] = prefix; | ||
| divergent.push(root); | ||
| } | ||
| } |
There was a problem hiding this comment.
Local-only prefixes cause unintended data deletion
High Severity
identify_divergent_subtrees includes local-only prefixes (data present locally but absent remotely) as divergent roots. Phase 2 then requests these from the remote, which returns them in not_found. The cleanup logic in is_key_stale deletes all local keys under not_found_prefixes, permanently destroying data the initiator had but the responder simply hadn't received yet. This contradicts the stated CRDT merge semantics and Invariant I5 ("No Silent Data Loss").


Summary
Implements the SubtreePrefetch sync strategy (Issue #1776) — a two-phase protocol optimized for deep Merkle trees with clustered, low-divergence changes.
TreeNodeRequestto enumerate first-byte prefixes of state keys, then compares local vs remote children to identify divergent subtreesSubtreePrefetchRequestfor all divergent subtree roots, receives complete entity data, and applies viahandle.put()(CRDT merge)Files changed (9 files, +773 / -11)
crates/node/src/sync/subtree_sync.rscrates/node/src/sync/manager.rscrates/node/primitives/src/sync/snapshot.rscrates/node/primitives/src/sync/handshake.rscrates/node/primitives/src/sync/protocol.rscrates/node/tests/sync_sim.rscrates/node/tests/sync_sim/actions.rscrates/node/tests/sync_sim/node/state.rsTest plan
cargo test -p calimero-node --lib -- sync— 26 passedcargo test -p calimero-node-primitives -- sync— 244 passedcargo test -p calimero-node --test sync_sim— 187 passedcargo fmt --all -- --check— cleanCloses #1776
Note
Medium Risk
Introduces a new network sync path that applies CRDT-merged state and performs prefix-based key deletions; bugs could cause missed updates or unintended data removal during sync.
Overview
Implements the
SubtreePrefetchsync strategy end-to-end, including making it a default advertised capability and adding new wire request/response variants for subtree prefetch data transfer.SyncManagernow dispatchesSyncProtocol::SubtreePrefetchto a new two-phase implementation (subtree_sync.rs) that (1) discovers divergent key-prefix subtrees via lightweight per-prefix hashes, then (2) bulk-fetches those subtrees, applies entities via CRDT merge, and deletes stale local keys under authoritative remote prefixes; any failure falls back to snapshot sync.Simulation and protocol-selection tests are updated/expanded to cover
SubtreePrefetchsupport, fallback behavior when the remote lacks the capability, new message sizing/types, and a newSyncState::SubtreePrefetchvariant.Written by Cursor Bugbot for commit 093f067. This will update automatically on new commits. Configure here.