Skip to content

Needs_push local leaf data#1915

Draft
cursor[bot] wants to merge 10 commits intomasterfrom
cursor/needs-push-local-leaf-data-c148
Draft

Needs_push local leaf data#1915
cursor[bot] wants to merge 10 commits intomasterfrom
cursor/needs-push-local-leaf-data-c148

Conversation

@cursor
Copy link
Contributor

@cursor cursor bot commented Feb 10, 2026

Fix: needs_push misses local leaf data in structural divergence

Description

This PR fixes a bug in the TreeCompareResult::needs_push method where it incorrectly returned false when a local node was a leaf with data, but the remote node at the same position was an internal node with children. This scenario, representing structural divergence, meant local leaf data was not being pushed, violating the method's contract.

The fix modifies the Different case in needs_push to correctly identify when the local node is a leaf (i.e., has no local_only_children and no common_children). In such cases, the local leaf data needs to be pushed for proper bidirectional CRDT sync, even if the remote node has children. The inline documentation for needs_push has been updated to reflect this behavior.

This addresses bug ref1_0ca2a992-7860-4c86-8856-6838e5d20455.

Test plan

The existing unit test with_remote_only in crates/node/primitives/src/sync.rs was updated to assert true for needs_push in the Different case where local_only_children and common_children are empty, but remote_only_children is non-empty. This test now correctly verifies the fix for structural divergence.
All cargo test, cargo fmt --check, and cargo clippy checks were run and passed.

Documentation update

The inline documentation for the needs_push method in crates/node/primitives/src/sync.rs has been updated to clarify its behavior regarding structural divergence. No other documentation updates are immediately required.


xilosada and others added 10 commits February 10, 2026 14:44
Implements issue #1774 - HashComparison Sync Strategy (P0 Primary)
from the Sync Protocol series.

Adds types for recursive Merkle tree comparison (CIP §4 STATE-BASED):
- TreeNodeRequest for traversal (node_id, max_depth)
- TreeNodeResponse with nodes and not_found flag
- TreeNode (internal with children OR leaf with data)
- TreeLeafData (key, value, metadata)
- LeafMetadata (crdt_type, hlc, version, collection_id, parent_id)
- CrdtType enum (LwwRegister, GCounter, PnCounter, etc.)
- TreeCompareResult and compare_tree_nodes() function

Key design decisions:
- Metadata MUST include crdt_type for proper CRDT merge
- TreeNode can be internal (has children) or leaf (has data)
- compare_tree_nodes() identifies differing subtrees

Protocol flow:
1. Start at root
2. Request children, compare hashes
3. Recurse on differing subtrees
4. Fetch leaf data, apply via CRDT merge

Includes 13 new unit tests (79 total).

CIP Reference: §4 - State Machine (STATE-BASED branch)
Fixes issues identified in PR #1870 review:

1. **Critical**: compare_tree_nodes now properly handles bidirectional sync
   - Changed signature to (Option<&TreeNode>, Option<&TreeNode>)
   - Returns RemoteMissing when local has data remote doesn't
   - Leaf nodes with different hashes return Different (not empty)

2. **Warnings**:
   - TreeCompareResult::Different now includes local_only_children
     for bidirectional sync support
   - O(n*m) → O(n+m) using HashSet for child comparison

3. **Suggestions**:
   - Added MAX_TREE_DEPTH = 64 constant
   - TreeNodeRequest::with_depth clamps to MAX_TREE_DEPTH

4. **Tests**: Added 4 new tests for edge cases:
   - test_compare_tree_nodes_leaf_content_differs
   - test_compare_tree_nodes_remote_missing
   - test_compare_tree_nodes_local_only_children
   - test_tree_node_request_max_depth_validation

All 46 sync tests pass.
…th docs

Address AI reviewer comments:
- Add BorshSerialize/BorshDeserialize to TreeCompareResult for wire protocol consistency
- Document max_depth security implications and MAX_TREE_DEPTH enforcement
Address AI reviewer comments (review-50843406):
- Add depth() accessor that always clamps max_depth for safe deserialization
- Add MAX_NODES_PER_RESPONSE constant with is_valid() for response validation
- Add MAX_CHILDREN_PER_NODE constant with is_valid() for node validation
- Add needs_push() method for bidirectional sync scenarios
- Clarify TreeNode id vs hash semantic distinction in documentation
- Add comprehensive tests for all new validation methods
Add TODO comment noting CrdtType is duplicated between sync and storage
modules, with a recommendation to consolidate into calimero-primitives.

Document LeafMetadata as a wire-protocol-optimized subset of storage Metadata,
explaining the design decision to keep it separate for smaller payloads.
…for leaf nodes

- Add common_children field to TreeCompareResult::Different to track
  children present on both sides that need recursive comparison
- Rename differing_children to remote_only_children for clarity
- Fix needs_push() to return true for differing leaf nodes (when all
  child vecs are empty but hashes differ)
- Update tests to cover the new behavior

Fixes:
- Merkle comparison now properly reports shared children for recursive
  traversal when parent hashes differ
- Bidirectional CRDT sync now works for differing leaf nodes
Cherry-pick Bugbot PR #1914:
- Add common_children field to TreeCompareResult::Different for proper Merkle traversal
- Rename differing_children → remote_only_children for clarity
- Fix needs_push() to return true for differing leaf nodes

Additional improvements:
- TreeNodeResponse::is_valid() now validates all nested TreeNodes
- TreeNode::is_valid() now checks structural invariant (children XOR leaf_data)
- Add MAX_LEAF_VALUE_SIZE (1MB) constant with TreeLeafData::is_valid()
- Add over-limit validation tests for all bounds
- Move HashSet import to module level (StdExternalCrate pattern)
- Change leaves() to return iterator instead of Vec (zero allocation)

Tests: 52 sync tests passing
- Make max_depth field private to prevent bypass of depth() clamping
- TreeNode::is_valid() now rejects empty nodes (no children AND no leaf_data)
- Improve needs_push() documentation explaining leaf detection invariant
- Simplify CrdtType TODO comment to just reference tracking issue #1912
- Update tests for stricter validation rules

All 52 sync tests passing
…ergence

The needs_push method now correctly returns true when local is a leaf
node and remote is an internal node (structural divergence). Previously,
the leaf detection heuristic only returned true when all child vecs were
empty, but this missed the case where local is a leaf (no local_only or
common children) while remote has children (remote_only is non-empty).

The fix changes the final check from:
  remote_only_children.is_empty() && common_children.is_empty()
to:
  common_children.is_empty()

Since we already checked local_only_children.is_empty() above, if
common_children is also empty, local has no children at all (is a leaf)
and its data needs to be pushed for CRDT merge.
@cursor
Copy link
Contributor Author

cursor bot commented Feb 10, 2026

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

@github-actions
Copy link

Your PR title does not adhere to the Conventional Commits convention:

<type>(<scope>): <subject>

Common errors to avoid:

  1. The title must be in lower case.
  2. Allowed type values are: build, ci, docs, feat, fix, perf, refactor, test.

Copy link

@meroreviewer meroreviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Reviewer

Reviewed by 1 agents | Quality score: 80% | Review time: 159.4s


✅ No Issues Found

All agents reviewed the code and found no issues. LGTM! 🎉


🤖 Generated by AI Code Reviewer | Review ID: review-9ce77f19

Base automatically changed from sync-protocol/007-hash-comparison to master February 10, 2026 19:31
@github-actions
Copy link

This pull request has been automatically marked as stale. If this pull request is still relevant, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize reviewing it yet. Your contribution is very much appreciated.

@github-actions github-actions bot added the Stale label Feb 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants