Skip to content

Synchronization logic issues#1971

Draft
cursor[bot] wants to merge 4 commits intomasterfrom
cursor/synchronization-logic-issues-cbc8
Draft

Synchronization logic issues#1971
cursor[bot] wants to merge 4 commits intomasterfrom
cursor/synchronization-logic-issues-cbc8

Conversation

@cursor
Copy link
Contributor

@cursor cursor bot commented Feb 12, 2026

Fixes for Hash Comparison and CRDT Merge Logic

Description

This PR addresses three high-severity logic bugs found in the hash comparison and CRDT merge protocol, improving correctness and efficiency.

  1. Child nodes incorrectly compared against parent's local version (bug_id: 68164f61-f021-47f2-a59c-b8b4ad0a75c5):

    • Motivation: When a max_depth=Some(1) request returns the parent node and its children, the original logic incorrectly compared child nodes against the parent's local version. This led to debug_assert failures in debug builds and silent incorrect comparisons in release, breaking the subtree-skipping optimization.
    • Fix: The loop processing remote nodes now explicitly checks the index. Only the requested node (index 0) is compared against its local counterpart. Child internal nodes (index > 0) are skipped for comparison at this stage, as they will be correctly processed when their node_id is popped from the to_compare stack with their respective local versions. Leaf nodes continue to be merged via CRDT regardless of their position in the response.
  2. Missing RuntimeEnv causes incorrect LWW timestamp comparison (bug_id: 2ed1efc4-22e5-41b2-98b9-24a95e5fe207):

    • Motivation: The Index::<MainStorage>::get_index call within merge_entity_values was not wrapped in with_runtime_env, causing it to fall back to MockedStorage. This resulted in existing_ts always defaulting to 0, leading to incoming values always overwriting existing local values, violating CRDT merge invariant I5.
    • Fix: The runtime_env is now passed to apply_leaf_with_crdt_merge and subsequently to merge_entity_values. The get_index call in merge_entity_values is now correctly wrapped with with_runtime_env to ensure it accesses the actual storage.
  3. CRDT merge uses wrong storage key format (bug_id: ref1_5b9ac4a8-9873-4837-b10a-fa3ecdfafd84):

    • Motivation: apply_leaf_with_crdt_merge was constructing ContextStateKey using raw entity ID bytes (leaf.key). However, the actual entity data is stored under a SHA256-hashed key (e.g., SHA256(discriminant || entity_id)). This mismatch caused the merge to read from a nonexistent location, skip the merge, and write the incoming value to the wrong location, effectively preventing any CRDT merge or update to the correct entity data.
    • Fix: apply_leaf_with_crdt_merge now constructs the ContextStateKey by first creating a calimero_storage::Key::Entry from the entity ID and then calling .to_bytes() on it. This ensures the key is correctly SHA256-hashed, matching the format used for storing entity data.

Test plan

The changes were verified by:

  • cargo check
  • cargo build
  • cargo clippy
    All checks passed successfully.
    Unit tests for the hash_comparison module were attempted, but environment-specific linking issues prevented their execution. The fixes address specific logic errors identified by static analysis. Existing end-to-end tests for synchronization should now behave more correctly.

Documentation update

No public or internal documentation updates are required for these internal logic fixes.


xilosada and others added 4 commits February 12, 2026 13:15
…tocol testing

Replaces the flat `DigestCache` HashMap in `SimNode` with `SimStorage`,
which uses the real `calimero-storage::Index<MainStorage>` implementation
backed by `InMemoryDB`. This enables accurate simulation of sync protocols
that depend on tree structure (e.g., HashComparison).

Key changes:
- Add `SimStorage` with in-memory Merkle tree using `Store + InMemoryDB`
- Add `RuntimeEnv` bridge to connect storage Key operations
- Update `SimNode` to use hybrid storage: real tree + metadata cache
- Add `insert_entity_hierarchical()` for creating proper tree depth
- Make `Index::get_index()` and `get_children_of()` public for traversal
- Add tree structure verification tests for protocol selection
- Fix: prevent self-referencing cycle in hierarchical insertion

The entity counting now correctly excludes intermediate tree nodes,
and `iter_entities()` returns only "real" entities (with metadata).

Tree depth now affects protocol selection:
- SubtreePrefetch scenarios have max_depth > 3
- LevelWise scenarios have max_depth <= 2

Spec reference: Simulation Framework Spec §5, §7, §11

Co-authored-by: cursor[bot] <cursor@calimero.network>
- Delegate apply_storage_op Insert/Update to insert_entity_with_metadata
  to avoid duplicating dual-write logic (cursor bugbot feedback)
- Extract magic number 24 to MAX_HIERARCHICAL_DEPTH constant with docs
- Add comprehensive documentation for max_depth() semantics explaining
  the difference between storage-level (root-inclusive) and protocol-level
  (root-exclusive) depth values
Implements the HashComparison sync protocol (CIP §4) with proper
integration into SyncManager and comprehensive test coverage.

Key changes:
- Add HashComparison protocol implementation with iterative DFS traversal
- Integrate with SyncManager for initiator and responder roles
- Add wire protocol types (TreeNodeRequest, TreeNodeResponse) in new wire.rs
- Implement CRDT merge at leaves with proper timestamp extraction (I5)
- Add force_protocol mechanism in SimNode for testing
- Add 10 HashComparison simulation tests
- Add compliance tests for I4 (convergence) and I5 (CRDT merge)

Invariants verified:
- I5: CRDT merge only, no overwrite for initialized nodes
- I4: Strategy equivalence verified via compliance tests

Test coverage: 230 tests passing
Fix Bug 1: Child nodes incorrectly compared against parent's local version
- Only compare the first node (index 0) in the response with local_node
- For children (index > 0), skip comparison and let them be processed
  when popped from to_compare stack with their correct local versions
- This prevents the debug_assert in compare_tree_nodes from firing

Fix Bug 2: Missing RuntimeEnv causes incorrect LWW timestamp comparison
- Pass runtime_env through apply_leaf_with_crdt_merge to merge_entity_values
- Wrap Index::get_index call in with_runtime_env to access actual storage
- This ensures existing timestamps are correctly retrieved for LWW merge

Fix Bug 3: CRDT merge uses wrong storage key format
- Use Key::Entry(entity_id).to_bytes() to apply SHA256 hashing
- This matches the key format used by create_storage_callbacks
- Without this fix, reads/writes target wrong storage locations
@cursor
Copy link
Contributor Author

cursor bot commented Feb 12, 2026

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

Copy link

@meroreviewer meroreviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Reviewer

Reviewed by 3 agents | Quality score: 93% | Review time: 257.4s

🟡 2 warnings, 💡 3 suggestions. See inline comments.


🤖 Generated by AI Code Reviewer | Review ID: review-dbcfba6f

@@ -573,11 +589,14 @@ impl SyncManager {

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Silent fallback to timestamp 0 on get_index failure masks errors

If with_runtime_env or get_index fails due to an unexpected error (not just missing entry), the code silently uses timestamp 0, which could cause incoming values to incorrectly win LWW resolution and violate invariant I5.

Suggested fix:

Log a warning when get_index returns an error (distinct from None) so storage failures during sync are observable, even if the fallback behavior is retained.

context_id: ContextId,
leaf: &TreeLeafData,
runtime_env: &RuntimeEnv,
) -> Result<()> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Critical key-hashing fix lacks unit test coverage

The storage key format fix is critical for correctness but the PR notes unit tests couldn't run; consider adding an integration test that verifies the hashed key matches the expected storage location.

Suggested fix:

Add a test that stores an entity via normal path and verifies `apply_leaf_with_crdt_merge` can read it back using the same key derivation.

} else if idx == 0 {
// Internal node at index 0: this is the requested node, compare with local
// For children (idx > 0), skip comparison here - they will be processed
// when they are popped from the to_compare stack with their correct local versions.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Children data in response is discarded and re-requested

Internal nodes at idx > 0 are skipped entirely even though their data is already in the response; they will be re-fetched when popped from to_compare stack, causing redundant network round-trips for max_depth=1 requests.

Suggested fix:

Consider caching child node data from the response in a local map keyed by node_id, then checking this cache before sending TreeNodeRequest to avoid duplicate fetches.

self.apply_leaf_with_crdt_merge(context_id, leaf_data)
self.apply_leaf_with_crdt_merge(context_id, leaf_data, &runtime_env)
.await
.wrap_err("failed to merge leaf entity")?;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Implicit fall-through for internal child nodes reduces clarity

Internal nodes at idx > 0 silently fall through the if-else chain; an explicit else branch with a trace log would make the intentional no-op explicit and aid future debugging.

Suggested fix:

Add `else { trace!(%context_id, node_id = ?remote_node.node_id, "Skipping child internal node, will process when popped from stack"); }` after line 390.

@@ -573,11 +589,14 @@ impl SyncManager {

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Clone of runtime_env inside loop iteration may be expensive if called frequently

The runtime_env.clone() occurs for every LWW merge; if RuntimeEnv contains Rc-wrapped callbacks this is cheap, but this should be verified or documented.

Suggested fix:

Confirm RuntimeEnv::clone is cheap (Rc clones) or move the closure outside the match arm if performance matters.

Base automatically changed from feat/hash-comparison-protocol to master February 12, 2026 14:57
@github-actions
Copy link

This pull request has been automatically marked as stale. If this pull request is still relevant, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize reviewing it yet. Your contribution is very much appreciated.

@github-actions github-actions bot added the Stale label Feb 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants