Skip to content

feat(sync-sim): add SimStorage with real Merkle tree for protocol testing#1961

Closed
xilosada wants to merge 2 commits intomasterfrom
feat/sim-node-storage-infra
Closed

feat(sync-sim): add SimStorage with real Merkle tree for protocol testing#1961
xilosada wants to merge 2 commits intomasterfrom
feat/sim-node-storage-infra

Conversation

@xilosada
Copy link
Member

@xilosada xilosada commented Feb 12, 2026

Summary

Adds SimStorage infrastructure to the simulation framework, replacing the flat DigestCache HashMap with a real Merkle tree implementation. This enables accurate simulation of sync protocols that depend on tree structure (e.g., HashComparison).

Key changes:

  • Add SimStorage with in-memory Merkle tree using Store + InMemoryDB
  • Add RuntimeEnv bridge to connect storage Key operations
  • Update SimNode to use hybrid storage: real tree + metadata cache
  • Add insert_entity_hierarchical() for creating proper tree depth
  • Make Index::get_index() and get_children_of() public for traversal
  • Add tree structure verification tests for protocol selection

Entity semantics improvements:

  • Entity counting now correctly excludes intermediate tree nodes
  • iter_entities() returns only "real" entities (with metadata)
  • has_entity() checks metadata cache, not storage (excludes intermediate nodes)

Protocol selection now uses real tree depth:

  • SubtreePrefetch scenarios have max_depth > 3
  • LevelWise scenarios have max_depth <= 2

Test Plan

  • All 197 sync_sim tests pass
  • Tree structure verification tests added:
    • test_subtree_prefetch_has_deep_tree
    • test_levelwise_has_shallow_tree
    • test_deep_tree_entities_produce_depth
    • test_shallow_tree_entities_produce_low_depth
    • test_tree_structure_affects_handshake
  • Hierarchical insertion tests verify intermediate nodes aren't counted
  • Protocol negotiation compliance tests still pass

Spec Reference

  • Simulation Framework Spec §5: In-memory Storage
  • Simulation Framework Spec §7: State Digest and Hashing
  • Simulation Framework Spec §11: HashComparison Protocol

Note

Medium Risk
Touches core sync buffering behavior during snapshot sync and changes overflow semantics from rejection to eviction, which can affect consistency under load. Simulation/storage refactors are large but mostly test-scoped; main runtime changes add new logging and session-cancel paths.

Overview
Strengthens snapshot-sync delta buffering (Invariant I6) by switching DeltaBuffer to a FIFO VecDeque with configurable capacity (default 10,000), oldest-first eviction instead of erroring on overflow, and new drops/capacity metrics surfaced to callers.

Updates node sync-session handling to use the new buffer contract: adds rate-limited overflow warnings, logs drop totals when a session ends, supports explicit cancel_sync_session() on snapshot failures, and threads the new behavior through snapshot fallback in SyncManager.

Expands the sync simulation framework to better reflect production behavior by introducing SimStorage (real calimero-storage Merkle tree backed by InMemoryDB), updating SimNode to use a storage+metadata hybrid (including hierarchical insertion for realistic depth), adding explicit simulated delta-buffering events/scenarios for I6, and exposing Index::get_index()/get_children_of() plus accessors on EntityIndex for tree traversal in tests.

Written by Cursor Bugbot for commit 4db3188. This will update automatically on new commits. Configure here.

Add comprehensive simulation tests verifying delta buffering behavior
during state-based sync (Invariant I6: deltas arriving during sync
MUST be preserved and applied after sync completes).

Changes:
- Enhanced DeltaBuffer with FIFO eviction (oldest-first drop policy)
- Added drops counter and configurable capacity
- Integrated cancel_sync_session for proper error handling
- Added 7 simulation test scenarios covering:
  - Deltas buffered during active sync
  - Buffered deltas replayed on completion
  - Immediate application when idle
  - Buffer cleared on crash
  - Multiple deltas preserved in FIFO order
  - Convergence blocked while buffer non-empty
  - Complex snapshot sync with concurrent writes

All 176 sync_sim tests pass.
…tocol testing

Replaces the flat `DigestCache` HashMap in `SimNode` with `SimStorage`,
which uses the real `calimero-storage::Index<MainStorage>` implementation
backed by `InMemoryDB`. This enables accurate simulation of sync protocols
that depend on tree structure (e.g., HashComparison).

Key changes:
- Add `SimStorage` with in-memory Merkle tree using `Store + InMemoryDB`
- Add `RuntimeEnv` bridge to connect storage Key operations
- Update `SimNode` to use hybrid storage: real tree + metadata cache
- Add `insert_entity_hierarchical()` for creating proper tree depth
- Make `Index::get_index()` and `get_children_of()` public for traversal
- Add tree structure verification tests for protocol selection

The entity counting now correctly excludes intermediate tree nodes,
and `iter_entities()` returns only "real" entities (with metadata).

Tree depth now affects protocol selection:
- SubtreePrefetch scenarios have max_depth > 3
- LevelWise scenarios have max_depth <= 2

Spec reference: Simulation Framework Spec §5, §7, §11
@xilosada xilosada changed the base branch from master to test/delta-buffering-sim February 12, 2026 03:21
@xilosada xilosada changed the base branch from test/delta-buffering-sim to master February 12, 2026 03:22
Copy link

@meroreviewer meroreviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Reviewer

Reviewed by 3 agents | Quality score: 100% | Review time: 293.9s

🟡 1 warnings. See inline comments.


🤖 Generated by AI Code Reviewer | Review ID: review-f89aaaf0


impl std::error::Error for DeltaBufferFull {}

impl DeltaBuffer {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 VecDeque initial capacity inconsistent with default buffer capacity

The with_capacity(capacity.min(1000)) limits initial allocation to 1000 even though DEFAULT_BUFFER_CAPACITY is 10,000, causing unnecessary reallocations.

Suggested fix:

Either remove the `.min(1000)` or use a sensible fraction of capacity like `capacity.min(DEFAULT_BUFFER_CAPACITY / 10)`.

Copy link
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is ON. A Cloud Agent has been kicked off to fix the reported issues.

let storage_id = Self::entity_id_to_storage_id(id);
let storage_metadata = Self::entity_metadata_to_storage_metadata(&metadata);
self.storage
.add_entity_with_parent(storage_id, parent_id, &data, storage_metadata);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hierarchical insertion can create self-referencing tree cycle

Medium Severity

insert_entity_hierarchical can create a self-referencing cycle when the entity key's bytes after position depth are all zeros. In that case, the deepest intermediate node ID (constructed as key[0..depth] + [0; 32-depth]) equals the entity's own storage_id. The final add_entity_with_parent(storage_id, parent_id, ...) then adds the entity as a child of itself. This creates a cycle that causes infinite recursion (stack overflow) in compute_depth_recursive and count_entities_recursive. For instance, EntityId::from_u64(0) with any depth >= 1 triggers this.

Fix in Cursor Fix in Web

for op in operations {
n.apply_storage_op(op);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replayed buffered writes missing from work metrics

Low Severity

In the SyncComplete handler, replayed buffered operations call n.apply_storage_op(op) without recording self.metrics.work.record_write(). The immediate-apply path in the GossipDelta handler correctly calls record_write() for each operation. This inconsistency causes work metrics to under-count actual storage writes performed during buffered delta replay.

Additional Locations (1)

Fix in Cursor Fix in Web

@cursor
Copy link
Contributor

cursor bot commented Feb 12, 2026

Bugbot Autofix prepared fixes for 2 of the 2 bugs found in the latest run.

  • ✅ Fixed: Hierarchical insertion can create self-referencing tree cycle
    • Added check to skip intermediate nodes that would equal the entity's storage_id, preventing self-referencing cycles when key bytes after depth are all zeros.
  • ✅ Fixed: Replayed buffered writes missing from work metrics
    • Added record_write() call for each replayed operation in SyncComplete handler, making metrics consistent with the GossipDelta handler.

View PR

Or push these changes by commenting:

@cursor push 5b5f755833
Preview (5b5f755833)
diff --git a/crates/node/tests/sync_sim/node/state.rs b/crates/node/tests/sync_sim/node/state.rs
--- a/crates/node/tests/sync_sim/node/state.rs
+++ b/crates/node/tests/sync_sim/node/state.rs
@@ -499,6 +499,9 @@
         // Build parent chain based on key prefix
         let mut parent_id = self.storage.root_id();
 
+        // Compute storage_id early to check for self-referencing cycles
+        let storage_id = Self::entity_id_to_storage_id(id);
+
         for d in 1..=depth {
             // Create intermediate node ID from key prefix
             let mut intermediate_key = [0u8; 32];
@@ -506,6 +509,12 @@
 
             let intermediate_id = Id::new(intermediate_key);
 
+            // Skip if this intermediate ID equals the entity's storage ID
+            // This prevents self-referencing cycles when key[depth..] are all zeros
+            if intermediate_id == storage_id {
+                continue;
+            }
+
             // Create intermediate node if it doesn't exist
             if !self.storage.has_entity(intermediate_id) {
                 self.storage.add_entity_with_parent(
@@ -520,7 +529,6 @@
         }
 
         // Add the actual entity under the deepest intermediate node
-        let storage_id = Self::entity_id_to_storage_id(id);
         let storage_metadata = Self::entity_metadata_to_storage_metadata(&metadata);
         self.storage
             .add_entity_with_parent(storage_id, parent_id, &data, storage_metadata);

diff --git a/crates/node/tests/sync_sim/sim_runtime.rs b/crates/node/tests/sync_sim/sim_runtime.rs
--- a/crates/node/tests/sync_sim/sim_runtime.rs
+++ b/crates/node/tests/sync_sim/sim_runtime.rs
@@ -622,6 +622,7 @@
                         for (_delta_id, operations) in buffered_ops {
                             for op in operations {
                                 n.apply_storage_op(op);
+                                self.metrics.work.record_write();
                             }
                         }

@xilosada
Copy link
Member Author

Superseded by #1968 (rebased on master)

@xilosada xilosada closed this Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant