Skip to content

feat(sync): SyncProtocolExecutor trait for unified protocol testing#1972

Merged
xilosada merged 9 commits intomasterfrom
feat/sim-transport-abstraction
Feb 13, 2026
Merged

feat(sync): SyncProtocolExecutor trait for unified protocol testing#1972
xilosada merged 9 commits intomasterfrom
feat/sim-transport-abstraction

Conversation

@xilosada
Copy link
Member

@xilosada xilosada commented Feb 12, 2026

Summary

Adds a trait-based architecture for sync protocols enabling the same protocol code to run in production and simulation.

Phase 1 (previous commits)

  • Add SyncTransport trait abstracting network operations (send/recv/close)
  • Add StreamTransport for production Stream wrapper
  • Add SimStream for in-memory channel-based transport

Phase 2 (this PR)

  • Add SyncProtocolExecutor trait as common interface for all sync protocols
  • Add create_runtime_env shared helper to bridge calimero-storage with calimero-store
  • Extract HashComparisonProtocol to standalone module implementing the trait
  • Update simulation tests to call production protocol directly (not a reimplementation)
  • Fix wire protocol to use u64 for sequence_id (portability)

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    SyncProtocolExecutor trait                    │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐ │
│  │ HashComparison  │  │    Snapshot     │  │   BloomFilter   │ │
│  │    Protocol     │  │    Protocol     │  │    Protocol     │ │
│  └────────┬────────┘  └────────┬────────┘  └────────┬────────┘ │
│           └────────────────────┼────────────────────┘          │
│                    ┌───────────┴───────────┐                   │
│                    │   SyncTransport       │                   │
│                    │ (Stream or SimStream) │                   │
│                    └───────────────────────┘                   │
└─────────────────────────────────────────────────────────────────┘

Same code, different backends:

  • Production: StreamTransport + Store<RocksDB>
  • Simulation: SimStream + Store<InMemoryDB>

Changes

File Change
protocol_trait.rs New SyncProtocolExecutor trait
storage_bridge.rs Shared create_runtime_env() helper
hash_comparison_protocol.rs Standalone HashComparison implementation
hash_comparison.rs Cleaned to responder-only (~875 → ~285 lines)
wire.rs sequence_id changed from usize to u64
protocol.rs (test) Uses production protocol directly

Test plan

  • All 241 sync simulation tests pass
  • All 293 node-primitives tests pass
  • cargo fmt clean
  • cargo clippy no errors in changed modules

Note

High Risk
Includes a breaking on-the-wire format change (sequence_id type), requiring coordinated upgrades; also refactors core sync protocol execution paths and storage/env bridging, which can affect correctness and interoperability.

Overview
Introduces a trait-based sync protocol interface: adds SyncTransport (with shared EncryptionState), SyncProtocolExecutor, and a shared create_runtime_env() bridge so protocols can run against either production streams/RocksDB or simulation channels/InMemoryDB.

Extracts HashComparison into a standalone HashComparisonProtocol implementing the trait, updates SyncManager to execute it via StreamTransport, and trims hash_comparison.rs to responder-only stream handling. Simulation is updated to run the production HashComparison protocol end-to-end via new SimStream, plus new/expanded multi-node and gossip buffering tests.

Makes a breaking wire change by switching StreamMessage::Message.sequence_id (and related tracking/state code) from usize to u64 for cross-platform portability.

Written by Cursor Bugbot for commit a7eacd0. This will update automatically on new commits. Configure here.

Add infrastructure to run sync protocols through in-memory channels,
enabling testing of actual message flow and state convergence.

Key changes:
- Add SyncTransport trait abstracting network operations (send/recv/close)
- Add StreamTransport for production Stream wrapper
- Add SimStream for in-memory channel-based transport
- Add protocol.rs with execute_hash_comparison_sync for simulation
- Add SimNode::new_in_context for shared context testing
- Fix entity_count to use storage leaf_count (source of truth)

The simulation now uses the exact same storage code path as production
(Index<MainStorage>, Interface<MainStorage>, RuntimeEnv callbacks),
with only the Database implementation differing (InMemoryDB vs RocksDB).

Phase 1 of sim-transport-abstraction plan.
Copy link

@meroreviewer meroreviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Reviewer

Reviewed by 3 agents | Quality score: 96% | Review time: 262.5s

🔴 2 critical, 🟡 2 warnings, 💡 2 suggestions, 📝 2 nitpicks. See inline comments.


🤖 Generated by AI Code Reviewer | Review ID: review-fe149d58

/// # Errors
///
/// Returns error if timeout expires or receive fails.
async fn recv_timeout(&mut self, timeout: Duration) -> Result<Option<StreamMessage<'static>>>;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Nonce reuse vulnerability in SyncTransport design

The set_encryption method stores a single nonce that is reused for all subsequent encrypt/decrypt calls; reusing a nonce with the same key completely breaks AEAD encryption (e.g., AES-GCM) security.

Suggested fix:

Either increment/rotate the nonce after each encryption operation, or require callers to pass a fresh nonce per message (matching the existing `send`/`recv` function signatures).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged, out of scope: This is pre-existing infrastructure for future encrypted sync. The HashComparison protocol currently doesn't use encryption (nonce is just for message framing). The nonce reuse concern will be addressed when encryption is actually integrated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged - Deferred: Encryption is not currently used in the sync protocol (infrastructure for future encrypted sync). When encryption is enabled, nonce rotation will be implemented. The wire protocol already includes next_nonce fields to support proper rotation at the application layer.


#[async_trait]
impl SyncTransport for StreamTransport {
async fn send(&mut self, message: &StreamMessage<'_>) -> eyre::Result<()> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 StreamTransport reuses encryption nonce across messages

The send implementation encrypts every message using the same stored nonce from self.encryption, which leads to nonce reuse when multiple messages are sent—catastrophic for AEAD ciphers.

Suggested fix:

Implement nonce rotation: either increment the nonce after each encryption or derive it from a message counter/sequence ID.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged, out of scope: Same as above - this is pre-existing encryption infrastructure. The HashComparison protocol currently doesn't use encryption. The nonce handling will be properly implemented when encrypted sync is integrated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged - Deferred: Same as above - encryption is not currently used. The EncryptionState is infrastructure for future encrypted sync. Nonce rotation will be addressed when encryption is enabled.

id,
data: data.to_vec(),
ancestors: vec![],
metadata: Metadata::default(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Silent error suppression in update_entity_data may violate I5

The let _ = Interface::<MainStorage>::apply_action(action); calls discard errors, which could lead to silent data loss violating invariant I5 (no silent data loss).

Suggested fix:

Propagate errors or at minimum log them with a warning; consider returning Result from update_entity_data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged, low priority: This is test helper code. The let _ = pattern is intentional to allow tests to pass even if entity doesn't exist (for testing sync to empty nodes). Production code uses proper error propagation via Interface::apply_action.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged - Test code: This is simulation test infrastructure, not production code. Using .expect() would be appropriate here, but silent failures in test setup are acceptable since test assertions will catch any issues. The production code path uses proper error handling via Interface::apply_action.


// Run both sides
let (init_result, resp_result) = tokio::join!(initiator_fut, responder_fut);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 All entities buffered in memory before applying

entities_to_merge collects all leaf entities before applying them, which could cause memory pressure on large syncs with many entities.

Suggested fix:

Apply entities incrementally as they're received rather than buffering all in memory, or use a bounded buffer with periodic flushing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superseded: This code was removed. The simulation now calls the production HashComparisonProtocol::run_initiator which applies entities incrementally via Interface::apply_action.

is_root: bool,
) -> Option<TreeNode> {
let id = if is_root {
storage.root_id()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Nit: Comment says BFS but implementation is DFS

Using queue.pop() gives LIFO (stack/DFS) behavior, not FIFO (queue/BFS) as the comment states; doesn't affect correctness but misleads readers.

Suggested fix:

Either change comment to 'DFS' or use `VecDeque` with `pop_front()` for actual BFS.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superseded: The old simulation run_initiator function was removed. The simulation now calls the production HashComparisonProtocol::run_initiator directly.

// 1. All zeros (empty root marker)
// 2. The root ID (direct request)
// 3. The root HASH (most common case from handshake)
let root_hash = storage.root_hash();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 CRDT merge comment doesn't match implementation

Comment claims 'last-write-wins based on timestamp' but the code unconditionally writes without any timestamp comparison, making the match arm for Some(_) misleading.

Suggested fix:

Either implement actual timestamp comparison or update the comment to accurately describe the always-write behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superseded: The old apply_leaf_to_storage was removed. The simulation now uses the production apply_leaf_with_crdt_merge which correctly implements CRDT merge via Interface::apply_action.

use futures_util::SinkExt as _;
self.stream.close().await?;
Ok(())
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Nit: Redundant import of SinkExt

SinkExt is already imported at the top of the file (line 15); the inner use futures_util::SinkExt as _; is redundant.

Suggested fix:

Remove the redundant import inside the `close` method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: The redundant SinkExt import inside the close method was removed in commit 6a7d56a. The top-level import is sufficient.

None
};

Some(TreeNode {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Nit: Unnecessary read of existing data

storage.get_entity_data(id) is called but the result is always ignored since should_write is always true.

Suggested fix:

Remove the dead read or add a TODO comment if merge logic is planned for later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superseded: The old simulation protocol code was removed. The simulation now uses the production HashComparisonProtocol directly, which handles entity comparison correctly.

@cursor

This comment has been minimized.

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
@xilosada xilosada changed the title feat(sync-sim): add transport abstraction for protocol testing feat(sync): SyncProtocolExecutor trait for unified protocol testing Feb 12, 2026
Copy link

@meroreviewer meroreviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Reviewer

Reviewed by 3 agents | Quality score: 75% | Review time: 357.1s

🟡 3 warnings, 💡 4 suggestions, 📝 1 nitpicks. See inline comments.


🤖 Generated by AI Code Reviewer | Review ID: review-c0c3ad44


/// Set encryption parameters.
pub fn set(&mut self, encryption: Option<(SharedKey, Nonce)>) {
self.key_nonce = encryption;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Nonce reuse vulnerability in encryption helper

EncryptionState uses the same nonce for all encrypt operations without incrementing; AES-GCM nonce reuse with the same key completely breaks confidentiality and authenticity guarantees.

Suggested fix:

Add nonce increment after each encrypt operation, or document that callers must update the nonce between messages via set_encryption().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged, out of scope: Same as above - the EncryptionState is pre-existing infrastructure for encrypted sync. Proper nonce incrementing will be implemented when encryption is actually used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged - Deferred: Same as comment 2799530476 - encryption infrastructure for future use. Nonce rotation will be implemented when encryption is enabled.

///
/// This updates an existing entity's data or creates it if it doesn't exist.
/// For new entities, creates them as direct children of root.
pub fn update_entity_data(&self, id: Id, data: &[u8]) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Silent error handling in update_entity_data may violate I5

Multiple let _ = Interface::<MainStorage>::apply_action(action) calls silently discard the Result, so storage write failures go undetected—this could violate invariant I5 (no silent data loss).

Suggested fix:

Propagate or log the Result from apply_action, e.g., `Interface::<MainStorage>::apply_action(action)?;` or at minimum log on error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged, low priority: Same as above - this is test helper code with intentional permissive error handling. Not a concern for production code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged - Test code: Same as comment 2799531036 - this is test infrastructure. Silent failures are acceptable here since test assertions will catch any issues.

/// Encryption state.
encryption: EncryptionState,
/// Default timeout for receive operations.
recv_timeout: Duration,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 YAGNI: buffer field declared but never populated

The buffer: VecDeque<Vec<u8>> is initialized and checked in recv_raw_timeout, but nothing ever pushes to it, making the peek/reorder testing capability mentioned in the comment unreachable.

Suggested fix:

Remove the buffer field and related checks until the reorder testing feature is actually needed, or implement the buffering functionality now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged: The buffer field is infrastructure for potential future use (e.g., message buffering during encryption key rotation). Removing it is low priority since it has no runtime cost when empty.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged - Test infrastructure: The buffer is scaffolding for future fault injection testing (message reordering, delay simulation). Leaving it in place as it's harmless and will be useful for testing I6 (delta buffering) edge cases.

@@ -302,14 +327,32 @@ impl SimNode {
/// - Intermediate nodes created by `insert_entity_hierarchical` don't have metadata
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Semantic change to entity_count may affect existing tests

Switching from entity_metadata.len() to storage.leaf_count() changes what entity_count means; ensure all 241 tests pass with this new semantics and document the change.

Suggested fix:

Add a doc comment explaining that entity_count now returns storage leaf count (source of truth) rather than metadata cache size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged: This is an intentional change. The new entity_count() uses the proper Merkle tree iterator, which correctly counts entities in the tree structure. The previous implementation counted raw storage entries which could be inconsistent with the tree. All tests have been updated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: Doc comment updated in commit 78cd972 to accurately reflect that entity_count() uses storage.leaf_count() as the source of truth. All 247 tests pass.

identity,
config,
)
.await
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Hardcoded zero nonce in protocol messages

Using next_nonce: [0; NONCE_LEN] for all messages is acceptable for unencrypted simulation but could mask nonce-handling bugs if encryption is later enabled.

Suggested fix:

Consider using unique/incrementing nonces even in simulation to catch nonce-handling issues early, or add a comment clarifying this is intentionally test-only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superseded: The old simulation protocol code was removed. The simulation now uses the production HashComparisonProtocol directly, which uses proper nonce generation via generate_nonce().

initiator_context,
identity,
config,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Entities collected in memory before applying

All transferred entities accumulate in entities_to_merge (O(n) memory) before being applied after the loop; for very large syncs this could cause high memory usage.

Suggested fix:

Consider applying entities incrementally in batches rather than collecting all first, or document this as acceptable for simulation scale.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superseded: The old simulation protocol code was removed. The simulation now uses the production HashComparisonProtocol directly, which applies entities incrementally via Interface::apply_action.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superseded: The old simulation run_initiator with entities_to_merge buffer was removed. The simulation now calls production HashComparisonProtocol::run_initiator which applies entities incrementally via Interface::apply_action.


/// Set encryption parameters.
pub fn set(&mut self, encryption: Option<(SharedKey, Nonce)>) {
self.key_nonce = encryption;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Nit: get() clones key and nonce on every call

Returning self.key_nonce.clone() allocates; consider returning a reference or Option<&(SharedKey, Nonce)> if callers don't need ownership.

Suggested fix:

Change to `pub fn get(&self) -> Option<&(SharedKey, Nonce)>` if API allows.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged, acceptable: The clone is intentional for API safety - returning an owned value prevents callers from holding references into the internal state. For small fixed-size arrays (key+nonce), the clone cost is negligible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged - Minor optimization: Valid point, but the clone is cheap (32-byte key + 12-byte nonce) and only called during encryption setup, not per-message. Can optimize if profiling shows it matters.

Introduces a trait-based architecture for sync protocols that enables:
- Same protocol code to run in production and simulation
- Shared storage bridge (create_runtime_env) for both backends
- Standalone HashComparisonProtocol implementation

Changes:
- Add SyncProtocolExecutor trait in node-primitives
- Add create_runtime_env shared helper in storage_bridge.rs
- Extract HashComparisonProtocol to standalone module
- Clean up hash_comparison.rs (responder only, ~875 → ~285 lines)
- Fix wire protocol to use u64 for sequence_id (portability)
- Update simulation tests to use production protocol directly

This removes ~730 lines of duplicated code and ensures the simulation
tests exercise the exact same protocol logic as production.
@xilosada xilosada force-pushed the feat/sim-transport-abstraction branch from 0bebce1 to 6a7d56a Compare February 12, 2026 17:51
@xilosada
Copy link
Member Author

PR Comment Review

Several comments have been addressed or superseded by the latest changes:

Superseded (old simulation protocol code removed)

The following comments were on the old reimplemented protocol code in protocol.rs which has been completely replaced with calls to the production HashComparisonProtocol:

  • "All entities buffered in memory" - The production protocol applies entities incrementally via Interface::apply_action
  • "BFS/DFS comment mismatch" - Old run_initiator/run_responder removed
  • "CRDT merge comment doesn't match" - Old apply_leaf_to_storage removed, now uses production apply_leaf_with_crdt_merge
  • "Unnecessary read of existing data" - Old protocol code removed
  • "Tree traversal DFS/BFS" - Old build_tree_node_response removed

Fixed

  • "Redundant SinkExt import" - Fixed in 6a7d56a

Pre-existing / Out of Scope

  • Nonce reuse in encryption (comments 1-2) - The EncryptionState is infrastructure for future encrypted sync. The HashComparison protocol currently doesn't use encryption (nonce is just for message framing). This should be addressed when encryption is actually integrated.
  • Silent error in update_entity_data - This is test helper code in storage.rs. The production code uses proper error propagation.

Copy link

@meroreviewer meroreviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Reviewer

Reviewed by 3 agents | Quality score: 90% | Review time: 313.9s

🟡 3 warnings, 💡 4 suggestions. See inline comments.


🤖 Generated by AI Code Reviewer | Review ID: review-960edfa9

///
/// # Errors
///
/// Returns error if encryption fails.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 EncryptionState does not enforce nonce rotation between messages

The encrypt/decrypt methods reuse the same nonce stored in key_nonce until explicitly updated via set(). For stream ciphers like AES-GCM, nonce reuse with the same key completely breaks confidentiality and authenticity. While the wire protocol includes next_nonce fields suggesting proper rotation is expected at a higher level, this helper provides no safeguard against misuse.

Suggested fix:

Consider either: (1) incrementing/consuming the nonce after each encrypt call, (2) accepting nonce as a parameter to encrypt/decrypt rather than storing it, or (3) adding documentation clearly stating the caller MUST update the nonce between operations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged - Deferred: Same as other encryption comments - infrastructure for future encrypted sync. The wire protocol's next_nonce fields support application-level rotation when encryption is enabled.

Message {
/// Sequence number for ordering.
sequence_id: usize,
sequence_id: u64,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Breaking wire format change from usize to u64

Changing sequence_id from usize to u64 is a wire protocol breaking change; nodes with mismatched versions may fail to deserialize messages during rolling deployments.

Suggested fix:

Document this as a breaking change and consider protocol versioning or ensure all nodes upgrade atomically.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0d73005: Added doc comment documenting this as a breaking wire format change that requires coordinated upgrades.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented: Added explicit doc comment to sequence_id field in commit 0d73005 explaining this is a breaking wire format change requiring coordinated node upgrades.

/// * `transport` - Transport for sending/receiving messages
/// * `nonce` - Initial nonce (unused; each response generates its own nonce)
pub async fn handle_tree_node_request<T: SyncTransport>(
&self,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Responder loop has no upper bound on total requests

The handle_tree_node_request loop processes requests indefinitely until the stream closes. While each response is bounded by MAX_NODES_PER_RESPONSE, a malicious peer could send unlimited requests to exhaust CPU/IO resources.

Suggested fix:

Consider adding a maximum request count or a deadline after which the responder terminates, similar to how the removed initiator code had `MAX_PENDING_NODES` for DoS protection.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged - Deferred: Valid DoS concern. The responder is bounded per-response by MAX_NODES_PER_RESPONSE and the stream will timeout/close naturally. Adding request limits is future hardening work.

mod tests {
use super::*;

#[test]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Unit tests only cover passthrough mode, not actual encryption

The test test_encryption_state_passthrough only verifies behavior when no encryption is configured. There are no tests verifying that encryption/decryption actually work correctly or that nonce misuse is detectable.

Suggested fix:

Add tests that configure encryption with a real key/nonce and verify encrypt followed by decrypt returns the original data, and that different nonces produce different ciphertexts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged - Future work: Valid point. Encryption unit tests will be added when encryption is actually integrated into the sync protocol. Currently the infrastructure is in place but unused.

Copy link

@meroreviewer meroreviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Reviewer

Reviewed by 3 agents | Quality score: 96% | Review time: 303.4s

🔴 1 critical, 🟡 3 warnings, 💡 4 suggestions. See inline comments.


🤖 Generated by AI Code Reviewer | Review ID: review-cf09d9ba

/// Encrypt data if encryption is configured.
///
/// # Errors
///
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Nonce reuse vulnerability in EncryptionState

The encrypt/decrypt methods reuse the same nonce for every call until set_encryption() is called again; reusing a nonce with the same key in AES-GCM completely breaks encryption security, allowing plaintext recovery.

Suggested fix:

Either increment/rotate the nonce after each encrypt/decrypt operation within EncryptionState, or clearly document that callers MUST call set_encryption with a fresh nonce before each operation and add debug assertions to detect reuse.

let write: Rc<dyn Fn(Key, &[u8]) -> bool> = {
let handle_cell: Rc<RefCell<_>> = Rc::new(RefCell::new(store.handle()));
let ctx_id = context_id;
Rc::new(move |key: Key, value: &[u8]| {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Silent write failures may cause data loss

The write callback returns false on error without logging, unlike the read callback which logs errors; this inconsistency makes debugging storage failures difficult and could mask data corruption.

Suggested fix:

Add warn! logging for write failures similar to the read callback's error handling.

.put(&state_key, &state_value)
.is_ok()
})
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Silent remove failures may cause data inconsistency

The remove callback returns false on error without logging, which could mask failed deletions and lead to stale data persisting unnoticed.

Suggested fix:

Add warn! logging for remove failures to match read callback's error handling pattern.

},
next_nonce: super::helpers::generate_nonce(),
};
transport.send(&msg).await?;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Unnecessary async function with no await points

The build_tree_node_response method is marked async but contains no .await calls, making the async unnecessary.

Suggested fix:

Remove the `async` keyword and return `Result<TreeNodeResponse>` directly, or wrap the result in `async {}` block at call sites if needed.

"Handling subsequent TreeNodeRequest"
);

let clamped_depth = max_depth.map(|d| d.min(MAX_REQUEST_DEPTH));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 N database lookups for N children in build_tree_node_response

Each child node requires a separate with_runtime_env + DB lookup call; for nodes with many children this creates N sequential database operations that could potentially be batched.

Suggested fix:

Consider adding a batch lookup API to retrieve multiple child nodes in a single database call.

/// This is the responder side of HashComparison sync.
/// Handles the first request (already parsed) and then loops to handle
/// subsequent requests until the stream closes.
///
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 YAGNI violation: unused parameter reserved for future use

The _nonce parameter is documented as 'Reserved for future encrypted sync' but is unused; this adds unnecessary API surface.

Suggested fix:

Remove the `_nonce` parameter and add it when actually needed for encryption.

use calimero_storage::index::Index;
use calimero_storage::store::MainStorage;
use calimero_store::db::InMemoryDB;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Test only verifies no panic, not actual functionality

The test test_create_runtime_env_with_inmemory only checks that calling Index::get_index doesn't panic; it would be more valuable to verify that write/read round-trips work correctly.

Suggested fix:

Add a test that writes via the RuntimeEnv callbacks and verifies the data can be read back.

Copy link

@meroreviewer meroreviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Reviewer

Reviewed by 3 agents | Quality score: 85% | Review time: 264.1s

💡 7 suggestions, 📝 2 nitpicks. See inline comments.


🤖 Generated by AI Code Reviewer | Review ID: review-4b2b56e3

match handle.get(&state_key) {
Ok(Some(state)) => Some(state.value.into_boxed().into_vec()),
Ok(None) => None,
Err(e) => {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Storage read errors are silently masked

When storage read fails, the error is logged but None is returned, potentially masking storage corruption or integrity issues from higher-level code.

Suggested fix:

Consider propagating the error or providing a mechanism for callers to detect masked failures (e.g., a metric counter).

/// `calimero-store` ContextStateKey-based storage.
#[expect(
clippy::type_complexity,
reason = "Matches RuntimeEnv callback signatures"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Three separate store handles created for read/write/remove

Each callback creates its own store.handle(), meaning read, write, and remove use different handles; if handles maintain transaction-level caches, read-after-write within one with_runtime_env block may not see uncommitted writes.

Suggested fix:

Consider sharing a single `RefCell<Handle>` across all three callbacks to ensure consistent view within a session.

use super::*;
use calimero_storage::env::with_runtime_env;
use calimero_storage::index::Index;
use calimero_storage::store::MainStorage;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Test only verifies non-panic, not callback behavior

The test test_create_runtime_env_with_inmemory only checks that create_runtime_env doesn't panic; it doesn't verify that write/read callbacks actually persist and retrieve data correctly.

Suggested fix:

Add a test that writes data via the callbacks and reads it back to verify round-trip correctness.

///
/// Returns error if encryption fails.
pub fn encrypt(&self, data: Vec<u8>) -> Result<Vec<u8>> {
match &self.key_nonce {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Generic error messages lack context

The encrypt and decrypt methods return only 'encryption failed' / 'decryption failed' without context about what operation or data size was involved.

Suggested fix:

Consider including the data length or a hint in the error message, e.g., `eyre::eyre!("encryption failed for {} bytes", data.len())`.

/// This enables the same protocol code to run in both production and simulation.
///
/// Note: Uses `?Send` because `RuntimeEnv` (used for storage access) contains `Rc`
/// which is not `Send`. Callers must not spawn these futures across threads.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Nit: Associated type bounds may be overly restrictive

Both Config: Send and Stats: Send + Default require Send, but the trait itself is ?Send - this asymmetry may cause confusion or limit flexibility for non-Send configs.

Suggested fix:

Consider whether `Send` bounds are truly necessary given the `?Send` trait bound, or document why they're required despite the trait being `?Send`.

let response = self.recv(stream, None).await?;
// Expect Init messages with TreeNodeRequest
let StreamMessage::Init { payload, .. } = request else {
debug!(%context_id, "Received non-Init message, ending responder");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Nit: Async function without await

The build_tree_node_response method is marked async but contains no .await points; all operations (with_runtime_env, get_local_tree_node_from_index) are synchronous.

Suggested fix:

Remove `async` from the function signature and return `Result<TreeNodeResponse>` directly, unless async is required for trait consistency.

/// This is the responder side of HashComparison sync.
/// Handles the first request (already parsed) and then loops to handle
/// subsequent requests until the stream closes.
///
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 YAGNI: Unused nonce parameter reserved for future use

The _nonce parameter is explicitly unused and documented as 'Reserved for future encrypted sync', which violates YAGNI - add it when actually needed.

Suggested fix:

Remove the `_nonce` parameter and add it back when encryption support is implemented; update callers accordingly.

/// through the normal storage path which handles CRDT semantics.
/// This is the responder side of HashComparison sync.
/// Handles the first request (already parsed) and then loops to handle
/// subsequent requests until the stream closes.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Unused nonce parameter suggests incomplete encryption integration

The _nonce parameter is documented as 'reserved for future encrypted sync' but the transport abstraction already supports encryption; this could lead to confusion about whether messages are actually encrypted.

Suggested fix:

Either use the nonce to configure transport encryption at the start of the session, or remove the parameter and document that encryption is handled at a different layer.

/// Get current encryption parameters.
#[must_use]
pub fn get(&self) -> Option<(SharedKey, Nonce)> {
self.key_nonce.clone()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Key material cloning in get() method

The get() method clones cryptographic key material (SharedKey), which could leave copies in memory that aren't zeroed on drop.

Suggested fix:

Consider returning a reference instead of cloning, or ensure SharedKey implements zeroize-on-drop semantics.

Copy link

@meroreviewer meroreviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Reviewer

Reviewed by 3 agents | Quality score: 100% | Review time: 366.0s

🟡 3 warnings, 💡 3 suggestions. See inline comments.


🤖 Generated by AI Code Reviewer | Review ID: review-5172edc8

/// # Errors
///
/// Returns error if encryption fails.
pub fn encrypt(&self, data: Vec<u8>) -> Result<Vec<u8>> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Encryption uses same nonce for all messages in session

The encrypt() method reuses the same nonce stored in EncryptionState for every call; reusing a nonce with the same key produces deterministic ciphertext and can leak plaintext relationships.

Suggested fix:

Increment or regenerate the nonce after each encryption operation, or document that callers must call `set_encryption()` with a fresh nonce before each message.

/// initialized nodes. This is enforced by storing the merged values
/// through the normal storage path which handles CRDT semantics.
/// This is the responder side of HashComparison sync.
/// Handles the first request (already parsed) and then loops to handle
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Arbitrary identity selection for RuntimeEnv

Using random selection (choose_stream) to pick an owned identity is fragile; if any owned identity works for storage access, deterministically pick the first one to avoid non-deterministic behavior.

Suggested fix:

Replace `choose_stream(identities, &mut rand::thread_rng())` with taking the first identity from the stream, or document why random selection is intentional.

.await
.transpose()?
{
Some((identity, _)) => identity,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Error condition conflates 'no identity' with 'node not found'

When no owned identity exists, the responder sends not_found: true, which the initiator may misinterpret as the requested tree node being absent rather than an authorization failure.

Suggested fix:

Consider returning a distinct error payload (e.g., `SnapshotError::Unauthorized`) so the initiator can distinguish between missing data and missing permissions.

}
})
};

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Silent failure on storage write operations

The write callback returns is_ok() which silently discards the specific error; while not a direct vulnerability, silent write failures during sync could lead to data inconsistency or incomplete state that violates invariant I5.

Suggested fix:

Consider logging write failures or propagating errors rather than returning a boolean.

/// Statistics about the sync session, or error if sync failed.
pub(crate) async fn hash_comparison_sync(
/// * `first_node_id` - Node ID from the first request (already parsed)
/// * `first_max_depth` - Max depth from the first request
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Unused nonce parameter bypasses expected encryption setup

The _nonce parameter is documented as 'reserved for future encrypted sync' but is never used, meaning the transport's encryption state is never configured from this value; if callers expect encryption to be applied based on this nonce, it won't be.

Suggested fix:

Either remove the parameter if encryption is handled elsewhere, or implement `transport.set_encryption()` using the provided nonce if encryption is expected.

//! └─────────────────────────────────────────────────────────────────┘
//! ```
//!
//! # Example
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Doc example references non-existent type

The example uses HashComparisonProtocol but no struct implementing SyncProtocolExecutor is introduced in this PR, making the example misleading.

Suggested fix:

Either add a placeholder implementation or update the example to show the trait definition pattern without referencing unimplemented types.

Copy link
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is ON. A Cloud Agent has been kicked off to fix the reported issues.

full_hash,
children_ids,
)))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated get_local_tree_node function across two files

Medium Severity

The new standalone get_local_tree_node in hash_comparison_protocol.rs is a near-exact copy of SyncManager::get_local_tree_node_from_index in hash_comparison.rs. Both perform the same Index lookup, hash retrieval, children extraction, and leaf/internal node classification. The only difference is that the latter takes &self (which it doesn't use). A bug fix applied to one copy risks being missed in the other, leading to divergent behavior between production and simulation paths.

Additional Locations (1)

Fix in Cursor Fix in Web


info!(%context_id, requests_handled, "HashComparison responder complete");
Ok(())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Production and simulation responders use different code paths

Medium Severity

The PR's goal is "same code, different backends," but production uses SyncManager::handle_tree_node_request while simulation calls HashComparisonProtocol::run_responder — two independent implementations. They differ in how is_root_request is determined: production queries context.root_hash from context_client per request, while the standalone responder caches Index::get_hashes_for once at startup. Simulation tests therefore don't exercise the production responder code, undermining the unified-testing architecture.

Additional Locations (1)

Fix in Cursor Fix in Web

const MAX_PENDING_NODES: usize = 10_000;

/// Maximum depth allowed in TreeNodeRequest.
pub const MAX_REQUEST_DEPTH: u8 = 16;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated MAX_REQUEST_DEPTH constant in two files

Low Severity

MAX_REQUEST_DEPTH is defined as pub const in both hash_comparison.rs and hash_comparison_protocol.rs. If the DoS protection threshold is changed in one file but not the other, the production responder and simulation responder would enforce different depth limits, causing subtle divergence.

Additional Locations (1)

Fix in Cursor Fix in Web

@xilosada xilosada merged commit 8572e37 into master Feb 13, 2026
22 checks passed
@xilosada xilosada deleted the feat/sim-transport-abstraction branch February 13, 2026 09:52
@cursor
Copy link
Contributor

cursor bot commented Feb 13, 2026

Bugbot Autofix prepared fixes for 3 of the 3 bugs found in the latest run.

  • ✅ Fixed: Duplicated get_local_tree_node function across two files
    • Made get_local_tree_node pub(crate) in hash_comparison_protocol.rs and removed the duplicate get_local_tree_node_from_index method from hash_comparison.rs, having it import and use the shared function instead.
  • ✅ Fixed: Production and simulation responders use different code paths
    • Changed the production responder's is_root_request determination to use Index-based hash lookup (Index::get_hashes_for) instead of context.root_hash, aligning with the standalone protocol and ensuring consistent behavior.
  • ✅ Fixed: Duplicated MAX_REQUEST_DEPTH constant in two files
    • Removed the duplicate MAX_REQUEST_DEPTH constant from hash_comparison.rs and now import it from hash_comparison_protocol.rs to have a single source of truth.

Create PR

Or push these changes by commenting:

@cursor push 037d81233b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant