Skip to content

Subtree prefetch thresholds#1934

Draft
cursor[bot] wants to merge 3 commits intomasterfrom
cursor/subtree-prefetch-thresholds-9ada
Draft

Subtree prefetch thresholds#1934
cursor[bot] wants to merge 3 commits intomasterfrom
cursor/subtree-prefetch-thresholds-9ada

Conversation

@cursor
Copy link
Contributor

@cursor cursor bot commented Feb 11, 2026

Node Primitives Sync Protocol: Refactor select_protocol thresholds

Description

This PR refactors the select_protocol function in protocol.rs to use the canonical DEEP_TREE_THRESHOLD and MAX_DIVERGENCE_RATIO constants from subtree.rs instead of hardcoded magic numbers.

This fixes bug aae0e2bf-d69e-4a39-8987-278f96e4d8f3, where select_protocol duplicated threshold values, creating two sources of truth. The motivation is to ensure consistency and maintainability, preventing silent divergence if these thresholds are updated in the future.

Test plan

The existing unit tests for the primitives crate were run using cargo test -p primitives and all passed, verifying that the change did not alter the expected behavior of select_protocol. No new test cases were added as this is a refactor of existing logic.

Documentation update

No public or internal documentation updates are required as this is an internal code quality improvement and refactor that does not change external behavior or APIs.


xilosada and others added 3 commits February 11, 2026 12:34
Add SubtreePrefetch sync protocol types for deep trees with clustered changes.

This protocol is optimized for scenarios where:
- Tree depth > 3 levels
- Divergence < 20%
- Changes are clustered in subtrees

Trade-off: O(1) round trips per subtree vs HashComparison's O(depth),
but may over-fetch data compared to HashComparison's minimal transfer.

Types added:
- SubtreePrefetchRequest: Request subtrees by root hash with depth limit
- SubtreePrefetchResponse: Contains fetched subtrees and not-found roots
- SubtreeData: Single subtree with entities for CRDT merge
- should_use_subtree_prefetch(): Heuristic for protocol selection

Security:
- MAX_SUBTREE_DEPTH (64): Prevents deep traversal attacks
- MAX_SUBTREES_PER_REQUEST (100): Limits request size
- MAX_ENTITIES_PER_SUBTREE (10,000): Bounds per-subtree data
- MAX_TOTAL_ENTITIES (100,000): Caps total response size
- is_valid() methods on all types for post-deserialization validation
- Saturating arithmetic in total_entity_count() to prevent overflow

Includes 39 unit tests covering edge cases and exploit prevention.
Fixes issues raised in PR review:

1. **depth() now always returns bounded value (High Severity)**
   - Changed return type from `Option<usize>` to `usize`
   - Returns `MAX_SUBTREE_DEPTH` when `max_depth` is `None`
   - Consumers always get a safe, bounded depth value

2. **is_valid() now validates max_depth (Medium Severity)**
   - Added check: `max_depth <= MAX_SUBTREE_DEPTH` when `Some`
   - Catches invalid values from untrusted deserialization

3. **max_depth field is now private (Medium Severity)**
   - Matches encapsulation pattern from hash_comparison module
   - Added `is_unlimited()` accessor to check if unlimited was requested
   - Prevents bypassing `depth()` accessor

4. **Extracted heuristic magic numbers to constants (Nitpick)**
   - `DEEP_TREE_THRESHOLD = 3`
   - `MAX_DIVERGENCE_RATIO = 0.20`
   - `MAX_CLUSTERED_SUBTREES = 5`

Tests updated:
- Added test_subtree_request_max_depth_validation
- Added test_heuristic_constants_are_sensible
- Updated existing tests to use new API (41 tests total)
Replace hardcoded magic numbers (3 and 0.2) in select_protocol() with
DEEP_TREE_THRESHOLD and MAX_DIVERGENCE_RATIO constants from subtree.rs.

This ensures protocol selection logic stays in sync with the canonical
threshold definitions and avoids having two sources of truth for the
same values.
@cursor
Copy link
Contributor Author

cursor bot commented Feb 11, 2026

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

Base automatically changed from feat/sync-009-subtree-prefetch to master February 11, 2026 12:46
@github-actions
Copy link

Your PR title does not adhere to the Conventional Commits convention:

<type>(<scope>): <subject>

Common errors to avoid:

  1. The title must be in lower case.
  2. Allowed type values are: build, ci, docs, feat, fix, perf, refactor, test.

@github-actions
Copy link

This pull request has been automatically marked as stale. If this pull request is still relevant, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize reviewing it yet. Your contribution is very much appreciated.

@github-actions github-actions bot added the Stale label Feb 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants