Skip to content

fix: ensure GET with subscribe=true creates network subscriptions for cached contracts #1739

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions CLAUDE.local.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Claude Code Development Guidelines

This document contains important guidelines for Claude Code when working on the Freenet project.

## TODO-MUST-FIX Pattern

To prevent committing broken or incomplete code, we use a special comment pattern that blocks commits:

### Usage

When you need to temporarily disable a test or leave code in an incomplete state, use the `TODO-MUST-FIX` comment:

```rust
// TODO-MUST-FIX: This test hangs during node startup and needs investigation
// The test attempts to start a gateway and peer but appears to deadlock
// during the node initialization phase. This needs to be fixed before release.
#[tokio::test]
#[ignore = "Test hangs - see TODO-MUST-FIX above"]
async fn test_gateway_reconnection() -> TestResult {
// ...
}
```

### How it Works

The git pre-commit hook (`.git/hooks/pre-commit`) will:
1. Scan all staged files for `TODO-MUST-FIX` comments
2. If found, block the commit and display the files and line numbers
3. Force you to resolve the issue before committing

### When to Use

Use `TODO-MUST-FIX` when:
- Disabling a failing test that needs investigation
- Leaving a critical bug unfixed temporarily
- Implementing a temporary workaround that must be properly fixed
- Any code that MUST be addressed before the next release

### Benefits

- Prevents forgetting about disabled tests
- Ensures critical issues aren't accidentally merged
- Creates a clear signal for what needs immediate attention
- Maintains code quality by preventing the accumulation of technical debt

## Other Guidelines

### Dead Code
- Don't just ignore unused variables by prepending `_`, understand why they are unused
- Either use or remove dead code so it doesn't cause confusion
- Don't ignore dead code warnings

### Test Management
- NEVER fix a unit or integration test by just disabling it
- If a test must be temporarily disabled, use `TODO-MUST-FIX` to ensure it's not forgotten
- Always investigate why a test is failing before disabling it
9 changes: 9 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ members = [
"apps/freenet-ping/app",
"apps/freenet-ping/types",
"apps/freenet-ping/contracts/ping",
"tests/test-contract-integration"
"tests/test-contract-integration",
"tests/test-contract-update-nochange"
]

[workspace.dependencies]
Expand Down
47 changes: 47 additions & 0 deletions GATEWAY_FIX_PROPOSAL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Gateway Connection Fix Proposal

## Problem Statement

Peers connecting through gateways fail because:
1. Initial transport connection is registered in ConnectionManager
2. When peer sends StartJoinReq, gateway rejects it as "already connected"
3. This prevents proper ring membership

## Current (Flawed) Approach

The ConnectionState enum spreads throughout the codebase, creating a leaky abstraction that violates separation of concerns.

## Proposed Solution

### Core Principle
**Transport connections ≠ Ring membership**

### Implementation

1. **HandshakeHandler Changes**
- Maintain internal set of pending connections
- Don't register peers in ConnectionManager until accepted
- Clean separation between transport and logical layers

2. **ConnectionManager Changes**
- Remove all ConnectionState logic
- Only track actual ring members
- Simplify should_accept() logic

3. **Flow**
```
Peer → Gateway (transport connection)
Peer → StartJoinReq → Gateway
Gateway checks should_accept() [peer not in ConnectionManager yet]
If accepted → Add to ConnectionManager as ring member
If rejected → Close transport connection
```

### Benefits
- No leaky abstractions
- Clear separation of concerns
- Simpler, more maintainable code
- Addresses Nacho's architectural concerns

### Key Insight
The bug is that we're conflating transport connectivity with ring membership. The fix is to keep them separate until the peer is actually accepted.
68 changes: 68 additions & 0 deletions PRE_COMMIT_HOOK_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Pre-Commit Hook with Claude Test Detection

## Overview

The pre-commit hook now includes Claude-powered detection of disabled tests to prevent accidentally committing code with disabled tests.

## How It Works

The hook performs the following checks:
1. **Rust formatting** - Ensures code follows standard formatting
2. **Clippy linting** - Catches common Rust mistakes and anti-patterns
3. **TODO-MUST-FIX detection** - Blocks commits with TODO-MUST-FIX comments
4. **Disabled test detection** - Uses Claude to analyze the diff and detect disabled tests

## Disabled Test Detection

Claude will detect various patterns indicating disabled tests:
- Rust: `#[ignore]`, `#[ignore = "reason"]`
- JavaScript/TypeScript: `it.skip()`, `describe.skip()`, `test.skip()`
- Python: `@pytest.mark.skip`, `@unittest.skip`
- Commented out test functions
- Any other language-specific test disabling patterns

When disabled tests are detected, the hook will:
1. Block the commit
2. Show exactly where the disabled tests were found
3. Provide guidance on how to proceed

## Example Output

```bash
Checking for disabled tests with Claude...
✗ Claude detected disabled tests in the commit
Disabled tests found at:
test_example.rs:6-9 - test marked with #[ignore] attribute
Tests should not be disabled in commits. If a test must be temporarily disabled:
1. Add a TODO-MUST-FIX comment explaining why
2. Fix the underlying issue before committing
3. Or exclude the test changes from this commit
```

## Handling Disabled Tests

If you need to temporarily disable a test:

1. **Add TODO-MUST-FIX comment**: This will also block the commit, forcing you to address it
```rust
// TODO-MUST-FIX: This test hangs during startup
#[ignore = "See TODO-MUST-FIX above"]
fn test_broken() { }
```

2. **Fix the test**: The preferred solution is to fix the underlying issue

3. **Exclude from commit**: Use `git add -p` to selectively stage changes, excluding the disabled test

## Requirements

- Claude CLI must be installed at `~/.claude/local/claude`
- The hook will gracefully skip Claude checks if the CLI is not available

## Troubleshooting

- If Claude checks are failing unexpectedly, check that the Claude CLI is working:
```bash
~/.claude/local/claude --help
```
- The hook won't fail the commit if Claude itself has an error (only if disabled tests are found)
98 changes: 98 additions & 0 deletions TESTING_RECOMMENDATIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Testing Recommendations for UPDATE Race Condition Fix (Issue #1733)

## Current Testing Status
- ✅ Root cause identified and fixed
- ✅ Logging added for debugging
- ⚠️ Manual verification only
- ❌ No automated regression test
- ❌ No performance benchmarks

## Required Tests for Production

### 1. Integration Test
Create a full integration test that:
- Sets up a test network with gateway and peer nodes
- Sends UPDATE requests with < 24ms delays
- Verifies 100% of responses are received
- Fails without the fix, passes with it

### 2. Unit Tests
Add unit tests for:
- `EventListenerState::tx_to_client` management
- Transaction cleanup timing
- Concurrent client handling

### 3. Performance Tests
- Benchmark UPDATE throughput before/after fix
- Ensure no regression in response times
- Test memory usage (no leaks from state tracking)

### 4. Stress Tests
- 100+ concurrent clients
- 1000+ rapid UPDATE requests
- Network latency simulation
- Resource exhaustion scenarios

### 5. Edge Cases
- Client disconnection during UPDATE
- Network partition during processing
- Transaction timeout scenarios
- Duplicate transaction IDs (if possible)

## Test Implementation Plan

1. **Phase 1**: Create reproducer test
```rust
#[test]
fn reproduce_race_condition() {
// This should FAIL without the fix
// Send 2 UPDATEs within 20ms
// Assert both get responses
}
```

2. **Phase 2**: Add to CI
- Run on every PR
- Include timing-sensitive tests
- Monitor for flakiness

3. **Phase 3**: Production monitoring
- Add metrics for UPDATE response rates
- Alert on anomalies
- Track p95/p99 response times

## Manual Testing Protocol

Until automated tests are ready:

1. Start test network
2. Run this script:
```bash
# Send rapid updates
for i in {1..10}; do
curl -X POST $FREENET_API/update &
sleep 0.02 # 20ms delay
done
wait
# Check all 10 responses received
```

3. Check logs for:
- `[UPDATE_RACE_FIX]` entries
- No timeout errors
- All clients received responses

## Risk Assessment

Without proper testing:
- 🔴 **High Risk**: Race condition could still occur under different timing
- 🟡 **Medium Risk**: Performance regression in high-load scenarios
- 🟡 **Medium Risk**: Memory leaks from improper state cleanup

## Recommendation

Before merging to production:
1. Implement at least the basic integration test
2. Run manual testing protocol
3. Monitor in staging environment for 24 hours
4. Add production metrics for UPDATE success rate
49 changes: 37 additions & 12 deletions crates/core/src/client_events/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -434,26 +434,28 @@ async fn process_open_request(
?data,
"Starting update op",
);
let new_state = match op_manager
let update_response = op_manager
.notify_contract_handler(ContractHandlerEvent::UpdateQuery {
key,
data,
related_contracts: related_contracts.clone(),
})
.await
{
Ok(ContractHandlerEvent::UpdateResponse {
.await?;

let new_state = match update_response {
ContractHandlerEvent::UpdateResponse {
new_value: Ok(new_val),
}) => Ok(new_val),
Ok(ContractHandlerEvent::UpdateResponse {
} => Ok(new_val),
ContractHandlerEvent::UpdateResponse {
new_value: Err(err),
}) => Err(OpError::from(err)),
Ok(ContractHandlerEvent::UpdateNoChange { key }) => {
tracing::debug!(%key, "update with no change, do not start op");
return Ok(None);
} => Err(OpError::from(err)),
ContractHandlerEvent::UpdateNoChange { key } => {
// This should not happen anymore since we now return UpdateResponse
// from the contract handler even for NoChange cases
tracing::warn!(%key, "Unexpected UpdateNoChange event - this should have been converted to UpdateResponse");
return Err(OpError::UnexpectedOpState.into());
}
Err(err) => Err(err.into()),
Ok(_) => Err(OpError::UnexpectedOpState),
_ => return Err(OpError::UnexpectedOpState.into()),
}
.inspect_err(|err| tracing::error!(%key, "update query failed: {}", err))?;

Expand Down Expand Up @@ -527,6 +529,8 @@ async fn process_open_request(
if subscribe {
if let Some(subscription_listener) = subscription_listener {
tracing::debug!(%client_id, %key, "Subscribing to locally found contract");

// First, register the local listener
let register_listener = op_manager
.notify_contract_handler(
ContractHandlerEvent::RegisterSubscriberListener {
Expand Down Expand Up @@ -557,6 +561,27 @@ async fn process_open_request(
);
}
}

// Also create a network subscription to ensure we receive updates
// This matches the behavior of PUT with subscribe=true
tracing::debug!(%client_id, %key, "Creating network subscription for locally cached contract");
if let Err(err) = crate::node::subscribe(
op_manager.clone(),
key,
Some(client_id),
)
.await
{
tracing::error!(
%client_id, %key,
"Failed to create network subscription for local GET: {}", err
);
} else {
tracing::debug!(
%client_id, %key,
"Network subscription created successfully for local GET"
);
}
} else {
tracing::warn!(%client_id, %key, "GET with subscribe=true but no subscription_listener");
}
Expand Down
Loading
Loading