freenet · sanity · Aug 2, 2025 · Aug 2, 2025 · Aug 2, 2025 · Aug 2, 2025
diff --git a/CLAUDE.local.md b/CLAUDE.local.md
@@ -0,0 +1,56 @@
+# Claude Code Development Guidelines
+
+This document contains important guidelines for Claude Code when working on the Freenet project.
+
+## TODO-MUST-FIX Pattern
+
+To prevent committing broken or incomplete code, we use a special comment pattern that blocks commits:
+
+### Usage
+
+When you need to temporarily disable a test or leave code in an incomplete state, use the `TODO-MUST-FIX` comment:
+
+```rust
+// TODO-MUST-FIX: This test hangs during node startup and needs investigation
+// The test attempts to start a gateway and peer but appears to deadlock
+// during the node initialization phase. This needs to be fixed before release.
+#[tokio::test]
+#[ignore = "Test hangs - see TODO-MUST-FIX above"]
+async fn test_gateway_reconnection() -> TestResult {
+    // ...
+}
+```
+
+### How it Works
+
+The git pre-commit hook (`.git/hooks/pre-commit`) will:
+1. Scan all staged files for `TODO-MUST-FIX` comments
+2. If found, block the commit and display the files and line numbers
+3. Force you to resolve the issue before committing
+
+### When to Use
+
+Use `TODO-MUST-FIX` when:
+- Disabling a failing test that needs investigation
+- Leaving a critical bug unfixed temporarily
+- Implementing a temporary workaround that must be properly fixed
+- Any code that MUST be addressed before the next release
+
+### Benefits
+
+- Prevents forgetting about disabled tests
+- Ensures critical issues aren't accidentally merged
+- Creates a clear signal for what needs immediate attention
+- Maintains code quality by preventing the accumulation of technical debt
+
+## Other Guidelines
+
+### Dead Code
+- Don't just ignore unused variables by prepending `_`, understand why they are unused
+- Either use or remove dead code so it doesn't cause confusion
+- Don't ignore dead code warnings
+
+### Test Management
+- NEVER fix a unit or integration test by just disabling it
+- If a test must be temporarily disabled, use `TODO-MUST-FIX` to ensure it's not forgotten
+- Always investigate why a test is failing before disabling it
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/Cargo.toml b/Cargo.toml
@@ -5,7 +5,8 @@ members = [
 	"apps/freenet-ping/app",
 	"apps/freenet-ping/types",
 	"apps/freenet-ping/contracts/ping",
-	"tests/test-contract-integration"
+	"tests/test-contract-integration",
+	"tests/test-contract-update-nochange"
 ]
 
 [workspace.dependencies]

diff --git a/GATEWAY_FIX_PROPOSAL.md b/GATEWAY_FIX_PROPOSAL.md
@@ -0,0 +1,47 @@
+# Gateway Connection Fix Proposal
+
+## Problem Statement
+
+Peers connecting through gateways fail because:
+1. Initial transport connection is registered in ConnectionManager
+2. When peer sends StartJoinReq, gateway rejects it as "already connected"
+3. This prevents proper ring membership
+
+## Current (Flawed) Approach
+
+The ConnectionState enum spreads throughout the codebase, creating a leaky abstraction that violates separation of concerns.
+
+## Proposed Solution
+
+### Core Principle
+**Transport connections ≠ Ring membership**
+
+### Implementation
+
+1. **HandshakeHandler Changes**
+   - Maintain internal set of pending connections
+   - Don't register peers in ConnectionManager until accepted
+   - Clean separation between transport and logical layers
+
+2. **ConnectionManager Changes**
+   - Remove all ConnectionState logic
+   - Only track actual ring members
+   - Simplify should_accept() logic
+
+3. **Flow**
+   ```
+   Peer → Gateway (transport connection)
+   Peer → StartJoinReq → Gateway
+   Gateway checks should_accept() [peer not in ConnectionManager yet]
+   If accepted → Add to ConnectionManager as ring member
+   If rejected → Close transport connection
+   ```
+
+### Benefits
+- No leaky abstractions
+- Clear separation of concerns
+- Simpler, more maintainable code
+- Addresses Nacho's architectural concerns
+
+### Key Insight
+The bug is that we're conflating transport connectivity with ring membership. The fix is to keep them separate until the peer is actually accepted.
diff --git a/PRE_COMMIT_HOOK_GUIDE.md b/PRE_COMMIT_HOOK_GUIDE.md
@@ -0,0 +1,68 @@
+# Pre-Commit Hook with Claude Test Detection
+
+## Overview
+
+The pre-commit hook now includes Claude-powered detection of disabled tests to prevent accidentally committing code with disabled tests.
+
+## How It Works
+
+The hook performs the following checks:
+1. **Rust formatting** - Ensures code follows standard formatting
+2. **Clippy linting** - Catches common Rust mistakes and anti-patterns
+3. **TODO-MUST-FIX detection** - Blocks commits with TODO-MUST-FIX comments
+4. **Disabled test detection** - Uses Claude to analyze the diff and detect disabled tests
+
+## Disabled Test Detection
+
+Claude will detect various patterns indicating disabled tests:
+- Rust: `#[ignore]`, `#[ignore = "reason"]`
+- JavaScript/TypeScript: `it.skip()`, `describe.skip()`, `test.skip()`
+- Python: `@pytest.mark.skip`, `@unittest.skip`
+- Commented out test functions
+- Any other language-specific test disabling patterns
+
+When disabled tests are detected, the hook will:
+1. Block the commit
+2. Show exactly where the disabled tests were found
+3. Provide guidance on how to proceed
+
+## Example Output
+
+```bash
+Checking for disabled tests with Claude...
+✗ Claude detected disabled tests in the commit
+Disabled tests found at:
+test_example.rs:6-9 - test marked with #[ignore] attribute
+Tests should not be disabled in commits. If a test must be temporarily disabled:
+1. Add a TODO-MUST-FIX comment explaining why
+2. Fix the underlying issue before committing
+3. Or exclude the test changes from this commit
+```
+
+## Handling Disabled Tests
+
+If you need to temporarily disable a test:
+
+1. **Add TODO-MUST-FIX comment**: This will also block the commit, forcing you to address it
+   ```rust
+   // TODO-MUST-FIX: This test hangs during startup
+   #[ignore = "See TODO-MUST-FIX above"]
+   fn test_broken() { }
+   ```
+
+2. **Fix the test**: The preferred solution is to fix the underlying issue
+
+3. **Exclude from commit**: Use `git add -p` to selectively stage changes, excluding the disabled test
+
+## Requirements
+
+- Claude CLI must be installed at `~/.claude/local/claude`
+- The hook will gracefully skip Claude checks if the CLI is not available
+
+## Troubleshooting
+
+- If Claude checks are failing unexpectedly, check that the Claude CLI is working:
+  ```bash
+  ~/.claude/local/claude --help
+  ```
+- The hook won't fail the commit if Claude itself has an error (only if disabled tests are found)
diff --git a/TESTING_RECOMMENDATIONS.md b/TESTING_RECOMMENDATIONS.md
@@ -0,0 +1,98 @@
+# Testing Recommendations for UPDATE Race Condition Fix (Issue #1733)
+
+## Current Testing Status
+- ✅ Root cause identified and fixed
+- ✅ Logging added for debugging
+- ⚠️  Manual verification only
+- ❌ No automated regression test
+- ❌ No performance benchmarks
+
+## Required Tests for Production
+
+### 1. Integration Test
+Create a full integration test that:
+- Sets up a test network with gateway and peer nodes
+- Sends UPDATE requests with < 24ms delays
+- Verifies 100% of responses are received
+- Fails without the fix, passes with it
+
+### 2. Unit Tests
+Add unit tests for:
+- `EventListenerState::tx_to_client` management
+- Transaction cleanup timing
+- Concurrent client handling
+
+### 3. Performance Tests
+- Benchmark UPDATE throughput before/after fix
+- Ensure no regression in response times
+- Test memory usage (no leaks from state tracking)
+
+### 4. Stress Tests
+- 100+ concurrent clients
+- 1000+ rapid UPDATE requests
+- Network latency simulation
+- Resource exhaustion scenarios
+
+### 5. Edge Cases
+- Client disconnection during UPDATE
+- Network partition during processing
+- Transaction timeout scenarios
+- Duplicate transaction IDs (if possible)
+
+## Test Implementation Plan
+
+1. **Phase 1**: Create reproducer test
+   ```rust
+   #[test]
+   fn reproduce_race_condition() {
+       // This should FAIL without the fix
+       // Send 2 UPDATEs within 20ms
+       // Assert both get responses
+   }
+   ```
+
+2. **Phase 2**: Add to CI
+   - Run on every PR
+   - Include timing-sensitive tests
+   - Monitor for flakiness
+
+3. **Phase 3**: Production monitoring
+   - Add metrics for UPDATE response rates
+   - Alert on anomalies
+   - Track p95/p99 response times
+
+## Manual Testing Protocol
+
+Until automated tests are ready:
+
+1. Start test network
+2. Run this script:
+   ```bash
+   # Send rapid updates
+   for i in {1..10}; do
+     curl -X POST $FREENET_API/update &
+     sleep 0.02  # 20ms delay
+   done
+   wait
+   # Check all 10 responses received
+   ```
+
+3. Check logs for:
+   - `[UPDATE_RACE_FIX]` entries
+   - No timeout errors
+   - All clients received responses
+
+## Risk Assessment
+
+Without proper testing:
+- 🔴 **High Risk**: Race condition could still occur under different timing
+- 🟡 **Medium Risk**: Performance regression in high-load scenarios
+- 🟡 **Medium Risk**: Memory leaks from improper state cleanup
+
+## Recommendation
+
+Before merging to production:
+1. Implement at least the basic integration test
+2. Run manual testing protocol
+3. Monitor in staging environment for 24 hours
+4. Add production metrics for UPDATE success rate
diff --git a/crates/core/src/client_events/mod.rs b/crates/core/src/client_events/mod.rs
@@ -434,26 +434,28 @@ async fn process_open_request(
                             ?data,
                             "Starting update op",
                         );
-                        let new_state = match op_manager
+                        let update_response = op_manager
                             .notify_contract_handler(ContractHandlerEvent::UpdateQuery {
                                 key,
                                 data,
                                 related_contracts: related_contracts.clone(),
                             })
-                            .await
-                        {
-                            Ok(ContractHandlerEvent::UpdateResponse {
+                            .await?;
+
+                        let new_state = match update_response {
+                            ContractHandlerEvent::UpdateResponse {
                                 new_value: Ok(new_val),
-                            }) => Ok(new_val),
-                            Ok(ContractHandlerEvent::UpdateResponse {
+                            } => Ok(new_val),
+                            ContractHandlerEvent::UpdateResponse {
                                 new_value: Err(err),
-                            }) => Err(OpError::from(err)),
-                            Ok(ContractHandlerEvent::UpdateNoChange { key }) => {
-                                tracing::debug!(%key, "update with no change, do not start op");
-                                return Ok(None);
+                            } => Err(OpError::from(err)),
+                            ContractHandlerEvent::UpdateNoChange { key } => {
+                                // This should not happen anymore since we now return UpdateResponse
+                                // from the contract handler even for NoChange cases
+                                tracing::warn!(%key, "Unexpected UpdateNoChange event - this should have been converted to UpdateResponse");
+                                return Err(OpError::UnexpectedOpState.into());
                             }
-                            Err(err) => Err(err.into()),
-                            Ok(_) => Err(OpError::UnexpectedOpState),
+                            _ => return Err(OpError::UnexpectedOpState.into()),
                         }
                         .inspect_err(|err| tracing::error!(%key, "update query failed: {}", err))?;
 
@@ -527,6 +529,8 @@ async fn process_open_request(
                                 if subscribe {
                                     if let Some(subscription_listener) = subscription_listener {
                                         tracing::debug!(%client_id, %key, "Subscribing to locally found contract");
+
+                                        // First, register the local listener
                                         let register_listener = op_manager
                                             .notify_contract_handler(
                                                 ContractHandlerEvent::RegisterSubscriberListener {
@@ -557,6 +561,27 @@ async fn process_open_request(
                                                 );
                                             }
                                         }
+
+                                        // Also create a network subscription to ensure we receive updates
+                                        // This matches the behavior of PUT with subscribe=true
+                                        tracing::debug!(%client_id, %key, "Creating network subscription for locally cached contract");
+                                        if let Err(err) = crate::node::subscribe(
+                                            op_manager.clone(),
+                                            key,
+                                            Some(client_id),
+                                        )
+                                        .await
+                                        {
+                                            tracing::error!(
+                                                %client_id, %key,
+                                                "Failed to create network subscription for local GET: {}", err
+                                            );
+                                        } else {
+                                            tracing::debug!(
+                                                %client_id, %key,
+                                                "Network subscription created successfully for local GET"
+                                            );
+                                        }
                                     } else {
                                         tracing::warn!(%client_id, %key, "GET with subscribe=true but no subscription_listener");
                                     }