-
Notifications
You must be signed in to change notification settings - Fork 135
chore(pegboard-gateway): add new message id format & add deprecated tunnel ack #3492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 11-19-chore_pegboard-gateway_remove_tunnelack_
Are you sure you want to change the base?
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
3 Skipped Deployments
|
c020fe1 to
e47241f
Compare
Code ReviewThis PR introduces a significant architectural improvement to the tunnel messaging system by implementing a more efficient message ID format and removing the need for explicit tunnel acknowledgments. Overall, the changes are well-structured and maintain backward compatibility. Here are my findings: ✅ Strengths
|
Code ReviewI've reviewed this PR which adds a new message ID format and deprecates the old tunnel ack mechanism. Here's my feedback: Positive Aspects✅ Well-structured message ID system: The new ✅ Backwards compatibility: The PR thoughtfully handles protocol versioning with ✅ Clean architecture: The new ✅ Improved state management: Removing the ✅ Request ID collision handling: Good defensive programming in Issues & Concerns🔴 Critical: Request ID Not Released After CompletionLocation: engine/packages/guard-core/src/proxy_service.rs:1889 The // At line 1889, request_id is generated
let request_id = state.generate_unique_request_id().await?;
// ... but I don't see state.release_request_id(request_id) anywhere in the taskRecommendation: Add a 🟡 Medium: Missing Request ID Release in HTTP PathLocation: engine/packages/guard-core/src/proxy_service.rs:1086 Similar to above, the HTTP request path generates a unique request ID but doesn't appear to release it when the request completes. The request ID should be released after the response is sent. Recommendation: Wrap the request handling in a scope guard or defer pattern to ensure 🟡 Medium: Serialization Size Not Verified at Compile TimeLocation: engine/packages/pegboard/src/tunnel/id.rs:34-39 The code uses a runtime assertion to verify the serialized size is exactly 12 bytes: ensure!(
written == 12,
"message id serialization produced wrong size: expected 12 bytes, got {}",
written
);While this is good defensive programming, if the BARE schema changes, this will only be caught at runtime. Recommendation: Consider adding a compile-time test or const assertion to verify the size. At minimum, add a unit test that exercises this code path. 🟡 Medium: Wrapping Arithmetic Without Overflow DocumentationLocation: engine/packages/pegboard-gateway/src/shared_state.rs:136 The message index uses wrapping arithmetic: req.message_index = req.message_index.wrapping_add(1);This is correct for a u16 counter, but there's no documentation about what happens when the index wraps from 65535 to 0, especially for WebSocket message acknowledgments. Recommendation: Add a comment explaining the wrapping behavior and confirming that the 🟢 Minor: Inconsistent Error Message FormattingLocation: engine/packages/pegboard/src/tunnel/id.rs:80-81 vs TypeScript implementation The Rust error messages use lowercase ("invalid request id length") while some TypeScript errors might use different casing. While minor, consistency helps with debugging. Recommendation: Standardize error message formats across Rust and TypeScript implementations. 🟢 Minor: Missing Documentation on MessageId FormatLocation: engine/sdks/schemas/runner-protocol/v3.bare:217 The BARE schema defines Recommendation: Add a comment in the BARE schema: Performance Considerations✅ The new format is actually more efficient - the old approach with UUID-based IDs used 16 bytes per message ID, while the new format uses 12 bytes. Over millions of messages, this saves bandwidth. ✅ Removing the
Security Considerations✅ Using random u32 for both gateway_id and request_id provides sufficient entropy for non-cryptographic uniqueness.
Test Coverage❌ Missing tests: I found no test files for the new tunnel ID functionality. Given this is a protocol-level change, unit tests are strongly recommended. Recommended tests:
SummaryThis is a solid refactoring that simplifies the protocol and improves efficiency. The main concern is the request ID memory leak which must be fixed before merging. The other issues are lower priority but should still be addressed. Required before merge:
Recommended before merge:
Nice to have:
Great work on the backwards compatibility handling! The deprecated tunnel ack approach is clean and will make rolling this out much safer. Review generated with Claude Code |
| route_cache: RouteCache, | ||
| rate_limiters: Cache<(Id, std::net::IpAddr), Arc<Mutex<RateLimiter>>>, | ||
| in_flight_counters: Cache<(Id, std::net::IpAddr), Arc<Mutex<InFlightCounter>>>, | ||
| inflight_requests: Arc<Mutex<HashSet<RequestId>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use scc HashSet
| @@ -0,0 +1,86 @@ | |||
| use anyhow::{Context, Result, ensure}; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be in runner protocol crate
| @@ -0,0 +1 @@ | |||
| pub mod id; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be in runner protocol crate
4adef6f to
759df3e
Compare
e47241f to
6521b98
Compare
PR Review: New Message ID Format & Deprecated Tunnel AckI have completed a thorough review of this PR. Overall, this is a solid refactoring that improves the message ID system and maintains backwards compatibility. Here are my findings: Strengths
Critical Issues1. Memory Leak: Request IDs Never Released The release_request_id() method is defined in proxy_service.rs:687-689 but never called. This means the inflight_requests HashSet will grow indefinitely, memory usage will increase over time, and eventually could lead to performance degradation. Fix needed: Add cleanup calls after custom serve request completes (both success and error paths) and after WebSocket task completion. 2. Unsafe .expect() Calls in Database Code In engine/packages/pegboard/src/keys/actor.rs there are .expect() calls that will panic if the database contains invalid data. Per CLAUDE.md error handling guidelines, these should return proper Result<> with context instead of panicking in production code paths. Moderate Issues3. Missing Test Coverage - No tests added for the new message ID format, collision detection logic, or backwards compatibility handling. 4. Message Index Overflow - proxy_service.rs uses wrapping_add(1) for message_index (u16). After 65,535 messages it wraps to 0. Consider adding a comment or warning log. 5. Import Organization - Some files have imports that should be moved to the top of the file per CLAUDE.md style. Performance & Security
SummaryThis is a strong refactoring with good architectural decisions, but has two critical issues that must be fixed before merge:
Once these are addressed, this PR will be ready to merge. The new message ID format is well-designed and the backwards compatibility approach is solid. |
PR Review: New Message ID Format & Deprecated Tunnel AckThis is a well-structured refactoring that modernizes the tunnel message ID system. The changes introduce a more efficient message identification scheme while maintaining backward compatibility. Here's my detailed review: ✅ Strengths1. Improved Message ID Design
2. Strong Backward Compatibility
3. Good Type Safety
4. Hibernation Request Improvements
|
6521b98 to
d6a2e79
Compare
759df3e to
4adef6f
Compare
PR Review: Add New Message ID Format & Deprecated Tunnel AckOverviewThis PR introduces a significant refactor to the message ID system in the Pegboard gateway, moving from UUID-based IDs to a more compact and structured format. The changes touch critical paths in the proxy service and gateway communication. Code Quality & Best Practices ✅Strengths:
Areas for Improvement: 1. Inconsistent Whitespace (engine/packages/guard-core/src/proxy_service.rs:641)// Generate unique request ID
let request_id = Some(self.generate_unique_request_id().await?);Mixed tabs and spaces. The leading whitespace appears to be spaces instead of tabs. Per CLAUDE.md, this project uses hard tabs for Rust formatting. 2. Missing Import Consolidation (engine/packages/guard-core/src/proxy_service.rs:16)The import for use pegboard::tunnel::id::{RequestId, generate_request_id};Consider grouping this with other pegboard imports if any exist nearby. Potential Bugs & Issues
|
Pull Request ReviewSummaryThis PR introduces a new message ID format for the tunnel protocol and adds backward compatibility for deprecated tunnel acknowledgments. The changes migrate from a 16-byte UUID-based message ID to a more efficient 12-byte structured format. Code Quality - Strengths
CRITICAL BUG - Double In-Flight AcquisitionLocation: engine/packages/guard-core/src/proxy_service.rs:1097-1114 The CustomServe path acquires in-flight limits TWICE:
Impact: CustomServe HTTP requests count twice against the in-flight limit, reducing effective capacity by 50%. Fix: Remove the second acquire_in_flight call in the CustomServe branch. Other Issues1. Request ID Collision Detection (proxy_service.rs:465-495)
2. Message Index Wrap-Around (pegboard-gateway/src/shared_state.rs:136-151)
3. Lock Contention (proxy_service.rs:381, 465)
4. Test Coverage
RecommendationsHigh Priority:
Medium Priority: Low Priority: ConclusionSolid refactoring with good protocol versioning. However, the double in-flight acquisition bug should be fixed before merging. The code follows Rivet conventions well. Great work on backward compatibility! |
d6a2e79 to
adeb05d
Compare
4adef6f to
759df3e
Compare
PR Review: Tunnel Message ID Format & Deprecated Tunnel AckI've reviewed this PR which introduces a new message ID format and adds deprecated tunnel acknowledgments. Overall, the implementation is solid with good architectural decisions. Here are my findings: ✅ Strengths
|
PR Review: Message ID Format & Deprecated Tunnel AckSummaryThis PR introduces a new compact message ID format and deprecates the TunnelAck mechanism. The changes are well-structured with good backward compatibility support through protocol versioning. ✅ Strengths
🔍 Issues Found1. Whitespace Formatting Issue (guard-core/src/proxy_service.rs:641)// Generate unique request ID
let request_id = Some(self.generate_unique_request_id().await?);Issue: Mixed tabs and spaces - line uses spaces instead of hard tabs 2. Request ID Lifecycle Bug (guard-core/src/proxy_service.rs:1094-1173)let request_id = match self.state.acquire_in_flight(client_ip, &actor_id, &req_headers).await? {
Some(id) => id,
None => { return Err(errors::RateLimit { /* ... */ }.build()); }
};
// ... later in the loop
self.state.release_in_flight(client_ip, &actor_id, request_id).await;Issue:
This causes:
Fix: Remove the duplicate 3. Collision Handling Concerns (guard-core/src/proxy_service.rs:667-694)const MAX_TRIES: u32 = 100;
for attempt in 0..MAX_TRIES {
let request_id = generate_request_id();
if !requests.contains(&request_id) {
requests.insert(request_id);
return Ok(request_id);
}
tracing::warn!(?request_id, attempt, "request id collision");
}Issue: With 4 bytes (4 billion possibilities), collisions should be extremely rare. If hitting 100 attempts, there is likely a deeper issue (not cleaning up IDs, memory leak, etc.) Suggestion:
4. Message Index Type Mismatch (pegboard-gateway/src/shared_state.rs:39)message_index: tunnel_id::MessageIndex, // u16But in v3.bare: And in shared_state.rs:316: fn wrapping_gt(msg.message_index, ack_index)Issue: The code uses wrapping arithmetic for u16 indices, which is correct, but there is inconsistency in how indices are used vs. how they are documented. Suggestion: Add a comment explaining the wrapping behavior for message indices (e.g., "Indices wrap at u16::MAX") 5. Missing Error Context (pegboard/src/tunnel/id.rs:31)serde_bare::to_writer(&mut cursor, &parts).context("failed to serialize message id parts")?;Suggestion: Add the actual values to the error context for debugging: .with_context(|| format!("failed to serialize message id parts: gateway_id={:?}, request_id={:?}, message_index={}",
gateway_id, request_id, message_index))?;
|
adeb05d to
d6a2e79
Compare
759df3e to
4adef6f
Compare
PR Review: Add new message ID format & deprecated tunnel ackI've reviewed this PR and have the following feedback: Code Quality & Architecture ✅The refactoring to use a structured message ID format is well-designed: Strengths:
Potential Issues
|
Code Review - PR 3492SummaryThis PR introduces a new message ID format for the pegboard-gateway tunnel protocol and adds support for deprecated tunnel acknowledgments. The changes involve significant refactoring of the message ID system from UUID-based identifiers to a structured format with gateway ID, request ID, and message index components. Strengths
Critical Issues1. Potential Race Condition in Request ID Management (engine/packages/guard-core/src/proxy_service.rs:656-692) The code acquires an in-flight slot and generates a request ID, but releases it in a spawned task. If the request fails before the spawned task runs, the request ID leaks from the inflight_requests HashSet. Recommendation: Use RAII pattern or ensure release happens in all code paths. Consider using a guard struct that releases on drop. 2. Missing Tests No tests found for the new tunnel::id module. Required coverage:
3. Commented-out Code (shared_state.rs:472-474) Malformed commented code should be removed or properly formatted. 4. Duplicate In-Flight Acquisition (proxy_service.rs:809, 1097-1113) In handle_http_request, acquire_in_flight is called twice - at top level and within CustomServe handler. This could cause double-counting or double-release. Needs verification. Other Concerns5. Memory Leak Potential If the release task panics, request ID remains in HashSet forever, leading to memory exhaustion. 6. Message Index Overflow (shared_state.rs:150) No documentation about wraparound behavior when message index wraps from u16::MAX to 0. 7. Hard-coded Magic Numbers Consider documenting or making configurable: HWS_MESSAGE_ACK_TIMEOUT (30s), HWS_MAX_PENDING_MSGS_SIZE_PER_REQ (1MB), MAX_TRIES (100). Security Considerations
Performance Considerations
Style & Best Practices
RecommendationsMust Fix:
Should Fix:
Overall: Solid refactoring improving message ID structure, but resource cleanup and test coverage issues need addressing before merge. Request changes. |
d6a2e79 to
9d2a169
Compare
4adef6f to
bdf8c0e
Compare
Code Review: PR #3492 - Add new message ID format & deprecated tunnel ackOverviewThis PR refactors the tunnel message ID system from UUID-based identifiers to a compact, structured format. The new format embeds gateway ID, request ID, and message index into a 12-byte message ID, improving efficiency and eliminating redundant fields. Strengths1. Architecture & Design
2. Code Quality
3. Performance Improvements
|
Issues & RecommendationsCRITICAL: Double In-Flight Acquisition BugLocation: engine/packages/guard-core/src/proxy_service.rs:809 and :1100 There is a double acquire bug for CustomServe routes. The code acquires in-flight slot twice:
Impact:
Fix: The acquisition at line 809 should be removed, OR the second acquisition for CustomServe should be skipped. Acquisition should happen at the routing destination only. MEDIUM: Request ID Collision HandlingLocation: engine/packages/guard-core/src/proxy_service.rs:665-692 The collision retry logic holds a Mutex lock for up to 100 iterations. Issues:
Recommendations:
|
Security Considerations
Test CoverageIssue: No new tests detected in the PR. Recommendations:
Performance AnalysisPositive:
Concerns:
|
Summary VerdictOverall Assessment: Good refactoring with excellent backwards compatibility, but contains a CRITICAL double-acquisition bug that must be fixed before merge. Priority Fixes:
Code Style: Mostly follows conventions - uses hard tabs, proper structured logging with tracing, good error handling with anyhow::Context. Recommendation: Request changes to fix the double-acquisition bug, then approve after verification. Great work on the protocol migration and backwards compatibility! The structured message ID format is a solid improvement. |
Code Review: Message ID Format & Deprecated Tunnel AckI've completed a thorough review of this PR. Overall, the implementation is solid with good backward compatibility handling. Here are my findings: ✅ Strengths
🔍 Issues & ConcernsHigh Priority1. Resource Leak in In-Flight Request Management ( The
This causes:
Fix: Remove the duplicate 2. Missing In 3. Potential Integer Overflow in Message Index ( req.message_index = req.message_index.wrapping_add(1);While
Recommendation: Add a comment explaining the wrap-around behavior is intentional and safe due to the acknowledgment window size. Medium Priority4. Mutex Contention on let mut requests = self.inflight_requests.lock().await;The collision detection holds a global mutex for the entire retry loop. Under high concurrency, this could become a bottleneck. Consider using a lock-free data structure or reducing the critical section. 5. Hardcoded Retry Limit ( const MAX_TRIES: u32 = 100;This should probably be configurable or at least documented why 100 is the right number. With 4 billion possible IDs (2^32), the birthday paradox suggests collisions become probable around sqrt(4B) ≈ 65k concurrent requests, which is achievable in production. 6. Missing Validation in TypeScript Implementation
ensure!(message_index <= u16::MAX, "message index out of range");Low Priority7. Inconsistent Logging Some places use 8. Test Coverage I don't see test files for the new tunnel ID collision handling logic. Consider adding:
📊 Performance Considerations
🔐 Security ReviewNo security vulnerabilities identified. The random ID generation uses cryptographically secure RNG which is appropriate. 📝 Code Quality
Recommendations
Verdict✅ Approve with changes requested - The core implementation is solid, but the double acquisition bug needs to be fixed before merging as it could cause resource leaks in production. |

No description provided.