Skip to content

Conversation

@NathanFlurry
Copy link
Member

No description provided.

@vercel
Copy link

vercel bot commented Nov 20, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
rivetkit-serverless Error Error Nov 20, 2025 8:52pm
3 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
rivet-cloud Ignored Ignored Nov 20, 2025 8:52pm
rivet-inspector Ignored Ignored Preview Nov 20, 2025 8:52pm
rivet-site Ignored Ignored Preview Nov 20, 2025 8:52pm

@vercel vercel bot temporarily deployed to Preview – rivetkit-serverless November 20, 2025 07:59 Inactive
Copy link
Member Author

NathanFlurry commented Nov 20, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Nov 20, 2025

Code Review

I've reviewed this PR and have the following feedback:

Overview

This PR extracts the keepalive task into a separate module and ensures it runs both during active WebSocket connections and during hibernation. The goal is to prevent hibernating requests from being garbage collected.

Positive Changes

  • Good refactoring: Extracting the keepalive logic into a dedicated module improves code organization and maintainability
  • Proper task lifecycle management: Using watch channels for graceful shutdown is the correct pattern
  • Consistent error handling: The new task follows the same pattern as other lifecycle tasks

Issues and Concerns

1. Documentation typo (Minor)

Line 13 in keepalive_task.rs - Missing space in comment. Should be "/// Only ran for hibernating requests." instead of "///Only ran for hibernating requests."

2. Potential unnecessary work during active connections (Performance)

Lines 480-507 in lib.rs - The keepalive task now runs during active WebSocket connections when can_hibernate is true. Previously, it only ran during actual hibernation (handle_websocket_hibernation).

Question: Is it necessary to ping the hibernating_request UDB entry while the connection is still active? This adds database writes during normal operation. The original implementation only ran the keepalive task during actual hibernation (lines 603-616), which seems more efficient.

Recommendation: Consider whether the keepalive task should only run during actual hibernation, not during active connections. If it must run during active connections, please document why this change is necessary in the PR description or code comments.

3. Task spawning pattern (Code Quality)

Lines 485-491 in lib.rs - The keepalive task is spawned inside a tokio::join! future, which is different from how the other three tasks are spawned (lines 398-418). For consistency, consider spawning it outside the join block conditionally and then awaiting it inside, similar to how tunnel_to_ws, ws_to_tunnel, and ping tasks are handled.

4. Result matching could be more robust (Correctness)

Lines 511-523 in lib.rs - The result matching logic prefers the first non-aborted result, but doesn't explicitly handle cases where multiple tasks return non-aborted results. Consider being more explicit about the precedence for different LifecycleResult variants (ServerClose vs ClientClose vs Aborted).

5. Clone variables optimization (Minor Performance)

Lines 424-428 in lib.rs - Several clones are performed that may not be necessary since most of these types are cheap to clone (Ids are Copy types, ctx is an Arc), but consider capturing them directly in the async block if possible.

Testing Recommendations

  • Add integration tests to verify that hibernating requests don't get garbage collected
  • Test the keepalive interval is working correctly (at threshold/2)
  • Verify that jitter prevents stampeding herds in high-load scenarios

Security/Safety

No security concerns identified. The changes maintain proper error handling and resource cleanup.

Verdict

The core fix looks sound, but there are some architectural questions (particularly issue 2 about running during active connections) that should be addressed before merging. The code quality could be improved with more consistent patterns and better documentation of the design decisions.

@NathanFlurry NathanFlurry marked this pull request as ready for review November 20, 2025 18:24
@MasterPtato MasterPtato force-pushed the 11-19-fix_rivetkit_fix_graceful_runner_shutdown branch from 4f192f3 to a11d203 Compare November 20, 2025 19:38
@MasterPtato MasterPtato force-pushed the 11-19-fix_pegboard-gateway_ping_hibernating_requests_both_during_open_hws_connections_and_during_hibernation branch from 5552b74 to 815c918 Compare November 20, 2025 19:38
@claude
Copy link

claude bot commented Nov 20, 2025

Code Review - PR #3498

Summary

This PR refactors the keepalive task for hibernating requests by extracting it into a separate module and ensuring it runs both during active WebSocket connections and during hibernation. Overall, the changes look good and follow the repository's patterns well.


Positive Observations

Good refactoring: Extracting the keepalive logic into keepalive_task.rs improves code organization and reusability

Consistent patterns: The new task follows the same pattern as ping_task.rs with proper abort handling via watch channels

Proper lifecycle management: The keepalive task is correctly spawned and cleaned up in both handle_websocket and handle_websocket_hibernation

Jitter implementation: Good use of jitter (0-128ms) to prevent thundering herd problems

Import organization: Follows the repository convention of keeping imports at the top of the file


Issues & Suggestions

1. Typo in documentation (Minor)

Location: keepalive_task.rs:10

/// Periodically pings writes keepalive in UDB. This is used to restore hibernating request IDs on

Should be:

/// Periodically writes keepalive pings in UDB. This is used to restore hibernating request IDs on

2. Formatting issue in documentation (Minor)

Location: keepalive_task.rs:13

There's a missing space:

///Only ran for hibernating requests.

Should be:

/// Only ran for hibernating requests.

3. Inconsistent lifecycle result handling (Moderate)

Location: lib.rs:511-522

The lifecycle result matching logic shows asymmetry. When multiple tasks complete successfully but with different results, the code only checks the first two positions:

(Ok(res), Ok(LifecycleResult::Aborted), _, _) => Ok(res),
(Ok(LifecycleResult::Aborted), Ok(res), _, _) => Ok(res),
// Unlikely case
(res, _, _, _) => res,

Potential issue: If tunnel_to_ws_res and ws_to_tunnel_res are both Aborted, but ping_res or keepalive_res contains a meaningful result (like ServerClose or ClientClose), that result will be lost.

Suggestion: Consider a more comprehensive pattern that checks all positions for non-Aborted results:

match (tunnel_to_ws_res, ws_to_tunnel_res, ping_res, keepalive_res) {
    // Prefer error
    (Err(err), _, _, _) => Err(err),
    (_, Err(err), _, _) => Err(err),
    (_, _, Err(err), _) => Err(err),
    (_, _, _, Err(err)) => Err(err),
    // Find first non-aborted result
    (Ok(res), _, _, _) if !matches!(res, LifecycleResult::Aborted) => Ok(res),
    (_, Ok(res), _, _) if !matches!(res, LifecycleResult::Aborted) => Ok(res),
    (_, _, Ok(res), _) if !matches!(res, LifecycleResult::Aborted) => Ok(res),
    (_, _, _, Ok(res)) if !matches!(res, LifecycleResult::Aborted) => Ok(res),
    // All aborted (unlikely)
    (res, _, _, _) => res,
}

4. Error handling observation (Low)

Location: keepalive_task.rs:21-26

The .try_into()? conversion could fail if hibernating_request_eligible_threshold / 2 exceeds u64::MAX, though this is extremely unlikely in practice. The current error handling is appropriate, but you might consider adding a more descriptive error context:

let mut ping_interval = tokio::time::interval(Duration::from_millis(
    (ctx.config()
        .pegboard()
        .hibernating_request_eligible_threshold()
        / 2)
    .try_into()
    .context("hibernating_request_eligible_threshold too large for interval")?,
));

However, this is optional since the threshold would need to be absurdly large to cause issues.


5. Missing context on abort handling (Low)

Location: keepalive_task.rs:37-39

The abort handling via keepalive_abort_rx.changed() is correct, but it might be worth adding a debug log when the task is aborted, similar to how other tasks in lib.rs log their completion states. This would help with debugging:

_ = keepalive_abort_rx.changed() => {
    tracing::debug!("keepalive task aborted");
    return Ok(LifecycleResult::Aborted);
}

Performance Considerations

✅ The keepalive interval calculation (threshold / 2) is appropriate to ensure pings happen well before the threshold expires

✅ Using MissedTickBehavior::Skip prevents backlog buildup if the task falls behind

✅ Jitter range (0-128ms) is reasonable for preventing synchronization


Security Considerations

✅ No security concerns identified
✅ Proper use of watch channels for cancellation
✅ No credentials or sensitive data handling issues


Test Coverage

⚠️ Missing tests: This package doesn't appear to have any tests. While not a blocker for this PR, consider adding integration tests for:

  • Keepalive task behavior during normal operation
  • Keepalive task abort handling
  • Proper cleanup when websockets close during hibernation

Verdict

Recommendation: Approve with minor fixes

The changes are well-structured and solve the stated problem effectively. The only critical item is the documentation typo, which is trivial to fix. The lifecycle result handling observation is worth considering but not blocking since the current logic works for the common case.

Great work on maintaining consistency with existing patterns! 🎯

@claude
Copy link

claude bot commented Nov 20, 2025

PR Review: Keepalive Task Refactoring

Summary

This PR extracts keepalive logic into a dedicated task module and ensures hibernating requests are pinged both during active WebSocket connections and during hibernation. The refactoring improves code organization and fixes a potential bug where keepalive pings weren't being sent during active connections.


Code Quality & Best Practices ✅

Strengths:

  • Good separation of concerns: Extracting keepalive_task.rs follows the existing pattern of separate task modules (ping_task.rs, tunnel_to_ws_task.rs, etc.)
  • Consistent error handling: Uses anyhow::Result appropriately
  • Proper abort handling: Uses watch::Receiver for graceful task cancellation, consistent with other tasks
  • Code reuse: DRY principle applied - same keepalive logic used in both contexts

Minor Issues:

  1. Comment formatting (keepalive_task.rs:13)

    ///Only ran for hibernating requests.
    

    Missing space after ///. Should be:

    /// Only ran for hibernating requests.
    
  2. Import ordering (lib.rs:10-16)
    The imports are reordered in the diff, but this appears to be automatic formatting. Consider running cargo fmt if it hasn't been run yet (per CLAUDE.md: "DO NOT RUN CARGO FMT AUTOMATICALLY").


Potential Bugs 🐛

Medium Priority:

  1. Result handling in keepalive task abort (lib.rs:615-616)

    let _ = keepalive_abort_tx.send(());
    let _ = keepalive_handle.await;

    The await result is ignored. If the keepalive task panicked, we should probably log it:

    let _ = keepalive_abort_tx.send(());
    if let Err(e) = keepalive_handle.await {
        tracing::warn!(?e, "keepalive task failed during cleanup");
    }

    Same issue exists in lines 440-441, 457-458, 473-474, and others where abort signals are sent.

  2. Lifecycle result priority logic (lib.rs:511-523)
    The logic for determining the single result from all tasks seems incomplete:

    // Prefer non aborted result if all succeed
    (Ok(res), Ok(LifecycleResult::Aborted), _, _) => Ok(res),
    (Ok(LifecycleResult::Aborted), Ok(res), _, _) => Ok(res),
    // Unlikely case
    (res, _, _, _) => res,

    This only checks the first two tasks. What if:

    • (Aborted, Aborted, ServerClose, Aborted) → Would return Aborted instead of ServerClose
    • (Aborted, Aborted, Aborted, ClientClose) → Would return Aborted instead of ClientClose

    Consider a more comprehensive approach:

    let mut lifecycle_res = match (tunnel_to_ws_res, ws_to_tunnel_res, ping_res, keepalive_res) {
        // Prefer error
        (Err(err), _, _, _) => Err(err),
        (_, Err(err), _, _) => Err(err),
        (_, _, Err(err), _) => Err(err),
        (_, _, _, Err(err)) => Err(err),
        // Find first non-aborted result
        (Ok(res), _, _, _) if !matches!(res, LifecycleResult::Aborted) => Ok(res),
        (_, Ok(res), _, _) if !matches!(res, LifecycleResult::Aborted) => Ok(res),
        (_, _, Ok(res), _) if !matches!(res, LifecycleResult::Aborted) => Ok(res),
        (_, _, _, Ok(res)) if !matches!(res, LifecycleResult::Aborted) => Ok(res),
        // All aborted
        (res, _, _, _) => res,
    };

Performance Considerations ⚡

Good:

  • Jitter implementation (keepalive_task.rs:42-44): Prevents thundering herd with 0-128ms jitter
  • Missed tick behavior: MissedTickBehavior::Skip prevents tick accumulation if system is slow
  • Efficient abort signaling: Uses watch::channel which is lightweight

Potential Improvement:

  • The keepalive task spawns unconditionally in handle_websocket (lib.rs:485) even when can_hibernate is false, then immediately returns Aborted. Consider spawning conditionally:
    let keepalive = if can_hibernate {
        Some(tokio::spawn(keepalive_task::task(...)))
    } else {
        None
    };
    Then handle None case in the join. This avoids spawning unnecessary tasks.

Security Concerns 🔒

No major security issues identified.

Minor observations:

  • The jitter range (0-128ms) is small and predictable, but this is acceptable for load distribution
  • No input validation needed as all parameters are internal IDs

Test Coverage 🧪

Critical Gap:

  • No tests found for the pegboard-gateway package
  • This PR adds significant functionality (keepalive during active connections) but has no automated test coverage

Recommendations:

  1. Add integration tests for:
    • Keepalive task sending periodic updates during active WebSocket connections
    • Keepalive task sending periodic updates during hibernation
    • Proper cleanup when connection closes
    • Proper cleanup when tasks are aborted
  2. Add unit tests for:
    • Lifecycle result priority logic (especially the edge cases mentioned above)
    • Jitter behavior

Architecture & Design 🏗️

Excellent:

  • Follows existing patterns in the codebase (separate task modules with abort channels)
  • The dual-use of keepalive (during connection and hibernation) makes sense architecturally
  • Clean separation between the two contexts where keepalive is needed

Adherence to CLAUDE.md Standards ✅

  • ✅ Uses workspace dependencies correctly
  • ✅ Follows structured logging patterns with tracing
  • ✅ Error handling follows the custom error system
  • ✅ Uses hard tabs (consistent with rustfmt.toml)
  • ⚠️ Comment formatting needs minor fix

Summary & Recommendations

Approve with minor changes recommended:

  1. Fix comment spacing in keepalive_task.rs:13
  2. Improve error logging when awaiting task handles (don't ignore panics)
  3. Fix lifecycle result priority logic to handle all 4 tasks correctly
  4. Consider conditional spawning of keepalive task when can_hibernate is false
  5. Add test coverage for this functionality

The core logic is sound and this is a good refactoring. The issues identified are relatively minor but should be addressed to improve robustness.

Priority order:

  1. Fix lifecycle result priority logic (Medium - could cause incorrect behavior)
  2. Add error logging for task cleanup (Low - helps debugging)
  3. Fix comment formatting (Low - code quality)
  4. Add tests (Medium - prevents regressions)
  5. Conditional task spawning (Low - micro-optimization)

@NathanFlurry NathanFlurry force-pushed the 11-19-fix_pegboard-gateway_ping_hibernating_requests_both_during_open_hws_connections_and_during_hibernation branch from 815c918 to 986289f Compare November 20, 2025 20:51
@NathanFlurry NathanFlurry force-pushed the 11-19-fix_rivetkit_fix_graceful_runner_shutdown branch from a11d203 to 0ed702c Compare November 20, 2025 20:51
@claude
Copy link

claude bot commented Nov 20, 2025

Code Review

Overview

This PR refactors the keepalive logic for hibernating WebSocket requests by extracting it into a dedicated task module and running it alongside other lifecycle tasks. The change improves architecture and ensures hibernating requests are properly maintained both during active connections and during hibernation.

Positive Aspects ✅

  1. Good separation of concerns: Extracting keepalive logic into keepalive_task.rs follows the same pattern as ping_task.rs and ws_to_tunnel_task.rs, improving code organization.

  2. Consistent task lifecycle management: The new task integrates well with the existing abort channel pattern using watch::Receiver<()>.

  3. Proper jitter implementation: The rand::thread_rng().gen_range(0..128) prevents thundering herd issues when multiple requests ping simultaneously.

  4. Correct interval calculation: Using hibernating_request_eligible_threshold / 2 ensures pings happen well before the eligibility threshold expires.

Issues and Suggestions 🔍

1. Typo in Comment (Minor)

Location: engine/packages/pegboard-gateway/src/keepalive_task.rs:13

///Only ran for hibernating requests.

Issue: Missing space after ///.

Fix: Should be /// Only ran for hibernating requests.


2. Import Organization (Style)

Location: engine/packages/pegboard-gateway/src/lib.rs:10-17

The diff shows imports were reordered:

use rivet_guard_core::{
	WebSocketHandle,
	custom_serve::{CustomServeTrait, HibernationResult},
	errors::{ServiceUnavailable, WebSocketServiceUnavailable},
	proxy_service::{ResponseBody, is_ws_hibernate},
	request_context::RequestContext,
	websocket_handle::WebSocketReceiver,
};

Issue: While this appears to be an auto-formatting change, the CLAUDE.md explicitly states:

DO NOT RUN CARGO FMT AUTOMATICALLY

Recommendation: Verify if these import reorderings were intentional or accidental. If accidental, they should be reverted to avoid conflicts with the team's formatting workflow.


3. Potential Integer Overflow (Low Risk)

Location: engine/packages/pegboard-gateway/src/keepalive_task.rs:21-26

let mut ping_interval = tokio::time::interval(Duration::from_millis(
	(ctx.config()
		.pegboard()
		.hibernating_request_eligible_threshold()
		/ 2)
	.try_into()?,
));

Issue: The code divides i64 by 2 then converts to u64 with try_into()?. While hibernating_request_eligible_threshold() returns a positive value (defaults to 90,000ms), there's no guarantee at the type level.

Consideration:

  • If hibernating_request_eligible_threshold is ever negative, try_into() will fail
  • This is likely acceptable since negative thresholds don't make sense
  • The default value is 90,000ms (90 seconds), so the ping interval would be 45 seconds

Recommendation: Current implementation is acceptable, but consider adding a debug assertion or comment explaining the assumption.


4. Task Lifecycle Abort Pattern Inconsistency (Minor)

Location: engine/packages/pegboard-gateway/src/lib.rs:601-616

In handle_websocket_hibernation, the new keepalive task uses:

let _ = keepalive_abort_tx.send(());
let _ = keepalive_handle.await;

Compare to the old implementation in the same function:

keepalive_handle.abort();

Observation: The new pattern is cleaner and more consistent with other tasks in handle_websocket. However, note that let _ = keepalive_handle.await; ignores the join result.

Recommendation: Consider whether you want to log if the keepalive task panicked:

if let Err(e) = keepalive_handle.await {
    if e.is_panic() {
        tracing::error!(error = ?e, "keepalive task panicked");
    }
}

5. Keepalive Task Only Runs for can_hibernate Requests

Location: engine/packages/pegboard-gateway/src/lib.rs:480-482

if !can_hibernate {
	return Ok(LifecycleResult::Aborted);
}

Question: Is this the intended behavior? The PR title mentions "ping hibernating requests both during open hws connections and during hibernation", but the task immediately aborts if can_hibernate is false.

Clarification Needed:

  • Should the keepalive task run for all requests during the active WebSocket connection phase?
  • Or is it correct that only hibernatable requests need keepalive pings?

Based on the code, it appears correct that only hibernatable requests need keepalive tracking in UDB, but this should be confirmed.


6. Error Handling in Infinite Loop (Design Question)

Location: engine/packages/pegboard-gateway/src/keepalive_task.rs:43-51

ctx.op(pegboard::ops::actor::hibernating_request::upsert::Input {
	actor_id,
	gateway_id,
	request_id,
})
.await?;

Question: If the UDB upsert operation fails, the task will exit with an error. This will cascade to abort all other tasks.

Consideration: Is this the desired behavior? Alternatives could include:

  • Retry logic with exponential backoff
  • Logging the error and continuing the loop
  • Metric emission for failed keepalive attempts

Recommendation: The current "fail fast" approach is reasonable, but consider if transient database errors should be more resilient.


Security Considerations 🔒

No security concerns identified. The changes don't introduce new attack vectors or data exposure risks.


Performance Considerations ⚡

  1. Additional task overhead: The keepalive task adds one more concurrent task during WebSocket connections, but the overhead is minimal.

  2. Jitter range: The 0-128ms jitter is reasonable but fairly small. For large-scale deployments, consider if a larger range would better distribute database load.

  3. Database writes: Each hibernating request now writes to UDB every ~45 seconds. Ensure this scales appropriately with your expected number of concurrent hibernating connections.


Testing Recommendations 🧪

Consider adding tests for:

  1. Keepalive task properly aborts when signaled
  2. Keepalive pings occur at the expected interval
  3. Task cleanup when other tasks fail
  4. Behavior when hibernating_request_eligible_threshold has edge case values

Adherence to Project Conventions ✓

  • ✅ Uses anyhow::Result correctly
  • ✅ Structured logging with tracing (lowercase messages)
  • ✅ Follows existing task pattern architecture
  • ✅ Uses workspace dependencies appropriately
  • ⚠️ Potential auto-formatting changes to imports (see Issue [SVC-2479] Send cluster events to PostHog #2)

Summary

This is a solid refactoring that improves code organization and maintainability. The main concerns are:

  1. A minor typo in the comment
  2. Potential unintended import reformatting
  3. Questions about error handling strategy for database failures

The core logic appears sound and follows established patterns in the codebase. Once the minor issues are addressed, this should be good to merge.


Overall Assessment: ✅ Approve with minor suggested improvements

@claude
Copy link

claude bot commented Nov 20, 2025

Code Review

I've reviewed this PR and overall the changes look good! The refactoring properly extracts the keepalive logic into a dedicated module and correctly ensures it runs both during active WebSocket connections and hibernation.

Strengths

  1. Good separation of concerns: Extracting the keepalive logic improves code organization
  2. Proper task lifecycle management: Correctly integrated with the abort signal pattern
  3. Correct hibernation check: Only runs when can_hibernate is true
  4. Consistent error handling: Uses the same LifecycleResult pattern as other tasks

Issues Found

1. Typo in documentation (keepalive_task.rs:10)

  • "pings writes keepalive" should be "writes keepalive pings"

2. Missing space in documentation (keepalive_task.rs:13)

  • Should be: /// Only ran for hibernating requests.

3. Comment reference outdated (keepalive_task.rs:30-31)

  • The comment says "we've already called this above" but there's no code above in this new module that calls the upsert operation. This was copied from the original inline implementation and should be updated.

Recommendations

  1. Fix the typos in the documentation comments
  2. Update the misleading comment to accurately reflect the code flow
  3. Consider adding error context to .try_into()? on line 26 using .context() for easier debugging

Performance & Security

  • The jitter (0-128ms) helps prevent thundering herd issues ✅
  • The ping interval at half the threshold ensures timely updates ✅
  • MissedTickBehavior::Skip is appropriate ✅
  • No security concerns identified ✅

Overall Assessment: Solid refactoring with minor documentation fixes needed. Core logic is correct and properly integrated.

Great work! 🎉

@MasterPtato MasterPtato force-pushed the 11-19-fix_rivetkit_fix_graceful_runner_shutdown branch from 0ed702c to a11d203 Compare November 20, 2025 22:26
@MasterPtato MasterPtato force-pushed the 11-19-fix_pegboard-gateway_ping_hibernating_requests_both_during_open_hws_connections_and_during_hibernation branch from 986289f to 815c918 Compare November 20, 2025 22:26
@claude
Copy link

claude bot commented Nov 20, 2025

Code Review

This PR extracts keepalive logic into a separate task and ensures it runs both during active WebSocket connections and during hibernation. Overall, the change is well-structured and improves code organization.

✅ Strengths

  1. Good separation of concerns: Extracting keepalive logic into keepalive_task.rs improves modularity and mirrors the pattern used for ping_task.rs.

  2. Proper task lifecycle management: The use of watch channels for aborting tasks is consistent with existing patterns.

  3. Correct concurrent task handling: The tokio::join! properly handles all four tasks with proper abort coordination.

  4. Hibernation guard: The keepalive task correctly checks can_hibernate and immediately returns Aborted if not enabled.

🔍 Code Quality Observations

1. Minor typo in documentation (keepalive_task.rs:10-13)

Issues: pings writes keepalive should be writes keepalive pings, missing space before Only, grammar: Only ran should be Only runs.

2. Code duplication

Both handle_websocket and handle_websocket_hibernation spawn the keepalive task with nearly identical code (lib.rs:424-428 vs 604-611). Consider extracting to a helper method.

3. Result handling pattern (lib.rs:511-523)

The unlikely case comment suggests the logic might not handle all edge cases explicitly. Consider documenting what scenarios this covers.

4. Comment mismatch (keepalive_task.rs:30-32)

The comment says we've already called this above but there's no call above in the extracted function. Consider simplifying to: Discard the first tick since it fires immediately.

5. Import ordering (lib.rs:10-17)

The imports have inconsistent ordering within the braces. Consider grouping alphabetically for consistency.

🛡️ Security & Performance

No security concerns identified. The code properly validates inputs, uses safe interval-based timing with jitter to prevent thundering herd, and handles errors appropriately.

🧪 Testing

No tests found for the pegboard-gateway package. Consider adding tests for: keepalive task abort behavior, keepalive task continuing to write during both connection phases, edge cases around hibernation transitions.

📋 Minor Suggestions

  1. The try_into()? at keepalive_task.rs:26 could fail if threshold is too large. Consider adding context.

  2. Consider adding a trace log in the keepalive task when the upsert succeeds to help with debugging hibernation issues.

Summary

This is a solid refactoring that improves code organization and fixes the keepalive behavior for hibernating requests. The main issues are minor documentation and style improvements. The logic appears sound and properly handles the concurrent task lifecycle.

Recommendation: Approve with minor suggestions

@NathanFlurry NathanFlurry force-pushed the 11-19-fix_pegboard-gateway_ping_hibernating_requests_both_during_open_hws_connections_and_during_hibernation branch from 815c918 to 986289f Compare November 20, 2025 22:32
@MasterPtato MasterPtato force-pushed the 11-19-fix_rivetkit_fix_graceful_runner_shutdown branch from 0ed702c to a11d203 Compare November 20, 2025 22:38
@MasterPtato MasterPtato force-pushed the 11-19-fix_pegboard-gateway_ping_hibernating_requests_both_during_open_hws_connections_and_during_hibernation branch from 986289f to 815c918 Compare November 20, 2025 22:38
@claude
Copy link

claude bot commented Nov 20, 2025

Pull Request Review

Summary

This PR refactors the keepalive ping mechanism for hibernating WebSocket requests by extracting it into a dedicated task module and ensuring it runs both during active WebSocket connections (when hibernation is possible) and during actual hibernation periods.

Code Quality ✅

Strengths:

  • Good separation of concerns: The new keepalive_task.rs module cleanly separates the keepalive logic from the main lib, matching the existing pattern used for ping_task.rs and other task modules
  • Consistent error handling: Properly uses Result<LifecycleResult> return type and integrates well with the abort channel pattern
  • Proper cleanup: Uses watch channels for graceful shutdown instead of abort(), which is cleaner and more reliable
  • Jitter implementation: Smart addition of jitter (0-128ms) to prevent thundering herd problems when many connections ping simultaneously

Minor Issues:

  1. Documentation typo (line 13 in keepalive_task.rs):

    ///Only ran for hibernating requests.
    

    Should be:

    /// Only ran for hibernating requests.
    

    Missing space after ///

  2. Inconsistent clone naming (lines 425-428 in lib.rs):
    The clones use _clone suffix which is verbose. Consider either:

    • Using the variables directly in the spawn (preferred)
    • Or using more descriptive names if needed elsewhere

    Current:

    let ctx_clone = self.ctx.clone();
    let actor_id_clone = self.actor_id;
    let gateway_id_clone = self.shared_state.gateway_id();
    let request_id_clone = request_id;

Potential Bugs 🔍

  1. Missing variable in abort chain (lines 422, 501-502 in lib.rs):
    On line 422, you create ping_abort_tx2 but never create keepalive_abort_tx2. In the keepalive task (lines 499-502), you're using ws_to_tunnel_abort_tx2 and tunnel_to_ws_abort_tx2, but you should probably also have keepalive_abort_tx2 for consistency. Currently you're using variables from outer scope which works but breaks the pattern.

    Consider:

    let keepalive_abort_tx2 = keepalive_abort_tx.clone();

    And use it in the keepalive task's abort chain.

  2. Early return bypasses keepalive spawn (lines 481-483 in lib.rs):
    The keepalive task immediately returns Ok(LifecycleResult::Aborted) if can_hibernate is false, which is correct. However, this means the task never actually spawns the inner keepalive task. While functionally correct, this could be more explicit:

    async {
        if !can_hibernate {
            // Don't spawn keepalive for non-hibernating connections
            return Ok(LifecycleResult::Aborted);
        }
        // ... rest of code
    }

    Adding a comment would make the intent clearer.

Performance Considerations ⚡

  1. Interval calculation (lines 21-26 in keepalive_task.rs):
    Good: The interval is set to half the hibernating request threshold, which is appropriate for keepalive

    Potential issue: If hibernating_request_eligible_threshold is an odd number, integer division could cause issues. The .try_into()? will catch overflow but not precision loss. Consider documenting expected ranges or adding validation.

  2. Jitter range (line 43 in keepalive_task.rs):
    The 0-128ms jitter is reasonable but hardcoded. Consider:

    • Is 128ms appropriate for all scales?
    • Should it be configurable or proportional to the ping interval?
    • Document why 128ms was chosen
  3. Task spawning overhead: The refactor adds a new concurrent task that runs alongside other tasks. This is fine, but ensure that spawning 4 tasks per WebSocket connection scales appropriately with expected load.

Security Concerns 🔒

No significant security issues identified. The code properly:

  • Uses timeout mechanisms
  • Handles errors appropriately
  • Doesn't expose sensitive data
  • Uses proper channel-based communication

Test Coverage ⚠️

Major gap: No tests found for the pegboard-gateway package.

Recommendations:

  1. Add unit tests for keepalive_task::task:

    • Test that it pings at the correct interval
    • Test that it aborts properly when signaled
    • Test error handling for UDB operations
    • Test jitter behavior
  2. Add integration tests for the full lifecycle:

    • Test keepalive during active connection with can_hibernate=true
    • Test keepalive during actual hibernation
    • Test that keepalive properly aborts when connection closes
    • Test interaction between all 4 concurrent tasks
  3. Test edge cases:

    • What happens if hibernating_request_eligible_threshold is very small (e.g., 1)?
    • Verify no race conditions in task abortion

Additional Observations

  1. Pattern matching logic (lines 511-523 in lib.rs):
    The error prioritization logic is correct but has grown complex with 4 tasks. The current pattern:

    (Err(err), _, _, _) => Err(err),
    (_, Err(err), _, _) => Err(err),
    (_, _, Err(err), _) => Err(err),
    (_, _, _, Err(err)) => Err(err),

    This works but is verbose. No change needed now, but if you add more tasks, consider refactoring to iterate over results.

  2. Code style compliance: ✅ Follows the repository conventions:

    • Uses hard tabs (as specified in rustfmt.toml)
    • Proper structured logging with tracing
    • Follows existing patterns from neighboring files
    • Proper error handling with anyhow

Verdict

Approval recommendation: ✅ Approve with minor suggestions

The PR successfully addresses the issue of pinging hibernating requests during both connection phases. The refactoring is well-structured and follows existing patterns. The identified issues are minor and the code is functionally correct.

Before merging:

  • Fix the documentation typo
  • Consider adding test coverage (can be a follow-up PR)
  • Consider the variable clone naming suggestion

Nice to have:

  • Add comments explaining the early return in keepalive task
  • Document the jitter value choice
  • Add validation for ping interval edge cases

Great work on maintaining consistency with the existing codebase architecture!

@claude
Copy link

claude bot commented Nov 20, 2025

Pull Request Review

Summary

This PR refactors the keepalive logic for hibernating WebSocket requests by extracting it into a dedicated task module. The keepalive task now runs during both active WebSocket connections and hibernation periods, ensuring requests maintain their "alive" status consistently.


✅ Positive Observations

1. Good Code Organization

  • Extracting the keepalive logic into a separate module (keepalive_task.rs) improves maintainability and follows the existing pattern used by ping_task.rs
  • Clear separation of concerns makes the code easier to understand

2. Consistent Lifecycle Management

  • The keepalive task properly integrates into the existing abort channel pattern
  • Graceful shutdown is handled correctly via watch channels
  • The task aborts cleanly when other tasks complete

3. Proper Hibernation Handling

  • Keepalive only runs for hibernating requests (can_hibernate check on line 481)
  • Correctly maintains the hibernating request entry in both scenarios (active connection and hibernation)

🔍 Issues & Suggestions

1. Documentation Typo (Minor)

Location: keepalive_task.rs:10-13

/// Periodically pings writes keepalive in UDB. This is used to restore hibernating request IDs on
/// next actor start.
///
///Only ran for hibernating requests.

Issues:

  • Line 10: "pings writes" should be "writes" or "pings"
  • Line 13: Missing space after ///

Suggestion:

/// Periodically writes keepalive pings to UDB. This is used to restore hibernating request IDs on
/// next actor start.
///
/// Only ran for hibernating requests.

2. Potential JoinHandle Panic Unwrapping (Medium)

Location: lib.rs:493 and lib.rs:616

let res = keepalive.await?;

Issue:
The JoinHandle::await returns Result<T, JoinError> which is being unwrapped with ?. If the task panics, this will propagate a JoinError up through the task result matching logic. While this eventually gets handled, the error type mismatch could be confusing.

Current behavior:

  • If keepalive panics: Err(JoinError) → matched as (_, _, _, Err(err)) on line 517
  • This works but loses context about which task failed

Suggestion:
Consider explicitly handling the JoinError case or adding a comment explaining this behavior:

let res = keepalive.await.unwrap_or_else(|e| {
    tracing::error\!(error = ?e, "keepalive task panicked");
    Err(anyhow::anyhow\!("keepalive task panicked: {:?}", e))
});

Or add a comment:

// JoinError from panic will be treated as task error
let res = keepalive.await?;

3. Incomplete Result Matching Logic (Medium)

Location: lib.rs:510-523

The result matching logic prioritizes errors but doesn't handle all non-aborted result combinations:

// Prefer non aborted result if all succeed
(Ok(res), Ok(LifecycleResult::Aborted), _, _) => Ok(res),
(Ok(LifecycleResult::Aborted), Ok(res), _, _) => Ok(res),
// Unlikely case
(res, _, _, _) => res,

Issues:

  • Only checks first two tasks for non-aborted results
  • What if ping_res or keepalive_res contains the meaningful result?
  • The fallback (res, _, _, _) => res always returns tunnel_to_ws_res, ignoring the other three

Example problematic scenario:

tunnel_to_ws_res = Ok(LifecycleResult::Aborted)
ws_to_tunnel_res = Ok(LifecycleResult::Aborted)
ping_res = Ok(LifecycleResult::Aborted)
keepalive_res = Ok(LifecycleResult::ClientClose(...))

Result: Returns Aborted instead of ClientClose

Suggestion:
Use a more comprehensive matching strategy:

let mut lifecycle_res = match (tunnel_to_ws_res, ws_to_tunnel_res, ping_res, keepalive_res) {
    // Prefer error from any task
    (Err(err), _, _, _) | (_, Err(err), _, _) | (_, _, Err(err), _) | (_, _, _, Err(err)) => Err(err),
    
    // Find first non-aborted result
    (Ok(res @ (LifecycleResult::ServerClose(_) | LifecycleResult::ClientClose(_))), _, _, _) => Ok(res),
    (_, Ok(res @ (LifecycleResult::ServerClose(_) | LifecycleResult::ClientClose(_))), _, _) => Ok(res),
    (_, _, Ok(res @ (LifecycleResult::ServerClose(_) | LifecycleResult::ClientClose(_))), _) => Ok(res),
    (_, _, _, Ok(res @ (LifecycleResult::ServerClose(_) | LifecycleResult::ClientClose(_)))) => Ok(res),
    
    // All aborted
    _ => Ok(LifecycleResult::Aborted),
};

Or use an iterator approach:

let results = [tunnel_to_ws_res, ws_to_tunnel_res, ping_res, keepalive_res];

let mut lifecycle_res = results
    .into_iter()
    .find_map(|r| match r {
        Err(e) => Some(Err(e)),
        Ok(LifecycleResult::Aborted) => None,
        Ok(res) => Some(Ok(res)),
    })
    .unwrap_or(Ok(LifecycleResult::Aborted));

4. Variable Naming Inconsistency (Minor)

Location: lib.rs:424-428

let ctx_clone = self.ctx.clone();
let actor_id_clone = self.actor_id;
let gateway_id_clone = self.shared_state.gateway_id();
let request_id_clone = request_id;

Issue:
These variables use _clone suffix but some are already clones (ctx) while others are Copy types (UUIDs). The naming suggests they're all clones when some are just copies.

Suggestion:
Since these are only used once to pass to the keepalive spawn, consider passing them directly:

let keepalive = tokio::spawn(keepalive_task::task(
    self.ctx.clone(),
    self.actor_id,
    self.shared_state.gateway_id(),
    request_id,
    keepalive_abort_rx,
));

Or use more descriptive names if kept:

let keepalive_ctx = self.ctx.clone();
let keepalive_actor_id = self.actor_id;
// etc.

5. Missing Error Context (Minor)

Location: keepalive_task.rs:46-51

ctx.op(pegboard::ops::actor::hibernating_request::upsert::Input {
    actor_id,
    gateway_id,
    request_id,
})
.await?;

Suggestion:
Add context to the operation error for better debugging:

use anyhow::Context;

ctx.op(pegboard::ops::actor::hibernating_request::upsert::Input {
    actor_id,
    gateway_id,
    request_id,
})
.await
.context("failed to upsert hibernating request keepalive")?;

6. Race Condition Comment Mismatch (Minor)

Location: keepalive_task.rs:30-32

// Discard the first tick since it fires immediately and we've already called this
// above
ping_interval.tick().await;

Issue:
The comment says "we've already called this above" but there's no call above in the keepalive task. This comment appears to be copied from the original inline implementation where it made sense.

In lib.rs:579-587, there IS a call before the task starts:

self.ctx
    .op(pegboard::ops::actor::hibernating_request::upsert::Input {
        actor_id: self.actor_id,
        gateway_id: self.shared_state.gateway_id(),
        request_id,
    })
    .await?;

Suggestion:
Update the comment to reflect the actual situation:

// Discard the first tick since it fires immediately and the initial upsert
// has already been called before starting this task
ping_interval.tick().await;

🔒 Security Considerations

No security issues identified. The changes maintain the existing security posture.


🎯 Performance Considerations

Positive:

  • Jitter (0-128ms) prevents thundering herd problems ✅
  • MissedTickBehavior::Skip prevents backlog accumulation ✅
  • Interval is appropriately half of the threshold (45s default) ✅

Note:
The keepalive task now runs during active connections (when can_hibernate is true), adding periodic UDB writes that weren't happening before. This is the intended behavior per the PR title but worth noting for monitoring database load.


🧪 Test Coverage

Missing:
No test coverage found for the keepalive task. Consider adding tests for:

  1. Keepalive interval behavior - Verify it pings at the correct interval
  2. Graceful abort - Test that the abort channel correctly stops the task
  3. Error handling - Test behavior when UDB operations fail
  4. Integration test - Verify keepalive maintains request eligibility across hibernation cycles

Example test structure:

#[tokio::test]
async fn test_keepalive_task_aborts_cleanly() {
    let (abort_tx, abort_rx) = watch::channel(());
    let handle = tokio::spawn(keepalive_task::task(
        /* ... */
        abort_rx,
    ));
    
    abort_tx.send(()).unwrap();
    let result = handle.await.unwrap();
    
    assert\!(matches\!(result, Ok(LifecycleResult::Aborted)));
}

📝 Summary

Category Status
Code Quality ✅ Good - Clean refactoring
Potential Bugs ⚠️ Medium - Result matching logic needs improvement
Performance ✅ Good - Proper jitter and interval handling
Security ✅ Good - No issues identified
Test Coverage ❌ Missing - No tests for new module

Recommendation: Address the result matching logic issue (#3) before merging. Other issues are minor and can be addressed at discretion.


🔧 Action Items

Must Fix:

Should Fix:

Nice to Have:


Great work on improving the architecture! The refactoring makes the code more maintainable and the keepalive logic is now properly integrated into both connection modes. 🚀

@claude
Copy link

claude bot commented Nov 20, 2025

Code Review

Thanks for this PR! Overall, the changes look good and the refactoring improves code organization. Here's my detailed feedback:

✅ Strengths

  1. Good code organization: Extracting the keepalive logic into a separate module (keepalive_task.rs) improves readability and maintainability.

  2. Consistent abort handling: The keepalive task now uses the same watch channel pattern as other tasks (ping, tunnel_to_ws, ws_to_tunnel), which is consistent and clean.

  3. Proper cleanup: The change from keepalive_handle.abort() to using the watch channel for graceful shutdown (line 615-616 in lib.rs) is better practice and ensures proper task termination.

🔍 Code Quality Observations

  1. Comment typo (keepalive_task.rs:10)

    • Line 10 has a typo: "Periodically pings writes keepalive" should be "Periodically writes keepalive" or "Periodically pings by writing keepalive"
    • Missing space in line 13: "///Only ran" should be "/// Only ran"
  2. Import ordering (lib.rs:10-17)

    • The imports are slightly inconsistent. According to Rust conventions, imports from the same crate should be grouped together. The reordering from:
      use rivet_guard_core::{
          WebSocketHandle,
          custom_serve::{CustomServeTrait, HibernationResult},
          errors::{ServiceUnavailable, WebSocketServiceUnavailable},
          proxy_service::{ResponseBody, is_ws_hibernate},
          request_context::RequestContext,
          websocket_handle::WebSocketReceiver,
      };
    • To the current order is fine, but ensure it's consistent with the project's style (hard to tell from this snippet).
  3. Potential integer conversion issue (keepalive_task.rs:21-26)

    • The try_into()? on line 26 could theoretically fail if hibernating_request_eligible_threshold exceeds u64::MAX. Given the default is 90,000ms (90 seconds), this is unlikely but worth documenting or asserting at config validation time.

🐛 Potential Issues

  1. Task result handling inconsistency (lib.rs:511-523)

    • The result matching logic appears incomplete. Looking at lines 519-520:
      (Ok(res), Ok(LifecycleResult::Aborted), _, _) => Ok(res),
      (Ok(LifecycleResult::Aborted), Ok(res), _, _) => Ok(res),
    • This only handles cases where the first two tasks have results. What about cases where:
      • ping_res is the non-aborted result?
      • keepalive_res is the non-aborted result?
    • Consider a more comprehensive pattern match or add a comment explaining why these cases don't need handling.
  2. Race condition consideration

    • When can_hibernate is false, the keepalive task immediately returns LifecycleResult::Aborted (lib.rs:481-483). This is correct, but ensure that during the window between checking can_hibernate and spawning the task, state changes are handled properly.

🔒 Security Considerations

No security concerns identified. The code properly handles abort signals and doesn't introduce any new attack surfaces.

⚡ Performance Considerations

  1. Jitter implementation (keepalive_task.rs:42-44)

    • The jitter range (0-128ms) is reasonable for preventing thundering herd. Good practice!
    • However, this adds up to 128ms latency on each ping. For the default 90-second threshold, this is negligible, but worth documenting the rationale.
  2. Task spawning overhead

    • The keepalive task is now spawned even when can_hibernate is false, then immediately returns. Consider:
      // Option 1: Conditional spawn (more efficient)
      let keepalive_res = if can_hibernate {
          let keepalive = tokio::spawn(keepalive_task::task(...));
          keepalive.await?
      } else {
          Ok(LifecycleResult::Aborted)
      };
    • This would avoid unnecessary task creation, though the current approach is cleaner for the tokio::join! pattern.

📝 Test Coverage

I don't see any tests modified or added for this functionality. Consider adding tests for:

  • Keepalive task abort behavior
  • Keepalive task ping interval timing
  • Integration test for the 4-task coordination in handle_websocket
  • Edge case: what happens if hibernating_request_eligible_threshold is very small (e.g., < 256ms)?

🎯 Recommendations

High Priority:

  1. Fix the comment typos in keepalive_task.rs
  2. Review the result handling logic at lib.rs:511-523 to ensure all cases are covered

Medium Priority:
3. Add unit tests for the keepalive task
4. Document why the task spawn overhead (when can_hibernate=false) is acceptable

Low Priority:
5. Consider adding a minimum threshold validation for hibernating_request_eligible_threshold / 2 to ensure it's > 256ms (or document why it's not needed)

📊 Overall Assessment

This is a solid refactoring that improves code organization and consistency. The main concern is ensuring the result handling logic is complete and well-tested. Once the comment typos are fixed and the result matching is verified, this should be good to merge!

Recommendation: Approve with minor changes

@claude
Copy link

claude bot commented Nov 20, 2025

Code Review

Summary

This PR refactors the keepalive task for hibernating requests by extracting it into a separate module and running it both during active WebSocket connections and during hibernation. This ensures that hibernating request IDs are properly maintained in the database.

Positive Changes

Good Separation of Concerns: Extracting the keepalive logic into keepalive_task.rs improves code organization and reusability.

Correct Conditional Spawn: The keepalive task is now conditionally spawned based on can_hibernate (line 481-483 in lib.rs), preventing unnecessary work for non-hibernating connections.

Proper Task Coordination: The abort signal pattern is consistently implemented across all four concurrent tasks (tunnel_to_ws, ws_to_tunnel, ping, keepalive).

Graceful Cleanup: The code properly awaits the keepalive task completion (line 616) instead of using .abort(), which is a better practice.

Issues & Recommendations

1. Typo in Documentation Comment (keepalive_task.rs:10-13)

/// Periodically pings writes keepalive in UDB. This is used to restore hibernating request IDs on
/// next actor start.
///
///Only ran for hibernating requests.

Issues:

  • Line 10: "pings writes" should be "pings" or "writes"
  • Line 13: Missing space after /// (should be /// Only ran)
  • Line 13: Grammar - "Only ran" should be "Only runs" (present tense) or "Only run" (imperative)

Suggested fix:

/// Periodically writes keepalive in UDB. This is used to restore hibernating request IDs on
/// next actor start.
///
/// Only runs for hibernating requests.

2. Result Preference Logic May Drop Non-Aborted Results (lib.rs:511-523)

The result preference logic only checks the first two results for non-aborted values:

(Ok(res), Ok(LifecycleResult::Aborted), _, _) => Ok(res),
(Ok(LifecycleResult::Aborted), Ok(res), _, _) => Ok(res),
// Unlikely case
(res, _, _, _) => res,

Issue: If tunnel_to_ws and ws_to_tunnel both abort, but ping or keepalive returns a non-aborted result (like ServerClose or ClientClose), the fallback (res, _, _, _) will return the first result (Aborted) instead of the meaningful third or fourth result.

Suggested fix:

// Prefer error
(Err(err), _, _, _) => Err(err),
(_, Err(err), _, _) => Err(err),
(_, _, Err(err), _) => Err(err),
(_, _, _, Err(err)) => Err(err),
// Prefer non-aborted result if all succeed
(Ok(res @ LifecycleResult::ServerClose(_)), _, _, _)
| (_, Ok(res @ LifecycleResult::ServerClose(_)), _, _)
| (_, _, Ok(res @ LifecycleResult::ServerClose(_)), _)
| (_, _, _, Ok(res @ LifecycleResult::ServerClose(_))) => Ok(res),
(Ok(res @ LifecycleResult::ClientClose(_)), _, _, _)
| (_, Ok(res @ LifecycleResult::ClientClose(_)), _, _)
| (_, _, Ok(res @ LifecycleResult::ClientClose(_)), _)
| (_, _, _, Ok(res @ LifecycleResult::ClientClose(_))) => Ok(res),
// All aborted (or unlikely mix)
(res, _, _, _) => res,

Alternatively, iterate through all results to find the first non-aborted one.

3. Minor: Import Ordering (lib.rs:10-17)

The imports are slightly inconsistent with alphabetical/logical ordering:

use rivet_guard_core::{
	WebSocketHandle,
	custom_serve::{CustomServeTrait, HibernationResult},
	errors::{ServiceUnavailable, WebSocketServiceUnavailable},
	proxy_service::{ResponseBody, is_ws_hibernate},
	request_context::RequestContext,
	websocket_handle::WebSocketReceiver,
};

The WebSocketHandle import is before nested imports. According to CLAUDE.md, imports should be at the top of the file (not inline), but there's no specific ordering requirement. Consider grouping by alphabetical order for consistency.

4. Missing Error Context (keepalive_task.rs:21-26)

The .try_into()? on line 26 could fail with a non-descriptive error:

let mut ping_interval = tokio::time::interval(Duration::from_millis(
	(ctx.config()
		.pegboard()
		.hibernating_request_eligible_threshold()
		/ 2)
	.try_into()?,
));

Recommendation: Add context to the error:

.try_into()
.context("hibernating_request_eligible_threshold overflow when converting to u64")?,

Performance & Security

Performance: The jitter mechanism (line 43) is good for preventing thundering herd problems.

Security: No security concerns identified. The keepalive task only updates timestamps in the database.

Test Coverage

⚠️ Test Coverage: This PR doesn't include tests. Consider adding:

  1. Unit test for keepalive_task to verify it sends periodic updates
  2. Integration test to verify keepalive runs during both active connections and hibernation
  3. Test to verify keepalive is NOT spawned when can_hibernate is false

Conclusion

Overall, this is a solid refactoring that improves code organization. The main issues are:

  1. Documentation typo (minor but should be fixed)
  2. Result preference logic may miss meaningful non-aborted results (should be fixed)
  3. Missing error context (nice to have)
  4. Missing tests (recommended)

Recommendation: Request changes for items 1 and 2, consider adding item 3, and encourage adding tests in a follow-up PR if not in this one.

@NathanFlurry NathanFlurry force-pushed the 11-19-fix_pegboard-gateway_ping_hibernating_requests_both_during_open_hws_connections_and_during_hibernation branch from 815c918 to 986289f Compare November 20, 2025 23:32
@claude
Copy link

claude bot commented Nov 20, 2025

PR Review: Ping Hibernating Requests During Open Connections

This PR refactors the keepalive task for hibernating WebSocket requests. Previously, the keepalive task only ran during actual hibernation. Now it runs continuously during open WebSocket connections (when hibernation is enabled).

Code Quality - Strengths

  1. Good separation of concerns - Extracting keepalive logic into a dedicated module follows the existing pattern
  2. Consistent error handling using anyhow::Result and LifecycleResult enum
  3. Proper task coordination with abort signaling
  4. Follows project conventions (hard tabs, structured logging, workspace dependencies)
  5. Import cleanup - Removed unused JoinHandle import

Issues Found

Minor: Documentation Typos

  • keepalive_task.rs:10 - "Periodically pings writes keepalive" should be "Periodically writes keepalive pings"
  • keepalive_task.rs:13 - Missing space: "///Only" should be "/// Only"

Logic: Result Matching (lib.rs:511-522)

The match expression only handles cases where the first two tasks produce results. If ping_res or keepalive_res contain non-aborted results while both tunnel tasks are aborted, those results would be ignored. Likely not a practical issue given abort signaling.

Performance

  • Jitter (0-128ms) prevents thundering herd - good
  • Interval (45s default) provides good margin before 90s threshold
  • Task spawning overhead is minimal with efficient early return

Security

No issues identified - no new external inputs, no auth changes, uses existing DB operations.

Test Coverage

Concern: No tests added. Recommend adding tests for:

  1. Keepalive pings during active connections
  2. Proper abort on connection close
  3. Timing verification

Verdict

Overall: Solid refactoring with good code organization

Approve with minor suggestions

Action items (non-blocking):

  1. Fix documentation typos
  2. Consider result matching logic
  3. Add test coverage

@MasterPtato MasterPtato force-pushed the 11-19-fix_rivetkit_fix_graceful_runner_shutdown branch from 0ed702c to a11d203 Compare November 21, 2025 00:48
@MasterPtato MasterPtato force-pushed the 11-19-fix_pegboard-gateway_ping_hibernating_requests_both_during_open_hws_connections_and_during_hibernation branch from 986289f to 815c918 Compare November 21, 2025 00:48
@claude
Copy link

claude bot commented Nov 21, 2025

Code Review

Overview

This PR refactors the keepalive task for hibernating WebSocket requests by extracting it into a separate module and running it during both active WebSocket connections and hibernation periods. The change ensures that hibernating requests are consistently pinged to maintain their eligibility status.

✅ Strengths

  1. Good Code Organization: Extracting the keepalive logic into a dedicated module (keepalive_task.rs) improves code maintainability and reusability.

  2. Proper Cleanup: The PR correctly handles task abortion using watch channels and waits for task completion before proceeding, avoiding resource leaks.

  3. Consistent Patterns: The new task follows the same patterns as ping_task.rs with abort handling via watch::Receiver.

  4. Jitter Implementation: The random jitter (0-128ms) prevents stampeding herd problems when multiple requests ping simultaneously.

🔍 Issues & Concerns

1. Documentation Typo (Minor)

/// Periodically pings writes keepalive in UDB. This is used to restore hibernating request IDs on
///
///Only ran for hibernating requests.
  • Line 10: "pings writes" should be "writes" or "pings"
  • Line 13: Missing space after ///
  • Line 13: "Only ran" should be "Only run"

Suggestion:

/// Periodically writes keepalive data to UDB. This is used to restore hibernating request IDs on
/// next actor start.
///
/// Only run for hibernating requests.

2. Result Precedence Logic May Be Incomplete (Medium)

// Prefer non aborted result if all succeed
(Ok(res), Ok(LifecycleResult::Aborted), _, _) => Ok(res),
(Ok(LifecycleResult::Aborted), Ok(res), _, _) => Ok(res),
// Unlikely case
(res, _, _, _) => res,

This logic only handles cases where one of the first two tasks returns a non-aborted result. It doesn't handle:

  • Ping task returning non-aborted while others are aborted: (Ok(Aborted), Ok(Aborted), Ok(res), _)
  • Keepalive task returning non-aborted while others are aborted: (Ok(Aborted), Ok(Aborted), Ok(Aborted), Ok(res))

Impact: If the ping or keepalive task completes first with a non-aborted result, it would fall through to the "unlikely case" and return the first task's aborted result, potentially losing important lifecycle information.

Suggestion: Add comprehensive pattern matching:

// Prefer non aborted result if all succeed
(Ok(res @ LifecycleResult::ServerClose(_)), _, _, _)
| (_, Ok(res @ LifecycleResult::ServerClose(_)), _, _)
| (_, _, Ok(res @ LifecycleResult::ServerClose(_)), _)
| (_, _, _, Ok(res @ LifecycleResult::ServerClose(_))) => Ok(res),

(Ok(res @ LifecycleResult::ClientClose(_)), _, _, _)
| (_, Ok(res @ LifecycleResult::ClientClose(_)), _, _)
| (_, _, Ok(res @ LifecycleResult::ClientClose(_)), _)
| (_, _, _, Ok(res @ LifecycleResult::ClientClose(_))) => Ok(res),

// All aborted
_ => Ok(LifecycleResult::Aborted),

3. Potential Race Condition (Low)

In handle_websocket_hibernation, the keepalive task is spawned and then aborted immediately after handle_websocket_hibernation_inner completes. If the inner function returns quickly, the keepalive task might not have completed its first upsert operation at line 580-587 before being aborted at line 615.

While this is unlikely to cause issues (the initial upsert at line 580-587 happens before spawning the task), it's worth verifying that the timing is correct.

4. Error Handling in TryInto (Low)

(ctx.config()
    .pegboard()
    .hibernating_request_eligible_threshold()
    / 2)
.try_into()?,

The error from try_into could be more descriptive. If hibernating_request_eligible_threshold / 2 doesn't fit into the target type, the error message won't indicate what went wrong.

Suggestion: Add context:

.try_into()
.context("hibernating_request_eligible_threshold / 2 overflow")?

5. Import Organization (Style)

The imports in lib.rs were reformatted (lines 10-25), mixing alphabetical ordering with grouped imports. While this follows the codebase style, ensure it's consistent with the project's import conventions.

🧪 Test Coverage

Concern: No tests were added or modified in this PR. Given the complexity of the lifecycle management and the new keepalive task integration, consider adding tests for:

  1. Keepalive task properly updates UDB at the expected interval
  2. Keepalive task respects abort signals
  3. Result precedence logic correctly prioritizes non-aborted results
  4. Keepalive task only runs when can_hibernate is true

📊 Performance Considerations

The keepalive task runs continuously during active WebSocket connections when can_hibernate is true, adding periodic database operations. Ensure that:

  • The UDB upsert operation is efficient
  • The interval (hibernating_request_eligible_threshold / 2) is appropriately tuned
  • The jitter range (0-128ms) is sufficient for your scale

🔒 Security

No security concerns identified. The keepalive mechanism appears to be a legitimate internal operation.

📝 Recommendations

  1. High Priority: Fix the result precedence logic to handle all task completion scenarios
  2. Medium Priority: Correct documentation typos
  3. Medium Priority: Add test coverage for the new keepalive task
  4. Low Priority: Improve error messages for try_into
  5. Low Priority: Verify timing in handle_websocket_hibernation to ensure no race conditions

Overall, this is a solid refactoring that improves code organization. The main concern is the result precedence logic which should be addressed before merging.

@claude
Copy link

claude bot commented Nov 21, 2025

PR Review: Hibernating Request Keepalive Enhancement

Summary

This PR improves the keepalive mechanism for hibernating WebSocket requests by ensuring they are pinged both during active WebSocket connections and during hibernation periods. The implementation extracts the keepalive logic into a dedicated module and runs it as a parallel task.


Positive Aspects

Good architectural separation - Extracting keepalive logic into keepalive_task.rs follows the same pattern as ping_task.rs and ws_to_tunnel_task.rs, improving code organization and maintainability.

Proper task lifecycle management - The keepalive task is correctly integrated with abort channels and proper cleanup on task completion.

Consistent error handling - Uses the same LifecycleResult pattern as other tasks in the module.

Jitter implementation - The random jitter (0-128ms) prevents thundering herd problems when multiple requests update simultaneously.


Issues & Recommendations

🔴 Critical: Missing Space in Comment (keepalive_task.rs:13)

///Only ran for hibernating requests.

Should be:

/// Only ran for hibernating requests.

Missing space after /// violates Rust documentation standards.


🟡 Medium: Potential Integer Overflow (keepalive_task.rs:21-26)

let mut ping_interval = tokio::time::interval(Duration::from_millis(
    (ctx.config()
        .pegboard()
        .hibernating_request_eligible_threshold()
        / 2)
    .try_into()?,
));

Issue: The division by 2 happens on i64, then converts to u64 for Duration::from_millis. While try_into() handles negative values, the default threshold is 90,000ms → 45,000ms interval, which is reasonable. However, if someone configures a negative value, try_into() will error.

Recommendation: Add validation or use saturating_div(2) and handle edge cases more explicitly:

let threshold_ms = ctx.config()
    .pegboard()
    .hibernating_request_eligible_threshold()
    .max(1000) // Ensure minimum reasonable threshold
    / 2;
let ping_interval_ms: u64 = threshold_ms.try_into()
    .context("hibernating_request_eligible_threshold must be positive")?;

🟡 Medium: Task Result Priority Logic (lib.rs:511-523)

let mut lifecycle_res = match (tunnel_to_ws_res, ws_to_tunnel_res, ping_res, keepalive_res) {
    // Prefer error
    (Err(err), _, _, _) => Err(err),
    (_, Err(err), _, _) => Err(err),
    (_, _, Err(err), _) => Err(err),
    (_, _, _, Err(err)) => Err(err),
    // Prefer non aborted result if all succeed
    (Ok(res), Ok(LifecycleResult::Aborted), _, _) => Ok(res),
    (Ok(LifecycleResult::Aborted), Ok(res), _, _) => Ok(res),
    // Unlikely case
    (res, _, _, _) => res,
};

Issue: The priority logic only checks the first two tasks for non-aborted results. If ping_res or keepalive_res complete with a non-aborted result while the first two are aborted, they won't be selected.

Recommendation: Make the priority logic exhaustive:

let mut lifecycle_res = match (tunnel_to_ws_res, ws_to_tunnel_res, ping_res, keepalive_res) {
    // Prefer error
    (Err(err), _, _, _) => Err(err),
    (_, Err(err), _, _) => Err(err),
    (_, _, Err(err), _) => Err(err),
    (_, _, _, Err(err)) => Err(err),
    // Prefer non-aborted result if all succeed
    (Ok(res @ LifecycleResult::ServerClose(_)), _, _, _) 
    | (_, Ok(res @ LifecycleResult::ServerClose(_)), _, _)
    | (_, _, Ok(res @ LifecycleResult::ServerClose(_)), _)
    | (_, _, _, Ok(res @ LifecycleResult::ServerClose(_))) => Ok(res),
    (Ok(res @ LifecycleResult::ClientClose(_)), _, _, _)
    | (_, Ok(res @ LifecycleResult::ClientClose(_)), _, _)
    | (_, _, Ok(res @ LifecycleResult::ClientClose(_)), _)
    | (_, _, _, Ok(res @ LifecycleResult::ClientClose(_))) => Ok(res),
    // All aborted
    (Ok(LifecycleResult::Aborted), Ok(LifecycleResult::Aborted), _, _) => Ok(LifecycleResult::Aborted),
    (res, _, _, _) => res,
};

Or use a helper function for clearer logic.


🟢 Minor: Redundant Variable Cloning (lib.rs:424-428)

let ctx_clone = self.ctx.clone();
let actor_id_clone = self.actor_id;
let gateway_id_clone = self.shared_state.gateway_id();
let request_id_clone = request_id;

Issue: These clones are only used once in the tokio::join\! below. The naming with _clone suffix is slightly verbose.

Recommendation: Either inline these into the spawn call or use more descriptive names if they serve a documentation purpose. This is purely stylistic.


🟢 Minor: Inconsistent Abort Order (lib.rs:439-441, 455-457, 471-473, 499-501)

The order of abort signals varies across the different task completion handlers. While this doesn't affect correctness (all are independent), consistency would improve readability.

Recommendation: Use consistent ordering, e.g., always ping → ws_to_tunnel → tunnel_to_ws → keepalive.


🔵 Question: Performance - Unnecessary Task Spawn in Non-Hibernating Case

In lib.rs:480-507, when can_hibernate is false, the code still wraps the immediate return in a tokio::spawn:

async {
    if \!can_hibernate {
        return Ok(LifecycleResult::Aborted);
    }
    // ... spawn keepalive task
}

Question: Could this be optimized to conditionally spawn the task outside the tokio::join\! to avoid the overhead of spawning an async block that immediately returns?

let keepalive_res = if can_hibernate {
    let keepalive = tokio::spawn(keepalive_task::task(...));
    // Handle in join
} else {
    // Skip entirely or create a completed future
};

However, this would require restructuring the tokio::join\! pattern. Current approach is simpler and overhead is minimal, so this is a minor optimization opportunity.


Test Coverage

⚠️ No tests included - This PR doesn't add tests for the new keepalive task behavior. Consider adding:

  1. Unit tests for keepalive_task::task to verify interval timing and abort handling
  2. Integration tests to ensure keepalive pings occur during both active connections and hibernation
  3. Tests verifying that the keepalive task properly cleans up on abort

Security & Performance

No security concerns - The changes don't introduce new attack vectors.

Performance impact minimal - The keepalive task runs at 45-second intervals (default), which is very low overhead.


Conclusion

Overall Assessment: Approve with minor fixes recommended

The PR successfully addresses the issue of keeping hibernating requests alive during both active connections and hibernation. The code follows existing patterns well and integrates cleanly with the existing task lifecycle management.

Required fixes:

  • Fix the documentation comment spacing in keepalive_task.rs:13

Recommended improvements:

  • Add validation for the threshold configuration
  • Improve the result priority logic for better coverage
  • Add test coverage

Style improvements (optional):

  • Consistent abort signal ordering
  • Consider optimization for non-hibernating case

Great work on improving the hibernation reliability! 🚀

@NathanFlurry NathanFlurry force-pushed the 11-19-fix_pegboard-gateway_ping_hibernating_requests_both_during_open_hws_connections_and_during_hibernation branch from 815c918 to 986289f Compare November 21, 2025 01:27
@NathanFlurry NathanFlurry force-pushed the 11-19-fix_rivetkit_fix_graceful_runner_shutdown branch from a11d203 to 0ed702c Compare November 21, 2025 01:27
@claude
Copy link

claude bot commented Nov 21, 2025

Code Review - PR #3498

Summary

This PR refactors the keepalive task for hibernating WebSocket requests by extracting it into a separate module and ensuring it runs both during active WebSocket connections (when can_hibernate is true) and during actual hibernation periods.

Positive Aspects

  1. Good separation of concerns: Extracting the keepalive logic into keepalive_task.rs improves code organization and reusability.

  2. Consistent cleanup pattern: The use of watch channels for graceful task shutdown is well-implemented and follows the existing pattern used by other tasks (ping, tunnel_to_ws, ws_to_tunnel).

  3. Proper jitter implementation: The random jitter (0-128ms) prevents stampeding herd issues when multiple requests update their keepalive timestamps.

  4. Correct interval calculation: Using hibernating_request_eligible_threshold / 2 ensures keepalive pings happen frequently enough to prevent requests from being considered disconnected.

Issues & Concerns

1. Typo in Documentation (Minor)

Location: keepalive_task.rs:10

///Only ran for hibernating requests.

Should be:

/// Only ran for hibernating requests.

Missing space after ///.

2. Potential Integer Overflow (Low severity)

Location: keepalive_task.rs:21-26

let mut ping_interval = tokio::time::interval(Duration::from_millis(
    (ctx.config()
        .pegboard()
        .hibernating_request_eligible_threshold()
        / 2)
    .try_into()?,
));

While the try_into() will catch overflow at runtime, the division by 2 is performed on i64 before conversion to u64 for Duration::from_millis. The default threshold is 90,000ms, so overflow is unlikely, but consider using checked arithmetic for robustness:

let interval_ms = ctx.config()
    .pegboard()
    .hibernating_request_eligible_threshold()
    .checked_div(2)
    .context("invalid hibernating_request_eligible_threshold")?;
let interval_ms: u64 = interval_ms.try_into()?;
let mut ping_interval = tokio::time::interval(Duration::from_millis(interval_ms));

3. Task Ordering Logic Could Be Clearer (Minor)

Location: lib.rs:511-523

The pattern matching logic for determining the final lifecycle_res is somewhat difficult to reason about:

let mut lifecycle_res = match (tunnel_to_ws_res, ws_to_tunnel_res, ping_res, keepalive_res) {
    (Err(err), _, _, _) => Err(err),
    (_, Err(err), _, _) => Err(err),
    (_, _, Err(err), _) => Err(err),
    (_, _, _, Err(err)) => Err(err),
    (Ok(res), Ok(LifecycleResult::Aborted), _, _) => Ok(res),
    (Ok(LifecycleResult::Aborted), Ok(res), _, _) => Ok(res),
    (res, _, _, _) => res,
};

This prioritizes errors from tasks in order (tunnel_to_ws > ws_to_tunnel > ping > keepalive), but the last two patterns only handle the first two tasks' results when deciding between Aborted and non-Aborted. The ping and keepalive results are not considered in the "prefer non-aborted" logic. This might be intentional, but it's worth verifying if this is the desired behavior.

4. Missing Error Context (Minor)

Location: keepalive_task.rs:46-51

When the database operation fails, it propagates up without context:

ctx.op(pegboard::ops::actor::hibernating_request::upsert::Input {
    actor_id,
    gateway_id,
    request_id,
})
.await?;

Consider adding context for debugging:

ctx.op(pegboard::ops::actor::hibernating_request::upsert::Input {
    actor_id,
    gateway_id,
    request_id,
})
.await
.context("failed to upsert hibernating request keepalive")?;

Performance Considerations

  1. Database load: The keepalive task now runs during active WebSocket connections (when can_hibernate is true), not just during hibernation. This means more frequent database writes (every ~45 seconds with the default 90s threshold). Ensure this doesn't cause excessive database load in scenarios with many concurrent hibernation-capable WebSocket connections.

  2. Task overhead: Adding a fourth concurrent task (keepalive) to the WebSocket lifecycle adds some overhead. However, since it only runs when can_hibernate is true and the logic is simple, this should be minimal.

Test Coverage

No tests were added or modified in this PR. Consider adding tests for:

  • Keepalive task starting/stopping correctly based on can_hibernate flag
  • Keepalive pings occurring at the expected interval
  • Proper cleanup when tasks are aborted
  • Behavior when database operations fail

Security Concerns

No security issues identified. The changes maintain the existing security boundaries.

Recommendations

  1. Fix the documentation typo in keepalive_task.rs:10
  2. Consider using checked arithmetic for the interval calculation
  3. Add error context to the database operation in the keepalive loop
  4. Verify the task result prioritization logic matches intended behavior
  5. Add test coverage for the new keepalive task behavior

Conclusion

The PR successfully addresses the goal of ensuring hibernating requests are pinged during both active connections and hibernation. The code quality is good overall with just minor improvements suggested. The refactoring improves maintainability by extracting the keepalive logic into its own module.

Status: Approve with minor suggestions for improvement.

@claude
Copy link

claude bot commented Nov 21, 2025

Code Review

Overview

This PR extracts the keepalive logic for hibernating requests into a dedicated task module and ensures it runs both during active WebSocket connections and during hibernation. This is a good refactoring that improves code organization and fixes a potential issue with keepalive timing.


Positive Aspects

Good Refactoring: Extracting the keepalive logic into keepalive_task.rs follows the existing pattern used by ping_task.rs and improves code maintainability.

Proper Cancellation: The task correctly uses watch::Receiver for graceful shutdown and returns LifecycleResult::Aborted when cancelled.

Consistent Pattern: The implementation follows the same abort/cleanup pattern as the other lifecycle tasks (tunnel_to_ws, ws_to_tunnel, ping).

Anti-Stampede: The jitter mechanism (rand::thread_rng().gen_range(0..128)) helps prevent thundering herd issues.


Issues Found

🐛 Critical: Typo in Documentation (Line 13)

///Only ran for hibernating requests.

Missing space after ///. Should be:

/// Only ran for hibernating requests.

🐛 Critical: Comment Inaccuracy (Line 30-31)

// Discard the first tick since it fires immediately and we've already called this
// above

This comment refers to "already called this above" but that's incorrect in this context. In the hibernation path (handle_websocket_hibernation at lib.rs:571), the upsert operation IS called before spawning this task (line 582-587), so the comment makes sense there. However, in the WebSocket connection path (handle_websocket at lib.rs:396+), this task only runs if can_hibernate is true (line 474-475), and there's no prior upsert call in that path.

Recommendation: Update the comment to be more accurate:

// Discard the first tick since it fires immediately

⚠️ Potential Issue: Inconsistent Keepalive Behavior

The keepalive task now runs in two different contexts:

  1. During active WebSocket connection (lib.rs:474-507): Only if can_hibernate is true
  2. During hibernation (lib.rs:603-616): Always runs

In the active connection case, the task is spawned alongside the other lifecycle tasks but immediately returns Aborted if can_hibernate is false (line 474-476). This works but adds an unnecessary task spawn for non-hibernating connections.

Recommendation: Consider using an Option<JoinHandle> pattern to avoid spawning the task when not needed:

let keepalive = if can_hibernate {
    Some(tokio::spawn(keepalive_task::task(...)))
} else {
    None
};

⚠️ Minor: Potential Integer Conversion Issues

At keepalive_task.rs:21-27, the conversion from i64 to u64 via try_into()? could theoretically fail if hibernating_request_eligible_threshold is negative (though the default is 90,000ms). The error handling is correct, but this seems unlikely to occur in practice given the configuration.


Code Quality

Good:

  • Follows Rust idioms and error handling patterns
  • Proper use of structured logging with tracing
  • Clean separation of concerns
  • Follows existing code style in the repository

Import Organization: The imports in lib.rs have been auto-sorted (lines 10-17), which is fine, but note that the CLAUDE.md states "Always add imports at the top of the file inside of inline within the function." This appears to be a misunderstanding of that guideline - the imports at module level are correct.


Security Considerations

✅ No security concerns identified. The keepalive mechanism correctly updates distributed state and doesn't introduce any new attack vectors.


Performance Considerations

Jitter Implementation: The 0-128ms jitter is a good practice for distributed systems.

⚠️ Minor Overhead: Spawning the keepalive task even when can_hibernate is false adds minimal overhead (one task spawn that immediately returns), but it's worth considering the optimization mentioned above.


Test Coverage

Missing Tests: There are no test files for the pegboard-gateway package. While this PR maintains existing behavior, integration tests would be valuable for:

  • Verifying keepalive task runs during active connections (when can_hibernate is true)
  • Verifying keepalive task runs during hibernation
  • Verifying proper cleanup when tasks are aborted
  • Testing the jitter mechanism doesn't cause timing issues

Note: This is not a blocker for this PR, but worth considering for future work.


Recommendations Summary

Must Fix:

  1. Fix the typo in the doc comment (line 13)

Should Consider:
2. Update the comment at line 30-31 to be more accurate
3. Consider using Option<JoinHandle> to avoid spawning unnecessary tasks

Nice to Have:
4. Add integration tests for keepalive behavior


Verdict

This is a solid refactoring that improves code organization. The critical issues are minor (typo and comment accuracy). The code correctly implements the desired behavior of running keepalive both during active hibernation-capable connections and during actual hibernation.

Recommendation: Approve with minor fixes requested.

@graphite-app
Copy link
Contributor

graphite-app bot commented Nov 21, 2025

Merge activity

  • Nov 21, 2:40 AM UTC: NathanFlurry added this pull request to the Graphite merge queue.
  • Nov 21, 2:42 AM UTC: CI is running for this pull request on a draft pull request (#3515) due to your merge queue CI optimization settings.
  • Nov 21, 2:43 AM UTC: Merged by the Graphite merge queue via draft PR: #3515.

graphite-app bot pushed a commit that referenced this pull request Nov 21, 2025
@graphite-app graphite-app bot closed this Nov 21, 2025
@graphite-app graphite-app bot deleted the 11-19-fix_pegboard-gateway_ping_hibernating_requests_both_during_open_hws_connections_and_during_hibernation branch November 21, 2025 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants