Skip to content

Conversation

@MasterPtato
Copy link
Contributor

No description provided.

@vercel
Copy link

vercel bot commented Nov 27, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

4 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
rivet-cloud Ignored Ignored Nov 27, 2025 0:58am
rivet-inspector Ignored Ignored Preview Nov 27, 2025 0:58am
rivet-site Ignored Ignored Preview Nov 27, 2025 0:58am
rivetkit-serverless Skipped Skipped Nov 27, 2025 0:58am

Copy link
Contributor Author

MasterPtato commented Nov 27, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@pkg-pr-new
Copy link

pkg-pr-new bot commented Nov 27, 2025

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@3550

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@3550

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@3550

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@3550

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@3550

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@3550

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@3550

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@3550

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@3550

commit: f5a5102

@claude
Copy link

claude bot commented Nov 27, 2025

Code Review

Thank you for this PR! This adds important shutdown observability and graceful termination handling. Here is my feedback:

Strengths

  1. Good observability improvements - The periodic shutdown progress messages will help operators understand what is happening during deployments
  2. Consistent pattern - You applied the same monitoring pattern across worker.rs and server.rs
  3. Proper WebSocket signaling - The new GoingAway error with structured error handling follows the existing pattern nicely
  4. Atomic operations - Correct use of Ordering::Acquire and Ordering::Release for reading/writing the hyper_shutdown flag

Issues and Suggestions

1. Missing SIGTERM handling in tunnel_to_ws_task.rs

The ws_to_tunnel_task.rs now handles SIGTERM and sends the GoingAway error, but tunnel_to_ws_task.rs does not have the same logic. This creates an asymmetry in how the two directions handle graceful shutdown.

Suggestion: Add the same SIGTERM handling to tunnel_to_ws_task.rs for consistency. The runner should evict gracefully from both directions.

2. Missing newline in JSON error file

The file engine/artifacts/errors/ws.going_away.json is missing a trailing newline, which violates common conventions and may cause issues with some tools.

Suggestion: Add a newline at the end of the JSON file.

3. Memory ordering is correct

You are using Ordering::Release for the store and Ordering::Acquire for the load on hyper_shutdown. This is correct for ensuring visibility of the write to readers. Good job!

4. Consider extracting the shutdown interval constant

Both gasoline/worker.rs and guard-core/server.rs define SHUTDOWN_PROGRESS_INTERVAL as Duration::from_secs(7).

Suggestion: If you expect this to be consistent across services, consider extracting it to a shared configuration or runtime module.

Test Coverage

I did not see any tests added for the new shutdown progress logging, the GoingAway error path in websocket handling, or the remaining_tasks method.

Suggestion: Consider adding integration tests that verify SIGTERM triggers graceful shutdown with progress messages and WebSocket connections receive the proper GoingAway close code.

Minor Observations

  1. Log message consistency: You changed requests to tasks in guard shutdown logs, which is more accurate given you are tracking task groups. Nice improvement!

  2. Interval tick pattern: The pattern of calling progress_interval.tick().await immediately after creating the interval is correct - it consumes the first immediate tick so the first real tick happens after the interval duration.

Priority Fixes

  1. High: Add SIGTERM handling to tunnel_to_ws_task.rs for consistency
  2. Low: Add trailing newline to ws.going_away.json
  3. Optional: Extract SHUTDOWN_PROGRESS_INTERVAL to shared constant

Overall, this is a solid improvement to shutdown observability! The main concern is the missing SIGTERM handling in the tunnel-to-ws direction.


Review generated with Claude Code

@MasterPtato MasterPtato force-pushed the 11-26-fix_runtime_evict_runners_on_sigterm_add_periodic_shutdown_progress_messages branch from 5d5591a to ef06c0e Compare November 27, 2025 00:52
@vercel vercel bot temporarily deployed to Preview – rivetkit-serverless November 27, 2025 00:52 Inactive
@claude
Copy link

claude bot commented Nov 27, 2025

PR Review: fix(runtime): evict runners on SIGTERM, add periodic shutdown progress messages

Summary

This PR improves graceful shutdown behavior across the runtime by:

  1. Sending "going away" errors to websocket clients on SIGTERM
  2. Adding periodic progress logging during shutdown for both Guard and Gasoline workers
  3. Exposing task count metrics for observability

Code Quality & Best Practices

Positive:

  • Good use of structured logging with tracing (e.g., %remaining_tasks, hyper_shutdown)
  • Consistent shutdown progress interval (7 seconds) across components
  • Proper atomic operations for thread-safe state tracking
  • Follows existing error handling patterns with the custom error system

Concerns:

  1. Memory Ordering Inconsistency (guard-core/src/server.rs:272,302)

    • Line 272 uses Ordering::Release for store()
    • Line 302 uses Ordering::Acquire for load()
    • This is correct, but consider using Ordering::SeqCst for both or add a comment explaining the memory ordering choice. The current pattern is correct for a happens-before relationship, but it's subtle.
  2. Error Message Consistency (pegboard-runner/src/errors.rs:16,18)

    • The error message duplicates between the error artifact JSON and the enum
    • If these get out of sync, it could be confusing
    • Consider generating one from the other or adding a test to verify consistency

Potential Bugs

  1. Interval Tick Consumption (guard-core/src/server.rs:284)

    progress_interval.tick().await;
    • This consumes the first tick before entering the loop, which is good to avoid immediate firing ✓
  2. Task Count Race Condition (minor)

    • remaining_tasks() in task_group.rs:59 uses Ordering::Acquire
    • The count could be stale by the time it's logged
    • This is acceptable for observability purposes, but worth noting this is a snapshot

Performance Considerations

  1. Progress Interval Impact

    • 7-second intervals are reasonable and won't cause performance issues
    • Consider making this configurable if different environments need different intervals
  2. Atomic Operations

    • AtomicBool and AtomicUsize operations are efficient
    • No performance concerns

Security Concerns

None identified. The changes are focused on shutdown behavior and observability.

Test Coverage

Missing:

  1. No tests verify the new "going away" error is sent on SIGTERM
  2. No tests verify progress messages are logged during shutdown
  3. Consider adding integration tests that:
    • Trigger SIGTERM during active connections
    • Verify websocket clients receive the GoingAway error
    • Verify progress logs are emitted at expected intervals

Suggestion:
Add a test in ws_to_tunnel_task.rs that simulates SIGTERM and verifies the error response.

Additional Notes

  1. Error Artifact (engine/artifacts/errors/ws.going_away.json)

    • Missing newline at end of file (line 5)
    • This is a minor style issue but worth fixing for consistency
  2. Import Organization (guard-core/src/server.rs:3-6)

    • Imports are properly organized at the top of the file ✓
    • Follows CLAUDE.md guidelines
  3. Shutdown Duration Configuration

    • Both workers use configuration for shutdown duration
    • Good practice for operational flexibility ✓

Recommendations

High Priority:

  • Add tests for SIGTERM handling and error propagation
  • Add newline at end of ws.going_away.json

Medium Priority:

  • Consider adding a comment explaining the memory ordering choice in server.rs
  • Add validation test to ensure error messages stay in sync between JSON and Rust enum

Low Priority:

  • Consider making SHUTDOWN_PROGRESS_INTERVAL configurable if needed

Overall Assessment

This is a solid improvement to shutdown observability and correctness. The code quality is good and follows the project's conventions. The main gap is test coverage for the new behavior. The implementation is sound but would benefit from integration tests to ensure the shutdown flow works as expected under various scenarios.

Recommendation: Approve with suggestions for follow-up test additions.

@MasterPtato MasterPtato force-pushed the 11-26-fix_runtime_evict_runners_on_sigterm_add_periodic_shutdown_progress_messages branch from ef06c0e to f5a5102 Compare November 27, 2025 00:58
@vercel vercel bot temporarily deployed to Preview – rivetkit-serverless November 27, 2025 00:58 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants