fix(rivetkit): properly handle msgIndex for hibernatable websocket reconnection #3401

NathanFlurry · 2025-11-06T20:38:34Z

No description provided.

vercel · 2025-11-06T20:38:39Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
rivetkit-serverless	Ready	Preview	Comment	Nov 13, 2025 2:48am

3 Skipped Deployments

Project	Deployment	Preview	Updated (UTC)
rivet-cloud	Ignored		Nov 13, 2025 2:48am
rivet-inspector	Ignored	Preview	Nov 13, 2025 2:48am
rivet-site	Ignored	Preview	Nov 13, 2025 2:48am

NathanFlurry · 2025-11-06T20:38:50Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

claude · 2025-11-06T20:40:32Z

Code Review for PR #3401

Summary

This PR fixes hibernatable WebSocket reconnection by properly handling msgIndex tracking in the engine driver. The main changes involve:

Renaming: hibernatableWebSocket → hibernatableWebSockets (plural, more idiomatic)
Centralized msgIndex tracking: Moved msgIndex updates from ActorInstance.handleRawWebSocket() to EngineActorDriver
Simplified actor instance code: Removed ~100 lines of WebSocket event handling logic

Positive Aspects

1. Improved Separation of Concerns
The msgIndex tracking now lives in the engine driver where it belongs. The driver is responsible for managing hibernation state, so tracking message indices there makes architectural sense.

2. Proper msgIndex Initialization

// src/drivers/engine/actor-driver.ts:191-195
const lastMsgIndex =
    existingWs.msgIndex >= 0n
        ? Number(existingWs.msgIndex)
        : undefined;

Good handling of the sentinel value -1n to represent "no messages processed yet", converting it to undefined for the hibernation config.

3. Message Sequence Gap Detection

// src/drivers/engine/actor-driver.ts:652-668
if (event.rivetMessageIndex > currentEntry.messageIndex + 1) {
    logger().warn({
        msg: "detected gap in websocket message sequence",
        // ...
    });
}

Excellent defensive programming - this will help detect message loss or ordering issues early.

4. Comprehensive Logging
Good use of structured logging throughout, following the project's conventions:

Lowercase messages
Using ?field syntax for structured data
Including relevant context (actorId, requestId, msgIndex)

Issues & Concerns

1. CRITICAL: Potential Race Condition with msgIndex Updates

The msgIndex is now updated directly in the message event handler (actor-driver.ts:633) without any synchronization:

hibernatableWs.msgIndex = BigInt(event.rivetMessageIndex);

However, the actor's persist data is accessed from multiple places:

Message handlers (updating msgIndex)
Reconnection logic (reading msgIndex)
Persistence serialization (reading all state)

Concern: If a persistence operation reads the hibernatableWebSockets array while a message handler is updating msgIndex, there could be race conditions. TypeScript/JavaScript is single-threaded, but async operations could interleave.

Recommendation: Consider if the msgIndex updates should be queued and applied during the next persistence cycle, or if explicit locking/coordination is needed.

2. Missing Event Listener Cleanup in handleRawWebSocket()

The simplified handleRawWebSocket() method at instance.ts:1675-1684 adds event listeners but has incomplete cleanup:

const onSocketClosed = (_event: any) => {
    this.#activeRawWebSockets.delete(websocket);
    this.#resetSleepTimer();
};

try {
    websocket.addEventListener("close", onSocketClosed);
    websocket.addEventListener("error", onSocketClosed);
} catch {}

Issue: The event listeners are never explicitly removed. While they'll be garbage collected when the websocket is closed, the old code explicitly removed them.

Recommendation: Add cleanup in the finally block or ensure the listeners are removed on close:

const onSocketClosed = (_event: any) => {
    try {
        websocket.removeEventListener("close", onSocketClosed);
        websocket.removeEventListener("error", onSocketClosed);
    } catch {}
    this.#activeRawWebSockets.delete(websocket);
    this.#resetSleepTimer();
};

3. Inconsistent Type Handling

In fromBare() at instance.ts:2228, there's an any type annotation:

hibernatableWebSockets: bareData.hibernatableWebSockets.map(
    (ws: any) => ({
        requestId: ws.requestId,
        // ...
    }),
),

Issue: Using any defeats TypeScript's type checking. The BARE schema should have proper types.

Recommendation: Import and use the proper type from the BARE schema definitions.

4. Potential Memory Leak in Ack Queue

The #hibernatableWebSocketAckQueue map (actor-driver.ts:82-86) accumulates entries but only clears them on flush. If a WebSocket disconnects before a flush, the entry remains.

Current behavior:

Entries added on message (line 679)
Cleared on periodic flush (line 327)
Flushed on close (line 699)

Issue: If a websocket closes between flush intervals, and the close event doesn't trigger for some reason, the entry could leak.

Recommendation: Explicitly remove from ack queue in the WebSocket close handler, before flushing.

5. Missing msgIndex Update on WebSocket Close

When a hibernatable WebSocket closes, the code removes it from the tracking array but doesn't verify the msgIndex is up-to-date.

Consider: Should the close handler flush pending acks before removing from hibernatableWebSockets array? The current order (line 699) flushes acks, but by that point, if the actor has been destroyed, the msgIndex might not be saved.

6. Lack of Test Coverage

No test files were modified/added. Given the critical nature of msgIndex tracking for message ordering and the complexity of hibernation:

Recommendation: Add tests covering:

WebSocket reconnection with msgIndex restoration
Message sequence gap detection
Multiple reconnections
Edge case: msgIndex = -1n (no messages)
Race conditions between message handling and persistence

Minor Issues

7. Method Rename May Break External Code

The rename from handleWebSocket() to handleRawWebSocket() (instance.ts:1657) is a breaking API change.

Impact: Any external code calling actor.handleWebSocket() will break.
Mitigation: If this is internal-only API, it's fine. Otherwise, consider deprecation path.

8. Unclear Comment About "-1" in Logging

// actor-driver.ts:199
lastMsgIndex: lastMsgIndex ?? -1,

This logs -1 when lastMsgIndex is undefined, but -1 is also a valid msgIndex value (as bigint). Could be confusing in logs.

Recommendation: Log "undefined" or "none" for clarity.

Performance Considerations

Positive:

Debounced ack flushing (1s interval) reduces network overhead
Simplified event handling reduces memory footprint

Concern:

Every message triggers a linear search through hibernatableWebSockets array (actor-driver.ts:626). For actors with many hibernatable WebSockets, this could be O(n) per message.
Recommendation: Consider using a Map keyed by requestId for O(1) lookups.

Security Considerations

No obvious security issues. The changes are mostly refactoring and don't introduce new attack surfaces. Proper input validation (invariant checks) remains in place.

Summary of Recommendations

High Priority:

✅ Review and document thread-safety guarantees for msgIndex updates
✅ Add proper event listener cleanup in handleRawWebSocket
✅ Add test coverage for hibernatable WebSocket reconnection scenarios

Medium Priority:
4. ✅ Fix memory leak potential in ack queue on abnormal WebSocket close
5. ✅ Replace any type with proper BARE schema type
6. ✅ Optimize hibernatable WebSocket lookup with Map data structure

Low Priority:
7. ✅ Clarify logging for undefined msgIndex values
8. ✅ Verify API compatibility for handleWebSocket → handleRawWebSocket rename

Overall, this is a solid refactoring that improves code organization and fixes the msgIndex tracking issue. The main concerns are around potential race conditions and missing cleanup logic. With the recommended fixes, this should be good to merge.

pkg-pr-new · 2025-11-06T20:40:55Z

More templates

@rivetkit/actor

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/actor@3401

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@3401

@rivetkit/core

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/core@3401

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@3401

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@3401

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@3401

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@3401

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@3401

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@3401

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@3401

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@3401

commit: 9f1d38e

claude · 2025-11-06T20:42:35Z

Code Review: Fix hibernatable WebSocket msgIndex handling

Summary

This PR fixes a critical issue with message index tracking for hibernatable WebSocket reconnections. The changes move msgIndex management from the actor instance layer to the engine driver layer, ensuring proper message sequencing during WebSocket hibernation/restoration.

✅ Positive Aspects

Correct Architecture: Moving msgIndex tracking to the engine driver (actor-driver.ts:619-695) is the right approach since the driver has direct access to message events and can update state atomically.
Proper Naming: Renaming hibernatableWebSocket → hibernatableWebSockets (plural) is more semantically correct for an array.
Method Rename: Renaming handleWebSocket → handleRawWebSocket (instance.ts:1657) improves clarity about the method's purpose.
Good Logging: Added comprehensive debug/warn logging for tracking message indices, gaps, and state transitions.
Edge Case Handling: The code properly handles the -1n sentinel value for new WebSockets (lines actor-driver.ts:191-195).

🐛 Potential Issues

1. Race Condition in Message Handler (Medium Priority)

Location: actor-driver.ts:622-628

The message event handler looks up the actor and hibernatable WebSocket on every message:

const actorHandler = this.#actors.get(actorId);
if (actorHandler?.actor) {
    const hibernatableWs = actorHandler.actor[PERSIST_SYMBOL]
        .hibernatableWebSockets.find((ws: any) => ...);

Issue: If the actor is unloaded/hibernated between messages, this lookup could fail even for valid hibernatable WebSockets. The warning at line 687-694 suggests this scenario is considered, but it might be expected behavior rather than an error condition.

Suggestion: Consider documenting when this warning is expected vs. when it indicates a real problem, or adjust the log level.

2. Missing WebSocket Cleanup Logic (High Priority)

Location: instance.ts:1675

The old code (removed in this PR) had logic to remove hibernatable WebSockets from the persist array when they closed:

// Old code (removed):
const wsIndex = this.#persist.hibernatableWebSocket.findIndex(...);
const removed = this.#persist.hibernatableWebSocket.splice(wsIndex, 1);

Issue: The new code at instance.ts:1675-1678 only removes from #activeRawWebSockets but doesn't clean up the persisted hibernatable WebSocket entry. This could lead to:

Memory leaks with stale WebSocket entries accumulating
Incorrect hibernation state on reconnection
Confusion about which WebSockets are actually active

Suggestion: Add cleanup logic in the close handler or implement a TTL-based cleanup mechanism. The lastSeenTimestamp field suggests TTL cleanup was intended.

3. Type Safety Issue (Low Priority)

Location: actor-driver.ts:626

.hibernatableWebSockets.find((ws: any) => ...);

Using any type defeats TypeScript's type checking. Should use proper type from PersistedHibernatableWebSocket.

⚡ Performance Considerations

Array Linear Search: Lines actor-driver.ts:625-628 perform a linear search through hibernatableWebSockets array on every message. For actors with many hibernatable WebSockets, this could become a bottleneck. Consider using a Map<string, PersistedHibernatableWebSocket> keyed by requestId.
Ack Queue Batching: The 1-second flush interval (actor-driver.ts:314-317) is reasonable but consider making it configurable or adaptive based on message volume.
BigInt Conversions: Frequent BigInt() conversions (e.g., actor-driver.ts:633-634) could be optimized by storing timestamps as numbers if millisecond precision suffices.

🔒 Security Concerns

No significant security issues identified. The code properly:

Uses invariants to validate message metadata (actor-driver.ts:616-617)
Maintains proper buffer comparisons for request IDs
Doesn't expose sensitive data in logs

Minor suggestion: Consider sanitizing or truncating requestId in logs to avoid potential information leakage.

🧪 Test Coverage

CRITICAL ISSUE: No test coverage found for hibernatable WebSocket functionality.

Required Tests:

Test msgIndex persistence across hibernation/reconnection cycles
Test message sequence gap detection (the new logic at lines 651-668)
Test WebSocket cleanup on close
Test concurrent message handling with multiple hibernatable WebSockets
Test edge case with msgIndex = -1 (no messages yet)
Test the ack queue flush mechanism

Recommendation: Add tests in rivetkit-typescript/packages/rivetkit/tests/ following the pattern of existing driver tests. Consider using the rivetkit-driver-test-writer agent mentioned in the codebase.

📝 Additional Observations

Schema Migration: The BARE schema change (v2.bare) correctly updates the field name. Verify the migration at versioned.ts:28 handles existing persisted data correctly.
Unused Event Listeners: The old code had onSocketOpened and onSocketMessage handlers that tracked state. Verify that moving this logic to the driver doesn't break any dependent functionality.
Documentation: The persisted.ts comment at line 22 still references the old field name in the comment: "See PersistedActor.hibernatableWebSocket" - should be updated to hibernatableWebSockets.

📋 Recommendations

Before Merge:

✅ Fix the WebSocket cleanup issue (high priority)
✅ Add test coverage for hibernatable WebSocket scenarios
✅ Update the outdated comment in persisted.ts:22

Nice to Have:
4. Replace any type with proper type annotation
5. Consider optimizing the linear search to a Map-based lookup
6. Document the expected vs. error scenarios for the warning at line 687

Verdict

The core fix is solid and addresses the msgIndex tracking issue correctly. However, the missing cleanup logic is a significant concern that should be addressed before merge. The lack of test coverage for this critical functionality is also concerning given the complexity of hibernatable WebSocket state management.

Recommendation: Request changes to add cleanup logic and basic test coverage before merging.

claude · 2025-11-06T21:00:43Z

PR Review: Fix hibernatable WebSocket msgIndex handling

Summary

This PR fixes a critical bug in the handling of msgIndex for hibernatable WebSocket reconnection. The changes properly track message indices in the engine driver and fixes a naming inconsistency (hibernatableWebSocket → hibernatableWebSockets).

Positive Observations ✅

1. Correct Architecture Changes

Moving msgIndex tracking from ActorInstance to EngineActorDriver is the right architectural decision. The driver is the proper place to handle Rivet-specific metadata like rivetRequestId and rivetMessageIndex.
The engine driver now has direct access to these fields and can update the persisted state immediately.

2. Naming Consistency

Fixed inconsistent naming: hibernatableWebSocket → hibernatableWebSockets (plural)
Applied consistently across all files including schema, persisted data, and TypeScript code

3. Good Diagnostics

The new logging and sequence checking is excellent:

if (event.rivetMessageIndex !== previousIndex + 1) {
    logger().warn({
        msg: "websocket message index out of sequence",
        sequenceType: event.rivetMessageIndex < previousIndex ? "regressed" 
                     : event.rivetMessageIndex === previousIndex ? "duplicate" 
                     : "gap/skipped",
        gap: event.rivetMessageIndex > previousIndex 
             ? event.rivetMessageIndex - previousIndex - 1 
             : 0,
    });
}

4. Proper Initial State Handling

The conversion of -1n to undefined for lastMsgIndex is correct:

const lastMsgIndex = existingWs.msgIndex >= 0n 
    ? Number(existingWs.msgIndex) 
    : undefined;

Issues & Concerns 🔴

1. Critical: Race Condition Risk

Location: actor-driver.ts:630-643

The msgIndex update happens in the message event handler, but there's no guarantee it will be persisted before actor hibernation:

hibernatableWs.msgIndex = BigInt(event.rivetMessageIndex);
hibernatableWs.lastSeenTimestamp = BigInt(Date.now());

Problem: These updates directly modify the persisted state, but ActorInstance uses throttled persistence (#savePersistThrottled()). If the actor hibernates before the throttled save occurs, the msgIndex won't be persisted.

Scenario:

Message arrives with index 5
Engine driver updates hibernatableWs.msgIndex = 5n
Actor hibernates before throttled save completes
On reconnection, actor restores with old msgIndex (e.g., 3)
Messages 4-5 get replayed, causing duplicates

Recommendation:

Either call actor.saveState({ immediate: true }) after updating msgIndex
Or implement a msgIndex-specific persistence mechanism that bypasses throttling
Or document that this is acceptable and the application layer should handle duplicate messages

2. Medium: WebSocket Close Cleanup Missing

Location: actor-driver.ts:699-703 and instance.ts:1675-1684

The old code in ActorInstance.handleRawWebSocket (now handleRawWebSocket) removed hibernatable WebSockets from the array on close:

// OLD CODE (removed):
const wsIndex = this.#persist.hibernatableWebSocket.findIndex((ws) =>
    arrayBuffersEqual(ws.requestId, rivetRequestIdLocal)
);
const removed = this.#persist.hibernatableWebSocket.splice(wsIndex, 1);

The new code doesn't remove closed hibernatable WebSockets anywhere.

Problem: The hibernatableWebSockets array will grow indefinitely with stale entries. Even though they update lastSeenTimestamp, there's no cleanup mechanism.

Recommendation:

Add cleanup logic in the engine driver's close handler
Or implement a periodic cleanup of stale hibernatable WebSockets based on lastSeenTimestamp
Or document if this is intentional for reconnection scenarios

3. Medium: Missing Test Coverage

The existing WebSocket tests (raw-websocket.ts) don't cover:

Hibernation scenarios
Message index tracking and recovery
Reconnection with proper msgIndex restoration
Out-of-sequence message detection

Recommendation: Add tests for hibernatable WebSocket behavior, especially:

test("should properly restore msgIndex on reconnection", async (c) => {
    // 1. Connect WebSocket
    // 2. Send messages 0-4
    // 3. Force hibernation
    // 4. Reconnect 
    // 5. Verify messages 5+ arrive without duplicates of 0-4
});

4. Low: Inconsistent Error Handling

Location: actor-driver.ts:686-696

The warning when actor is not found should probably prevent the ack from being queued, but it still allows execution to continue:

} else {
    logger().warn({
        msg: "received websocket message but actor not found for hibernatable tracking",
        // ...
    });
}
// Code continues, but no ack is queued - is this intentional?

Recommendation: Either:

Return early after the warning
Or queue the ack anyway and document why
Or throw an error if this represents an invalid state

Minor Suggestions 💡

1. Type Safety

Line 626: (ws: any) could be properly typed:

const hibernatableWs = actorHandler.actor[PERSIST_SYMBOL]
    .hibernatableWebSockets.find((ws: PersistedHibernatableWebSocket) =>
        arrayBuffersEqual(ws.requestId, requestIdBuf)
    );

2. Magic Number

Line 291: -1n as initial msgIndex should be documented or use a constant:

const INITIAL_MSG_INDEX = -1n; // Indicates no messages processed yet

3. Logging Consistency

Following CLAUDE.md conventions, log messages should be lowercase unless mentioning code symbols. Most are correct, but verify all instances.

Performance Considerations

1. Ack Queue Flushing

The 1-second flush interval is reasonable given the 30-second gateway timeout. Consider making this configurable if needed for different deployment scenarios.

2. Linear Search

The .find() operations on hibernatableWebSockets arrays are O(n). For actors with many hibernatable connections, consider using a Map indexed by requestId.

Security Considerations

No security issues identified. The changes maintain proper isolation and don't introduce new attack vectors.

Verdict

Approve with reservations. The core architecture change is sound, but the race condition risk (#1) and missing cleanup logic (#2) need to be addressed or explicitly documented as acceptable trade-offs.

Required Changes:

⚠️ Address the persistence race condition - either force immediate saves or document that applications must handle duplicate messages
⚠️ Add cleanup logic for closed hibernatable WebSockets - prevent memory leaks from stale entries

Recommended Changes:

Add test coverage for hibernation scenarios
Fix the inconsistent error handling in the message handler
Improve type safety and add constants for magic numbers

Great work on identifying and fixing this issue! The diagnostic logging additions will be invaluable for debugging production issues. Just need to close the loop on persistence timing and cleanup.

claude · 2025-11-06T21:08:02Z

PR Review: Hibernatable WebSocket msgIndex Handling

Summary

This PR fixes critical issues with message index (msgIndex) handling for hibernatable WebSocket reconnections. The changes move message tracking logic from the actor instance level to the engine driver level, ensuring proper state persistence and restoration.

Code Quality ✅

Strengths:

Clear separation of concerns: hibernatable WebSocket tracking is now properly handled at the driver level
Consistent naming: hibernatableWebSocket → hibernatableWebSockets (plural) across the codebase
Method rename: handleWebSocket → handleRawWebSocket improves clarity
Comprehensive logging with structured fields (following project conventions)
Proper cleanup in both close and error event handlers

Naming Convention:

Good adherence to project conventions with lowercase log messages (e.g., "updated hibernatable websocket msgIndex")
Structured logging with proper field names

Architecture Improvements 🎯

Better Design:
The refactoring significantly improves the architecture:

Removed ~100 lines of complex event handling from ActorInstance.handleWebSocket (rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1654-1773)
Centralized hibernatable WebSocket tracking in the engine driver where it logically belongs
Simplified actor instance code by removing hibernation-specific concerns

Potential Issues ⚠️

1. Race Condition in Message Handler (actor-driver.ts:622-696)

The code accesses actorHandler.actor[PERSIST_SYMBOL].hibernatableWebSockets without synchronization. If the actor is being destroyed or modified concurrently, this could cause issues.

Recommendation: Consider adding a check or lock mechanism before accessing the actor's persist data.

2. Duplicate Cleanup Logic (actor-driver.ts:699-776)

The cleanup code for hibernatable WebSockets is duplicated in both close and error handlers. This is ~70 lines of nearly identical code.

Recommendation: Extract into a helper method like:

#cleanupHibernatableWebSocket(actorId: string, requestIdBuf: ArrayBuffer, requestId: string, event?: any)

This would reduce duplication and make the code more maintainable.

3. Missing Validation (actor-driver.ts:191-195)

The code converts msgIndex from BigInt to number without range checking. JavaScript numbers are 53-bit integers, while BigInt can be arbitrarily large.

Recommendation: Add bounds checking to ensure msgIndex doesn't exceed Number.MAX_SAFE_INTEGER

4. Type Safety (actor-driver.ts:626, 708, 745)

Multiple uses of any type: find((ws: any) => weakens type safety.

Recommendation: Define proper types or use existing types from the persist schema.

Performance Considerations 💡

Positive:

Ack queue debouncing (#flushHibernatableWebSocketAcks) with 1-second interval is good for reducing network overhead
Using Map for ack queue provides O(1) lookups

Potential Issue:
The message sequence validation (actor-driver.ts:650-673) could generate excessive warnings under normal network conditions (packet reordering, retransmissions).

Recommendation: Consider making the sequence validation less strict or rate-limiting the warnings.

Security Concerns 🔒

No major security issues found, but consider:

Memory leak potential: If WebSocket cleanup fails, entries could accumulate in hibernatableWebSockets array
- Current code has warnings for missing actors, which is good
- Consider adding periodic cleanup or size limits
msgIndex manipulation: The code trusts event.rivetRequestId and event.rivetMessageIndex from events
- Ensure these are validated/sanitized at a lower layer

Test Coverage ❌

Critical Gap: No tests found for hibernatable WebSocket functionality or msgIndex handling.

Recommendations:

Add integration tests for hibernatable WebSocket reconnection scenarios
Test msgIndex persistence across actor sleep/wake cycles
Test edge cases:
- Reconnection with out-of-sequence messages
- Multiple concurrent hibernatable WebSockets
- Actor destruction during active WebSocket
- msgIndex at boundary values (-1, 0, MAX_SAFE_INTEGER)

Schema Changes ✅

The BARE schema change (v2.bare) is correct:

hibernatableWebSocket → hibernatableWebSockets (plural)
Migration logic properly handles this in versioned.ts

Summary

Overall Assessment: This is a solid refactoring that improves architecture and fixes real bugs. The main concerns are:

Code duplication in cleanup handlers (should be refactored)
Missing test coverage (critical for reliability)
Minor type safety and validation issues

Recommendation: Approve with suggestions to address the cleanup duplication and add test coverage in follow-up work.

Generated with Claude Code for PR #3401

claude · 2025-11-06T21:28:12Z

Code Review

Overview

This PR fixes the handling of msgIndex for hibernatable WebSocket reconnections. The changes centralize WebSocket state management in the engine driver and fix a critical bug where message indices weren't being properly tracked for restoration.

Key Changes

✅ Positive Changes

Field Name Consistency - Renamed hibernatableWebSocket to hibernatableWebSockets (plural) throughout the codebase for better naming convention
- Updated in schema, TypeScript interfaces, and all references
Centralized State Management - Moved hibernatable WebSocket state tracking from ActorInstance to EngineActorDriver
- Removes ~100 lines of duplicated/problematic logic from instance.ts:1654-1790
- Consolidates tracking in the driver where it belongs (lines 619-696 in actor-driver.ts)
Proper msgIndex Handling - Fixed critical bug where msgIndex wasn't being updated
- Now correctly updates hibernatableWs.msgIndex on every message (line 633)
- Handles -1n as sentinel value for "no messages processed yet" (lines 189-197)
- Converts BigInt to number properly when passing to hibernation config
Enhanced Observability - Added comprehensive debug/warn logging
- Logs msgIndex updates with old/new values (lines 636-642)
- Warns on out-of-sequence messages with detailed diagnostics (lines 650-673)
- Tracks gaps, regressions, and duplicates in message sequences
Code Deduplication - New #cleanupHibernatableWebSocket helper (lines 728-787)
- Eliminates duplication between close and error handlers
- Consistent cleanup logic with proper error context

Issues & Concerns

🔴 Critical Issues

Type Safety Violation (line 626)

].hibernatableWebSockets.find((ws: any) =>

Using any defeats TypeScript's type checking. This should use the proper type PersistedHibernatableWebSocket.

Recommendation:

].hibernatableWebSockets.find((ws: PersistedHibernatableWebSocket) =>

Potential Race Condition - msgIndex updates without synchronization (line 633)
The hibernatableWs.msgIndex is updated directly in the message handler. If messages arrive rapidly or out of order, this could lead to race conditions.

Current:

hibernatableWs.msgIndex = BigInt(event.rivetMessageIndex);

Concern: No locking mechanism ensures atomic updates. Consider if concurrent message handling could cause issues.

⚠️ Medium Priority Issues

Missing Null Checks - Lines 687-696 warn but don't handle missing actor gracefully

} else {
    // Warn if we receive a message for a hibernatable websocket but can't find the actor
    logger().warn({...});
}

This logs a warning but continues execution. Should this early-return or throw an error instead?

Inconsistent Error Handling - #cleanupHibernatableWebSocket method

Called for both close and error events
Error events include error context, but close events don't include close code/reason
Consider logging close code/reason for debugging

Method Rename Without Clear Migration - handleWebSocket → handleRawWebSocket
While the rename is good for clarity, there's no deprecation notice or runtime warning for any external callers that might exist.

🟡 Minor Issues / Suggestions

Logging Consistency

Mix of logger().debug() and logger().warn() calls
Some messages are lowercase (good per CLAUDE.md), but inconsistently applied
Example line 636: "updated hibernatable websocket msgIndex in engine driver" is verbose

Suggestion: Shorten to "updated hibernatable ws msgIndex" for consistency

Magic Number - Line 287: msgIndex: -1n

msgIndex: -1n,

Consider extracting to a named constant:

const HIBERNATABLE_WS_NO_MESSAGES = -1n;

BigInt String Conversion - Line 639: oldMsgIndex.toString()
While correct, consider whether logging numeric values might be better for analysis tools. Alternatively, document why string conversion is necessary.

Performance Considerations

Linear Search on Every Message (line 626)

hibernatableWebSockets.find((ws: any) =>
    arrayBuffersEqual(ws.requestId, requestIdBuf)
)

For actors with many hibernatable WebSockets, this O(n) search on every message could become a bottleneck.

Recommendation: Use a Map keyed by requestId (as string) for O(1) lookup:

#hibernatableWebSocketsByRequestId: Map<string, PersistedHibernatableWebSocket>

ArrayBuffer Comparison Overhead - arrayBuffersEqual called repeatedly
Consider caching string representations of requestIds if they're compared frequently.

Security Concerns

No Validation on rivetMessageIndex - Lines 616-617

invariant(event.rivetRequestId, "missing rivetRequestId");
invariant(event.rivetMessageIndex, "missing rivetMessageIndex");

The code validates presence but not validity:

Could rivetMessageIndex be negative?
Could it be excessively large (DoS via memory)?
Should there be a max gap threshold that triggers connection termination?

Recommendation: Add range validation:

invariant(
    typeof event.rivetMessageIndex === 'number' && 
    event.rivetMessageIndex >= 0 && 
    event.rivetMessageIndex < Number.MAX_SAFE_INTEGER,
    'invalid rivetMessageIndex'
);

Test Coverage

No New Tests - This PR modifies critical hibernation logic but adds no tests

Missing Test Cases:

✗ Hibernatable WebSocket with msgIndex restoration
✗ Message sequence gap detection
✗ Out-of-order message handling
✗ Reconnection with proper lastMsgIndex
✗ Cleanup on close vs error events
✗ Multiple hibernatable WebSockets for same actor

Recommendation: Add tests in tests/driver-engine.test.ts covering:

Basic hibernation flow with message index tracking
Edge cases: -1n initial state, gaps, duplicates
Cleanup scenarios: normal close, error, actor not found

Code Quality

Positive

✅ Follows CLAUDE.md structured logging conventions
✅ Consistent naming (kebab-case in logs)
✅ Good error context in warnings
✅ Proper use of BigInt for large integers

Needs Improvement

❌ Type safety (uses any)
❌ Missing JSDoc on new helper method
❌ No comments explaining the -1n sentinel value

Recommendations Summary

Must Fix (Before Merge):

Remove any type, use proper PersistedHibernatableWebSocket type
Add input validation for rivetMessageIndex
Add test coverage for hibernatable WebSocket msgIndex tracking

Should Fix (High Priority):
4. Consider race condition implications for concurrent msgIndex updates
5. Extract magic number -1n to named constant
6. Add JSDoc to #cleanupHibernatableWebSocket method

Nice to Have (Lower Priority):
7. Optimize linear search to Map-based lookup if needed
8. Add deprecation notice for handleWebSocket rename
9. Include close code/reason in cleanup logging

Conclusion

This is a solid fix that addresses the core issue and improves code organization. The centralization of WebSocket state management is the right architectural decision. However, the lack of test coverage for such critical functionality is concerning.

Overall Assessment: ✅ Approve with recommendations - The core fix is correct, but address the type safety and validation issues before merging, and strongly consider adding test coverage.

Review generated by Claude Code following CLAUDE.md conventions

claude · 2025-11-06T22:12:48Z

Code Review: Hibernatable WebSocket msgIndex Handling

Summary

This PR fixes the handling of msgIndex for hibernatable WebSocket reconnections. The changes include renaming hibernatableWebSocket to hibernatableWebSockets (plural) and moving msgIndex tracking logic from ActorInstance to EngineActorDriver.

Positive Aspects

Centralized Logic: Moving msgIndex tracking to the driver (actor-driver.ts:631-683) centralizes hibernatable WebSocket state management in one place, reducing duplication and complexity in ActorInstance.
Improved Error Handling: The new code includes robust sequence validation (actor-driver.ts:648-666) that warns about:
- Regressed indices
- Duplicate messages
- Gaps in message sequences
Better Code Organization: The #cleanupHibernatableWebSocket helper method (actor-driver.ts:730-778) eliminates duplication between close and error handlers.
Consistent Naming: The rename from hibernatableWebSocket (singular) to hibernatableWebSockets (plural) is more accurate for an array.
Proper Type Handling: Converting msgIndex from bigint to number when needed (actor-driver.ts:191-196) with explicit handling of the -1 sentinel value.

Issues & Concerns

1. Critical: Potential msgIndex Race Condition

Location: actor-driver.ts:191-201

const lastMsgIndex = existingWs.msgIndex >= 0n 
    ? Number(existingWs.msgIndex) 
    : undefined;
hibernationConfig = {
    enabled: true,
    lastMsgIndex,
};

Issue: The msgIndex is read during WebSocket open, but could be updated by concurrent message handlers before the WebSocket is fully established. This could lead to messages being replayed or skipped.

Recommendation: Consider adding a lock or using atomic operations to ensure msgIndex is consistent during WebSocket restoration.

2. Bug: Duplicate lastSeenTimestamp Updates

Location: actor-driver.ts:276-280 and actor-driver.ts:634

The code updates lastSeenTimestamp in two places:

During WebSocket open (line 277)
During message handling (line 634)

Issue: For an existing WebSocket that reconnects, both code paths execute, causing redundant updates.

Recommendation: The open handler should only update timestamp for new WebSockets, not existing ones that already have their timestamp updated by message handlers.

3. Potential Memory Leak: Incomplete Cleanup

Location: actor-driver.ts:743-777

Issue: If an actor is destroyed or crashes while WebSockets are active, the cleanup logic only runs on socket close/error events. If the actor terminates abnormally, hibernatable WebSocket entries may remain in memory and the ack queue.

Recommendation: Add cleanup in actor shutdown/destroy path to ensure all hibernatable WebSocket state is cleared.

4. Missing Test Coverage

Issue: No test files were modified, and grep shows no existing tests for hibernatable WebSocket functionality.

Recommendation: Add tests covering:

WebSocket reconnection with msgIndex restoration
Message sequence validation (gaps, duplicates, regression)
Cleanup on close/error
Concurrent message handling during reconnection
Edge case: msgIndex = -1 (no messages processed yet)

5. Inconsistent Error Handling

Location: actor-driver.ts:688-696 vs actor-driver.ts:764-773

When the actor is not found:

Message handler logs a warning (line 688)
Cleanup handler logs a warning (line 764)

Issue: These warnings provide debugging info but don't handle the actual problem. The ack queue could accumulate entries for non-existent actors.

Recommendation: Consider removing stale ack queue entries when actor not found, not just logging.

6. Type Safety: Unsafe toString() on msgIndex

Location: actor-driver.ts:755

removedMsgIndex: removed[0]?.msgIndex?.toString() ?? "unknown",

Issue: Using optional chaining with toString() on bigint is correct, but the fallback "unknown" is a string in a logging context that might be used for metrics/debugging. Consider using -1 or null for consistency.

Minor: This is a minor issue but affects debugging clarity.

7. Schema Version Migration

Location: versioned.ts:31

The migration from v1 to v2 sets hibernatableWebSockets: [], but:

Question: What happens to existing v1 data that might have had hibernatable connections tracked elsewhere? Is there a migration path or is this a breaking change?

Recommendation: Add a comment explaining the migration strategy or confirm this is intentional.

Performance Considerations

Debouncing Strategy: The 1-second flush interval (actor-driver.ts:316) is reasonable, but consider:
- What's the expected message rate? High-throughput WebSockets might benefit from a shorter interval
- Consider adaptive flushing based on queue size
Linear Search: Finding hibernatable WebSockets uses findIndex (actor-driver.ts:184). For actors with many hibernatable WebSockets, this could be O(n).
- Recommendation: Consider using a Map keyed by requestId for O(1) lookups

Security Concerns

No immediate security issues identified. The code properly validates requestId presence and uses ArrayBuffer equality checks.

Style & Convention Issues

Logging Format (per CLAUDE.md):
- Line 637: "updated hibernatable websocket msgIndex in engine driver" should be lowercase: "updated hibernatable websocket msgindex in engine driver"
- Multiple log messages properly use structured logging with ?x format ✓
Method Naming:
- handleWebSocket → handleRawWebSocket is more descriptive ✓
- #flushWsAcks → #flushHibernatableWebSocketAcks is more explicit ✓

Recommendations Summary

Must Fix:

Add test coverage for hibernatable WebSocket reconnection and msgIndex handling
Address potential race condition in msgIndex reading during WebSocket open
Fix duplicate lastSeenTimestamp updates
Add cleanup for abnormal actor termination

Should Fix:

Improve error handling when actor not found (clean up ack queue)
Consider performance optimization for WebSocket lookup (use Map)

Nice to Have:

Document schema migration strategy
Consider adaptive ack flush interval for high-throughput scenarios
Improve type consistency in logging (msgIndex fallback value)

Verdict

The PR makes good architectural improvements by centralizing msgIndex tracking, but needs test coverage and has some edge case handling issues that should be addressed before merging.

Rating: Needs work ⚠️

Generated by Claude Code

claude · 2025-11-08T22:28:03Z

Code Review

Summary

This PR fixes the handling of msgIndex for hibernatable WebSocket reconnection. The changes focus on:

Renaming hibernatableWebSocket to hibernatableWebSockets (pluralization)
Moving hibernatable WebSocket tracking logic from ActorInstance to EngineActorDriver
Improving message index tracking and acknowledgment handling

Positive Observations

1. Proper Separation of Concerns ✅

Moving hibernatable WebSocket lifecycle management from ActorInstance.handleRawWebSocket (instance.ts:1657-1699) to EngineActorDriver is architecturally sound
The driver is the appropriate layer for managing engine-specific WebSocket state

2. Better Error Handling ✅

Added comprehensive logging with proper context (actorId, requestId, message indices)
The new #cleanupHibernatableWebSocket helper (actor-driver.ts:731-778) eliminates code duplication between close and error handlers
Out-of-sequence message detection with detailed warnings (actor-driver.ts:648-672)

3. Correct msgIndex Initialization ✅

Properly handling -1n as "no messages processed yet" when converting to lastMsgIndex (actor-driver.ts:192-194)
Converting to undefined rather than passing -1 to the engine is the right approach

4. Consistent Naming ✅

Pluralization of hibernatableWebSocket → hibernatableWebSockets across all files improves consistency
Renaming handleWebSocket → handleRawWebSocket clarifies intent

Issues & Recommendations

🔴 Critical: Message Index Update Timing Issue

In actor-driver.ts:631-633, the msgIndex is updated on the message event:

hibernatableWs.msgIndex = BigInt(event.rivetMessageIndex);

However, this happens before the actor processes the message. If the actor crashes or hibernates immediately after receiving the message but before processing it, the saved msgIndex will point to an unprocessed message. On reconnection, that message would be skipped.

Recommendation: Only update msgIndex after successful message processing, or use an acknowledgment mechanism to ensure messages are processed before updating the index.

🟡 Moderate: Race Condition in Actor Lookup

In actor-driver.ts:622-624:

const actorHandler = this.#actors.get(actorId);
if (actorHandler?.actor) {

The actor could be removed from #actors between checking and using it. While the current code handles this with warnings (lines 686-695), consider:

Using a lock/mutex for actor lifecycle operations
Or documenting the expected behavior when an actor is being destroyed

🟡 Moderate: Potential Memory Leak in Ack Queue

The #hibernatableWebSocketAckQueue is only cleared in:

#flushHibernatableWebSocketAcks() - every 1 second
#cleanupHibernatableWebSocket() - on close/error
shutdown() - on driver shutdown

Issue: If a WebSocket is tracked as hibernatable but never receives messages, it won't be added to the ack queue. However, if it does receive messages but then the cleanup fails (e.g., actor already destroyed), the entry could persist in the ack queue.

Recommendation: Add defensive cleanup to ensure ack queue entries are removed when hibernatable WebSocket entries are removed from hibernatableWebSockets array.

🟡 Moderate: Inconsistent Error Handling

In actor-driver.ts:686-695 and actor-driver.ts:764-773, the code warns when an actor isn't found but continues execution. Consider:

Should these be errors that prevent further processing?
What's the expected behavior when tracking state for a non-existent actor?

🟢 Minor: Code Clarity

Line 192-194 in actor-driver.ts:

const lastMsgIndex =
    existingWs.msgIndex >= 0n
        ? Number(existingWs.msgIndex)
        : undefined;

The comment says "treating -1 as undefined" but the condition checks >= 0n. This means -1n → undefined (correct), but it's clearer to explicitly check for -1n:

const lastMsgIndex = existingWs.msgIndex === -1n 
    ? undefined 
    : Number(existingWs.msgIndex);

🟢 Minor: Log Message Clarity

Line 749 in actor-driver.ts could be more descriptive. Consider separate messages for close vs error events for better log filtering.

🟢 Minor: Removed Code Verification

The removed code in instance.ts (lines 1675-1766) tracked socket lifecycle events. Verify that all functionality was preserved:

onSocketOpened handler updated lastSeenTimestamp - confirmed this still happens in the driver at line 633 ✅
All cleanup logic appears to be preserved ✅

Testing Concerns

❗ No test coverage visible - This is a complex change to hibernation logic. Consider adding tests for:

Message index tracking across hibernation/wake cycles
Out-of-sequence message handling
Cleanup on various error conditions
Race conditions with actor lifecycle

Performance Considerations

✅ Ack Queue Batching - The 1-second flush interval (line 316) is well-reasoned given the 30s gateway timeout
✅ No Blocking Operations - All operations are async where appropriate

Security Considerations

✅ No security issues identified - proper validation of rivetRequestId and rivetMessageIndex with invariant

Summary

This is a solid refactoring that improves code organization and observability. The main concern is the message index update timing which could cause message loss during failures.

Recommendations:

Before merging: Address the critical msgIndex timing issue - ensure messages are processed before updating the index
Nice to have: Add test coverage for hibernation scenarios
Nice to have: Review and document the race condition behavior with actor lifecycle

Overall Assessment: Approve with changes requested - the msgIndex timing issue should be addressed before merging.

…connection

…eConn

claude · 2025-11-13T03:02:02Z

PR Review: Fix hibernatable WebSocket msgIndex handling

Summary

This PR fixes the message index (msgIndex) handling for hibernatable WebSocket reconnections. The changes properly track and restore message indices when WebSockets hibernate and reconnect, which is critical for ensuring message delivery guarantees in the actor system.

Positive Aspects

1. Improved Architecture ✅

The new createRawWebSocketSocket driver properly separates concerns for raw WebSocket connections
Good use of dependency injection with the closePromise parameter for graceful shutdown coordination
Clear documentation explaining why raw WebSockets don't need sendMessage

2. Better Message Index Tracking ✅

Lines 633-687 in actor-driver.ts: The message handler now properly updates msgIndex on each message
Lines 189-202: Correctly converts msgIndex from bigint to number, treating -1 as "no messages yet"
The fix ensures that when a WebSocket hibernates and reconnects, it can resume from the correct message index

3. Enhanced Debugging ✅

Excellent out-of-sequence detection (lines 654-676) with detailed logging for:
- Regressed indices
- Duplicate indices
- Gaps/skipped indices
Comprehensive logging throughout helps diagnose hibernation issues

4. Code Quality Improvements ✅

Lines 736-780: The #cleanupHibernatableWebSocket helper eliminates duplication between close and error handlers
Consistent error handling with proper cleanup in all paths
Fixed schema naming from hibernatableWebSocket to hibernatableWebSockets (plural)

Issues & Concerns

1. Critical: Race Condition in Connection Creation 🔴

Location: router-endpoints.ts:435-464

const conn = await actor.createConn(
    createRawWebSocketSocket(...),
    {},
    newRequest,
);
createdConn = conn;
actor.handleWebSocket(adapter, { request: newRequest });

Problem: The WebSocket onMessage and onClose handlers can fire before createdConn is set, especially under high load or if createConn is slow. This could lead to:

Messages arriving before the connection is tracked
Close events firing before cleanup is registered

Recommendation: Restructure to ensure atomic setup:

createdConn = await actor.createConn(...);
// Now that createdConn is set, it's safe to call handleWebSocket
actor.handleWebSocket(adapter, { request: newRequest });

2. Potential Memory Leak 🟡

Location: router-endpoints.ts:390-391

let createdConn: AnyConn | undefined;

Problem: If an exception occurs after createConn succeeds but before the WebSocket handlers are set up, createdConn is never cleaned up. The connection remains tracked but orphaned.

Recommendation: Add cleanup in the catch block:

} catch (error) {
    if (createdConn) {
        actor.connDisconnected(createdConn, false);
    }
    actor.rLog.error({...});
    ws.close(1011, "Failed to create connection");
}

3. Type Safety: Unsafe Type Casting 🟡

Location: raw-websocket.ts:42

terminate: () => {
    (websocket as any).terminate?.();
},

Problem: Using any bypasses type safety. The terminate method may not exist on all WebSocket implementations.

Recommendation: Use proper type guard:

terminate: () => {
    if ('terminate' in websocket && typeof websocket.terminate === 'function') {
        websocket.terminate();
    }
},

4. Incomplete Cleanup in Error Path 🟡

Location: actor-driver.ts:718-729

The error handler calls #cleanupHibernatableWebSocket but doesn't flush acks like the close handler does. This could leave stale acks in the queue.

Recommendation: Add ack flushing to error handler:

websocket.addEventListener("error", (event) => {
    this.#flushHibernatableWebSocketAcks(); // Add this
    this.#cleanupHibernatableWebSocket(...);
    wsHandlerPromise.then((x) => x.onError?.(event, wsContext));
});

5. Logging: Missing BigInt Conversion 🟢

Location: actor-driver.ts:642

oldMsgIndex: oldMsgIndex.toString(),

If oldMsgIndex is a bigint, this will work, but the type is inconsistent. The code later converts to number (line 636) so logging before conversion could be confusing.

Recommendation: Be consistent with types throughout the flow.

Performance Considerations

1. Ack Queue Flushing Strategy ℹ️

The 1-second interval (line 313) is reasonable for debouncing, but under very high message rates, the queue could grow large between flushes. Consider:

Implementing a max queue size with early flush
Making the interval configurable for different deployment scenarios

2. Array Search Performance ℹ️

Lines 182-184, 627-631: Using find() on hibernatableConns array is O(n). For actors with many hibernatable connections, consider using a Map indexed by requestId.

Security Considerations

✅ No major security issues found

Proper close codes used (1000 for normal, 1011 for errors)
No sensitive data logged
Good input validation with invariant checks

Test Coverage

⚠️ No test changes included

This is a critical bug fix for message ordering in hibernatable WebSockets. Please add tests covering:

Message index restoration after hibernation
Out-of-sequence message detection
Race conditions in connection setup
Cleanup on error paths
Multiple hibernatable WebSockets per actor

Additional Recommendations

Code Style

Naming: createRawWebSocketSocket has "Socket" twice - consider createRawWebSocketDriver for clarity
Error Messages: Line 463 could include the error type: Failed to create connection: ${error.constructor.name}

Documentation

Consider adding inline comments explaining:

Why -1 is used as sentinel value for "no messages"
The relationship between msgIndex tracking in multiple locations
The WebSocket lifecycle state machine

Verdict

Conditional Approval - The PR fixes an important bug and improves the codebase, but the race condition in router-endpoints.ts should be addressed before merging.

Priority Fixes (Blocking):

Fix race condition in onOpen handler (Issue [SVC-2555] Set up issue templates #1)
Add connection cleanup in error catch block (Issue [SVC-2479] Send cluster events to PostHog #2)

Recommended Fixes (Non-blocking):

Add type-safe terminate() implementation
Add ack flushing to error handler
Add test coverage

🤖 Generated with Claude Code

claude · 2025-11-13T03:16:32Z

Code Review: fix(rivetkit): properly handle msgIndex for hibernatable websocket reconnection

This PR properly handles msgIndex for hibernatable WebSocket reconnections. The changes look solid overall with good improvements to state management and logging.

Strengths

1. Improved Architecture

New createRawWebSocketSocket driver provides proper separation of concerns
Connection lifecycle managed through createConn flow enables deduplication for hibernatable connections
Cleanup code well-factored into cleanupHibernatableWebSocket

2. Better State Tracking

msgIndex properly updated in driver message handler (actor-driver.ts:636)
Added lastSeenTimestamp updates alongside msgIndex
Proper handling of msgIndex = -1 for no messages processed yet

3. Enhanced Observability

Excellent logging with structured fields
Out-of-sequence message detection (lines 654-676) with detailed warnings
Debug logs track msgIndex state transitions

4. Schema Consistency

Fixed typo: hibernatableWebSocket → hibernatableWebSockets (plural)

Issues & Concerns

1. Race Condition in onOpen (router-endpoints.ts:395-464)
actor.handleWebSocket is called without await, so async errors won't be caught by try-catch.
Recommendation: Either await the call or add separate error handling.

2. Type Safety Issues

Line 417: (ws as any).__adapter bypasses type safety
Line 42: (websocket as any).terminate?.()
Recommendation: Use WeakMap for adapter association or define proper interface.

3. Incomplete Error Cleanup (router-endpoints.ts:458-464)
Close promise not resolved in catch block could cause deadlocks.
Recommendation: Add closePromiseResolvers.resolve() in catch block.

4. BigInt to Number Conversion (actor-driver.ts:189-191)
Schema uses BigInt but runtime uses Number - potential precision loss.
Recommendation: Use BigInt consistently or document safe range.

5. Missing Boundary Check
No upper boundary check on message indices.
Recommendation: Add sanity check or high-water mark warning.

6. Schema Migration
Question: Verify migration path for hibernatableWebSocket → hibernatableWebSockets field rename won't break existing actors.

7. Removed Sleep Prevention (instance/mod.ts:671-686)
Question: Is removing activeRawWebSockets sleep prevention intentional? Should be documented if so.

8. Error Handler Cleanup
Question: Should connDisconnected be called in onError handler similar to onClose?

Recommendations

Add await or error handling for actor.handleWebSocket
Resolve close promise in error path
Use WeakMap instead of any casts
Add test coverage for msgIndex restoration, out-of-sequence messages, error cleanup
Document sleep behavior change
Add JSDoc to createRawWebSocketSocket
Verify schema migration path

Minor Nits

Line 372: typo Remvoe → Remove (pre-existing)
Extract isHibernatable check to helper
Simplify sequenceType ternary with helper function

Security

No security concerns identified.

Performance

Improvements from debouncing WebSocket acks and proper hibernatable connection tracking.

Overall

Well-thought-out fix for hibernatable WebSocket state management. Main concerns are error handling and type safety.

Recommendation: Request changes for error handling, then approve after fixes.

graphite-app · 2025-11-13T20:47:12Z

Merge activity

Nov 13, 8:47 PM UTC: NathanFlurry added this pull request to the Graphite merge queue.
Nov 13, 8:48 PM UTC: CI is running for this pull request on a draft pull request (#3463) due to your merge queue CI optimization settings.
Nov 13, 8:50 PM UTC: Merged by the Graphite merge queue via draft PR: #3463.

…connection (#3401)

NathanFlurry requested a review from MasterPtato November 6, 2025 20:38

NathanFlurry force-pushed the 11-06-fix_rivetkit_properly_handle_msgindex_for_hibernatable_websocket_reconnection branch from cf4dd44 to 8f43e55 Compare November 6, 2025 20:40

vercel bot deployed to Preview – rivetkit-serverless November 6, 2025 20:41 View deployment

NathanFlurry force-pushed the 11-06-fix_rivetkit_properly_handle_msgindex_for_hibernatable_websocket_reconnection branch from 8f43e55 to c7a52ec Compare November 6, 2025 20:58

vercel bot deployed to Preview – rivetkit-serverless November 6, 2025 21:00 View deployment

NathanFlurry force-pushed the 11-06-fix_rivetkit_properly_handle_msgindex_for_hibernatable_websocket_reconnection branch from c7a52ec to 7255d47 Compare November 6, 2025 21:06

vercel bot deployed to Preview – rivetkit-serverless November 6, 2025 21:07 View deployment

NathanFlurry force-pushed the 11-06-fix_rivetkit_properly_handle_msgindex_for_hibernatable_websocket_reconnection branch from 7255d47 to 22a431f Compare November 6, 2025 21:26

vercel bot deployed to Preview – rivetkit-serverless November 6, 2025 21:28 View deployment

NathanFlurry mentioned this pull request Nov 6, 2025

fix(rivetkit): properly apply clientEndpoint from metadata lookup #3402

Closed

NathanFlurry force-pushed the 11-06-fix_rivetkit_properly_handle_msgindex_for_hibernatable_websocket_reconnection branch from 22a431f to b540cc6 Compare November 6, 2025 22:10

vercel bot deployed to Preview – rivetkit-serverless November 6, 2025 22:12 View deployment

NathanFlurry mentioned this pull request Nov 7, 2025

chore(rivetkit): close websockets on out of sequence messages #3404

Closed

MasterPtato approved these changes Nov 7, 2025

View reviewed changes

NathanFlurry force-pushed the 11-06-fix_rivetkit_properly_handle_msgindex_for_hibernatable_websocket_reconnection branch from b540cc6 to 26c1b12 Compare November 8, 2025 22:25

vercel bot deployed to Preview – rivetkit-serverless November 8, 2025 22:26 View deployment

This was referenced Nov 13, 2025

fix(rivetkit): fix zod not validating body if content-type header not provided #3456

Merged

feat(rivet-engine): udb key parser #3457

Merged

feat(rivetkit): add ability to destroy actors #3458

Merged

wip: cf workers fix #3459

Closed

NathanFlurry added 4 commits November 12, 2025 18:29

fix(rivetkit): properly handle msgIndex for hibernatable websocket re…

81e1bdc

…connection

fix(rivetkit): fix type errors in msgIndex tracking

d5e55d7

feat(rivetkit): refactor handleRawWebSocketHandler to use actor.creat…

30abe11

…eConn

refactor(rivetkit): remove #activeRawWebSockets tracking

9f1d38e

NathanFlurry force-pushed the 11-06-fix_rivetkit_properly_handle_msgindex_for_hibernatable_websocket_reconnection branch from ce9544b to 9f1d38e Compare November 13, 2025 02:47

NathanFlurry force-pushed the 11-08-chore_rivetkit_rename_onstart_onstop_-_onwake_onsleep branch from 1386f17 to 8a57285 Compare November 13, 2025 02:47

vercel bot deployed to Preview – rivetkit-serverless November 13, 2025 02:48 View deployment

This was referenced Nov 13, 2025

chore(rivetkit-typescript): remove dependency on node modules #3460

Merged

chore(rivetkit): switch dynamic node imports to use require #3461

Merged

chore(rivetkit): fix type checks #3462

Merged

This was referenced Nov 13, 2025

feat(gas): implement smart pulling for horizontally scaled workers #3417

Merged

fix(guard): handle websocket tasks during shutdown #3448

Merged

fix: improve sigterm handling for the entire runtime #3443

Merged

NathanFlurry mentioned this pull request Nov 13, 2025

feat(gateway): websocket hibernation #3400

Merged

This was referenced Nov 13, 2025

fix(pb): dont immediately reallocate actors on runner stopping #3413

Merged

fix(guard): dont send open msg for hibernating ws, hibernating ws keepalive #3449

Merged

NathanFlurry mentioned this pull request Nov 13, 2025

chore(gas): rename wake sub to bump sub #3408

Merged

graphite-app bot pushed a commit that referenced this pull request Nov 13, 2025

fix(rivetkit): properly handle msgIndex for hibernatable websocket re…

43b09b2

…connection (#3401)

graphite-app bot closed this Nov 13, 2025

graphite-app bot deleted the 11-06-fix_rivetkit_properly_handle_msgindex_for_hibernatable_websocket_reconnection branch November 13, 2025 20:50

This was referenced Nov 13, 2025

chore(rivetkit): implement new hibernating ws protocol #3464

Open

feat(cloudflare-workers): add createInlineClient #3466

Open

NicholasKissel mentioned this pull request Nov 15, 2025

chre: v4 website changes #3472

Open

fix(rivetkit): properly handle msgIndex for hibernatable websocket reconnection #3401

fix(rivetkit): properly handle msgIndex for hibernatable websocket reconnection #3401

Uh oh!

Conversation

NathanFlurry commented Nov 6, 2025

Uh oh!

vercel bot commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NathanFlurry commented Nov 6, 2025 • edited by jog1t Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use the Graphite Merge Queue

Uh oh!

claude bot commented Nov 6, 2025

Code Review for PR #3401

Summary

Positive Aspects

Issues & Concerns

Minor Issues

Performance Considerations

Security Considerations

Summary of Recommendations

Uh oh!

pkg-pr-new bot commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Nov 6, 2025

Code Review: Fix hibernatable WebSocket msgIndex handling

Summary

✅ Positive Aspects

🐛 Potential Issues

1. Race Condition in Message Handler (Medium Priority)

2. Missing WebSocket Cleanup Logic (High Priority)

3. Type Safety Issue (Low Priority)

⚡ Performance Considerations

🔒 Security Concerns

🧪 Test Coverage

📝 Additional Observations

📋 Recommendations

Verdict

Uh oh!

claude bot commented Nov 6, 2025

PR Review: Fix hibernatable WebSocket msgIndex handling

Summary

Positive Observations ✅

1. Correct Architecture Changes

2. Naming Consistency

3. Good Diagnostics

4. Proper Initial State Handling

Issues & Concerns 🔴

1. Critical: Race Condition Risk

2. Medium: WebSocket Close Cleanup Missing

3. Medium: Missing Test Coverage

4. Low: Inconsistent Error Handling

Minor Suggestions 💡

1. Type Safety

2. Magic Number

3. Logging Consistency

Performance Considerations

1. Ack Queue Flushing

2. Linear Search

Security Considerations

Verdict

Required Changes:

Recommended Changes:

Uh oh!

claude bot commented Nov 6, 2025

PR Review: Hibernatable WebSocket msgIndex Handling

Summary

Code Quality ✅

Architecture Improvements 🎯

Potential Issues ⚠️

1. Race Condition in Message Handler (actor-driver.ts:622-696)

2. Duplicate Cleanup Logic (actor-driver.ts:699-776)

3. Missing Validation (actor-driver.ts:191-195)

4. Type Safety (actor-driver.ts:626, 708, 745)

Performance Considerations 💡

Security Concerns 🔒

Test Coverage ❌

Schema Changes ✅

vercel bot commented Nov 6, 2025 •

edited

Loading

NathanFlurry commented Nov 6, 2025 •

edited by jog1t

Loading

pkg-pr-new bot commented Nov 6, 2025 •

edited

Loading