Skip to content

Conversation

@MasterPtato
Copy link
Contributor

No description provided.

@vercel
Copy link

vercel bot commented Nov 22, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
rivetkit-serverless Error Error Nov 26, 2025 2:17am
3 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
rivet-cloud Ignored Ignored Nov 26, 2025 2:17am
rivet-inspector Ignored Ignored Preview Nov 26, 2025 2:17am
rivet-site Ignored Ignored Preview Nov 26, 2025 2:17am

Copy link
Contributor Author

MasterPtato commented Nov 22, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Nov 22, 2025

PR Review: fix: make all uses of protocol handle v4 <-> v3

Summary

This PR introduces protocol version handling to support both mk1 (v1-v3) and mk2 (v4) protocol versions across the Pegboard system. The changes ensure backward compatibility when communicating between gateways, runners, and clients using different protocol versions.


Code Quality & Best Practices ✅

Strengths:

  • Clear separation between mk1 and mk2 protocol handling with explicit version checks using protocol::is_mk2()
  • Well-organized versioned data structures in versioned.rs with comprehensive conversion functions between all protocol versions
  • Consistent use of PROTOCOL_MK1_VERSION and PROTOCOL_MK2_VERSION constants instead of magic numbers
  • Good use of Rust's type system to enforce correct protocol handling at compile time

Minor Suggestions:

  1. In engine/packages/pegboard/src/workflows/actor/mod.rs:323-338, there are multiple TODO comments for mk2:

    if protocol::is_mk2(runner_protocol_version) {
        // TODO: Send message to tunnel
    } else {

    These TODOs appear at lines 323, 358, 505, 523, and 651. Should these be implemented before merging, or tracked as follow-up work?

  2. The Event type in actor_event_demuxer.rs now uses protocol::mk2::Event exclusively. Verify this is intentional and that mk1 events are properly converted before reaching this code path.


Potential Bugs or Issues ⚠️

  1. Protocol Version Default Handling (engine/packages/pegboard/src/workflows/actor/mod.rs:779-781):

    pub struct Allocate {
        #[serde(default)]
        pub runner_protocol_version: Option<u16>,
    }

    When runner_protocol_version is None, ensure the code handles this gracefully. In the lifecycle loop, there's a guard:

    let (Some(runner_id), Some(runner_workflow_id), Some(runner_protocol_version)) = ...

    But earlier code paths may not have this protection.

  2. Error Silencing (tunnel_to_ws_task.rs:84-88):

    Err(err) => {
        tracing::error!(?err, "failed to parse tunnel message");
        return Ok(());
    }

    Silently returning Ok(()) on parse errors could hide protocol compatibility issues. Consider adding a metric or more context about which protocol version was expected vs received.

  3. Gateway-to-Runner Protocol Mismatch (shared_state.rs:185-203):
    The gateway reads runner_protocol_version from UDB and uses it to serialize messages. If there's a race condition where the runner reconnects with a different protocol version, messages could be serialized incorrectly. Consider if staleness of this value could cause issues.


Performance Considerations 📊

  1. Database Read Per Request (lib.rs:156-172):

    let (mut stopped_sub, runner_protocol_version) = tokio::try_join!(
        self.ctx.subscribe::<...>(...),
        udb.run(|tx| async move {
            tx.read(&pegboard::keys::runner::ProtocolVersionKey::new(runner_id), ...)
        })
    )?;

    This reads the protocol version from UDB for every HTTP/WebSocket request. Consider caching this value in SharedState or InFlightRequest after the first lookup for a given runner.

  2. Version Conversion Overhead: The conversion functions in versioned.rs (e.g., to_client_tunnel_message_v4_to_v3) allocate new structs. For high-throughput message passing, this could add GC pressure. This is likely acceptable but worth monitoring.


Security Concerns 🔒

No significant security concerns identified. The protocol versioning doesn't introduce new attack surfaces, and the conversion functions properly handle all message types without data loss.


Test Coverage 📝

The PR doesn't appear to include new tests for the protocol conversion logic. Consider adding:

  • Unit tests for versioned.rs conversion functions (especially edge cases like DeprecatedTunnelAck conversion)
  • Integration tests verifying mk1 clients can communicate with mk2 runners and vice versa
  • Tests for protocol version boundary conditions

Additional Notes

  1. TypeScript Changes (engine/sdks/typescript/runner-protocol/src/index.ts):
    The TypeScript SDK also needs updates to handle the new protocol. Verify the runner_protocol_version field is properly exposed and the checkpoint structure changes are reflected.

  2. Schema Changes (engine/sdks/schemas/runner-protocol/v4.bare):
    The v4 schema changes look correct. The addition of checkpoint field to CommandWrapper and actor_id removal from CommandStartActor is a clean restructuring.

  3. Naming Consistency: The PR uses both "mk1/mk2" and "v1-v4" naming. Consider documenting the relationship (mk1 = v1-v3, mk2 = v4) in a code comment for future maintainers.


Verdict

This is a solid implementation of protocol version handling with good backward compatibility design. The main concerns are:

  1. TODO comments - Are these blocking for this PR?
  2. Missing tests - Would be good to add protocol conversion tests
  3. Performance - Consider caching protocol version lookups

🤖 Generated with Claude Code

@claude
Copy link

claude bot commented Nov 25, 2025

Pull Request Review: fix: make all uses of protocol handle v4 <-> v3

Summary

This PR implements comprehensive protocol version compatibility handling between v4 (mk2) and v3 (mk1) versions across the pegboard system. The changes ensure that the system can communicate with runners using different protocol versions.

Code Quality & Best Practices ✅

Strengths:

  • Comprehensive version handling: The PR properly separates mk1 (v1-v3) and mk2 (v4) protocol versions with dedicated conversion functions
  • Consistent pattern usage: Uses protocol::mk2 module aliasing throughout for clarity
  • Database reads for protocol version: Properly reads runner_protocol_version from the database before sending messages
  • Proper error handling: Conversion functions return Result types and handle errors appropriately
  • Good separation of concerns: Keeps versioning logic in the versioned.rs module

Observations:

  • The PR adds universaldb dependency to pegboard-gateway/Cargo.toml appropriately (line 34 of diff)
  • Import changes consistently use rivet_runner_protocol::mk2 as rp pattern in actor-kv package
  • Protocol version checks use the helper function protocol::is_mk2() consistently

Potential Issues & Concerns ⚠️

  1. Missing ping implementation for mk1 protocol (engine/packages/pegboard-runner/src/ping_task.rs:29-39)

    • The code only sends pings for mk2 protocol:
    if protocol::is_mk2(conn.protocol_version) {
        let ping_msg = ...
        conn.ws_handle.send(Message::Binary(ping_msg_serialized.into())).await?;
    }
    • Issue: No ping is sent for mk1 protocol, which could lead to connection timeout issues
    • Recommendation: Add an else branch to handle mk1 ping messages or document why pings aren't needed for mk1
  2. Event dispatch type mismatch (engine/packages/pegboard-runner/src/actor_event_demuxer.rs:33-36)

    • The ingest method signature changed to use protocol::mk2::Event:
    pub fn ingest(&mut self, actor_id: Id, event: protocol::mk2::Event)
    • Issue: This assumes all events are mk2 format. If mk1 events are ingested, they need conversion first
    • Recommendation: Verify that all call sites convert mk1 events to mk2 before calling ingest
  3. Incomplete TODO comments (multiple locations)

    • Lines 325, 359, 505, 523, 651 in engine/packages/pegboard/src/workflows/actor/mod.rs contain:
    if protocol::is_mk2(runner_protocol_version) {
        // TODO: Send message to tunnel
    } else {
        ctx.signal(crate::workflows::runner::Command { ... }).send().await?;
    }
    • Issue: Critical functionality for mk2 protocol is not implemented (stop commands, etc.)
    • Impact: mk2 runners won't receive stop/sleep commands properly
    • Recommendation: Either implement these TODOs or file follow-up issues and document the limitation
  4. Conversion function coverage (engine/sdks/rust/runner-protocol/src/versioned.rs:1982)

    • The v3→v4 conversion for DeprecatedTunnelAck falls back to ToServerResponseAbort:
    v3::ToServerTunnelMessageKind::DeprecatedTunnelAck => {
        // v4 removed DeprecatedTunnelAck, this should not occur in practice
        // but if it does, we'll convert it to a response abort as a safe fallback
        v4::ToServerTunnelMessageKind::ToServerResponseAbort
    }
    • Observation: The comment says "should not occur" but uses a fallback anyway
    • Recommendation: Consider logging a warning if this case is hit to help detect unexpected behavior
  5. Protocol version storage (engine/packages/pegboard/src/workflows/runner2.rs:285-289)

    • Protocol version is stored in the database but I don't see where it's initially written for mk1 runners
    • Recommendation: Verify that mk1 runners have their protocol version properly initialized in the database

Performance Considerations ⚡

  1. Double database reads in gateway (engine/packages/pegboard-gateway/src/lib.rs:157-171, 301-315)

    • The code reads runner_protocol_version from the database twice in the same request flow (HTTP and WebSocket paths)
    • Recommendation: Consider caching the protocol version in the gateway's shared state after the first read
  2. Message serialization overhead

    • Each message now checks protocol version and serializes accordingly, which adds a branch prediction cost
    • Assessment: This is acceptable given the need for version compatibility, but worth monitoring

Security Concerns 🔒

No major security issues identified. The changes maintain the existing security model and don't introduce new attack vectors.

Test Coverage 🧪

Issue: The PR doesn't appear to include any new tests for the v4↔v3 conversion logic.

Recommendations:

  • Add unit tests for the conversion functions in versioned.rs
  • Add integration tests that verify mk1 and mk2 runners can coexist
  • Test edge cases like:
    • mk1 runner receiving messages intended for mk2
    • Upgrading a runner from mk1 to mk2 mid-session
    • WebSocket hibernation with mixed protocol versions

Architecture & Design 🏗️

Strengths:

  • Clean separation between mk1 and mk2 protocols using separate enum variants
  • Centralized versioning logic in the versioned module
  • Database-driven protocol version detection avoids hardcoding

Concerns:

  • The mixing of protocol (mk1) and protocol::mk2 types throughout the codebase could become confusing
  • Consider whether a unified interface with internal version handling would be cleaner long-term

Action Items Summary

High Priority:

  1. ✅ Implement or document the missing mk1 ping handling
  2. ✅ Complete the TODO items for mk2 stop/sleep commands
  3. ✅ Add tests for protocol version conversion

Medium Priority:
4. ✅ Verify protocol version initialization for mk1 runners
5. ✅ Consider caching protocol version in gateway shared state
6. ✅ Verify event type conversion before ingest() calls

Low Priority:
7. Add logging for unexpected conversion cases
8. Document protocol version upgrade path

Conclusion

This is a solid foundation for protocol versioning with good separation of concerns. However, the incomplete TODO items for mk2 protocol handling are blocking issues that should be addressed before merging. The missing ping implementation for mk1 should also be clarified.

Recommendation: Request changes to address high-priority items before merging.

@claude
Copy link

claude bot commented Nov 25, 2025

Pull Request Review

Summary

This PR implements bidirectional protocol version compatibility between v4 (mk2) and v3 (mk1) of the runner protocol. The changes add protocol version tracking per runner and update all communication paths to handle version conversion appropriately.

✅ Strengths

  1. Comprehensive version handling: The PR thoroughly addresses protocol version conversion across the entire communication stack (gateway, runner, serverless)
  2. Proper database tracking: Added ProtocolVersionKey to persist runner protocol versions in UniversalDB (engine/packages/pegboard/src/keys/runner.rs:187-228)
  3. Clean separation: Introduced separate mk2 module namespace and versioned enums (ToClientMk2, ToServerMk2, ToRunnerMk2) for v4 protocol
  4. Backward compatibility: Maintains support for both protocol versions simultaneously

🔍 Code Quality

Good Practices

  • Proper error handling with Result types throughout
  • Consistent use of tracing for logging
  • Follows repository conventions

Areas for Improvement

1. Error Handling in Ping Task (engine/packages/pegboard-runner/src/ping_task.rs:26-39)

The ping logic only sends pings for mk2 protocol versions but silently does nothing for mk1. Consider adding a log statement or comment explaining why mk1 does not need pings, or implement mk1 ping support if it should be symmetric.

2. Missing Error Context (engine/packages/pegboard-gateway/src/shared_state.rs:166-186)

The database read for protocol version lacks context on failure. Consider adding .with_context() to provide better error messages.

3. Potential Race Condition (engine/packages/pegboard-gateway/src/lib.rs:154-171)

The protocol version is read from the database for each request, but there is no guarantee it exists yet if the runner just connected. What happens if the runner has not written its protocol version yet? Should there be a default/fallback?

🐛 Potential Bugs

1. Deprecated TunnelAck Conversion (engine/sdks/rust/runner-protocol/src/versioned.rs:1979-1982)

The conversion from v3 DeprecatedTunnelAck to v4 maps to ToServerResponseAbort. Converting an acknowledgment to an abort seems semantically incorrect. If this should not occur in practice, consider returning an error instead of silently aborting requests.

2. Inconsistent Protocol Version Checks

The code uses protocol::is_mk2() in many places but the implementation is not visible in the diff. Ensure this function correctly identifies all v4 versions and handles edge cases.

⚡ Performance Considerations

1. Database Reads Per Request (engine/packages/pegboard-gateway/src/lib.rs:164-171, lib.rs:304-315)

The protocol version is read from the database for every HTTP and WebSocket request. This could be a performance bottleneck for high-traffic actors.

Recommendation: Consider caching the protocol version in memory (e.g., in SharedState) since it should not change after runner initialization.

2. Message Conversion Overhead

Every message going through the gateway undergoes version checking and potential conversion. While necessary, consider whether hot paths could benefit from version-specific fast paths.

🔒 Security Concerns

Low Risk: The changes do not introduce obvious security vulnerabilities. Protocol version handling appears safe.

Note: Ensure that the protocol version stored in the database cannot be manipulated by untrusted actors to cause version confusion attacks.

🧪 Test Coverage

Missing Test Evidence: The PR does not include visible test file changes.

Recommendations:

  1. Add integration tests for v3 ↔ v4 protocol conversion
  2. Test runner version upgrade scenarios (v3 runner replaced by v4 runner)
  3. Test hibernation with version mismatches
  4. Add unit tests for all conversion functions in versioned.rs

📋 Minor Issues

  1. Unused import: use vbare::OwnedVersionedData; appears unused in ping_task.rs
  2. Inconsistent logging: Some protocol operations log structured fields while others do not
  3. TypeScript changes: Consider whether TypeScript SDK users are aware of the protocol version changes

🎯 Recommendations

High Priority

  1. Clarify behavior when protocol version is not yet in database
  2. Consider caching protocol version to reduce database load
  3. Add comprehensive tests for version conversion paths

Medium Priority

  1. Add explicit error for DeprecatedTunnelAck conversion instead of silent abort
  2. Document why mk1 pings are skipped (or implement them)
  3. Add structured logging for protocol version decisions

Low Priority

  1. Remove unused imports
  2. Add inline comments explaining mk1/mk2 distinction for future maintainers

✨ Overall Assessment

This is a well-structured PR that comprehensively handles a complex protocol versioning requirement. The main concerns are around:

  • Performance: Database reads per request could be optimized
  • Test coverage: No visible tests for the critical conversion logic
  • Error handling: Some edge cases lack explicit handling

The code follows repository conventions and demonstrates good understanding of the codebase architecture. With the recommended improvements, this will be a solid foundation for multi-version protocol support.

Status: Approve with recommendations for follow-up improvements ✅

@MasterPtato MasterPtato force-pushed the 11-21-fix_make_all_uses_of_protocol_handle_v4_-_v3 branch from 7ddbf54 to 72a5b39 Compare November 26, 2025 02:16
@MasterPtato MasterPtato force-pushed the 11-18-fix_pb_simplify_runner_wf branch from 6f07705 to 9da2f02 Compare November 26, 2025 02:16
@claude
Copy link

claude bot commented Nov 26, 2025

Pull Request Review: Protocol v4 ↔ v3 Compatibility

Overview

This PR implements comprehensive backward compatibility between protocol v4 (mk2) and v3 (mk1), ensuring that all uses of the runner protocol handle version transitions correctly. The changes span 33 files with 1,075 additions and 566 deletions.

✅ Strengths

  1. Comprehensive Version Handling: The PR properly separates mk1 (v1-v3) and mk2 (v4+) protocol versions with dedicated type hierarchies (ToClientMk2, ToRunnerMk2, etc.)

  2. Conversion Infrastructure: Extensive conversion functions between v3 and v4 message types are implemented systematically in versioned.rs

  3. Runtime Protocol Detection: The code correctly reads and caches runner protocol versions from the database before serializing messages:

    // engine/packages/pegboard-gateway/src/lib.rs:164
    runner_protocol_version = udb.run(|tx| async move {
        tx.read(&pegboard::keys::runner::ProtocolVersionKey::new(runner_id), Serializable).await
    })
  4. Consistent Use of is_mk2() Helper: Protocol version checks use a consistent helper function throughout the codebase

⚠️ Issues & Concerns

Critical Issues

  1. Missing Protocol Version Validation (engine/packages/pegboard-runner/src/ping_task.rs:29-38)

    if protocol::is_mk2(conn.protocol_version) {
        let ping_msg = versioned::ToClientMk2::wrap_latest(...);
        // Send ping
    }
    // ❌ No else branch - mk1 clients never receive pings\!

    Impact: Mk1 runners will not receive ping messages, potentially causing connection health checks to fail.

  2. Incomplete TODO Comments - Multiple locations have TODO comments for mk2 message handling:

    • Lines 325, 358, 505, 523, 651 in engine/packages/pegboard/src/workflows/actor/mod.rs
    • All say "TODO: Send message to tunnel" for stop commands

    Impact: Stop commands are not being sent to mk2 runners, which could cause actors to not terminate properly.

  3. Error Handling in Conversion Functions

    • convert_to_server_tunnel_message_kind_v3_to_v2 (line 1622) can fail but error handling is inconsistent
    • convert_to_server_tunnel_message_kind_v2_to_v1 (line 1198) bails on ToServerWebSocketMessageAck which is a v2+ feature

    Risk: Runtime failures when protocol downgrades encounter unsupported message types

  4. Potential Race Condition (engine/packages/pegboard-gateway/src/shared_state.rs:119)

    Entry::Vacant(entry) => {
        entry.insert_entry(InFlightRequest {
            receiver_subject,
            protocol_version,  // ✅ Added
            msg_tx,
            // ...
        });
    }

    The protocol version is fetched and used but there's a window between reading and using it where the runner could upgrade.

Medium Priority Issues

  1. Inconsistent Import Patterns - Some files use rivet_runner_protocol::mk2 as rp while others check is_mk2() at runtime

    • Files like actor-kv/src/entry.rs always use mk2, but the gateway dynamically switches
    • Recommendation: Document which components are mk2-only vs. version-agnostic
  2. Missing Context in Conversions (engine/sdks/rust/runner-protocol/src/versioned.rs:1979)

    v3::ToServerTunnelMessageKind::DeprecatedTunnelAck => {
        // v4 removed DeprecatedTunnelAck, this should not occur in practice
        // but if it does, we'll convert it to a response abort as a safe fallback
        v4::ToServerTunnelMessageKind::ToServerResponseAbort
    }

    Silently converting TunnelAck to ResponseAbort could mask protocol issues.

  3. Database Read Performance - Protocol version is fetched from the database on every request start (lines 164-172 in pegboard-gateway/src/lib.rs)

    • Optimization: Consider caching the runner's protocol version in memory with TTL
  4. TypeScript/BARE Schema Changes - The BARE schema adds runner_protocol_version field but there's minimal validation

    type ToServerlessServerInit {
        runner_id: string
        runner_protocol_version: u16  # New field
    }
    

    Risk: Older clients sending init messages won't include this field

Minor Issues

  1. Code Duplication: The ping/pong message creation is duplicated with slight variations between mk1 and mk2 branches

  2. Logging Inconsistency: Some protocol version mismatches are logged as warnings, others as debug

  3. Default Protocol Version (engine/sdks/rust/runner-protocol/src/versioned.rs:999)

    v4::ToServerlessServer::ToServerlessServerInit(v4::ToServerlessServerInit {
        runner_id: init.runner_id,
        runner_protocol_version: PROTOCOL_MK1_VERSION,  // Hardcoded default
    })

    When converting v3→v4, defaults to mk1 version - this assumption should be documented

🔒 Security Considerations

  • No security issues identified - The conversion logic appears safe and doesn't introduce injection vectors
  • Protocol version is read from trusted database source
  • Message serialization uses well-tested serde_bare library

🚀 Performance Considerations

  1. Database Reads: Every request triggers a database read for protocol version - consider caching
  2. Allocation Pattern: Multiple tokio::try_join\! uses are good for parallelization
  3. Message Conversion Overhead: Runtime protocol conversion adds CPU overhead, but appears necessary for compatibility

📝 Code Quality

  1. Follows CLAUDE.md conventions: ✅ Uses structured logging, proper error handling patterns
  2. Import style: ✅ Imports at top of file (not inline)
  3. Error handling: ⚠️ Some Result types don't use anyhow consistently (see conversion functions)

🧪 Test Coverage

CONCERN: No test files are modified in this PR. Given the complexity of protocol version handling:

  • Need tests for v3→v4 and v4→v3 message conversion
  • Need tests for mixed-version runner scenarios
  • Need tests for error cases (unsupported message types in downgrade scenarios)

📋 Recommendations

  1. HIGH PRIORITY: Complete the TODO comments for mk2 stop command handling
  2. HIGH PRIORITY: Fix the missing else branch in ping_task.rs
  3. MEDIUM: Add integration tests for protocol version compatibility
  4. MEDIUM: Cache protocol versions in memory to reduce database load
  5. LOW: Add metrics for protocol version distribution (protocol_version label on relevant metrics)

Summary

The PR makes good progress on protocol compatibility, but has several incomplete areas (TODOs) and a critical bug where mk1 runners don't receive pings. The conversion infrastructure is well-designed, but needs completion and testing before merge.

Recommendation: Request changes to address the missing ping handling and incomplete stop command TODOs before merging.

@claude
Copy link

claude bot commented Nov 26, 2025

PR Review: Protocol Version Handling (v4 <-> v3)

This PR implements bidirectional protocol version conversion between v4 (mk2) and v3 (mk1) for the runner protocol. Overall, the implementation is comprehensive and well-structured. Here are my findings:

✅ Strengths

  1. Comprehensive conversion layer: The versioned.rs file now properly handles v4 <-> v3 conversions with dedicated conversion functions for tunnel message types
  2. Protocol version tracking: Added ProtocolVersionKey to store runner protocol version in the database (pegboard/src/keys/runner.rs:187-228)
  3. Gateway integration: pegboard-gateway now reads runner protocol version and uses it to serialize messages appropriately
  4. Consistent mk2 terminology: Uses protocol::mk2 imports throughout for better code organization

🐛 Potential Issues

1. Missing ping message handling for mk1 runners (pegboard-runner/src/ping_task.rs:26-38)

The ping task only sends ping messages for mk2 runners:

if protocol::is_mk2(conn.protocol_version) {
    let ping_msg = versioned::ToClientMk2::wrap_latest(...);
    // send ping
}

Issue: mk1 (v3) runners won't receive ping messages, which could cause them to timeout. Either add an else branch for mk1 pings or document why they're not needed.

2. Potential race condition in protocol version reads (pegboard-gateway/src/lib.rs:159-171 & 301-314)

The code reads runner protocol version from the database in parallel with subscribing to actor stopped events:

let (mut stopped_sub, runner_protocol_version) = tokio::try_join!(
    self.ctx.subscribe(...),
    udb.run(|tx| async move { 
        tx.read(&ProtocolVersionKey::new(runner_id), Serializable).await
    })
)?;

Issue: If the runner hasn't fully initialized yet, the protocol version might not be set. Consider adding error handling or a default fallback.

3. Inconsistent error handling in version conversion (versioned.rs:1979-1982)

When converting DeprecatedTunnelAck from v3 to v4:

v3::ToServerTunnelMessageKind::DeprecatedTunnelAck => {
    // v4 removed DeprecatedTunnelAck, this should not occur in practice
    // but if it does, we'll convert it to a response abort as a safe fallback
    v4::ToServerTunnelMessageKind::ToServerResponseAbort
}

Issue: Silently converting to ResponseAbort could hide protocol issues. Consider logging a warning when this occurs or returning an error.

4. Missing protocol version validation

The code doesn't validate that the protocol version read from the database is actually a supported mk2 version (v4). Consider adding validation:

if !protocol::is_mk2(runner_protocol_version) {
    // handle unsupported version
}

🔒 Security Considerations

  1. Protocol downgrade attacks: No apparent protections against protocol downgrade. A malicious runner could advertise v4 support but actually use v3. Consider adding version verification.
  2. Version mismatch detection: Add logging or metrics when protocol version mismatches occur to detect anomalies.

🎯 Code Quality

  1. Type aliasing: Good use of use rivet_runner_protocol::mk2 as rp in actor-kv package
  2. Structured logging: Properly uses tracing with structured fields (e.g., tracing::info!(?x, "foo"))
  3. Import organization: Imports are properly organized at file level

⚠️ Minor Issues

  1. Redundant mk2 check in tunnel_to_ws_task.rs: The code has removed the init_rx wait logic but the comment structure suggests this was intentional for mk2. Consider documenting why this changed.

  2. ToServerlessServerInit conversion (versioned.rs:999-1000): When converting v3 to v4, it hardcodes PROTOCOL_MK1_VERSION:

runner_protocol_version: PROTOCOL_MK1_VERSION,

This is correct for v3->v4 conversion but worth documenting.

📋 Recommendations

  1. Add integration tests for v3<->v4 conversion edge cases
  2. Add metrics for protocol version distribution across runners
  3. Document the migration strategy from v3 to v4
  4. Consider adding a compatibility matrix in documentation
  5. Add tests verifying hibernation works correctly across protocol versions

✨ Overall Assessment

This is a solid implementation of protocol version handling. The main concerns are around:

  • Missing ping handling for mk1 runners (potentially breaking)
  • Race conditions when reading protocol version
  • Error handling for edge cases

Once these issues are addressed, this should be ready to merge.

Recommended next steps:

  1. Fix mk1 ping message handling
  2. Add error handling for missing protocol version
  3. Add logging for protocol version mismatches
  4. Test with mixed v3/v4 runners

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants