Skip to content

Conversation

@MasterPtato
Copy link
Contributor

No description provided.

@vercel
Copy link

vercel bot commented Nov 18, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
rivet-site Ready Ready Preview Comment Nov 18, 2025 2:33am
rivetkit-serverless Ready Ready Preview Comment Nov 18, 2025 2:33am
2 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
rivet-cloud Ignored Ignored Nov 18, 2025 2:33am
rivet-inspector Ignored Ignored Preview Nov 18, 2025 2:33am

Copy link
Contributor Author

MasterPtato commented Nov 18, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Nov 18, 2025

PR Review: Fix batch listen, fix history for graceful signal send in workflows

Summary

This PR introduces two main improvements to the gasoline workflow engine:

  1. Graceful signal sending - adds graceful_not_found() method to signal builders that returns Ok(None) instead of erroring when the target workflow is not found
  2. Batch listen fix - corrects signal batching to properly sort by timestamp and handle indexed input chunks

Code Quality & Best Practices

Strengths:

  • Consistent API design across both common/signal.rs and workflow/signal.rs builders
  • Good documentation with doc comments explaining behavior (lines 107-108, 118-119 in both files)
  • Follows existing error handling patterns using Result<Option<Id>>
  • Proper tracing/logging throughout with structured logging as per CLAUDE.md guidelines
  • Loop optimization improvements maintain good performance characteristics

Suggestions:

  1. Type annotation consistency (gasoline/src/db/kv/mod.rs:1945):

    let mut signals = futures_util::stream::iter(owned_filter.clone())

    The mut is added here, but signals were already being collected into a Vec. This is correct now for the sort operation on line 1980, but consider adding a comment explaining why this needs to be mutable.

  2. Error message clarity (api-peer/src/actors/delete.rs:74-77):
    The warning message could be more specific about which operation failed:

    tracing::warn!(
        actor_id=?path.actor_id,
        "actor workflow not found when sending destroy signal, likely already stopped"
    );

Potential Bugs & Issues

Critical:

  1. Database index insertion logic (gasoline/src/db/kv/debug.rs:931-939 and gasoline/src/db/kv/mod.rs:1505-1512):

    if let Some(input_chunks) = current_event.indexed_input_chunks.get_mut(key.index) {
        input_chunks.push(entry);
    } else {
        current_event.indexed_input_chunks.insert(key.index, vec![entry]);
    }

    This is a good fix! The previous code had a logic error where it would only push to existing indices but never create new ones. This could have caused data loss in signal batching. The fix properly handles both cases.

  2. Timestamp consistency (gasoline/src/db/kv/mod.rs:1980-2012):
    Good catch fixing the timestamp! The code now:

    • Sorts signals by key.ts (line 1980)
    • Uses a single now timestamp for all acks (line 1982, 2011)
    • Applies limit AFTER sorting but BEFORE processing (line 1995)

    This ensures consistent ordering and prevents race conditions.

Minor:

  1. Event type mismatch fix (gasoline/src/db/kv/keys/history.rs:1645):
    EventType::Signal -> EventType::Signals
    Good catch! This was incorrectly using the singular form for batch operations.

Performance Considerations

Improvements:

  1. Loop event upsertion parallelization (gasoline/src/ctx/workflow.rs:872-978):
    The new approach defers loop event upserts to run in parallel with the next iteration. This is a clever optimization that should reduce latency in tight loops. The 4-way join on lines 910-940 is well-documented with comments explaining the parallel execution strategy.

  2. Signal batching optimization (gasoline/src/db/kv/mod.rs:1995):
    Applying .take(limit) after sorting but before processing is more efficient than the previous approach of sorting after collection.

Concerns:

  1. The buffer_unordered(1024) on line 2063 could cause memory pressure if processing 1024 signals simultaneously. Consider if this limit should be configurable or tied to the input limit parameter.

Security Concerns

No security issues identified. The changes:

  • Properly handle missing workflows without exposing sensitive information
  • Maintain transactional integrity with conflict ranges
  • Don't introduce new attack surfaces

Test Coverage

Missing test scenarios:

  1. Test that graceful_not_found() properly creates and replays removed events in workflow history
  2. Test signal batching with timestamps in non-chronological order to verify sorting
  3. Test the indexed input chunks fix with multiple signals in a batch
  4. Test loop event upsertion parallelization with rapid iterations

Recommended test additions:

// Test graceful signal send with replay
#[test]
async fn test_graceful_signal_not_found_replay() {
    // First run: target doesn't exist, should create removed event
    // Second run: should replay removed event correctly
}

// Test batch signal ordering
#[test]
async fn test_batch_signal_timestamp_ordering() {
    // Create signals with out-of-order timestamps
    // Verify they're received in timestamp order
}

Additional Notes

  1. Display text changes (engine/src/util/wf/mod.rs): The changes from "signal receive" to "signal recv" for consistency are good, but ensure this doesn't break any external tooling that parses these strings.

  2. Logging level change (universaldb/src/driver/rocksdb/transaction_conflict_tracker.rs:67): Changing from info! to debug! for conflict logging is appropriate as this could be noisy in production.

  3. TypeScript comment (runner-protocol/src/index.ts:56-58): The TODO comment about renaming createTs to update_ts should be addressed in a future PR to maintain naming consistency per CLAUDE.md conventions.

Verdict

LGTM with minor suggestions

This is a solid improvement to the workflow engine. The graceful signal sending feature is well-implemented with proper history tracking for replay consistency. The batch listen fixes address real bugs in signal ordering and indexed chunk handling. The loop optimization should provide measurable performance improvements.

The code follows project conventions, has good documentation, and maintains backward compatibility. Recommended to merge after addressing the test coverage gaps.


Recommendation: Approve with suggestion to add tests before merge.

@pkg-pr-new
Copy link

pkg-pr-new bot commented Nov 18, 2025

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@3479

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@3479

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@3479

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@3479

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@3479

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@3479

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@3479

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@3479

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@3479

commit: 321b2c3

@MasterPtato MasterPtato force-pushed the 11-17-feat_gas_add_batch_listen branch from 25ece3d to 6ff29f6 Compare November 18, 2025 23:50
@MasterPtato MasterPtato force-pushed the 11-17-fix_gas_fix_batch_listen_fix_history_for_graceful_signal_send_in_workflows branch from 0ca2fa2 to 321b2c3 Compare November 18, 2025 23:50
@claude
Copy link

claude bot commented Nov 18, 2025

Critical Issues - Must Fix Before Merge

  1. Missing Variable Assignments - In several files, the code calls .await? but then tries to use res which was never assigned. This will cause compilation errors.

Affected locations:

  • engine/packages/api-peer/src/actors/delete.rs:67-73
  • engine/packages/guard/src/routing/pegboard_gateway.rs:195-207
  • engine/packages/pegboard-serverless/src/lib.rs:483-507
  • engine/packages/pegboard/src/workflows/runner.rs (4 occurrences)

Example of what needs fixing - change from this pattern:

.graceful_not_found()
.send()
.await?;
if res.is_none() {

To this pattern:

let res = ctx.signal(...)
    .graceful_not_found()
    .send()
    .await?;
if res.is_none() {
  1. Batch Listen Limit Mismatch - In db/kv/mod.rs:1995, the limit is applied AFTER the history event is inserted but BEFORE acknowledgment. This could cause signals beyond the limit to be lost or re-delivered incorrectly.

Positive Changes:

  • Better API ergonomics with graceful_not_found builder pattern
  • Proper history handling via removed events for replay determinism
  • Batch signal sorting by timestamp for correct ordering
  • Loop event upsertion properly parallelized

Recommendations:

  • Fix missing res variable assignments (critical)
  • Verify batch listen limit application (critical)
  • Add tests for graceful signal sending and replay
  • Add more context to debug logs
  • Verify EventType::Signals change backward compatibility

The core concept is solid - adding graceful degradation for missing signal targets is the right approach. However, compilation errors need fixing before merge.

@claude
Copy link

claude bot commented Nov 18, 2025

Code Review: Batch Listen & Graceful Signal Send

This PR introduces important improvements to the workflow signal system, but there are several critical issues that need to be addressed before merging.


Critical Issue: Missing runner2 Module

Severity: Critical - Breaks Compilation

Multiple files reference workflows::runner2 but the module does not exist:

  • engine/packages/pegboard/src/lib.rs:17 - Tries to register runner2::Workflow
  • engine/packages/pegboard/src/workflows/actor/mod.rs:6 - Imports runner2::AllocatePendingActorsInput
  • engine/packages/pegboard-runner/src/ping_task.rs:24 - References runner2::Input
  • engine/packages/pegboard-serverless/src/lib.rs:483+ - References runner2::Stop and runner2::Workflow

However, engine/packages/pegboard/src/workflows/mod.rs only declares pub mod actor and pub mod runner.

Fix Required: Either add pub mod runner2 and create the module file, OR revert all references back to runner if runner2 was intended for a different PR.

@claude
Copy link

claude bot commented Nov 18, 2025

Code Quality Issues

1. Potential Logic Issue in Batch Listen Sort/Limit

Severity: Medium

In engine/packages/gasoline/src/db/kv/mod.rs:1945-2067, the limit is applied twice:

  1. First, when fetching signals from each signal type (up to limit per type)
  2. Second, after sorting all signals by timestamp (taking only limit total)

If there are multiple signal types in the filter, this could fetch more signals than needed. For example, with 3 signal types and limit=10, you might fetch up to 30 signals, then sort and take 10.

Recommendation: Consider using limit: None in the initial fetch, or adjust the per-type limit to account for multiple signal types.

2. Inconsistent Event Type Naming

Severity: Low

The signal batch receive event is called different things across the codebase:

  • In db/debug.rs:125 - signal receive
  • In db/kv/keys/history.rs:1645 - EventType::Signals
  • In engine util/wf/mod.rs - signal recv

Consider standardizing terminology.

@claude
Copy link

claude bot commented Nov 18, 2025

Good Practices Observed

1. Excellent Error Handling Pattern

The graceful_not_found() API is well-designed and much cleaner than the previous pattern of checking for specific error types. This is a clean builder API that improves code readability significantly!

2. Proper History Tracking for Graceful Not Found

The implementation correctly handles replay scenarios by:

  • Checking is_removed() during replay (engine/packages/gasoline/src/builder/workflow/signal.rs:117-126)
  • Creating a removed event when the target is not found (lines 176-190)
  • Comparing removed events during replay to ensure consistency

This maintains workflow determinism properly.

3. Loop Optimization Parallelization

The loop optimization in engine/packages/gasoline/src/ctx/workflow.rs:866-979 is well-documented with detailed comments explaining the parallelization strategy, which is excellent for maintainability.

@claude
Copy link

claude bot commented Nov 18, 2025

Test Coverage

Missing Tests for New Features

Severity: Medium

The PR adds significant new functionality but I do not see corresponding tests:

  1. No test for graceful_not_found() - Should test:

    • Signal to non-existent workflow returns Ok(None)
    • Replay of gracefully-not-found signal works correctly
    • Without graceful_not_found(), signal to non-existent workflow still returns error
  2. No test for batch listen - Should test:

    • Multiple signals received in correct timestamp order
    • Limit is applied correctly
    • Multiple signal types are handled properly

Recommendation: Add integration tests in engine/packages/gasoline/tests/workflows/ following the pattern of signal_test.rs.

@claude
Copy link

claude bot commented Nov 18, 2025

Summary

Must Fix Before Merge:

  1. Resolve the missing runner2 module issue (critical compilation error)

Should Fix:
2. Add test coverage for graceful_not_found and batch listen
3. Review the batch listen limit logic (potential over-fetching)
4. Standardize signal event naming across the codebase

Nice to Have:
5. Review the log level change in universaldb transaction_conflict_tracker.rs:67 (changed from info to debug)


Overall, this is a solid improvement to the signal system with good attention to replay consistency and error handling. The main blocker is the runner2 compilation issue. Once that is resolved and tests are added, this should be ready to merge.

Great work on the graceful error handling pattern!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants