Skip to content

Conversation

@MasterPtato
Copy link
Contributor

No description provided.

@vercel
Copy link

vercel bot commented Dec 10, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

4 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
rivet-cloud Ignored Ignored Dec 10, 2025 11:39pm
rivet-inspector Ignored Ignored Preview Dec 10, 2025 11:39pm
rivet-site Ignored Ignored Preview Dec 10, 2025 11:39pm
rivetkit-serverless Skipped Skipped Dec 10, 2025 11:39pm

@vercel vercel bot temporarily deployed to Preview – rivetkit-serverless December 10, 2025 23:38 Inactive
Copy link
Contributor Author

MasterPtato commented Dec 10, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@pkg-pr-new
Copy link

pkg-pr-new bot commented Dec 10, 2025

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@3622

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@3622

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@3622

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@3622

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@3622

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@3622

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@3622

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@3622

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@3622

commit: 161ffc2

@claude
Copy link

claude bot commented Dec 10, 2025

Pull Request Review: fix(backport): fix serverless

I've reviewed the changes in this PR. Overall, the implementation addresses several critical issues with serverless functionality and workflow reliability. Here's my detailed feedback:


Strengths

  1. Improved Error Context: Adding .context() calls throughout (e.g., epoxy/src/ops/propose.rs) significantly improves debuggability.

  2. Better Activity Error Handling: Including activity names in error variants (ActivityFailure, ActivityTimeout, ActivityMaxFailuresReached) makes debugging much easier. This is a good practice.

  3. Robust CPU Monitoring: The new cpu_usage_ratio() implementation in system.rs properly handles cgroups v1/v2 and falls back gracefully to process-level metrics.

  4. Transaction Timeout Handling: Using EARLY_TXN_TIMEOUT to prevent long-running transactions from blocking is a smart optimization for scalability.

  5. Workflow Revival Feature: The new revive_workflows command is well-designed with parallelization support and dry-run mode.


Critical Issues

1. Potential Data Race in Metrics Counting (engine/packages/gasoline/src/db/kv/mod.rs:698-705)

The change from entry.1 += 1 to entry.1 += count assumes count exists in scope, but I don't see where this variable comes from in the diff. This will cause a compilation error.

Fix: Verify that the metric reading logic properly deserializes a count value, or if this should remain += 1.


2. HashMap Iteration Without Ordering (engine/packages/gasoline/src/db/kv/mod.rs:1152)

Changed from Vec to HashMap for deduplication, which loses insertion order. While the comment says "Sort for consistency across all workers", the actual sorting happens on wake_keys, not the deduplicated workflows. This could lead to non-deterministic workflow pulling order.

Fix: Consider using IndexMap to preserve insertion order, or ensure the final dedup_workflows collection is sorted before processing.


3. Missing Null Check (engine/packages/gasoline/src/db/kv/mod.rs:1165)

After the insertion block, the code continues with logic that expects wf to exist, but it won't because of the continue. The diff is truncated, but this pattern looks suspicious.

Fix: Verify that the logic after this block is correct and doesn't assume wf is still in scope.


Warnings & Suggestions

4. Silent Workflow Wake Failure (engine/packages/gasoline/src/db/kv/debug.rs:695)

Changed from ensure! (which returns an error) to warn + continue. While this is more resilient, it silently skips workflows. Consider tracking and returning the count of skipped workflows so the caller knows some wakes failed.


5. CPU Load Shedding Thresholds (engine/packages/gasoline/src/db/kv/mod.rs:1051)

Changed from cpu_usage.max(100.0) * 10.0 to cpu_usage_ratio * 1000.0. The new thresholds (50%-80%) seem aggressive. Has this been load tested?

Suggestion: Document the rationale for these specific threshold values in a comment.


6. Hardcoded Limit (engine/packages/gasoline/src/db/kv/mod.rs:1181)

Magic number (10000). Consider making this configurable via rivet_config or at least defining it as a named constant.


7. Inconsistent Error Handling in Purger (engine/packages/epoxy/src/workflows/purger.rs:25)

Changed from listen() to listen_n(1024). Why 1024? Is there potential for dropping signals if more than 1024 are buffered?

Suggestion: Add a comment explaining the buffer size choice and whether dropped signals are acceptable.


8. Missing Bounds Check (engine/packages/gasoline/src/db/kv/debug.rs:1212)

Good validation! However, consider whether 1024 is a reasonable upper bound. Document why this limit exists.


Minor Issues

  1. Unused removed() Call (engine/packages/pegboard/src/workflows/actor/destroy.rs:36)

What does removed() do? Is this a version compatibility shim? Needs a comment.


  1. Config Field Documentation (engine/packages/config/src/config/runtime.rs:8)

Clear documentation! However, the fallback behavior ("uses /sys/fs/cgroup/cpu.max, and if that is unset uses total host cpu") should also be documented in the config struct or schema.


Testing Concerns

  1. No Test Coverage for Critical Paths
  • The CPU usage calculation changes are critical for load shedding but I don't see corresponding tests
  • The workflow revival logic needs tests, especially for the parallelization and filtering logic
  • The metrics counting change needs validation

Recommendation: Add unit tests for:

  • cpu_usage_ratio() with mocked cgroup files
  • revive_workflows_inner() with various filter combinations
  • The deduplication logic in workflow pulling

Security

  1. No obvious security issues, but:
  • The workflow revival command could be resource-intensive. Consider rate-limiting or requiring elevated permissions.
  • The database range scans could be exploited for DoS if an attacker can trigger them. The timeout mitigation helps but doesn't eliminate the risk.

Performance

  1. Good Optimizations:
  • Early transaction timeouts prevent blocking
  • Load shedding based on CPU usage is smart
  • Workflow deduplication reduces unnecessary work
  1. Potential Concerns:
  • The HashMap change for deduplication trades insertion order for lookup speed. Profile to ensure this is beneficial.
  • Range scans with timeouts may leave work incomplete. Monitor metrics to ensure this doesn't cause workflow starvation.

Recommendations Summary

Must Fix:

  1. Verify the count variable exists in metrics aggregation
  2. Ensure workflow pulling order is deterministic
  3. Add tests for CPU monitoring and workflow revival

Should Fix:
4. Document load shedding thresholds
5. Make hardcoded limits configurable
6. Add comments for non-obvious code patterns

Consider:
7. Return counts of skipped workflows in wake operations
8. Add rate limiting to revival command
9. Profile HashMap vs Vec performance for workflow deduplication


Adherence to CLAUDE.md

  • Follows error handling patterns with custom error types
  • Uses structured logging (e.g., tracing::info!(?total, ...))
  • Follows timestamp naming conventions (destroy_ts, create_ts)
  • Some areas could use better inline documentation
  • Properly uses workspace dependencies

Overall Assessment: This is a solid improvement to workflow reliability and serverless functionality. The main concerns are around the metrics counting change (potential compilation error) and ensuring deterministic behavior in workflow pulling. Once those are addressed and tests are added, this should be good to merge.

@NathanFlurry NathanFlurry changed the base branch from 12-04-fix_add_serverless_backfill to graphite-base/3622 December 11, 2025 07:15
@MasterPtato MasterPtato changed the base branch from graphite-base/3622 to 12-04-fix_add_serverless_backfill December 11, 2025 20:34
@MasterPtato MasterPtato marked this pull request as ready for review December 11, 2025 21:03
@graphite-app
Copy link
Contributor

graphite-app bot commented Dec 12, 2025

Merge activity

  • Dec 12, 12:48 AM UTC: NathanFlurry added this pull request to the Graphite merge queue.
  • Dec 12, 12:49 AM UTC: CI is running for this pull request on a draft pull request (#3634) due to your merge queue CI optimization settings.
  • Dec 12, 12:49 AM UTC: Merged by the Graphite merge queue via draft PR: #3634.

graphite-app bot pushed a commit that referenced this pull request Dec 12, 2025
@graphite-app graphite-app bot closed this Dec 12, 2025
@graphite-app graphite-app bot deleted the 12-10-fix_backport_fix_serverless branch December 12, 2025 00:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants