-
Notifications
You must be signed in to change notification settings - Fork 135
fix(pb): rewrite runner wf to handle batch signals #3480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 11-17-fix_gas_fix_batch_listen_fix_history_for_graceful_signal_send_in_workflows
Are you sure you want to change the base?
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
3 Skipped Deployments
|
More templates
@rivetkit/cloudflare-workers
@rivetkit/db
@rivetkit/framework-base
@rivetkit/next-js
@rivetkit/react
rivetkit
@rivetkit/sql-loader
@rivetkit/engine-runner
@rivetkit/engine-runner-protocol
commit: |
Code Review: Rewrite runner workflow to handle batch signalsSummaryThis PR introduces Positive AspectsArchitecture & Design
Error Handling
Code Quality
Issues & ConcernsCritical Issues1. Unused Import (line 11)use vbare::OwnedVersionedData;This import is not used anywhere in the file and should be removed. The project uses workspace dependencies, so unnecessary imports should be cleaned up. Fix: Remove line 11 2. Potential Division by Zero (lines 620, 1057)let remaining_millislots = (remaining_slots * 1000) / input.total_slots;If Recommendation: Add validation or use 3. Missing Signal Handler?The Recommendation: Consider if this is intentional or if the signal handling should be refactored for better compile-time safety Performance Considerations4. Sequential Signal Processing (lines 239-256)// NOTE: This should not be parallelized because signals should be sent in order
// Forward to actor workflows
// Process events
for event in &events {
// ... sends signal to actor workflow
}While the comment explains this must be sequential, this could become a bottleneck with many events. Each signal send is an async operation that must complete before the next begins. Consider:
5. Sequential Allocation Signals (lines 315-321)for alloc in res.allocations {
ctx.signal(alloc.signal)
.to_workflow::<crate::workflows::actor::Workflow>()
.tag("actor_id", alloc.actor_id)
.send()
.await?;
}Similar to #4, these allocations are sent sequentially but could potentially be parallelized since they're going to different actors. Recommendation: Use 6. Message-by-Message Publishing (lines 1140-1147)for message in &input.messages {
let message_serialized = versioned::ToClient::wrap_latest(message.clone())
.serialize_with_embedded_version(PROTOCOL_VERSION)?;
ctx.ups()?
.publish(&receiver_subject, &message_serialized, PublishOpts::one())
.await?;
}Each message is published individually in a loop. If the pubsub system supports batch publishing, this could be optimized. Recommendation: Check if Code Quality Issues7. Commented Out Code (lines 36, 819-825)// events: Vec<EventRow>,
// TODO: Storing events is disabled for now, otherwise state will grow indefinitelyWhile the TODO explains why events aren't stored, the commented code should either be removed or the decision should be finalized. Recommendation: Either implement a proper solution (e.g., event truncation/archival) or remove the dead code 8. Deprecated Field (line 455-456)#[derive(Debug, Serialize, Deserialize)]
struct InitOutput {
/// Deprecated.
evict_workflow_id: Option<Id>,
}If this field is deprecated, there should be a plan to remove it. Is this for backwards compatibility? Should it have a timeline for removal? Recommendation: Add context on when this can be removed or if it's needed for backward compatibility 9. Clone Overhead (line 70, 284)let input = input.clone(); // line 70
commands: commands.clone(), // line 284The input is cloned on every loop iteration. While Recommendation: Consider if these clones are necessary or if references could be used Testing & Documentation10. No Tests IncludedThis is a significant rewrite (1176 lines) that changes core behavior from single-signal to batch-signal processing, but no tests are included in the PR. Critical Recommendation: Add tests covering:
11. Missing Migration PathThe file is named Questions:
Recommendation: Document the migration strategy Minor Issues12. Inconsistent Naming (lines 427-442)
Recommendation: impl Default for LifecycleState {
fn default() -> Self { ... }
}13. Magic Number (line 16)const EVENT_ACK_BATCH_SIZE: i64 = 500;Good use of a constant, but consider if this should be configurable via the config system like other thresholds. Security Concerns14. Unbounded Batch SizeWhile the batch size is limited to 1024 signals (line 77), each signal could contain variable-sized data (e.g., Recommendation: Consider adding size-based limits in addition to count-based limits 15. Error Message Information Disclosure (lines 130-133)bail!(
"received message that should not be sent to runner workflow: {:?}",
sig.inner
)Logging the full message content in an error could potentially expose sensitive information. Recommendation: Log message type only, not full content, or ensure sensitive fields are redacted Performance MetricsThe batch processing approach should provide significant performance improvements:
Recommendation: Add metrics to track:
Summary & RecommendationsMust Fix Before Merge
Should Fix Before Merge
Consider for Future
ConclusionThis is a well-architected improvement to the runner workflow that should significantly improve performance through batch processing. The code quality is generally good with clear separation of concerns and proper error handling. However, the lack of tests for such a critical component is concerning, and the migration strategy needs clarification. With the critical issues addressed, this will be a solid improvement to the system. Great work on the batch processing implementation! The signal accumulation logic is particularly well done. |
b180eca to
4eabd07
Compare
0ca2fa2 to
321b2c3
Compare
Code ReviewI've reviewed PR #3480 which rewrites the runner workflow to handle batch signals. Here's my detailed feedback: OverviewThis PR introduces a new 🎯 Positive Changes
|
PR Review: Rewrite Runner Workflow to Handle Batch SignalsSummaryThis PR rewrites the pegboard runner workflow ( ✅ Strengths1. Performance Optimization
2. Improved State Management
3. Code Quality
|
321b2c3 to
aa9ef2a
Compare
4eabd07 to
e68ccc5
Compare
PR Review: Rewrite Runner Workflow to Handle Batch SignalsI've reviewed this PR and have the following feedback: OverviewThis PR rewrites the runner workflow (creating runner2.rs) to handle batch signal processing instead of processing signals one at a time. This is a significant architectural improvement that should reduce workflow overhead and improve performance. Strengths
Potential IssuesCritical: State Management ConcernLocation: process_init activity (line 740-801) The process_init activity reads from state.last_event_idx and state.commands. The LifecycleState tracks last_event_idx separately from the activity State. When process_init returns missed commands, it compares against State.last_command_idx which might be out of sync with what was actually processed in the loop. Recommendation: Verify state consistency between State and LifecycleState is maintained correctly. Medium: Sequential Event ProcessingLocation: Lines 239-256 Events are still processed sequentially with the comment that they should not be parallelized. However, sending 1024 events one-by-one could become a bottleneck. Consider whether strict ordering is required for all events or if some could be parallelized. Medium: Message Sending PatternLocation: Lines 324-330, 1132-1150 The implementation sends messages in a loop without batching at the protocol level. Consider whether the protocol supports sending multiple messages in a single publish operation, or if this activity could publish in parallel. Test CoverageAre there tests that verify:
Performance ConsiderationsPositive Impact:
Questions:
Migration StrategySince this creates runner2.rs alongside the existing runner.rs:
Overall, this is a solid architectural improvement that should significantly enhance performance. The main concern is ensuring state consistency is maintained correctly with the new batching approach. |
Code Review for PR #3480I've reviewed this pull request which rewrites the runner workflow to handle batch signal processing. Strengths & Improvements
Potential Issues1. Incomplete File (Critical)The file appears truncated at line 1176. Line 944 shows incomplete struct name. Please verify full file was committed. 2. Backward CompatibilitySendMessageToRunnerInput (singular) replaced by SendMessagesToRunnerInput (plural) - could break workflow replays. Need migration strategy. 3. Duplicate CodeBoth runner.rs and runner2.rs exist with nearly identical code - maintenance burden. Need clear deprecation plan. 4. Sequential Actor Signal ProcessingLines 239-256 send signals sequentially. Could become bottleneck under high event volume. Consider adding telemetry. 5. Missing ValidationLine 122 uses .max() for ack indices - silently handles out-of-order acks. Consider logging these cases. Code Quality
TestingMissing:
Action ItemsMust Fix:
Should Address: Nice to Have: OverallSolid performance improvement with good code quality. Main concerns: incomplete file, dual-workflow situation, missing tests. Once addressed, should be ready to merge. Review generated by Claude Code |
aa9ef2a to
dfd167c
Compare
e68ccc5 to
9a87fd1
Compare
Code Review - Batch Signal Processing RewriteThis PR introduces runner2.rs to handle batch signal processing, replacing the single-signal approach with processing up to 1024 signals per iteration. Strengths
Critical Issues1. Unused Import (line 11) 2. Potential Division by Zero (lines 620, 1057) 3. No Test Coverage Important Issues4. Sequential Event Processing (lines 239-256) 5. Sequential Allocation Signals (lines 315-321) 6. Missing Migration Documentation Code Quality
Security & Performance
SummaryMust Fix:
Should Fix: Well-designed improvement with clean batch processing logic. However, lack of tests is concerning and division-by-zero is a potential bug. Great work! |
Pull Request Review: Rewrite Runner Workflow to Handle Batch SignalsOverviewThis PR introduces runner2.rs as a rewrite of the runner workflow with the primary goal of handling batch signals more efficiently. The implementation changes from processing signals one-at-a-time to batched processing (up to 1024 signals per iteration). Key ImprovementsBatch Processing Efficiency (lines 76-78)
Optimized Message Sending (lines 324-329)
Consolidated Command Processing (lines 280-303)
Critical Issue: Migration Path UnclearThe old runner.rs workflow is still present but all callers now use runner2. This raises important questions:
Recommendation: Add comments explaining the migration strategy or plan to remove runner.rs in a follow-up PR. Other IssuesMedium: Activity API Change (lines 832-840)
Medium: Init Signal Deduplication (lines 105-109)
Code Quality AssessmentStrengths:
Testing Recommendations:
Performance ConsiderationsImprovements:
Potential Concerns:
Recommendation: Monitor memory usage and latency metrics after deployment. SecurityNo security issues identified:
SummaryThis is a well-implemented refactor that should significantly improve performance for high-throughput scenarios. The main concern is around migration strategy and ensuring backwards compatibility is properly handled. Action Items:
Approval Status: Approved with recommendations. The code quality is high and the batching approach is sound. Please address the migration path documentation before merging. |
dfd167c to
bcf4d5d
Compare
9a87fd1 to
0560d9b
Compare
PR Review: fix(pb): rewrite runner wf to handle batch signalsSummaryThis PR introduces a new Key Changes
Code Quality & Best PracticesPositives:
Issues:
Potential Bugs
Performance ConsiderationsPositives:
Potential Improvements:
Security ConcernsNo security issues identified. The PR maintains the same trust boundaries and doesn't introduce new attack surfaces. Test CoverageMissing:
Minor Suggestions
VerdictThe batch signal processing is a solid improvement for throughput. The main concerns are:
Overall, this looks like a positive change once the above items are addressed. |
0560d9b to
ff15327
Compare
PR Review: fix(pb): rewrite runner wf to handle batch signalsSummaryThis PR introduces ✅ Code Quality & Best Practices
|
ff15327 to
f0e981f
Compare
457234c to
ea2cfc5
Compare
Pull Request Review: Rewrite Runner Workflow to Handle Batch SignalsThis PR rewrites the runner workflow to handle batch signal processing instead of processing signals one at a time - a significant architectural improvement. Key Strengths
Critical Issues1. Missing Command Acknowledgment Activity 2. Protocol Type Change 3. Init Signal Race 4. State Growth 5. Failure Recovery Testing Recommendations
Questions
RecommendationApprove with minor revisions - Verify command ack logic, add duplicate init warning, add bounds checking, ensure test coverage. Nitpicks
|
Code Review: Runner Workflow Rewrite for Batch Signal HandlingThis PR introduces runner2.rs to handle batch signals efficiently by processing multiple signals per loop iteration instead of one-at-a-time. Positive AspectsPerformance Improvement - Using listen_n_with_timeout to fetch up to 1024 signals at once (line 77) is excellent optimization. Efficient Message Batching - Messages accumulated and sent together reduces network overhead (lines 185, 324-330). Signal Deduplication - Good handling: only first Init processed (line 106), max of AckCommands taken (line 122). Code Structure - Well-organized separation between batching, processing, and dispatch. Issues and Concerns1. Unused Imports - Lines 5, 7, 10, 11 have unused imports. OwnedVersionedData appears unused. Run cargo clippy. 2. Type Inconsistency - Old workflow uses protocol::ToRunner (line 100) but runner2 uses protocol::ToClient (line 205). Verify equivalence. 3. Events Storage - Lines 36, 819-825 show intentionally disabled event storage. Should there be bounded storage for debugging? 4. TODO in Production - Line 667 TODO about combining txn. Create tracking issue if deferred. 5. Integer Overflow - Line 865 casts usize to i64. Use checked arithmetic or document safety. Security and TestingSecurity - Batch size 1024 prevents unbounded growth. Event filtering (line 116) prevents replays. Testing - No tests for batching behavior. Need unit tests for deduplication, integration tests for ordering, edge case tests. Migration PathBoth runner.rs and runner2.rs exist. When will old be removed? Any compatibility concerns for in-flight workflows? SummaryRatings: Code Quality 4/5, Performance 5/5, Security 5/5, Test Coverage 2/5 Must Fix: Remove unused imports, verify type consistency, add test coverage Should Address: Document migration plan, create issue for TODO, use checked arithmetic Well-designed performance improvement. Main concerns are completeness not correctness. Safe to merge with must-fix items addressed. |

No description provided.