-
Notifications
You must be signed in to change notification settings - Fork 135
fix(gas): fix loop forgotten bug due to concurrency #3482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 11-17-fix_pb_rewrite_runner_wf_to_handle_batch_signals
Are you sure you want to change the base?
fix(gas): fix loop forgotten bug due to concurrency #3482
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
3 Skipped Deployments
|
More templates
@rivetkit/cloudflare-workers
@rivetkit/db
@rivetkit/framework-base
@rivetkit/next-js
@rivetkit/react
rivetkit
@rivetkit/sql-loader
@rivetkit/engine-runner
@rivetkit/engine-runner-protocol
commit: |
PR Review: fix(gas): fix loop forgotten bug due to concurrencySummaryThis PR fixes a critical concurrency bug in the gasoline workflow engine where loop event history was being incorrectly cleared, potentially losing events from concurrent iterations. Core Fix Analysis (engine/packages/gasoline/src/db/kv/mod.rs)The Problem: The Solution:
This ensures only past iterations are moved to forgotten history, preserving the current iteration's events. Code Quality: ✅ Excellent
Other Changes Review1. pegboard-runner/src/conn.rs - Workflow migration-workflow(pegboard::workflows::runner::Input {
+workflow(pegboard::workflows::runner2::Input {Concern: Questions:
2. pegboard/src/workflows/runner2.rs - Init signal handling-if init.is_none() {
- init = Some(init_sig);
- check_queue = true;
-}
+init = Some(init_sig);
+check_queue = true;Concern: Potential Issue:
Recommendation: If this change is intentional (e.g., supporting re-initialization), add a comment explaining why. Otherwise, consider keeping the guard or at least logging when init is being overwritten. 3. Cargo.toml - Commented release profile options+# strip = true
+# panic = "abort"
+# overflow-checks = false
+# debug-assertions = falseIssue: ❌ These commented-out options should not be in the PR Recommendation: Remove these lines. They appear to be experimental/debugging configurations that were accidentally included. 4. scripts/docker/build-push.sh - Commented build arg+# --build-arg CARGO_BUILD_MODE=release \Issue: ❌ Commented code in scripts Recommendation: Remove if not needed, or uncomment if intended to be active. 5. scripts/run/docker/engine-rocksdb.sh - Trailing newlineIssue: This is inconsequential but typically files should end with a newline per POSIX standards. Security Considerations✅ No security concerns identified. The fix actually improves data integrity by preventing unintended data loss. Performance Considerations✅ Improved Performance: The bounded range query is more efficient than the original unbounded subspace range, as it scans fewer keys. Test CoverageConcerns:
Recommendations:
Example test structure: #[workflow(ConcurrentLoopTestWorkflow)]
pub async fn concurrent_loop_test(ctx: &mut WorkflowCtx, input: &Input) -> Result<()> {
ctx.loope(0, move |ctx, state| {
async move {
// Trigger some events during iteration
ctx.activity(...).await?;
if *state >= 5 {
return Ok(Loop::Break(()));
}
*state += 1;
Ok(Loop::Continue)
}.boxed()
}).await?;
// Verify all iteration events are in forgotten history
Ok(())
}Summary of RecommendationsMust Fix:
Should Address:
Nice to Have:
Overall AssessmentThe core bug fix is excellent and addresses a real concurrency issue with a correct, performant solution. However, the PR includes several unrelated or questionable changes that should be addressed before merging. Recommendation: Request changes to clean up the non-essential modifications and clarify the intentional behavior changes in runner2.rs. |
4eabd07 to
e68ccc5
Compare
1a4f15a to
4294775
Compare
Code ReviewSummaryThis PR fixes a critical concurrency bug in the workflow loop history management where events from all loop iterations were being moved to forgotten history, not just events from completed iterations. The fix correctly bounds the range query to only include events up to the current iteration. ✅ Positives
🔍 Issues & ConcernsCritical: Missing Key Deletion (
|
PR Review: Fix loop forgotten bug due to concurrencySummaryThis PR addresses a critical bug in the workflow engine's loop event history management. The fix ensures that when moving loop events to forgotten history, only events from iterations 0 through (current iteration - 1) are moved, preventing concurrent loop iterations from incorrectly archiving each other's events. Code Quality & Best Practices ✅Main Fix (gasoline/src/db/kv/mod.rs:2579-2634)Good:
Observation:
Workflow Fix (pegboard/src/workflows/runner2.rs:103-108)Good:
Potential Issue: Question: Is this the intended behavior? Consider these scenarios:
Suggestion: Consider either:
Potential Bugs & Issues1. Edge Case: iteration == 0
|
e68ccc5 to
9a87fd1
Compare
4294775 to
5458e50
Compare
Code ReviewSummaryThis PR fixes a critical bug in the workflow loop event history management where events from all iterations (including future ones) were being moved to forgotten history instead of only past iterations. The fix also includes some related cleanup in the pegboard runner workflow. Critical Bug Fix ✅Location: The Problem: The Solution:
This ensures only past iterations are moved to forgotten history, which is the correct behavior. Code Quality: ⭐⭐⭐⭐⭐
Secondary Changes1. Pegboard Runner Init Logic (
|
PR Review: Fix loop forgotten bug due to concurrencyOverviewThis PR fixes a critical bug in the workflow engine's loop history management where concurrent operations could cause events from ALL iterations of a loop to be moved to forgotten history, rather than just the events from completed iterations up to the current one. Code Quality & ImplementationMain Fix (gasoline/src/db/kv/mod.rs) ✅ The core fix correctly addresses the concurrency issue by:
The comments are also improved, making the intent clearer ("up to the current iteration" instead of just "current events"). Potential Issues & Concerns1. Minor: Iteration = 0 Edge Case When iteration.saturating_sub(1) // = 0 when iteration is 0This means the range would be
Recommendation: Add a guard condition: if iteration > 0 {
// Only move events to forgotten if we have completed iterations
let loop_events_subspace_start = ...
// ... rest of the code
}2. Configuration Changes (Cargo.toml) The commented-out release profile optimizations are concerning: # strip = true
# panic = "abort"
# overflow-checks = false
# debug-assertions = false
Recommendation: Remove these lines or add a comment explaining their purpose. 3. Unrelated Changes Several changes appear unrelated to the loop forgotten bug:
Recommendation: Consider splitting these changes into separate commits or PRs for clearer history. If they're related to reproducing/testing the bug, please document that relationship. Performance Considerations✅ Good: The fix is more efficient than the original - it uses precise range bounds instead of iterating over all loop events. Security Concerns
Test Coverage❌ Missing: No test changes are included in this PR. Recommendations:
Additional ObservationsLogging: Consider adding structured logging around the forgotten history logic: tracing::debug!(
?from_workflow_id,
?location,
iteration,
"moving loop events to forgotten history"
);This would help diagnose similar issues in the future. SummaryStrengths:
Action Items:
Overall, this is a good fix for a critical bug, but needs test coverage before merging. |

No description provided.