-
Notifications
You must be signed in to change notification settings - Fork 134
fix(gas): optimize loops #3475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: graphite-base/3475
Are you sure you want to change the base?
fix(gas): optimize loops #3475
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
3 Skipped Deployments
|
PR Review: fix(gas): optimize loopsSummaryThis PR optimizes loop performance in the workflow engine by:
Code Quality & Best PracticesStrengths:
Concerns:
Potential BugsCritical Issue: The optimization introduces a correctness bug. Looking at line 994: if iteration % LOOP_ITERS_PER_COMMIT == 0 {
self.db.upsert_workflow_loop_event(...)
}The modulo check means:
But the loop event was already created at iteration 0 (lines 892-904), so iteration 20 would be the first update. However, if the loop breaks at iteration 15, the final state IS committed (lines 1021-1032), so that case is handled. The real issue: If a workflow crashes/restarts between commits (e.g., at iteration 15), it will replay from iteration 0, not iteration 15. This changes the recovery semantics. Performance ConsiderationsPositive:
Concerns:
Security ConcernsNo security issues identified. The changes are confined to internal workflow state management. Test CoverageMissing tests for the optimization:
Recommended tests: // Test that loop state is persisted every LOOP_ITERS_PER_COMMIT iterations
#[test]
async fn test_loop_commit_batching() {
// Run a loop for 50 iterations
// Kill and restart the workflow at iteration 35
// Verify it resumes from iteration 20 (last commit), not iteration 0
}
// Test that final state is always committed on Loop::Break
#[test]
async fn test_loop_final_state_commit() {
// Run loop that breaks at arbitrary iteration (e.g., 15)
// Verify final state is persisted correctly
}Recommendations
Questions for Author
Overall Assessment: The optimization is well-intentioned but potentially introduces a subtle correctness issue around crash recovery. I'd recommend either:
Happy to discuss further! 🤖 Generated with Claude Code |
bbe84e7 to
e83ac9b
Compare
e5b2d7f to
eda671f
Compare
PR Review: fix(gas): optimize loopsSummaryThis PR optimizes loop performance in the workflow engine by:
Code Quality & Best PracticesStrengths:
Concerns:
Potential BugsCritical Issue: The optimization introduces a correctness bug. Looking at line 994: if iteration % LOOP_ITERS_PER_COMMIT == 0 {
self.db.upsert_workflow_loop_event(...)
}The modulo check means:
But the loop event was already created at iteration 0 (lines 892-904), so iteration 20 would be the first update. However, if the loop breaks at iteration 15, the final state IS committed (lines 1021-1032), so that case is handled. The real issue: If a workflow crashes/restarts between commits (e.g., at iteration 15), it will replay from iteration 0, not iteration 15. This changes the recovery semantics. Performance ConsiderationsPositive:
Concerns:
Security ConcernsNo security issues identified. The changes are confined to internal workflow state management. Test CoverageMissing tests for the optimization:
Recommended tests: // Test that loop state is persisted every LOOP_ITERS_PER_COMMIT iterations
#[test]
async fn test_loop_commit_batching() {
// Run a loop for 50 iterations
// Kill and restart the workflow at iteration 35
// Verify it resumes from iteration 20 (last commit), not iteration 0
}
// Test that final state is always committed on Loop::Break
#[test]
async fn test_loop_final_state_commit() {
// Run loop that breaks at arbitrary iteration (e.g., 15)
// Verify final state is persisted correctly
}Recommendations
Questions for Author
Overall Assessment: The optimization is well-intentioned but potentially introduces a subtle correctness issue around crash recovery. I'd recommend either:
Happy to discuss further! 🤖 Review generated with Claude Code |
eda671f to
2f81c6e
Compare
Code ReviewThanks for the optimization work on the workflow loop functionality! I've reviewed the changes and have the following feedback: ✅ Positive Aspects
🐛 Potential IssuesCritical: State Loss on CrashLocation: The new batching logic only commits state every 20 iterations: if iteration % LOOP_ITERS_PER_COMMIT == 0 {
self.db.upsert_workflow_loop_event(...).await?;
}Problem: If a workflow crashes between commits (e.g., at iteration 15), the workflow will replay from the last checkpoint (iteration 0) instead of continuing from iteration 15. This could cause:
Impact: For workflows with expensive loop iterations or non-idempotent operations, this could be severe. Recommendation:
Potential Race Condition in Option::take()Location: if let Some(loop_event_commit_fut) = loop_event_commit_fut.take() {The Recommendation: Consider adding a comment explaining that this only executes on iteration 0, or restructure to make this more explicit. 📊 Performance ConsiderationsPositive:
Question:
🧪 Test CoverageCurrent Coverage: The existing
Recommendation: Add tests for: // Test that state is properly recovered after crash mid-batch
#[test]
async fn test_loop_state_recovery_mid_batch() {
// Loop with 25 iterations, crash at iteration 15
// Verify it resumes from iteration 0 (or 20, depending on intended behavior)
}
// Test that final state is always committed
#[test]
async fn test_loop_final_state_commit() {
// Loop with 15 iterations (not a multiple of 20)
// Verify final state is properly saved
}🔒 Security ConcernsNo security concerns identified - the changes don't introduce any new attack surfaces. 🎯 Suggested Improvements
/// How often to commit loop event data to db and mark previous loop history as forgotten.
///
/// WARNING: Setting this too high means workflows will replay more iterations on crash/restart.
/// Loop bodies should be idempotent to handle replays safely.
const LOOP_ITERS_PER_COMMIT: usize = 20;
metrics::LOOP_STATE_COMMITS.increment(1);
metrics::LOOP_ITERATIONS_SKIPPED.record((iteration % LOOP_ITERS_PER_COMMIT) as u64);SummaryThis is a solid performance optimization with good parallelization strategy. The main concern is the potential for state loss on crashes, which may or may not be acceptable depending on your consistency requirements. I'd recommend:
Overall: ✅ Approve with recommendations - the optimization is valuable, but please address the state loss concern with documentation at minimum. |
Code ReviewI've reviewed the loop optimization changes in this PR. Overall, this is a well-thought-out performance optimization with good attention to detail. Here are my findings: ✅ Strengths
🔍 Potential Issues1. State Loss Risk on Failure (Moderate Severity)With the new batching strategy, if a workflow crashes between commits, up to 19 iterations of state could be lost. The workflow would resume from the last committed iteration, potentially repeating work. Current behavior:
Consider:
2. Missing Test Coverage for New BehaviorThe existing loop test only runs 3 iterations, which won't exercise the batching logic (kicks in at iteration 20). Recommendation:
Example test: #[tokio::test]
async fn test_loop_many_iterations() {
// Test with 50+ iterations to verify batching behavior
let workflow_id = test_ctx
.workflow(LoopWorkflowInput { iterations: 50 })
.dispatch()
.await
.unwrap();
// Verify completion and state handling
}3. Potential Indexing Issue (Minor)At line 994, the condition uses: if iteration % LOOP_ITERS_PER_COMMIT == 0This means commits happen at iterations 0, 20, 40, etc. However, iteration 0 is already committed in the initial setup (lines 891-904), so iteration 20 would be the first batched commit. This is likely intentional but worth confirming. Verify: Is it expected that iteration 0 is committed twice (once in setup, once in the modulo check)? 4. Database Load ConsiderationThe
This is now called every 20 iterations instead of every iteration, which is good. However, for very long-running loops (thousands of iterations), this could still create periodic latency spikes. Consider: Monitor the performance impact of these operations in production. 🛡️ Security & Error HandlingGood:
📊 Code QualityExcellent:
📝 Minor Suggestions
/// How often to commit loop event data to db and mark previous loop history as forgotten.
/// Set to 20 to balance between state persistence and database write pressure.
const LOOP_ITERS_PER_COMMIT: usize = 20;
SummaryThis is a solid optimization that should significantly improve performance for loop-heavy workflows. The main recommendation is to add test coverage for the >20 iteration case and document the state loss trade-off for users. Recommendation: Approve with minor suggestions ✅ 🤖 Generated with Claude Code |

No description provided.