-
Notifications
You must be signed in to change notification settings - Fork 135
fix(pb): stop actors from destroying on goingaway #3501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
3 Skipped Deployments
|
Code ReviewOverviewThis PR fixes an issue where actors were incorrectly being destroyed when receiving a Changes AnalysisThe Fix:
Logic Flow:
This is semantically equivalent BUT with the critical addition that Findings✅ Logic CorrectnessThe inverted boolean logic is correct:
The key improvement is that when 🤔 Semantic Clarity ConcernThe variable naming could be clearer:
This is somewhat counterintuitive. A clearer name might be let needs_rescheduling = state.going_away || \!matches\!(
variant,
StoppedVariant::Normal { code: protocol::StopCode::Ok }
);
match (input.crash_policy, needs_rescheduling) {
(CrashPolicy::Restart, true) => { /* ... */ }
(CrashPolicy::Sleep, true) => { /* ... */ }
// ...
}However, this is a minor style preference and doesn't affect correctness. ✅ State ManagementMoving
🟡 Comment PlacementThe comment "Reset retry count on successful exit" (line 575) was moved from above the match arm to inside it. This is fine, though having it above the arm was slightly more visible. Not a significant issue. ✅ Edge CasesThe logic correctly handles:
Questions for Consideration
Verdict✅ Approve - The logic is correct and fixes the described issue. The only suggestions are minor improvements to clarity:
The fix properly ensures actors don't get destroyed when their runner is going away, allowing them to be rescheduled according to the crash policy instead. |
d65d8db to
a78fb2c
Compare
Code ReviewSummaryThis PR fixes a critical bug where actors were being incorrectly destroyed when receiving a ✅ Positive Aspects
🔍 Code AnalysisMain Logic Change (lines 660-668, 668-695)Before: let failed = !matches!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });
match (input.crash_policy, failed) {
(CrashPolicy::Restart, true) => { /* reschedule */ }
(CrashPolicy::Sleep, true) => { /* sleep */ }
_ => { /* destroy */ }
}After: let graceful_exit = !state.going_away
&& matches!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });
match (input.crash_policy, graceful_exit) {
(CrashPolicy::Restart, false) => { /* reschedule */ }
(CrashPolicy::Sleep, false) => { /* sleep */ }
_ => { /* destroy */ }
}Analysis:
State Flag Ordering (lines 584-586 → 711-713)Before: state.gc_timeout_ts = None;
state.going_away = false; // ❌ Cleared too early
state.stopping = false;After: state.gc_timeout_ts = None;
state.stopping = false;
// ... rescheduling logic ...
state.going_away = false; // ✅ Cleared after all logicWhy this matters: The 🎯 Semantic CorrectnessThe fix aligns with the intended actor lifecycle:
Crash Policy Semantics:
The new code correctly implements these semantics. 🧹 Code Quality Notes
|
a78fb2c to
c20b8cd
Compare
815c918 to
986289f
Compare
Code ReviewSummaryThis PR fixes a critical bug where actors were being destroyed instead of rescheduled when they stopped during a runner's graceful shutdown ("going away" state). The fix correctly distinguishes between graceful exits that should trigger destruction versus non-graceful exits that should trigger rescheduling. Changes OverviewThe main logic change is in Before:
After:
Logic AnalysisThe inverted crash policy matching is correct: match (input.crash_policy, graceful_exit) {
(CrashPolicy::Restart, false) => { /* reschedule */ }
(CrashPolicy::Sleep, false) => { /* sleep */ }
_ => { /* destroy */ }
}This means:
Positive Aspects
Potential Concerns1. State Transition Safety
|
PR Review: fix(pb): stop actors from destroying on goingawaySummaryThis PR fixes a critical bug where actors were being destroyed when they received a Code Quality & Logic Changes ✅Main Logic Fix (actor/mod.rs:660-713)
State Management Timing (actor/mod.rs:586,713)
Correctness Analysis ✅The boolean logic inversion is handled correctly: Before: let failed = !matches!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });
match (input.crash_policy, failed) {
(CrashPolicy::Restart, true) => { /* restart */ }
(CrashPolicy::Sleep, true) => { /* sleep */ }
_ => { /* destroy */ }
}After: let graceful_exit = !state.going_away && matches!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });
match (input.crash_policy, graceful_exit) {
(CrashPolicy::Restart, false) => { /* restart */ }
(CrashPolicy::Sleep, false) => { /* sleep */ }
_ => { /* destroy */ }
}The match arms correctly flip Potential Issues & Concerns1. State Reset Ordering
|
c20b8cd to
a78fb2c
Compare
986289f to
815c918
Compare
Code ReviewSummaryThis PR fixes a critical bug where actors were being incorrectly destroyed when receiving a Key Changes Analysis1. Core Logic Fix in
|
PR Review: fix(pb): stop actors from destroying on goingawayI've reviewed this PR and have some observations and feedback: SummaryThis PR fixes a critical bug where actors were being incorrectly destroyed when receiving a Code Quality & Logic✅ Core Fix is SoundThe main logic change at Before: The After: The flag is cleared at the end (line 713), allowing the crash policy logic to correctly identify non-graceful exits and trigger rescheduling. The key improvement is in the let graceful_exit = !state.going_away
&& matches!(
variant,
StoppedVariant::Normal {
code: protocol::StopCode::Ok
}
);This correctly treats ✅ Crash Policy Logic InversionThe inversion of the crash policy matching (
This aligns with the expected behavior where actors should be rescheduled on crashes/forced stops, not destroyed. Potential Issues & Considerations
|
a78fb2c to
c20b8cd
Compare
Code ReviewSummaryThis PR fixes a critical bug where actors were being destroyed when they received a ✅ Positive Changes1. Correct Logic Fix (Critical) // Before: Considered any non-Ok exit as a failure
let failed = !matches!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });
match (input.crash_policy, failed) { /* ... */ }
// After: Only treat as non-graceful if going_away flag is set OR exit code is not Ok
let graceful_exit = !state.going_away && matches!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });
match (input.crash_policy, graceful_exit) { /* ... */ }This ensures that actors receiving a 2. Proper State Management 3. Code Formatting 🔍 Potential Issues & Questions1. Inverted Logic in Match Arms match (input.crash_policy, graceful_exit) {
(CrashPolicy::Restart, false) => { /* restart logic */ } // Was: (Restart, true)
(CrashPolicy::Sleep, false) => { /* sleep logic */ } // Was: (Sleep, true)
_ => { /* destroy */ }
}Question: Is this intentional? The logic now says:
This seems correct for the intended behavior (actors that exit gracefully during shutdown should be destroyed, not restarted), but it's a significant semantic change that should be verified. 2. Crash Policy Semantics Recommendation: Consider if this behavior should be documented or if crash policies should be renamed to reflect they only apply to actual crashes, not graceful shutdowns. 3. Missing Test Coverage Recommendation: Add tests covering these scenarios:
4. Comment Location 🎯 Verification Needed
📝 Minor IssuesImport Ordering 🔐 Security & Performance
📚 Recommendations
✅ Overall AssessmentThis is a good fix for a critical bug. The core logic change is sound and properly addresses the issue where actors were being destroyed incorrectly. The main concerns are around test coverage and verification that the inverted logic matches the intended semantics. Once confirmed through testing, this should be safe to merge. Approval Status: ✅ Approve with recommendation to add tests in a follow-up PR |
c20b8cd to
a78fb2c
Compare
986289f to
815c918
Compare
Code ReviewSummaryThis PR fixes actor lifecycle management by preventing actors from being destroyed when they're in a "going away" state. The key change inverts the crash policy logic to treat graceful exits differently from error cases. 🔍 Critical Logic ChangeLocation: The core change inverts how the crash policy is evaluated: Before: let failed = \!matches\!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });
match (input.crash_policy, failed) {
(CrashPolicy::Restart, true) => { /* restart on failure */ }
(CrashPolicy::Sleep, true) => { /* sleep on failure */ }
_ => { /* destroy */ }
}After: let graceful_exit = \!state.going_away && matches\!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });
match (input.crash_policy, graceful_exit) {
(CrashPolicy::Restart, false) => { /* restart on non-graceful exit */ }
(CrashPolicy::Sleep, false) => { /* sleep on non-graceful exit */ }
_ => { /* destroy on graceful exit */ }
}Analysis:
Potential Issue: The logic inverts the boolean (matching on let should_reschedule = state.going_away || \!matches\!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });
match (input.crash_policy, should_reschedule) {
(CrashPolicy::Restart, true) => { /* restart */ }
(CrashPolicy::Sleep, true) => { /* sleep */ }
_ => { /* destroy */ }
}🐛 State Management IssueLocation: The PR moves the Before: Analysis:
📝 Formatting ChangesLocations:
Issues:
Recommendation: Verify these changes align with project conventions. If they were introduced by an IDE or manual edit, consider reverting the pure whitespace changes to keep the PR focused on the logic fix. ✅ Positive Aspects
🔒 Security Considerations✅ No security concerns identified ⚡ Performance Considerations✅ No performance impact - this is purely logical flow control 🧪 Test CoverageConcern: No test changes are included in this PR. Recommendations:
📋 Recommendations
✅ Approval StatusOverall: The core logic fix is correct and addresses a real bug. However, test coverage would strengthen confidence in this change. Suggested next steps:
|
Code ReviewSummaryThis PR fixes an important issue where actors were being incorrectly destroyed when receiving a Key Changes Analysis1. Critical Fix: Delayed
|
a78fb2c to
c20b8cd
Compare
815c918 to
986289f
Compare
Code ReviewThanks for this PR! I've reviewed the changes and have the following feedback: SummaryThis PR fixes an issue where actors were being destroyed when receiving a ✅ Code Quality & Best PracticesGood:
Observations:
🐛 Potential IssuesCritical Logic Concern: // OLD: match (input.crash_policy, failed)
// NEW: match (input.crash_policy, graceful_exit)
match (input.crash_policy, graceful_exit) {
(CrashPolicy::Restart, false) => { /* reschedule */ }
(CrashPolicy::Sleep, false) => { /* sleep */ }
_ => { /* destroy */ }
}This is a significant semantic change:
The new
Question: Is this the intended behavior? Specifically:
This seems correct for the GOINGAWAY case, but I want to confirm this doesn't inadvertently change behavior for actors that exit gracefully with 🔒 Security ConsiderationsNo security concerns identified. The changes are internal state management logic. ⚡ Performance ConsiderationsNo performance impact. The changes are pure logic fixes with no algorithmic changes. 🧪 Test CoverageConcern: No test coverage added for this bug fix. Recommendation: Consider adding a test case that:
This would prevent regression and document the expected behavior. 📝 Minor Observations
❓ Questions for Author
Overall AssessmentThe fix appears to address the stated issue (actors destroying on GOINGAWAY), but the semantic inversion of the crash policy logic needs verification to ensure it doesn't introduce unintended behavior changes for other actor lifecycle scenarios. Recommendation: ✅ Approve with verification - Please confirm the crash policy logic change is fully intentional and tested. |
Code Review - PR #3501: fix(pb): stop actors from destroying on goingawaySummaryThis PR fixes a bug where actors were being destroyed when they received a ✅ Positive Aspects
🔍 Logic AnalysisBefore (buggy): let failed = !matches!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });
match (input.crash_policy, failed) {
(CrashPolicy::Restart, true) => { /* reschedule */ }
(CrashPolicy::Sleep, true) => { /* sleep */ }
_ => { /* destroy */ }
}
After (fixed): let graceful_exit = !state.going_away && matches!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });
match (input.crash_policy, graceful_exit) {
(CrashPolicy::Restart, false) => { /* reschedule */ }
(CrashPolicy::Sleep, false) => { /* sleep */ }
_ => { /* destroy */ }
}
🤔 Questions & Considerations1. Edge Case:
|
c20b8cd to
a78fb2c
Compare
986289f to
815c918
Compare
Code ReviewOverviewThis PR fixes a critical bug where actors were incorrectly destroyed when receiving a ✅ Main Logic Fix (actor/mod.rs:660-713)The Problem: The Solution:
// Before
let failed = !matches!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });
match (input.crash_policy, failed) {
(CrashPolicy::Restart, true) => { /* reschedule */ }
(CrashPolicy::Sleep, true) => { /* sleep */ }
_ => { /* destroy */ }
}
// After
let graceful_exit = !state.going_away
&& matches!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });
match (input.crash_policy, graceful_exit) {
(CrashPolicy::Restart, false) => { /* reschedule */ }
(CrashPolicy::Sleep, false) => { /* sleep */ }
_ => { /* destroy */ }
}Analysis:
✅ Critical State Management Fix (actor/mod.rs:713)Before: Why this matters: The 🎯 Correctness VerificationThe fix correctly implements these scenarios:
📝 Code Style & FormattingAll formatting changes align with CLAUDE.md conventions:
|
a78fb2c to
c20b8cd
Compare
815c918 to
986289f
Compare
Code ReviewSummaryThis PR fixes an issue where actors were being destroyed when they received a Code Quality & Logic ✅Primary Fix (actor/mod.rs:660-713) The core logic change is sound: Before: let failed = !matches!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });
match (input.crash_policy, failed) {
(CrashPolicy::Restart, true) => { /* reschedule */ }
(CrashPolicy::Sleep, true) => { /* sleep */ }
_ => { /* destroy */ }
}After: let graceful_exit = !state.going_away
&& matches!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });
match (input.crash_policy, graceful_exit) {
(CrashPolicy::Restart, false) => { /* reschedule */ }
(CrashPolicy::Sleep, false) => { /* sleep */ }
_ => { /* destroy */ }
}This correctly identifies that:
State Management (actor/mod.rs:713) Moving
The original placement at line 586 was premature - it cleared the flag before using it in the rescheduling logic. Formatting Changes ✅The formatting changes in
These follow the codebase conventions and improve consistency. Potential Concerns1. State Transition Safety
2. Comment Relocation (actor/mod.rs:575) 3. Generation Checking Testing Recommendations 🧪Consider adding test coverage for:
Example test structure: #[test]
async fn test_going_away_respects_restart_policy() {
// Setup actor with CrashPolicy::Restart
// Send GoingAway signal
// Stop actor with StopCode::Ok
// Assert: actor was rescheduled, not destroyed
// Assert: going_away flag is cleared
}Performance Considerations ⚡No performance concerns. The change:
Security Considerations 🔒No security concerns. The change:
Documentation 📝Consider adding a code comment explaining the // Only treat as graceful exit if:
// 1. Actor stopped with StopCode::Ok, AND
// 2. Not marked as going_away (runner shutdown/eviction)
let graceful_exit = !state.going_away
&& matches!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });Overall Assessment ✨Verdict: LGTM with minor suggestions The fix correctly addresses the issue where actors were being destroyed instead of rescheduled during Strengths:
Suggestions:
Generated with Claude Code |
Code ReviewSummaryThis PR fixes a critical bug where actors were being incorrectly destroyed when receiving a Core Logic Change (packages/pegboard/src/workflows/actor/mod.rs)The Bug:
The Fix:
Correctness:
This matches the comment on line 575: "Reset retry count on successful exit" - only truly successful exits should reset the counter. Code QualityStrengths:
Formatting Changes:
Note: Per CLAUDE.md, formatting should not be done manually - these changes appear to be from Potential Issues1. Test Coverage (Critical)
Recommendation: Add test coverage for: // Test that actors with crash_policy=Restart reschedule after GoingAway+Stop
// Test that actors with crash_policy=Sleep sleep after GoingAway+Stop
// Test that truly graceful exits (no GoingAway) are destroyed2. Edge Case: Concurrent Signals 🤔
The 3. Documentation 📝
Security & Performance✅ No security concerns identified VerdictApprove with recommendations:
Great catch on a subtle state management bug! The fix is minimal and correct. |
Merge activity
|

No description provided.