Skip to content

Conversation

@MasterPtato
Copy link
Contributor

No description provided.

@vercel
Copy link

vercel bot commented Nov 18, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
rivetkit-serverless Ready Ready Preview Comment Nov 20, 2025 3:08am
3 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
rivet-cloud Ignored Ignored Nov 20, 2025 3:08am
rivet-inspector Ignored Ignored Preview Nov 20, 2025 3:08am
rivet-site Ignored Ignored Preview Nov 20, 2025 3:08am

Copy link
Contributor Author

MasterPtato commented Nov 18, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@pkg-pr-new
Copy link

pkg-pr-new bot commented Nov 18, 2025

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@3482

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@3482

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@3482

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@3482

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@3482

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@3482

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@3482

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@3482

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@3482

commit: 1a4f15a

@claude
Copy link

claude bot commented Nov 18, 2025

PR Review: fix(gas): fix loop forgotten bug due to concurrency

Summary

This PR fixes a critical concurrency bug in the gasoline workflow engine where loop event history was being incorrectly cleared, potentially losing events from concurrent iterations.


Core Fix Analysis (engine/packages/gasoline/src/db/kv/mod.rs)

The Problem:
The original code used clear_subspace_range(&loop_events_subspace) which cleared all events under the loop location, regardless of iteration. In concurrent scenarios where multiple loop iterations could be in flight, this could inadvertently delete events from the current or future iterations.

The Solution:
The fix correctly bounds the range to only clear events from iterations 0 through iteration - 1:

  • Start: {loop location, 0, ...}
  • End: {loop location, iteration - 1, ...}

This ensures only past iterations are moved to forgotten history, preserving the current iteration's events.

Code Quality: ✅ Excellent

  • Clear variable names (loop_events_subspace_start, loop_events_subspace_end)
  • Helpful comments explaining the range bounds
  • Uses saturating_sub(1) to safely handle iteration 0 edge case
  • Matches the same pattern used for cleaning old forgotten events (lines 2638-2655)

Other Changes Review

1. pegboard-runner/src/conn.rs - Workflow migration

-workflow(pegboard::workflows::runner::Input {
+workflow(pegboard::workflows::runner2::Input {

Concern: ⚠️ This appears to be switching from runner to runner2 workflow.

Questions:

  • Is this migration intentional or accidentally included?
  • Are there backward compatibility implications?
  • Should there be a deprecation path for the old workflow?

2. pegboard/src/workflows/runner2.rs - Init signal handling

-if init.is_none() {
-    init = Some(init_sig);
-    check_queue = true;
-}
+init = Some(init_sig);
+check_queue = true;

Concern: ⚠️ This removes the guard that prevented overwriting an existing init value.

Potential Issue:

  • If ToServerInit can be received multiple times, this will now blindly overwrite previous init data
  • Could this cause race conditions or lost initialization state?

Recommendation: If this change is intentional (e.g., supporting re-initialization), add a comment explaining why. Otherwise, consider keeping the guard or at least logging when init is being overwritten.

3. Cargo.toml - Commented release profile options

+# strip = true
+# panic = "abort"
+# overflow-checks = false
+# debug-assertions = false

Issue: ❌ These commented-out options should not be in the PR

Recommendation: Remove these lines. They appear to be experimental/debugging configurations that were accidentally included.

4. scripts/docker/build-push.sh - Commented build arg

+# --build-arg CARGO_BUILD_MODE=release \

Issue: ❌ Commented code in scripts

Recommendation: Remove if not needed, or uncomment if intended to be active.

5. scripts/run/docker/engine-rocksdb.sh - Trailing newline

Issue: ⚠️ Minor - removes trailing newline

This is inconsequential but typically files should end with a newline per POSIX standards.


Security Considerations

✅ No security concerns identified. The fix actually improves data integrity by preventing unintended data loss.


Performance Considerations

Improved Performance: The bounded range query is more efficient than the original unbounded subspace range, as it scans fewer keys.


Test Coverage

⚠️ Missing Tests

Concerns:

  1. No test coverage for the concurrent loop iteration scenario
  2. Existing loop_test.rs only tests basic sequential loop functionality
  3. The bug being fixed (concurrent iteration event deletion) is not covered by any test

Recommendations:

  1. Add a test that:
    • Runs a loop with slow iterations
    • Triggers concurrent signals/events during loop execution
    • Verifies that events from all iterations are preserved correctly
  2. Add a test specifically for the forgotten history bounds to ensure iteration N-1 events are moved but iteration N events remain

Example test structure:

#[workflow(ConcurrentLoopTestWorkflow)]
pub async fn concurrent_loop_test(ctx: &mut WorkflowCtx, input: &Input) -> Result<()> {
    ctx.loope(0, move |ctx, state| {
        async move {
            // Trigger some events during iteration
            ctx.activity(...).await?;
            
            if *state >= 5 {
                return Ok(Loop::Break(()));
            }
            *state += 1;
            Ok(Loop::Continue)
        }.boxed()
    }).await?;
    
    // Verify all iteration events are in forgotten history
    Ok(())
}

Summary of Recommendations

Must Fix:

  1. ❌ Remove commented Cargo.toml release profile options
  2. ❌ Remove or uncomment the Docker build script changes

Should Address:

  1. ⚠️ Clarify the runner → runner2 migration intent
  2. ⚠️ Review the init signal handling change - document or revert
  3. ⚠️ Add test coverage for concurrent loop scenarios

Nice to Have:

  • Add trailing newline to engine-rocksdb.sh

Overall Assessment

The core bug fix is excellent and addresses a real concurrency issue with a correct, performant solution. However, the PR includes several unrelated or questionable changes that should be addressed before merging.

Recommendation: Request changes to clean up the non-essential modifications and clarify the intentional behavior changes in runner2.rs.

@claude
Copy link

claude bot commented Nov 19, 2025

Code Review

Summary

This PR fixes a critical concurrency bug in the workflow loop history management where events from all loop iterations were being moved to forgotten history, not just events from completed iterations. The fix correctly bounds the range query to only include events up to the current iteration.


✅ Positives

  1. Critical Bug Fix: The main change correctly addresses the concurrency issue by:

    • Changing from a full subspace query to a bounded range query
    • Using iteration.saturating_sub(1) as the upper bound to exclude the current iteration
    • Properly clearing only the bounded range instead of the entire subspace
  2. Clear Documentation: The comments clearly explain the intent ("Move all events under this loop up to the current iteration to the forgotten history")

  3. Safe Arithmetic: Using saturating_sub(1) prevents underflow if iteration is 0

  4. Consistent Signal Handling: The runner2 workflow fix allows re-initialization signals to be processed correctly by removing the guard condition


🔍 Issues & Concerns

Critical: Missing Key Deletion (engine/packages/gasoline/src/db/kv/mod.rs:2631)

The current code moves entries to forgotten history but doesn't delete the original keys:

tx.set(&forgotten_key, entry.value());
// Missing: tx.clear(entry.key());

Impact: This will leave duplicate data in the active history, potentially causing:

  • Incorrect workflow replay behavior
  • Wasted storage space
  • Query performance degradation

Recommendation: Add tx.clear(entry.key()); after line 2631.


Medium: Potential Off-by-One Issue (engine/packages/gasoline/src/db/kv/mod.rs:2598)

Using iteration.saturating_sub(1) as the exclusive upper bound means events from iteration N-1 will be included in the range. This may be correct, but needs verification:

  • If the current iteration is 5, the range will include iterations 0-4
  • Is this the intended behavior, or should it exclude iteration 4 (the immediately previous iteration)?

Recommendation: Verify with the workflow loop semantics whether the previous iteration's events should be moved to forgotten history or remain active.


Low: Commented Out Production Code (Cargo.toml:437-440, scripts/docker/build-push.sh:29)

Multiple production configuration options are commented out:

# strip = true
# panic = "abort"
# overflow-checks = false
# debug-assertions = false
# --build-arg CARGO_BUILD_MODE=release \

Impact:

  • These look like debugging/development changes that shouldn't be committed
  • May indicate the PR includes unintended changes
  • Could confuse future developers about why these are commented

Recommendation: Remove these commented lines or document why they're included.


Low: Unnecessary File Change (scripts/run/docker/engine-rocksdb.sh:16)

The removal of a trailing newline appears unrelated to the bug fix.

Recommendation: Revert this cosmetic change to keep the PR focused.


🧪 Testing Concerns

No Test Coverage Evident: The PR doesn't appear to include:

  • Unit tests for the bounded range query logic
  • Integration tests verifying the loop iteration boundary behavior
  • Tests ensuring events from the current iteration remain in active history

Recommendation: Add tests that verify:

  1. Events from completed iterations are moved to forgotten history
  2. Events from the current iteration remain in active history
  3. The edge case where iteration = 0 is handled correctly
  4. Concurrent workflow loops don't interfere with each other

🔒 Security Considerations

No security vulnerabilities identified. The use of saturating_sub prevents integer underflow.


⚡ Performance Considerations

Positive: The bounded range query is more efficient than querying the entire subspace, especially for long-running loops with many iterations.

Note: The loop that moves entries to forgotten history processes them sequentially, which is appropriate for maintaining consistency.


📝 Style & Conventions

According to CLAUDE.md:

✅ Uses structured logging with tracing
✅ Comments are lowercase
✅ Uses anyhow Result types correctly
✅ Follows existing code patterns


🎯 Recommendations Summary

  1. MUST FIX: Add tx.clear(entry.key()); after moving entries to forgotten history
  2. SHOULD VERIFY: Confirm the off-by-one semantics for iteration boundaries
  3. SHOULD REMOVE: Commented out code in Cargo.toml and build scripts
  4. SHOULD ADD: Test coverage for the loop iteration boundary logic

Overall Assessment

The core bug fix is sound and addresses a real concurrency issue, but the missing key deletion is a critical oversight that must be addressed before merging. Once that's fixed and the commented code is cleaned up, this will be a solid improvement to the workflow history management system.

@claude
Copy link

claude bot commented Nov 19, 2025

PR Review: Fix loop forgotten bug due to concurrency

Summary

This PR addresses a critical bug in the workflow engine's loop event history management. The fix ensures that when moving loop events to forgotten history, only events from iterations 0 through (current iteration - 1) are moved, preventing concurrent loop iterations from incorrectly archiving each other's events.

Code Quality & Best Practices ✅

Main Fix (gasoline/src/db/kv/mod.rs:2579-2634)

Good:

  • The fix correctly calculates precise range boundaries using saturating_sub(1) to prevent underflow
  • Clear, descriptive comments explain the intent ("Start is {loop location, 0, ...}" and "End is {loop location, iteration - 1, ...}")
  • Properly uses .range().0 and .range().1 to extract start/end boundaries from the subspace

Observation:

  • The use of saturating_sub(1) on line 2598 is defensive programming for iteration == 0, though logically this function shouldn't be called with iteration == 0. This is good defensive coding.

Workflow Fix (pegboard/src/workflows/runner2.rs:103-108)

Good:

  • Simplifies the initialization logic by removing the unnecessary if init.is_none() guard
  • The change allows re-initialization, which makes sense for handling reconnections or state recovery

Potential Issue:
The removal of the if init.is_none() check means init can be overwritten multiple times if multiple ToServerInit signals arrive in the same batch.

Question: Is this the intended behavior? Consider these scenarios:

  1. If multiple init packets arrive (network retransmission, reconnection), should we always take the latest one?
  2. Could this mask bugs where multiple init packets indicate an unexpected state?

Suggestion: Consider either:

  • Adding a warning log when overwriting an existing init: if init.is_some() { tracing::warn!("overwriting existing init signal"); }
  • Or documenting why multiple inits are expected and safe

Potential Bugs & Issues

1. Edge Case: iteration == 0 ⚠️

When iteration == 0, we have:

  • loop_events_subspace_start: points to iteration 0
  • loop_events_subspace_end: points to iteration 0 (via saturating_sub(1))

This creates a range where start >= end. Need to verify that clear_range handles this gracefully (likely a no-op, which is correct).

2. Missing Entry Deletion ❓

On line 2631, entries are copied to forgotten_history_subspace but the original entries are only deleted via the clear_range at line 2634. Verify that if an error occurs between copying (2631) and clearing (2634), we don't leave duplicate entries or partial state.

Recommendation: Consider transaction semantics - ensure this entire operation is atomic.

Performance Considerations

Range Query Optimization ✅

The fix significantly improves performance by limiting the range query to only the relevant iterations instead of querying all events under the loop location. This prevents O(total iterations) work on each loop update.

Stream Processing ✅

The existing streaming approach (get_ranges_keyvalues with StreamingMode::WantAll) is appropriate for handling potentially large event sets.

Security Concerns

No Major Issues ✅

The changes don't introduce security vulnerabilities. The range calculation is safe from integer overflow due to saturating_sub.

Test Coverage

Missing Tests ⚠️

This PR doesn't include tests. Given the bug was described as "forgotten due to concurrency", recommend adding:

  1. Concurrency test: Multiple concurrent loop iterations to verify isolation
  2. Boundary test: Loop with iteration == 0, iteration == 1
  3. Forgotten history test: Verify events are moved correctly and old events (> 100 iterations) are pruned

Example test scenario:

// Pseudo-code
#[test]
async fn test_concurrent_loop_iterations_dont_interfere() {
    // Create workflow with loop at iteration 5
    // Concurrently:
    //   - Thread A updates loop to iteration 6
    //   - Thread B updates loop to iteration 6
    // Verify: Events from iterations 0-5 are in forgotten
    // Verify: Events from iteration 6 are in active
    // Verify: No events are lost or duplicated
}

Additional Changes Review

Cargo.toml

The commented-out release optimizations appear to be development/debugging leftovers:

# strip = true
# panic = "abort"
# overflow-checks = false
# debug-assertions = false

Recommendation: Remove these commented lines or explain why they're being kept for future reference.

scripts/docker/build-push.sh

Similarly, commented build arg:

# --build-arg CARGO_BUILD_MODE=release \

Recommendation: Clean up commented code or document the intent.

scripts/run/docker/engine-rocksdb.sh

Removed trailing newline - this is fine, just a formatting cleanup.

Overall Assessment

Status:Approve with minor recommendations

The core fix correctly addresses the concurrency bug by precisely limiting the range of events moved to forgotten history. The logic is sound and the implementation properly uses the database range primitives.

Required before merge:

  • Add tests for concurrent loop iterations
  • Verify behavior when iteration == 0
  • Clean up commented code in Cargo.toml and build scripts

Recommended:

  • Add logging when init is overwritten in runner2.rs
  • Document expected behavior for multiple ToServerInit signals

Great work identifying and fixing this subtle concurrency issue!

@MasterPtato MasterPtato force-pushed the 11-17-fix_pb_rewrite_runner_wf_to_handle_batch_signals branch from e68ccc5 to 9a87fd1 Compare November 20, 2025 03:07
@MasterPtato MasterPtato force-pushed the 11-18-fix_gas_fix_loop_forgotten_bug_due_to_concurrency branch from 4294775 to 5458e50 Compare November 20, 2025 03:07
@claude
Copy link

claude bot commented Nov 20, 2025

Code Review

Summary

This PR fixes a critical bug in the workflow loop event history management where events from all iterations (including future ones) were being moved to forgotten history instead of only past iterations. The fix also includes some related cleanup in the pegboard runner workflow.


Critical Bug Fix ✅

Location: engine/packages/gasoline/src/db/kv/mod.rs:2579-2634

The Problem:
The original code used EventHistorySubspaceKey::entire() which creates a range that includes all iterations of a loop (0 to infinity). This meant that when moving events to forgotten history, it would incorrectly include events from the current and future iterations due to concurrent workflow execution.

The Solution:
The fix correctly bounds the range to only include iterations 0 through iteration - 1:

  • Start: {loop location, 0, ...} (iteration 0)
  • End: {loop location, iteration - 1, ...} (one before current)

This ensures only past iterations are moved to forgotten history, which is the correct behavior.

Code Quality: ⭐⭐⭐⭐⭐

  • Uses saturating_sub(1) to safely handle edge case when iteration = 0
  • Clear, descriptive comments explain the range boundaries
  • Proper use of .range() tuple extraction for start/end bounds

Secondary Changes

1. Pegboard Runner Init Logic (runner2.rs:103-108)

Change: Removed the if init.is_none() guard when handling ToServerInit signals.

Analysis:

  • Before: Only accepted the first init signal
  • After: Accepts and overwrites with any new init signal

Potential Concern: ⚠️
Is it expected/safe for a runner to re-initialize multiple times? If this is intentional (e.g., for reconnection scenarios), this is fine. However, if ToServerInit should only be sent once per runner lifecycle, this could mask bugs where duplicate init signals are sent.

Recommendation: Consider adding a log message when overwriting an existing init:

if init.is_some() {
    tracing::warn!("received duplicate ToServerInit signal");
}
init = Some(init_sig);
check_queue = true;

2. Workflow Reference Update (conn.rs:128)

Change: pegboard::workflows::runner::Inputpegboard::workflows::runner2::Input

Analysis: This appears to be migrating to a new runner2 workflow implementation. Straightforward refactor.

Question: Is the old runner workflow being phased out? If so, consider:

  • Adding a deprecation comment to the old workflow
  • Ensuring all callsites are updated in this or future PRs

3. Build Script Changes (scripts/docker/build-push.sh:26-30)

Changes:

  • Multi-line formatting for readability
  • Commented-out --build-arg CARGO_BUILD_MODE=release

Analysis: The commented-out release mode arg suggests this might be work-in-progress or debugging configuration. The formatting improvement is good.


Performance Considerations

Range Query Optimization:
The fix changes from a subspace range query to an explicit bounded range query. This is more efficient because:

  • Avoids scanning events from current/future iterations
  • Database can better optimize with explicit bounds
  • Reduces unnecessary data transfer and processing

Database Operations:

  • tx.clear_range() instead of tx.clear_subspace_range() - both are appropriate, but the explicit range version is more precise given the bounded range

Security Considerations

No security concerns identified. The changes are internal workflow state management operations.


Test Coverage

Recommendation: ⚠️
This bug involved concurrency issues (events from concurrent iterations being incorrectly processed). Consider adding:

  1. Unit test that verifies the range bounds:

    • Create a workflow with loop iterations 0, 1, 2, 3
    • Trigger forgotten history migration at iteration 3
    • Verify only iterations 0-2 are moved to forgotten history
    • Verify iteration 3 remains in active history
  2. Integration test for concurrent execution:

    • Simulate concurrent loop iterations
    • Verify forgotten history is correctly isolated per iteration

Additional Issues Found

Cargo.toml (Cargo.toml:436-440)

Issue: Commented-out release profile configurations:

# strip = true
# panic = "abort"
# overflow-checks = false
# debug-assertions = false

Recommendation:
These should either be:

  1. Removed entirely if not needed
  2. Uncommented and committed if they're intended configuration
  3. Moved to a separate .cargo/config.toml if they're developer-specific

Leaving commented-out configuration in version control creates confusion about intended settings. According to CLAUDE.md, commented code/config should be avoided.

Script Newline Issue (scripts/run/docker/engine-rocksdb.sh:15)

Issue: Removed trailing newline at EOF

Recommendation: Minor - POSIX compliance suggests files should end with a newline. This is a trivial change but consider preserving it.


Overall Assessment

Code Quality: ⭐⭐⭐⭐ (4/5)
Bug Fix Correctness: ⭐⭐⭐⭐⭐ (5/5)
Test Coverage: ⭐⭐⭐ (3/5 - needs concurrent test)

Approval Status: ✅ APPROVE (with minor suggestions)

The core bug fix is excellent and addresses a real concurrency issue correctly. The secondary changes are reasonable but could use clarification on the init signal handling behavior. Clean up the commented-out Cargo.toml changes before merging.


Action Items Before Merge

  1. Must Fix:

    • Remove or uncomment the Cargo.toml release profile settings
  2. Should Consider:

    • Add logging for duplicate ToServerInit signals
    • Add test coverage for the range boundary fix
    • Clarify the commented-out Docker build arg
  3. Nice to Have:

    • Document the migration from runner to runner2 workflow

@claude
Copy link

claude bot commented Nov 20, 2025

PR Review: Fix loop forgotten bug due to concurrency

Overview

This PR fixes a critical bug in the workflow engine's loop history management where concurrent operations could cause events from ALL iterations of a loop to be moved to forgotten history, rather than just the events from completed iterations up to the current one.

Code Quality & Implementation

Main Fix (gasoline/src/db/kv/mod.rs)

The core fix correctly addresses the concurrency issue by:

  1. Precise Range Queries: Changed from using a subspace (which would match ALL iterations) to explicit start/end ranges that only target iterations 0 through iteration - 1.

    • Start: {loop location, 0, ...}
    • End: {loop location, iteration - 1, ...}
  2. Proper Range Clear: Uses clear_range() with explicit bounds instead of clear_subspace_range(), ensuring only the intended range is cleared.

  3. Edge Case Handling: Uses saturating_sub(1) to safely handle iteration = 0 case without underflow.

The comments are also improved, making the intent clearer ("up to the current iteration" instead of just "current events").

Potential Issues & Concerns

1. Minor: Iteration = 0 Edge Case

When iteration = 0, the end range becomes:

iteration.saturating_sub(1) // = 0 when iteration is 0

This means the range would be {location, 0} ... {location, 0}. Depending on how range() works, this might:

  • Return an empty range (correct behavior)
  • Include iteration 0 (incorrect - would try to forget the current iteration's events)

Recommendation: Add a guard condition:

if iteration > 0 {
    // Only move events to forgotten if we have completed iterations
    let loop_events_subspace_start = ...
    // ... rest of the code
}

2. Configuration Changes (Cargo.toml)

The commented-out release profile optimizations are concerning:

# strip = true
# panic = "abort"
# overflow-checks = false
# debug-assertions = false
  • These should either be removed entirely or have a clear comment explaining why they're included but commented
  • Leaving them suggests they might have been used for debugging this issue
  • overflow-checks = false is particularly dangerous given the .saturating_sub(1) usage in the fix

Recommendation: Remove these lines or add a comment explaining their purpose.

3. Unrelated Changes

Several changes appear unrelated to the loop forgotten bug:

  • pegboard-runner/src/conn.rs: Changed from runner::Input to runner2::Input
  • pegboard/src/workflows/runner2.rs: Removed conditional check for init.is_none(), now always replaces init
  • scripts/docker/build-push.sh: Added commented CARGO_BUILD_MODE argument
  • scripts/run/docker/engine-rocksdb.sh: Removed trailing newline

Recommendation: Consider splitting these changes into separate commits or PRs for clearer history. If they're related to reproducing/testing the bug, please document that relationship.

Performance Considerations

Good: The fix is more efficient than the original - it uses precise range bounds instead of iterating over all loop events.

Security Concerns

⚠️ Medium Priority: The iteration handling needs careful review:

  • Integer underflow is handled with saturating_sub
  • However, the behavior at iteration = 0 should be explicitly tested
  • Consider adding debug assertions or tracing to validate the range is correct

Test Coverage

Missing: No test changes are included in this PR.

Recommendations:

  1. Add a test that creates a loop with multiple iterations and verifies:

    • Only completed iterations' events are moved to forgotten history
    • Current iteration's events remain in active history
    • Concurrent updates don't cause data loss
  2. Add a test for the iteration = 0 edge case

  3. Test the "last 100 events" retention logic (lines 2636-2656)

Additional Observations

Logging: Consider adding structured logging around the forgotten history logic:

tracing::debug!(
    ?from_workflow_id,
    ?location, 
    iteration,
    "moving loop events to forgotten history"
);

This would help diagnose similar issues in the future.

Summary

Strengths:

  • Correctly identifies and fixes the core concurrency bug
  • More efficient implementation
  • Better comments

Action Items:

  1. 🔴 Required: Verify iteration = 0 behavior or add guard condition
  2. 🔴 Required: Add test coverage for this bug fix
  3. 🟡 Recommended: Remove or document commented Cargo.toml changes
  4. 🟡 Recommended: Separate unrelated changes into different commits/PRs
  5. 🟡 Recommended: Add debug logging for troubleshooting

Overall, this is a good fix for a critical bug, but needs test coverage before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants