Skip to content

feat: implement race condition handling for task dequeue and status updates#237

Open
ronenkapelian wants to merge 7 commits intomasterfrom
fix/dequeue/bug
Open

feat: implement race condition handling for task dequeue and status updates#237
ronenkapelian wants to merge 7 commits intomasterfrom
fix/dequeue/bug

Conversation

@ronenkapelian
Copy link
Collaborator

Question Answer
Bug fix
New feature
Breaking change
Deprecations
Documentation
Tests added
Chore

Related issues:

Further information:
Enhance the system to handle race conditions during task dequeue and status updates. Introduce appropriate error handling and response codes to manage conflicts when multiple workers attempt to modify the same task or stage simultaneously. This includes adding timeouts for transactions and updating the API documentation to reflect these changes.

@ronenkapelian ronenkapelian self-assigned this Feb 19, 2026
@github-actions
Copy link

github-actions bot commented Feb 19, 2026

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 100% (🎯 80%) 766 / 766
🔵 Statements 100% (🎯 80%) 783 / 783
🔵 Functions 100% (🎯 80%) 110 / 110
🔵 Branches 100% (🎯 80%) 213 / 213
File Coverage
File Stmts Branches Functions Lines Uncovered Lines
Changed Files
src/api/v1/tasks/controller.ts 100% 100% 100% 100%
src/stages/models/manager.ts 100% 100% 100% 100%
src/tasks/DAL/taskRepository.ts 100% 100% 100% 100%
src/tasks/models/manager.ts 100% 100% 100% 100%
Generated in workflow #713 for commit 09e6719 by the Vitest Coverage Report Action

Comment on lines +46 to +51
// Note: $queryRaw returns raw database values, not Prisma-mapped values
// We need to re-fetch the task using Prisma to get properly mapped enum values
const rawTask = tasks[0]!;
const task = await tx.task.findUnique({
where: { id: rawTask.id },
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its bad practice to query again as it increases the load on the database. prisma recommends using TypedSql in their docs.
https://www.prisma.io/docs/orm/prisma-client/using-raw-sql/typedsql

async (newTx) => {
await this.executeUpdateStatus(jobId, status, newTx);
},
{ timeout: TX_TIMEOUT_MS }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to global timeout for transactions (pretty sure its a thing)

// This prevents errors during race conditions where multiple workers
// try to set the same status (e.g., multiple tasks setting stage to IN_PROGRESS)
/* v8 ignore next 4 -- @preserve */
if (stage.status === status) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know its already exists, but maybe a name like newStatus/wantedStatus would be better?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 17 to 19
* Uses SELECT FOR UPDATE SKIP LOCKED for pessimistic locking:
* - FOR UPDATE: Locks the row so other transactions wait
* - SKIP LOCKED: Skip rows that are already locked (instead of waiting)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is too digging

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 512 to 537
* Always includes the current status to implement optimistic locking.
* This ensures updates only succeed if the task is still in the expected state.
*
* **Why this is necessary:**
* In high-concurrency scenarios with multiple workers, race conditions can occur:
*
* Scenario 1: Concurrent dequeue operations
* - Worker A and B both read Task1 as PENDING
* - Worker A updates: WHERE id=X AND status=PENDING → IN_PROGRESS (succeeds)
* - Worker B updates: WHERE id=X AND status=PENDING → IN_PROGRESS (fails - optimistic lock)
*
* Scenario 2: Dequeue during update
* - Task1 is PENDING
* - Worker A calls updateStatus(Task1, COMPLETED) - reads task as PENDING
* - Worker B calls dequeue() - reads Task1 as PENDING
* - Worker B commits: WHERE id=X AND status=PENDING → IN_PROGRESS (succeeds)
* - Worker A commits: WHERE id=X AND status=PENDING → COMPLETED (fails - status is now IN_PROGRESS)
*
* Scenario 3: Double completion
* - Task1 is IN_PROGRESS
* - Worker A and B both try to update to COMPLETED
* - Worker A updates: WHERE id=X AND status=IN_PROGRESS → COMPLETED (succeeds)
* - Worker B updates: WHERE id=X AND status=IN_PROGRESS → COMPLETED (fails - status is now COMPLETED)
*
* Without status check, these scenarios would succeed silently, causing data inconsistency.
* With status check (optimistic locking), the second update fails with TASK_STATUS_UPDATE_FAILED.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

digging

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ronenkapelian and others added 5 commits February 24, 2026 11:22
Co-authored-by: Ofer <12687466+CptSchnitz@users.noreply.github.com>
Co-authored-by: Ofer <12687466+CptSchnitz@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants