Skip to content

fix: deadlock between Task.Size and redrawProgress under parallel install#564

Merged
joshfriend merged 2 commits intocashapp:masterfrom
mcharles-square:fix/task-size-deadlock
Apr 21, 2026
Merged

fix: deadlock between Task.Size and redrawProgress under parallel install#564
joshfriend merged 2 commits intocashapp:masterfrom
mcharles-square:fix/task-size-deadlock

Conversation

@mcharles-square
Copy link
Copy Markdown
Contributor

Summary

#558 parallelised hermit install, which lets many goroutines drive per-package download progress concurrently. cache/http.go calls task.Size(total) followed by task.Add(n) for each chunk received; with a single active task the cycle below could never close, but with many it closes in two hops.

Lock orderings:

Task.Size:           task.lock -> ui.lock        (via UI.swapSize)
UI.redrawProgress:   ui.lock   -> task_i.lock    (via liveOperations / op.status)

Goroutine A inside task_A.Size holds task_A.lock waiting on ui.lock. Goroutine B, having just returned from its own task.Add's critical section, enters redrawProgress holding ui.lock and iterates operations waiting on task_A.lock. Deadlock is silent because the UI itself is blocked, so no progress output is produced before the job is killed — matches the wild symptom of hermit install stalling with zero output after v0.52.0 on CI jobs that install many packages at once.

Fix

Release task.lock in Task.Size before calling UI.swapSize. The oldSize → newSize delta passed to swapSize is additive and commutative, so doing the swap outside the task's critical section loses nothing.

Test

ui/task_test.go::TestConcurrentTaskSizeAndAddNoDeadlock fans out goroutines each interleaving Size/Add on their own task (mirroring cache/http.go's download loop).

  • Pre-fix: hangs for the test's 10s self-deadline and fails with an annotated goroutine dump showing the expected ui.locktask.lock cycle.
  • Post-fix: completes in ~0s.

Passes go test -race ./ui/ and existing go test ./app/.

mcharles-square and others added 2 commits April 21, 2026 15:45
…tall

cashapp#558 parallelised hermit install, which lets many goroutines drive per-package
download progress concurrently. cache/http.go calls task.Size(total) followed
by task.Add(n) for each chunk received; with a single active task the cycle
below could never close, but with many it closes in two hops.

Lock orderings:

    Task.Size:           task.lock -> ui.lock        (via UI.swapSize)
    UI.redrawProgress:   ui.lock   -> task_i.lock    (via liveOperations / op.status)

Goroutine A inside task_A.Size holds task_A.lock waiting on ui.lock. Goroutine
B, having just returned from its own task.Add's critical section, enters
redrawProgress holding ui.lock and iterates operations waiting on task_A.lock.
Deadlock is silent because the UI itself is blocked, so no progress output is
produced before the job is killed — matches the wild symptom of 'hermit install'
stalling with zero output after v0.52.0.

Fix: release task.lock in Task.Size before calling UI.swapSize. The
oldSize → newSize delta is additive and commutative, so doing the swapSize
outside the task's critical section loses nothing.

Regression test fans out goroutines each interleaving Size/Add on their own
task (mirroring cache/http.go) and fails the Go test binary explicitly with a
goroutine dump rather than hanging for the default test timeout.
@joshfriend joshfriend force-pushed the fix/task-size-deadlock branch from 358b19b to 45fe13f Compare April 21, 2026 16:29
@joshfriend joshfriend enabled auto-merge (squash) April 21, 2026 16:31
@joshfriend joshfriend merged commit 107c8b9 into cashapp:master Apr 21, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants