[BUGFIX] Fix thpool_destroy() hang: unlock all workers at once#135
Open
nil0x42 wants to merge 1 commit intoPithikos:masterfrom
Open
[BUGFIX] Fix thpool_destroy() hang: unlock all workers at once#135nil0x42 wants to merge 1 commit intoPithikos:masterfrom
nil0x42 wants to merge 1 commit intoPithikos:masterfrom
Conversation
`thpool_destroy()` is supposed to wake every worker, wait until the pool is empty, and return quickly. In practice it can block **indefinitely** (or for dozens of seconds) as soon as the pool size out-grows its one-second “fast-exit” window. * `bsem_post_all()` sets `v = 1` and broadcasts. * The **first** worker that wakes executes `bsem_wait() → while (v==0) … ; v = 0; …` → the single ticket is consumed, all other awakened threads fall straight back into `pthread_cond_wait()`. * After that first second, the destroy code slow-polls: one `bsem_post_all()` + `sleep(1)` per loop → **one ticket per second**. With 4000 threads the destructor needs \~4000 s; even with only **16 threads** (real-world `duplicut` run) it can exceed a watchdog such as `timeout 5`, hanging about 1 ‰ of the executions.
nil0x42
added a commit
to nil0x42/duplicut
that referenced
this pull request
May 16, 2025
I also submitted a PR to upstream project, explaining the issue:
Pithikos/C-Thread-Pool#135
|
Good, this solves #134 |
ShowOffJonah
added a commit
to ShowOffJonah/Cutdi
that referenced
this pull request
Sep 19, 2025
I also submitted a PR to upstream project, explaining the issue:
Pithikos/C-Thread-Pool#135
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi, i'm nil0x42, the creator of duplicut
I wrote this patch for duplicut's copy of thpool.c, as i had a strange timeout in some unit tests that arised only once per ~1000 runs.
unblock all workers in thpool_destroy(), fix hang with medium-to-large pools
thpool_destroy()is supposed to wake every worker, wait until the poolis empty, and return quickly.
In practice it can block for dozens of seconds as
soon as the pool size out-grows its one-second “fast-exit” window.
Root cause
bsem_post_all()setsv = 1and broadcasts.bsem_wait() → while (v==0) … ; v = 0; …→ the single ticket is consumed, all other awakened threads fall
straight back into
pthread_cond_wait().one
bsem_post_all()+sleep(1)per loop → one ticket persecond.
With 4000 threads the destructor needs ~4000 s; even with only 16
threads (real-world
duplicutrun) it can exceed a watchdog such astimeout 5, hanging about 1 ‰ of the executions.Fix (pure ANSI C / POSIX, no API change)
bsem_wait()now consumes exactly one ticket:bsem_post_all()grants “infinite” tickets so every waiter canpass:
The single-post path (
bsem_post()) is unchanged: one post still wakesone thread.
All accesses to
vremain protected by the semaphore mutex, so no newdata races or lock-order inversions are introduced.
Reproducer added
tests/thpool_destroy_hang.shtests/src/thpool_destroy_hang.cduplicut.Impact
thpool_destroy()– 4000 threadsduplicut(16 threads)The change makes pool shutdown reliable regardless of the thread count
and keeps behaviour identical in every other respect.