A student of mine, @darshand15, has been doing some testing with ParlayLib. We think we've found a race condition in the scheduler initialization.
Suppose you have 2 threads.
- At the moment of scheduler initialization, we have
num_awake_workers set to 2.
- Immediately after thread 0 finishes initializing the scheduler, thread 1 is "awake" but may not be scheduled by the OS right away.
- Suppose thread 0 now executes
pardo and therefore scheduler.spawn(&right_job); from here it executes if (first) wake_up_a_worker() but this does not do atomic_notify_one because the current number of awake threads is set to 2 (equal to num_threads).
- Now, thread 1 starts executing. It enters
worker() and then wait_for_work(), and therefore immediately goes to sleep, waiting on the wake_up_counter.
The problem is that we have now entered a situation where thread 0 has a spare job in its deque, but thread 1 is not awake to steal it.
We were able to reproduce this race condition in practice with a simple program that looks like:
void loop() {
// just loop for a while
}
int main() {
loop();
pardo(
[](){ loop(); },
[](){ loop(); }
);
}
In this example, the scheduler gets initialized (lazily) at the beginning of the pardo. When we run this program on 2 threads, we almost always see sequential performance (no speedup). Actually, I don't think we've yet managed to see any speedup on 2 threads for this example program.
A student of mine, @darshand15, has been doing some testing with ParlayLib. We think we've found a race condition in the scheduler initialization.
Suppose you have 2 threads.
num_awake_workersset to 2.pardoand thereforescheduler.spawn(&right_job); from here it executesif (first) wake_up_a_worker()but this does not doatomic_notify_onebecause the current number of awake threads is set to 2 (equal tonum_threads).worker()and thenwait_for_work(), and therefore immediately goes to sleep, waiting on thewake_up_counter.The problem is that we have now entered a situation where thread 0 has a spare job in its deque, but thread 1 is not awake to steal it.
We were able to reproduce this race condition in practice with a simple program that looks like:
In this example, the scheduler gets initialized (lazily) at the beginning of the
pardo. When we run this program on 2 threads, we almost always see sequential performance (no speedup). Actually, I don't think we've yet managed to see any speedup on 2 threads for this example program.