Skip to content

race condition with scheduler initialization #83

@shwestrick

Description

@shwestrick

A student of mine, @darshand15, has been doing some testing with ParlayLib. We think we've found a race condition in the scheduler initialization.

Suppose you have 2 threads.

  1. At the moment of scheduler initialization, we have num_awake_workers set to 2.
  2. Immediately after thread 0 finishes initializing the scheduler, thread 1 is "awake" but may not be scheduled by the OS right away.
  3. Suppose thread 0 now executes pardo and therefore scheduler.spawn(&right_job); from here it executes if (first) wake_up_a_worker() but this does not do atomic_notify_one because the current number of awake threads is set to 2 (equal to num_threads).
  4. Now, thread 1 starts executing. It enters worker() and then wait_for_work(), and therefore immediately goes to sleep, waiting on the wake_up_counter.

The problem is that we have now entered a situation where thread 0 has a spare job in its deque, but thread 1 is not awake to steal it.

We were able to reproduce this race condition in practice with a simple program that looks like:

void loop() {
  // just loop for a while
}

int main() {
  loop();
  pardo(
    [](){ loop(); },
    [](){ loop(); }
  );
}

In this example, the scheduler gets initialized (lazily) at the beginning of the pardo. When we run this program on 2 threads, we almost always see sequential performance (no speedup). Actually, I don't think we've yet managed to see any speedup on 2 threads for this example program.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions