Skip to content

MPI worker uneven distribution of work / CPU usage #48

@vexingly

Description

@vexingly

Reviewing the grafana logs for the UAT testing, and performing various test runs to confirm, there seems to be some unusual variation in workload distribution to MPI workers in regards to their CPU usage.

  1. Observed instances where worker-0 was given no work, had 0% CPU usage
  2. Workers with multiple CPU's will be limited by # of maximum threads selected by user/default
  3. Sometimes different combinations of subsamples/threads/MPI workers results in unusual distributions

In theory the number of subsamples is evenly distributed to the number of MPI workers. Each subsample will utilize 1 CPU thread. If the # of MPI workers times the # of max threads equals the number of subsamples, and the workers CPU limit is greater or equal to the number of threads, then all subsamples will run in parallel. If there are not enough workers, threads or CPU cores available, then subsamples could run in series.

Observed some strange CPU utilization with various tests that should perform similarly, for example testing with the NewCaseBased MPI compiled model with 600m cases...

6 subsamples, 2 threads, 3 workers (2CPU) resulted in full CPU utilization
8 subsamples, 2 threads, 4 workers (2CPU) resulted in full CPU utilization
4 subsamples, 2 threads, 2 workers (2CPU) less than 1 CPU core used per worker
8 subsamples, 4 threads, 2 workers (4CPU) less than 1 CPU core used per worker
4 subsamples, 4 threads, 2 workers (4CPU) 1 worker idle, other worker used 1 core only

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions