Skip to content

MPI-PR: quadratics scaling for the number of open file descriptors with respect to the number of processes/node #257

@edoapra

Description

@edoapra

Since every shared memory allocation in MPI-PR opens a memory mapped file for each rank and each rank needs to know the file descriptors of all the ranks on the same node ... you end up with a (proc. per node)^2 file descriptors opened for every shared memory allocation.
Since 128-core hardware is becoming common place and 128*128=16K, we already have seen reports of Global Arrays runs that required to increase the kernel limit /proc/sys/fs/file-max to values O(10^6)-O(10^7).

https://groups.google.com/g/nwchem-forum/c/Q-qvcHP9vP4
nwchemgit/nwchem#338

Can we try to address this from the GA side?
Possible solutions that come to my mind (I have no idea about their feasibility)

  • Disable shared memory
  • Split physical node in smaller "virtual nodes" (may be consistent with numa or socket domains)

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions