Skip to content

feat(mtagro): enable muti-rank usage#4

Open
HaoZeke wants to merge 19 commits intometatomicfrom
realDomDec
Open

feat(mtagro): enable muti-rank usage#4
HaoZeke wants to merge 19 commits intometatomicfrom
realDomDec

Conversation

@HaoZeke
Copy link
Member

@HaoZeke HaoZeke commented Jan 30, 2026

Look away until

Or "real domain decomposition"..

Basically the LAMMPS style where the model loads everywhere, computes on every rank.

WIP. Needs testing to ensure consistency.

All set. Closes #7.

@HaoZeke HaoZeke changed the base branch from metatomic to noVesin January 30, 2026 11:44
@HaoZeke HaoZeke marked this pull request as ready for review January 30, 2026 11:44
@HaoZeke HaoZeke marked this pull request as draft January 30, 2026 11:49
@HaoZeke HaoZeke mentioned this pull request Feb 2, 2026
Base automatically changed from noVesin to metatomic February 3, 2026 14:40
@HaoZeke HaoZeke force-pushed the realDomDec branch 5 times, most recently from 5c0cd25 to d3505c6 Compare February 4, 2026 17:18
@HaoZeke HaoZeke marked this pull request as ready for review February 15, 2026 07:36
@HaoZeke
Copy link
Member Author

HaoZeke commented Feb 15, 2026

This is now consistent, with mpirun -n 1 gmx_mpi mdrun all the way up to -n 12 on the example from https://github.com/HaoZeke/pixi_envs/tree/main/orgs/metatensor/gromacs/mta_test

since GMX_LOG is rank 0 only, and cerr is not allowed as per GROMACS
style guides
1. The bonded interaction building (make_bondeds_zone) runs for all
zones but won't find any cross-zone bondeds when !hasInterAtomicInteractions() — it's a no-op for the extra zones
2. The exclusion building (make_exclusions_zone) correctly builds exclusion entries for all i-zone atoms, satisfying the pairlist assertion

Added nzone_bondeds = std::max(nzone_bondeds, numIZonesForExclusions) to
ensure the exclusion building loop covers all i-zones when
intermolecularExclusionGroup is present.

Without this, 3D DD (e.g., 2x2x2 with 8 ranks) has numIZones=4 but
nzone_bondeds=1, so exclusion lists are only built for zone 0 atoms while the nbnxm assertion expects them for zones 0-3.
  1. localToModelIndex_ sized to numLocalPlusHalo instead of signal.x_.size() (was OOB write)
  2. augmentGhostPairs rewritten to correctly identify halo MTA atoms by iterating localToModelIndex_ from numLocalAtoms_ onward, instead of
  incorrectly slicing the full coordinate array
  3. Shift vector computed as pair.dx() - (positions_[B] - positions_[A]) in model-space, then rounded to integer cell shifts and recomputed from
  box vectors for consistency
  4. Deduplication of pairs using std::set<tuple> to handle overlap between signal pairs and augmented halo-halo pairs
  5. Timer instrumentation via MetatomicTimer RAII class around key phases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(install): fixup RPATH

1 participant