Explicit communication inside models running with domain decomposition

When running metatomic models inside simulation engines with domain decomposition (currently mainly LAMMPS, more to come), it would be useful to give the model access to explicit communication API. This would then enable at least two things:

- using the communication for message passing in message passing models (MPNN), in turn reducing both the amount of duplicated work from one domain to another, and allowing the model to run with a smaller interaction cutoff.
- using the communication to enable domain decomposition for long-range models.

For the MPNN use-case, MACE is able to use something similar in the ML-IAP interface of LAMMPS, using LAMMPS primitives to receive/transmit node features from other domains: https://github.com/ACEsuit/mace/blob/42cd495b9c8e155441c405fef842a8ba5107c2f8/mace/tools/utils.py#L152-L167.

We should do something similar, but not limited to LAMMPS, providing models the same set of tools for communication regardless of the simulation engine.

#### How it could work

metatomic would provide a set of custom TorchScript operations (name to be decided, `communicate` here) that the model can use and call. Then the engine would give metatomic a set of corresponding function pointers that implement these operations using then engine's communication primitives.

```
model                                                               engine

communicate(...)    ==>    metatomic shim     ==>     communicate_impl(...)
```

Taking inspiration from ML-IAP, the API could look something like

```py
# features is `n_atoms x n_features`

pre_allocated = torch.zeros_like(features)
communicate(features, pre_allocated)

# we get the information on our ghosts (i.e. atoms from other domains) inside the `pre_allocated` array
# other domain get information about their ghosts from `features`
```

If the engine does not support domain decomposition, the `communicate` function would do nothing.

We should be able to make everything works during backward propagation for the calculation of forces/gradients in general. In LAMMPS, communicate would map to `forward_comm` and `reverse_comm` for backward propagation (https://docs.lammps.org/Developer_comm_ops.html)

#### Unresolved questions

- [ ] Can this API be used for long-range model as well or do we need more?
- [ ] can such an API perform direct GPU to GPU communication?
    - [ ] when running in LAMMPS-Kokkos
    - [ ] when running in LAMMPS without Kokkos, but still running the model on GPU
    - [ ] when running in GROMACS/OpenMM
- [ ] Can we implement this kind of interface in other simulation engines? 
    - [ ] GROMACS has [DomainCommForward](https://manual.gromacs.org/current/doxygen/html-lib/classgmx_1_1DomainCommForward.xhtml) and [DomainCommBackward](https://manual.gromacs.org/current/doxygen/html-lib/classgmx_1_1DomainCommBackward.xhtml), but they seems limited to passing positions/forces only
    - [ ] what does OpenMM do here? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicit communication inside models running with domain decomposition #82

How it could work

Unresolved questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Explicit communication inside models running with domain decomposition #82

Description

How it could work

Unresolved questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions