Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion doc/src/fix_qeq_reaxff.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,11 @@ Syntax

.. parsed-literal::

keyword = *dual* or *maxiter* or *nowarn*
keyword = *dual* or *maxiter* or *nowarn* or *matfree*
*dual* = process S and T matrix in parallel (only for qeq/reaxff/omp)
*maxiter* N = limit the number of iterations to *N*
*nowarn* = do not print a warning message if the maximum number of iterations was reached
*matfree* = use a matrix-free approach for applying the H matrix (only for qeq/reaxff/kk)

Examples
""""""""
Expand Down Expand Up @@ -90,6 +91,14 @@ same fixed number of QEq iterations is desired, which can be achieved
by using a very small tolerance and setting *maxiter* to the desired
number of iterations.

The optional *matfree* keyword replaces the sequence of
explicitly constructing the H matrix, then (repeatedly) applying it
with a matrix-free approach where the H matrix is effectively
regenerated each time it is applied. This trades performance for
reduced memory requirements because it avoids the overheads of
storing the matrix. This is only supported for the *qeq/reaxff/kk*
style, with both full and half qeq neighbor lists supported.

.. note::

In order to solve the self-consistent equations for electronegativity
Expand Down
14 changes: 10 additions & 4 deletions doc/src/package.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ Syntax
*no_affinity* values = none
*kokkos* args = keyword value ...
zero or more keyword/value pairs may be appended
keywords = *neigh* or *neigh/qeq* or *neigh/thread* or *neigh/transpose* or *newton* or *binsize* or *comm* or *comm/exchange* or *comm/forward* or *comm/pair/forward* or *comm/fix/forward* or *comm/reverse* or *comm/pair/reverse* or *sort* or *atom/map* or *gpu/aware* or *pair/only*
keywords = *neigh* or *neigh/qeq* or *neigh/thread* or *neigh/transpose* or *newton* or *binsize* or *comm* or *comm/exchange* or *comm/forward* or *comm/pair/forward* or *comm/fix/forward* or *comm/reverse* or *comm/pair/reverse* or *comm/fix/reverse* or *sort* or *atom/map* or *gpu/aware* or *pair/only*
*neigh* value = *full* or *half*
full = full neighbor list
half = half neighbor list built in thread-safe manner
Expand All @@ -98,7 +98,7 @@ Syntax
*binsize* value = size
size = bin size for neighbor list construction (distance units)
*comm* value = *no* or *host* or *device*
use value for comm/exchange and comm/forward and comm/pair/forward and comm/fix/forward and comm/reverse
use value for comm/exchange and comm/forward and comm/pair/forward and comm/fix/forward and comm/reverse and comm/fix/reverse
*comm/exchange* value = *no* or *host* or *device*
*comm/forward* value = *no* or *host* or *device*
*comm/pair/forward* value = *no* or *device*
Expand All @@ -110,6 +110,10 @@ Syntax
*comm/pair/reverse* value = *no* or *device*
*no* = perform communication pack/unpack in non-KOKKOS mode
*device* = perform pack/unpack on device (e.g. on GPU)
*comm/fix/reverse* value = *no* or *host* or *device*
*no* = perform communication pack/unpack in non-KOKKOS mode
*host* = perform pack/unpack on host (e.g. with OpenMP threading)
*device* = perform pack/unpack on device (e.g. on GPU)
*sort* value = *no* or *device*
*no* = perform atom sorting in non-KOKKOS mode
*device* = perform atom sorting on device (e.g. on GPU)
Expand Down Expand Up @@ -545,7 +549,8 @@ rule of thumb may give too large a binsize and the default should be
overridden with a smaller value.

The *comm* and *comm/exchange* and *comm/forward* and *comm/pair/forward*
and *comm/fix/forward* and *comm/reverse* and *comm/pair/reverse*
and *comm/fix/forward* and *comm/reverse* and *comm/pair/reverse* and
*comm/fix/reverse*
keywords determine whether the host or device performs the packing and
unpacking of data when communicating per-atom data between processors.
"Exchange" communication happens only on timesteps that neighbor lists
Expand Down Expand Up @@ -573,7 +578,8 @@ keywords, if a value of *host* is used it will be automatically
be changed to *no* since these keywords don't support *host* mode. The
value of *no* will also always be used when running on the CPU, i.e. setting
the value to *device* will have no effect if the pair/fix style is
running on the CPU. For the *comm/fix/forward* or *comm/pair/reverse*
running on the CPU. For the *comm/fix/forward* or *comm/pair/reverse* or
*comm/fix/reverse*
keywords, not all styles support *device* mode and in that case will run
in *no* mode instead.

Expand Down
89 changes: 86 additions & 3 deletions src/KOKKOS/comm_kokkos.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,11 @@
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */

/* ----------------------------------------------------------------------
Contributing authors: Christian Trott (SNL), Stan Moore (SNL),
Lewis Russell (U. Strathclyde), Balint Joo (NVIDIA)
------------------------------------------------------------------------- */

#include "comm_kokkos.h"

#include "atom.h"
Expand Down Expand Up @@ -113,6 +118,7 @@ void CommKokkos::init()
reverse_pair_comm_legacy = lmp->kokkos->reverse_pair_comm_legacy;
forward_fix_comm_legacy = lmp->kokkos->forward_fix_comm_legacy;
reverse_comm_legacy = lmp->kokkos->reverse_comm_legacy;
reverse_fix_comm_legacy = lmp->kokkos->reverse_fix_comm_legacy;
exchange_comm_on_host = lmp->kokkos->exchange_comm_on_host;
forward_comm_on_host = lmp->kokkos->forward_comm_on_host;
reverse_comm_on_host = lmp->kokkos->reverse_comm_on_host;
Expand Down Expand Up @@ -352,7 +358,8 @@ void CommKokkos::reverse_comm_device()

void CommKokkos::forward_comm(Fix *fix, int size)
{
if (fix->execution_space == Host || fix->execution_space == HostKK || !fix->forward_comm_device || forward_fix_comm_legacy) {
if (fix->execution_space == Host || fix->execution_space == HostKK ||
!fix->forward_comm_device || forward_fix_comm_legacy) {
k_sendlist.sync_host();
CommBrick::forward_comm(fix, size);
} else {
Expand Down Expand Up @@ -444,10 +451,86 @@ void CommKokkos::forward_comm_device(Fix *fix, int size)

void CommKokkos::reverse_comm(Fix *fix, int size)
{
k_sendlist.sync_host();
CommBrick::reverse_comm(fix, size);
if (fix->execution_space == Host || fix->execution_space == HostKK ||
!fix->reverse_comm_device || reverse_fix_comm_legacy) {
k_sendlist.sync_host();
CommBrick::reverse_comm(fix, size);
} else {
k_sendlist.sync_device();
reverse_comm_device<LMPDeviceType>(fix,size);
}
}

/* ---------------------------------------------------------------------- */

template<class DeviceType>
void CommKokkos::reverse_comm_device(Fix *fix, int size)
{
int iswap, n, nsize;
MPI_Request request;
DAT::tdual_double_1d k_buf_tmp;

if (size) nsize = size;
else nsize = fix->comm_reverse;
KokkosBase* fixKKBase = dynamic_cast<KokkosBase*>(fix);

for (iswap = 0; iswap < nswap; iswap++) {
int n = MAX(max_buf_fix,nsize*sendnum[iswap]);
n = MAX(n,nsize*recvnum[iswap]);
if (n > max_buf_fix)
grow_buf_fix(n);
}

// exchange data with another proc
// if other proc is self, just copy

for (iswap = nswap-1; iswap >= 0; iswap--) {

n = fixKKBase->pack_reverse_comm_kokkos(recvnum[iswap],firstrecv[iswap],k_buf_send_fix);

// exchange with another proc
// if self, set recv buffer to send buffer

if (sendproc[iswap] != me) {
double* buf_send_fix;
double* buf_recv_fix;
if (lmp->kokkos->gpu_aware_flag) {
buf_send_fix = k_buf_send_fix.view<DeviceType>().data();
buf_recv_fix = k_buf_recv_fix.view<DeviceType>().data();
} else {
k_buf_send_fix.modify<DeviceType>();
k_buf_send_fix.sync<LMPHostType>();
buf_send_fix = k_buf_send_fix.view_host().data();
buf_recv_fix = k_buf_recv_fix.view_host().data();
}

if (sendnum[iswap]) {
DeviceType().fence();
MPI_Irecv(buf_recv_fix,nsize*sendnum[iswap],MPI_DOUBLE,
sendproc[iswap],0,world,&request);
}
if (recvnum[iswap]) {
DeviceType().fence();
MPI_Send(buf_send_fix,n,MPI_DOUBLE,recvproc[iswap],0,world);
}
if (sendnum[iswap]) {
MPI_Wait(&request,MPI_STATUS_IGNORE);
DeviceType().fence();
}

if (!lmp->kokkos->gpu_aware_flag) {
k_buf_recv_fix.modify<LMPHostType>();
k_buf_recv_fix.sync<DeviceType>();
}
k_buf_tmp = k_buf_recv_fix;
} else k_buf_tmp = k_buf_send_fix;

// unpack buffer
auto k_sendlist_iswap = Kokkos::subview(k_sendlist,iswap,Kokkos::ALL);
fixKKBase->unpack_reverse_comm_kokkos(sendnum[iswap], k_sendlist_iswap,k_buf_tmp);
}

}

/* ----------------------------------------------------------------------
reverse communication invoked by a Fix with variable size data
Expand Down
2 changes: 2 additions & 0 deletions src/KOKKOS/comm_kokkos.h
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ class CommKokkos : public CommBrick {
bool reverse_pair_comm_legacy;
bool forward_fix_comm_legacy;
bool reverse_comm_legacy;
bool reverse_fix_comm_legacy;
bool exchange_comm_on_host;
bool forward_comm_on_host;
bool reverse_comm_on_host;
Expand Down Expand Up @@ -64,6 +65,7 @@ class CommKokkos : public CommBrick {
template<class DeviceType> void forward_comm_device(Pair *pair, int size=0);
template<class DeviceType> void reverse_comm_device(Pair *pair, int size=0);
template<class DeviceType> void forward_comm_device(Fix *fix, int size=0);
template<class DeviceType> void reverse_comm_device(Fix *fix, int size=0);
template<class DeviceType> void exchange_device();
template<class DeviceType> void borders_device();

Expand Down
Loading