diff --git a/content/backmatter.tex b/content/backmatter.tex index 2a36fa819..d5098e3a2 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -184,7 +184,9 @@ \chapter{Undefined Behavior in OpenSHMEM}\label{sec:undefined} \end{longtable} - +\color{ForestGreen} +\input{content/interoperability} +\color{black} \chapter{History of OpenSHMEM}\label{sec:openshmem_history} diff --git a/content/interoperability.tex b/content/interoperability.tex new file mode 100644 index 000000000..4cec41b76 --- /dev/null +++ b/content/interoperability.tex @@ -0,0 +1,174 @@ +\chapter{Interoperability with Other Programming Models}\label{sec:interoperability} + +OpenSHMEM routines may be used in conjunction with the routines of other +communication libraries or parallel languages in the same program. This section +describes the interoperability with other programming models, including +clarification of undefined behaviors caused by mixed use of different models, +and advice to \openshmem library users and developers that may improve the portability +and performance of hybrid programs. + + +\section{\ac{MPI} Interoperability} + +\openshmem and \ac{MPI} are two commonly used parallel programming models for +distributed-memory systems. The user can choose to utilize both models in the same program +to efficiently and easily support various communication patterns. + +A vendor may implement the \openshmem and \ac{MPI} libraries in different ways. For +instance, one may implement both \openshmem and \ac{MPI} as standalone libraries, +each of which allocates and initializes fully isolated communication +resources. +Another approach +is to implement both \openshmem and \ac{MPI} interfaces within the +same software system in order to share a communication resource when possible. + +To improve interoperability and portability in \openshmem + \ac{MPI} hybrid +programming, we clarify the relevant semantics in the following subsections. + + +\subsection{Initialization} +In order to ensure that a hybrid program can be portably performed with different vendor +implementations, the \openshmem environment of the program must be initialized by +a call to \FUNC{shmem\_init} or \FUNC{shmem\_init\_thread} and be finalized by +a call to \FUNC{shmem\_finalize}; the \ac{MPI} environment of the program must be initialized +by a call to \FUNC{MPI\_Init} or \FUNC{MPI\_Init\_thread} and be finalized by a +call to \FUNC{MPI\_Finalize}. + +\apiimpnotes{ +Portable implementations of OpenSHMEM and \ac{MPI} must ensure that the initialization +calls can be made in an arbitrary order within a program; the same rule also +applies to the finalization calls. A software runtime that utilizes a shared +communication resource for \openshmem and \ac{MPI} communication may maintain an +internal reference counter in order to ensure that the shared resource is +initialized only once and thus no shared resource is released until the last +finalization call is made. +} + + +\subsection{Dynamic Process Creation} +\label{subsec:interoperability:mpmd} + +\ac{MPI} defines a dynamic process model that allows creation of processes after +an \ac{MPI} application has started (e.g., by calling \FUNC{MPI\_Comm\_spawn}) and +connection to independent processes (e.g., through \FUNC{MPI\_Comm\_accept} +and \FUNC{MPI\_Comm\_connect}). +It provides a mechanism to establish communication +between the newly created processes and the existing \ac{MPI} application (see +\ac{MPI} standard version 3.1, Chapter 10). +Unlike \ac{MPI}, \openshmem starts all processes at once and requires all \acp{PE} to +collectively allocate and initialize resources (e.g., symmetric heap) used by +the \openshmem library before any other \openshmem routine may +be called. \openshmem does not support communication with dynamically created +or connected processes. In such a scenario, \ac{MPI} can be used to communicate +with these processes. + + +\subsection{Thread Safety} +\label{subsec:interoperability:thread} +Both \openshmem and \ac{MPI} define the interaction with user threads in a program +with routines that can be used for initializing and querying the thread +environment. A hybrid program may request different thread levels +at the initialization calls of \openshmem and \ac{MPI} environments; however, the +returned support level provided by the \openshmem or \ac{MPI} library might be different +from that returned in an non-hybrid program. For instance, the former +initialization call in a hybrid program may initialize a resource with the +requested thread level, but the supported level cannot be updated by a subsequent +initialization call if the underlying software runtime of \openshmem and \ac{MPI} +share the same internal communication resource. +The program should always check the \VAR{provided} thread level returned +at the corresponding initialization call or query the level of thread support +after initialization to portably ensure thread support in each communication +environment. + +Both \openshmem and \ac{MPI} define similar thread levels, namely, \VAR{THREAD\_SINGLE}, +\VAR{THREAD\_FUNNELED}, \VAR{THREAD\_SERIALIZED}, and \VAR{THREAD\_MULTIPLE}. +When requesting threading support in a hybrid program, however, +the following additional rules are applied if the implementations of \openshmem +and \ac{MPI} share the same internal communication resource. +It is strongly recommended to always follow these rules to ensure program +portability. + +\begin{itemize} + \item The \VAR{THREAD\_SINGLE} thread level requires a single-threaded program. + Hence, a hybrid program should not request \VAR{THREAD\_SINGLE} at the initialization + call of either \openshmem or \ac{MPI} but request a different thread level at the + initialization call of the other model. + + \item The \VAR{THREAD\_FUNNELED} thread level allows only the main thread to + make communication calls. A hybrid program using the \VAR{THREAD\_FUNNELED} + thread level in both \openshmem and \ac{MPI} should ensure that the same main thread + is used in both communication environments. + + \item The \VAR{THREAD\_SERIALIZED} thread level requires the program to ensure + that communication calls are not made concurrently by multiple threads. If a + hybrid program uses \VAR{THREAD\_SERIALIZED} in one communication environment + and \VAR{THREAD\_SERIALIZED} or \VAR{THREAD\_FUNNELED} in the other one, it + should also guarantee that the \openshmem and \ac{MPI} calls are not made concurrently + from two distinct threads. +\end{itemize} + +\subsection{Mapping Process Identification Numbers} +\label{subsec:interoperability:id} + +Similar to the \ac{PE} number in \openshmem, \ac{MPI} defines rank as the +identification number of a process in a communicator. Both the \openshmem \ac{PE} +and the \ac{MPI} rank are unique integers assigned from zero to one less than the total +number of processes. In a hybrid program, the \openshmem +\ac{PE} number in \LibHandleRef{SHMEM\_TEAM\_WORLD} +and the \ac{MPI} rank in \VAR{MPI\_COMM\_WORLD} of a process can be equal. +This feature, however, may be provided by only some of the \openshmem and \ac{MPI} +implementations (e.g., if both environments share the same underlying process +manager) and is not portably guaranteed. A portable program should always +use the standard functions in each model, namely, \FUNC{shmem\_team\_my\_pe} in \openshmem +and \FUNC{MPI\_Comm\_rank} in \ac{MPI}, to query the process identification numbers +in each communication environment and manage the mapping of identifiers in the +program when necessary. + +\subsubsection*{Examples} +\label{subsubsec:interoperability:id:example} +The following example demonstrates how to manage the mapping between \openshmem +\ac{PE} numbers and \ac{MPI} ranks in \VAR{MPI\_COMM\_WORLD} in a hybrid \openshmem +and \ac{MPI} program. + +\lstinputlisting[language={C}, tabsize=2, + basicstyle=\ttfamily\footnotesize] + {example_code/hybrid_mpi_mapping_id.c} + +The following example demonstrates an alternative approach for managing the mapping +of process identification numbers in a hybrid program. The program creates a +new MPI communicator, named \VAR{shmem\_comm}, that contains all +processes in \VAR{MPI\_COMM\_WORLD} and each process has the same \ac{MPI} rank +number as its \openshmem \ac{PE} number. + +\lstinputlisting[language={C}, tabsize=2, + basicstyle=\ttfamily\footnotesize] + {example_code/hybrid_mpi_mapping_id_shmem_comm.c} + +\subsection{RMA Programming Models} +\label{subsec:interoperability:rma} + +\openshmem and \ac{MPI} each define similar one-sided communication models; +however, a portable program should not assume interoperability between these +models. +For instance, \openshmem guarantees the atomicity only of concurrent \openshmem AMO operations +that operate on symmetric data with the same datatype. Access to the same symmetric +object with \ac{MPI} atomic operations, such as an \FUNC{MPI\_Fetch\_and\_op}, may +result in an undefined result. A hybrid program should avoid situations where \ac{MPI} and +\openshmem one-sided operations perform concurrent accesses to the same memory +location; otherwise, the behavior is undefined. + +\subsection{Communication Progress} +\label{subsec:interoperability:progress} + +\openshmem promises the progression of communication both with and without +\openshmem calls and requires the software progress mechanism in the implementation +(e.g., a progress thread) when the hardware does not provide asynchronous communication +capabilities (see Section \ref{subsec:progress}). +In \ac{MPI}, however, a weak progress semantics is applied. That is, +an \ac{MPI} communication call is guaranteed only to complete in finite time. For +instance, an \FUNC{MPI\_Put} may be completed only when the remote process makes an \ac{MPI} +call that internally triggers the progress of \ac{MPI}, if the underlying hardware +does not support asynchronous communication. A hybrid program +should not assume that the \openshmem library also makes progress for \ac{MPI}. +It can explicitly manage the asynchronous communication of \ac{MPI} in +order to prevent any deadlock or performance degradation. diff --git a/example_code/hybrid_mpi_mapping_id.c b/example_code/hybrid_mpi_mapping_id.c new file mode 100644 index 000000000..e99ff5bcc --- /dev/null +++ b/example_code/hybrid_mpi_mapping_id.c @@ -0,0 +1,29 @@ +#include +#include +#include + +int main(int argc, char *argv[]) +{ + MPI_Init(&argc, &argv); + shmem_init(); + + int mype = shmem_team_my_pe(SHMEM_TEAM_WORLD); + int npes = shmem_team_n_pes(SHMEM_TEAM_WORLD); + + static int myrank; + MPI_Comm_rank(MPI_COMM_WORLD, &myrank); + + int *mpi_ranks = shmem_calloc(npes, sizeof(int)); + + shmem_int_collect(SHMEM_TEAM_WORLD, mpi_ranks, &myrank, 1); + if (mype == 0) + for (int i = 0; i < npes; i++) + printf("PE %d's MPI rank is %d\n", i, mpi_ranks[i]); + + shmem_free(mpi_ranks); + + shmem_finalize(); + MPI_Finalize(); + + return 0; +} diff --git a/example_code/hybrid_mpi_mapping_id_shmem_comm.c b/example_code/hybrid_mpi_mapping_id_shmem_comm.c new file mode 100644 index 000000000..cf2b86809 --- /dev/null +++ b/example_code/hybrid_mpi_mapping_id_shmem_comm.c @@ -0,0 +1,24 @@ +#include +#include +#include + +int main(int argc, char *argv[]) +{ + MPI_Init(&argc, &argv); + shmem_init(); + + int mype = shmem_my_pe(); + + MPI_Comm shmem_comm; + MPI_Comm_split(MPI_COMM_WORLD, 0, mype, &shmem_comm); + + int myrank; + MPI_Comm_rank(shmem_comm, &myrank); + printf("PE %d's MPI rank is %d\n", mype, myrank); + + MPI_Comm_free(&shmem_comm); + shmem_finalize(); + MPI_Finalize(); + + return 0; +}