From e1d50954b9d061f3d5e26e28da80ba47df5110a2 Mon Sep 17 00:00:00 2001 From: Min Si Date: Tue, 29 Jan 2019 13:49:36 -0600 Subject: [PATCH 01/43] Rewrite the interoperability annex --- content/backmatter.tex | 200 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 200 insertions(+) diff --git a/content/backmatter.tex b/content/backmatter.tex index 2a36fa819..f2ce991e3 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -184,7 +184,207 @@ \chapter{Undefined Behavior in OpenSHMEM}\label{sec:undefined} \end{longtable} +\color{ForestGreen} +\chapter{Interoperability with other Programming Models}\label{sec:interoperability} + +OpenSHMEM routines may be used in conjunction with the routines of other +communication libraries or parallel languages in the same program. This section +describes the interoperability with other programming models including +clarification of undefined behaviors caused by mixed use of different models, +advice to \openshmem library users and developers that may improve the portability +and performance of hybrid programs, and the definition of an OpenSHMEM extension +API that queries the interoperability features provided by an \openshmem library. + + +\section{MPI Interoperability} + +\openshmem and MPI are two commonly used parallel programming models for distributed +memory systems. The user can choose to utilize both models in the same program +to efficiently and easily support various communication patterns. + +A vendor may implement the \openshmem and MPI libraries in different ways. For +instance, one may implement both \openshmem and MPI as standalone libraries +and each of them allocates and initializes fully isolated communication +resources. Consequently, an \openshmem call does not interfere with any MPI +communication in the same application. As the other common approach, however, +a vendor may also implement both \openshmem and MPI interfaces within the +same software system in order to share communication resource when possible. +In such a case, internal interference may occur. + +To improve interoperability and portability in \openshmem + MPI hybrid +programming, we clarify several aspects in the following subsections. + + +\subsection{Initialization} +To ensure that a hybrid program can be portably performed with different vendor +implementations, the \openshmem environment of the program must be initialized by +a call to \FUNC{shmem\_init} or \FUNC{shmem\_init\_thread}, and be finalized by +a call to \FUNC{shmem\_finalize}; the MPI environment of the program must be initialized +by a call to \FUNC{MPI\_Init} or \FUNC{MPI\_Init\_thread}, and be finalized by a +call to \FUNC{MPI\_Finalize}. + +\apiimpnotes{ +Portable implementations of OpenSHMEM and MPI must ensure that the initialization +calls can be made in an arbitrary order within a program; the same rule also +applies to the finalization calls. A software runtime that utilizes shared +communication resource for \openshmem and MPI communication may maintain an +internal reference counter in order to ensure that the shared resource is +initialized only once, and no shared resource is released until the last +finalization call is made. +} + + +\subsection{Dynamic Process Creation and MPMD Programming} +\label{subsec:interoperability:mpmd} + +MPI defines the dynamic process model that allows creation of processes after +an MPI application has started, and provides the mechanism to establish communication +between the newly created processes and the existing MPI application. This model +can be useful when implementing a MPMD application by dynamically starting multiple +groups of processes, and each of these groups may launch a different executable +MPI program. The communication performed within a process group is identified by +an intracommunicator, and that performed between two process groups is identified +by an intercommunicator. The two types of communication do not interfere with +each other. + +Unlike MPI, \openshmem requires all PEs to collectively allocate and initialize +resources used by the \openshmem library before any other \openshmem routine may +be called. Thus, the dynamic process model is not supported in \openshmem. For +instance, the processes newly created by a call to \FUNC{MPI\_Comm\_spawn} cannot +join the existing \openshmem environment that was initialized by other existing +PEs. The \FUNC{shmem\_pe\_accessible} routine can be used in this scenario to +portably ensure that a remote PE is accessible via \openshmem communication. + + +\subsection{Thread Safety} +\label{subsec:interoperability:thread} +Both \openshmem and MPI define the interaction with user threads in a program +with routines that can be used for initializing and querying the thread +environment. In a hybrid program, the user can request different thread levels +at the initialization calls of \openshmem and MPI environments, however, the +returned support level provided by the \openshmem library might be different +from that returned in an \openshmem-only program. For instance, the former +initialization call in a hybrid program may initialize resource with the user +requested thread level but the supported level cannot be updated by the latter +initialization call, if the underlying software runtime of \openshmem and MPI +share the same internal communication resource. +The program should always check the \VAR{provided} thread level returned +at the corresponding initialization call to portably ensure thread support in each +communication environment. + + +\subsection{Mapping Process Identification Numbers} +\label{subsec:interoperability:id} + +Similar to the PE identifier in \openshmem, MPI defines rank as the +identification number of a process in a communicator. Both \openshmem PE +and MPI rank are unique integers assigned from zero to one less than the total +number of processes. In a hybrid program, one may observe that the \openshmem +PE and the MPI rank in \VAR{MPI\_COMM\_WORLD} of a process can be equal. +This feature, however, may be provided by only some of the \openshmem and MPI +implementations (e.g., if both environments share the same underlying process +manager), and is not portably guaranteed. A portable program should always +use the standard functions in each model, i.e., \FUNC{shmem\_my\_pe} in \openshmem +and \FUNC{MPI\_Comm\_rank} in MPI, to query the process identification numbers +in each communication environment and manage the mapping of identifiers in the +program when necessary. + + +\subsection{RMA Synchronization, Ordering and Atomicity} +\label{subsec:interoperability:rma} + +Both \openshmem and MPI define similar RMA and atomic operations with additional +semantics and synchronization routines to ensure the operations' ordering and +completion. A synchronization call in \openshmem, however, does not interfere +with the outstanding operations issued in the MPI environment. For instance, +the \FUNC{shmem\_quiet} function only ensures completion of \openshmem RMA, +AMO, and memory store operations. It does not force the completion +of any MPI outstanding operations. To ensure the completion of RMA operations +in MPI, the program should use an appropriate MPI synchronization routine in the +MPI context (e.g., using \FUNC{MPI\_Win\_flush\_all} to ensure remote completion +of all outstanding operations in the passive-target mode). Similarly, \openshmem +guarantees only the atomicity of concurrent AMO operations that operate on +symmetric data with the same datatype. Access to the same symmetric object with +MPI atomic operations, such as an \FUNC{MPI\_Fetch\_and\_op}, may result in undefined +result. + +\apiimpnotes{ +In the implementations that share the same communication resources for \openshmem +and MPI, the memory or network synchronization internally issued for one +programming model may also effect the status of operations in the other model. +Although the user program must make necessary synchronization calls for both models +in order to ensure semantics correctness, a high performance implementation may +internally avoid the later synchronization made by the other model when no +subsequent operation is issued between these two synchronization calls. +} + +\subsection{Communication Progress} +\label{subsec:interoperability:progress} + +\openshmem promises the progression of communication both with and without +\openshmem calls and requires the software progress mechanism in implementation +(e.g., a progress thread) when the hardware does not provide asynchronous communication +capabilities. In MPI, however, a weak progress semantics is applied. That is, +an MPI communication call is only guaranteed to complete in finite time. For +instance, an MPI Put may be completed only when the remote process makes an MPI +call which internally triggers the progress of MPI, if the underlying hardware +does not support asynchronous communication. A portable hybrid program +should not assume that a call to the \openshmem library also makes progress for MPI, +and it may have to explicitly manage the asynchronous communication in MPI in +order to prevent any deadlock or performance degradation. + +\apiimpnotes{ +Implementations that provide both \openshmem and MPI interfaces should try +to ensure progress for both models when necessary and possible, for performance +reasons. For instance, a high-quality implementation may start making progress for +both \openshmem and MPI whenever possible, after the user program has called +\FUNC{shmem\_init} and \FUNC{MPI\_init} provided by the same system. +} + +To avoid unnecessary overhead and programming complexity in the user program, +the \openshmem implementation may provide an extended \openshmem routine that +allows the user program to query the progress support for the MPI environment. +We introduce the definition and semantics of this routine in +Section~\ref{subsec:interoperability:query}. + + +\section{Interoperability Query API} +\label{subsec:interoperability:query} + +Determines whether an interoperability feature is supported by the \openshmem +library implementation. + +\begin{apidefinition} + +\begin{Csynopsis} +int @\FuncDecl{shmemx\_query\_interoperability}@(int property); +\end{Csynopsis} + +\begin{apiarguments} + \apiargument{IN}{property}{The interoperability property queried by the user.} +\end{apiarguments} + +% compiling error ? +% \apidescription{ +\FUNC{shmemx\_query\_interoperability} is an extended \openshmem routine that queries +whether an interoperability property is supported by the \openshmem library. One of the +following property can be queried in an \openshmem program after finishing the +initialization call to \openshmem and that of the relevant programming models +being used in the program. An OpenSHMEM library implementation may extend the +available properties. +\begin{itemize} + \item \VAR{SHMEM\_PROGRESS\_MPI} Query whether the \openshmem + implementation makes progress for the MPI communication used in the user program. +\end{itemize} +% } + +\apireturnvalues{ + The return value is \CONST{1} if \VAR{property} is supported by the \openshmem library; + otherwise, it is \CONST{0}. +} +\end{apidefinition} +\color{black} \chapter{History of OpenSHMEM}\label{sec:openshmem_history} From 34c7e86fb46cc38890c224c2a6dab969f486abb7 Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 1 Apr 2019 17:08:09 -0500 Subject: [PATCH 02/43] Update dynamic process creation subsection --- content/backmatter.tex | 39 +++++++++++++++++++++++++-------------- 1 file changed, 25 insertions(+), 14 deletions(-) diff --git a/content/backmatter.tex b/content/backmatter.tex index f2ce991e3..8bb891cdf 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -238,22 +238,33 @@ \subsection{Dynamic Process Creation and MPMD Programming} \label{subsec:interoperability:mpmd} MPI defines the dynamic process model that allows creation of processes after -an MPI application has started, and provides the mechanism to establish communication -between the newly created processes and the existing MPI application. This model -can be useful when implementing a MPMD application by dynamically starting multiple -groups of processes, and each of these groups may launch a different executable -MPI program. The communication performed within a process group is identified by -an intracommunicator, and that performed between two process groups is identified -by an intercommunicator. The two types of communication do not interfere with -each other. - +an MPI application has started (e.g., by calling \FUNC{MPI\_Comm\_spawn}), +and provides the mechanism to establish communication +between the newly created processes and the existing MPI application (see +MPI standard version 3.1, Chapter 10). Unlike MPI, \openshmem requires all PEs to collectively allocate and initialize resources used by the \openshmem library before any other \openshmem routine may -be called. Thus, the dynamic process model is not supported in \openshmem. For -instance, the processes newly created by a call to \FUNC{MPI\_Comm\_spawn} cannot -join the existing \openshmem environment that was initialized by other existing -PEs. The \FUNC{shmem\_pe\_accessible} routine can be used in this scenario to -portably ensure that a remote PE is accessible via \openshmem communication. +be called. Hence, attention must be paid when using \openshmem together with the +MPI dynamic process routines. Specifically, we clarify the following three scenarios: + +\begin{enumerate} +\item After MPI initialization and before any PEs start \openshmem initialization, +it is implementation defined whether processes created by a call to MPI dynamic +process routine are able to join the call to \FUNC{shmem\_init} or +\FUNC{shmem\_init\_thread} and establish the same \openshmem environment together +with other existing PEs. + +\item After \openshmem initialization, a process newly created by +the MPI dynamic process routine cannot join the existing \openshmem environment +that was initialized by other existing PEs. The \FUNC{shmem\_pe\_accessible} routine +may be used in this scenario to portably ensure that a remote PE is accessible +via \openshmem communication. + +\item After \openshmem initialization, it is implementation defined whether +processes newly created by MPI dynamic process routine can make a call to +\FUNC{shmem\_init} or \FUNC{shmem\_init\_thread} and establish a separate +\openshmem environment. +\end{enumerate} \subsection{Thread Safety} From afdc69af1a0539c8228a19a3e8ce8beb83bc29b6 Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 1 Apr 2019 17:19:32 -0500 Subject: [PATCH 03/43] Typo fix and minor word adjustment --- content/backmatter.tex | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/backmatter.tex b/content/backmatter.tex index 8bb891cdf..3079e5a43 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -322,7 +322,7 @@ \subsection{RMA Synchronization, Ordering and Atomicity} \apiimpnotes{ In the implementations that share the same communication resources for \openshmem and MPI, the memory or network synchronization internally issued for one -programming model may also effect the status of operations in the other model. +programming model may also affect the status of operations in the other model. Although the user program must make necessary synchronization calls for both models in order to ensure semantics correctness, a high performance implementation may internally avoid the later synchronization made by the other model when no @@ -347,7 +347,7 @@ \subsection{Communication Progress} \apiimpnotes{ Implementations that provide both \openshmem and MPI interfaces should try to ensure progress for both models when necessary and possible, for performance -reasons. For instance, a high-quality implementation may start making progress for +reasons. For instance, an implementation may start making progress for both \openshmem and MPI whenever possible, after the user program has called \FUNC{shmem\_init} and \FUNC{MPI\_init} provided by the same system. } From 43c537903590da2a8d00555e7f548eadd9d97baf Mon Sep 17 00:00:00 2001 From: Min Si Date: Tue, 2 Apr 2019 13:53:52 -0500 Subject: [PATCH 04/43] Add more details in RMA semantics subsection --- content/backmatter.tex | 61 ++++++++++++++++++++++++++++++++++++------ 1 file changed, 53 insertions(+), 8 deletions(-) diff --git a/content/backmatter.tex b/content/backmatter.tex index 3079e5a43..121962d07 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -301,23 +301,68 @@ \subsection{Mapping Process Identification Numbers} program when necessary. -\subsection{RMA Synchronization, Ordering and Atomicity} +\subsection{RMA Memory Semantics, Completion, Ordering and Atomicity} \label{subsec:interoperability:rma} -Both \openshmem and MPI define similar RMA and atomic operations with additional -semantics and synchronization routines to ensure the operations' ordering and -completion. A synchronization call in \openshmem, however, does not interfere -with the outstanding operations issued in the MPI environment. For instance, +Both \openshmem and MPI define similar RMA and atomic operations for remote memory +access, however, each model defines different semantics for memory synchronization, +operation completion, ordering, and atomicity. +We clarify the semantics differences and interoperability of these two models +as below. + +\begin{itemize} + +\item Memory Semantics: MPI defines the concept of public and private copies +for each RMA window. Any remote RMA operation can access only the +public copy of that window, and memory load\slash store can access only the +private copy. MPI defines two memory models for memory +synchronization between the copies: RMA separate and RMA unified (see definition +in MPI standard version 3.1, Section 11.4), and requires additional RMA +synchronization call to ensure consistent view on memory in each memory model +(see requirement of RMA synchronization in MPI standard version 3.1, Section 11.7). +Unlike MPI, the memory model in \openshmem is implicit. +However, additional synchronization is still required to ensure consistent view +between remote memory access and memory load\slash store (e.g., \FUNC{shmem\_barrier}). + +To ensure portability, a hybrid program should always make appropriate \openshmem +and MPI synchronization calls for remote access in each environment respectively +in order to ensure any remote updates are visible to the target PE +and also become visible to other remote access operations. For instance, a program +can make a call to \FUNC{shmem\_barrier} on both local and target PEs after +a \FUNC{shmem\_put} operation in order to ensure the remote update is visible to +the target PE, and then make a call to \FUNC{MPI\_Win\_sync} on the target +PE before the data can be accessed by other PEs using MPI RMA operations. + +\item Completion: Unlike \openshmem RMA operations, all MPI RMA communication +operations including the atomic operations such as \FUNC{MPI\_Accumulate} are +nonblocking. Similar to \openshmem nonblocking RMA, the program should perform +additional MPI synchronization to ensure any local buffers involved in the outstanding +MPI RMA operations can be safely reused (see definition of MPI RMA synchronization +in MPI standard version 3.1, Section 11.5). +A synchronization call in \openshmem, however, does not interfere +with any outstanding operations issued in the MPI environment. For instance, the \FUNC{shmem\_quiet} function only ensures completion of \openshmem RMA, AMO, and memory store operations. It does not force the completion of any MPI outstanding operations. To ensure the completion of RMA operations in MPI, the program should use an appropriate MPI synchronization routine in the MPI context (e.g., using \FUNC{MPI\_Win\_flush\_all} to ensure remote completion -of all outstanding operations in the passive-target mode). Similarly, \openshmem +of all outstanding operations in the passive-target mode). + +\item Ordering: Unlike \openshmem ordering semantics, MPI does not ensure the +ordering of {\PUT} and {\GET} operations, however, it guarantees ordering between +MPI atomic operations from one process to the same (or overlapping) memory +locations at another process via the same window. A call to \FUNC{shmem\_fence} +forces neither ordering of any MPI operations, nor ordering between outstanding +MPI operations +and \openshmem operations. + +\item Atomicity: \openshmem guarantees only the atomicity of concurrent AMO operations that operate on symmetric data with the same datatype. Access to the same symmetric object with -MPI atomic operations, such as an \FUNC{MPI\_Fetch\_and\_op}, may result in undefined -result. +MPI atomic operations, such as an \FUNC{MPI\_Fetch\_and\_op}, may result in +undefined result. + +\end{itemize} \apiimpnotes{ In the implementations that share the same communication resources for \openshmem From c055ca476d2a2f84ab5eb5cf01693656ff02825f Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 20 May 2019 16:46:33 -0500 Subject: [PATCH 05/43] Made a pass by English editor --- content/backmatter.tex | 64 ++++++++++++++++++++++-------------------- 1 file changed, 33 insertions(+), 31 deletions(-) diff --git a/content/backmatter.tex b/content/backmatter.tex index 121962d07..3ed7fca82 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -189,26 +189,26 @@ \chapter{Interoperability with other Programming Models}\label{sec:interoperabil OpenSHMEM routines may be used in conjunction with the routines of other communication libraries or parallel languages in the same program. This section -describes the interoperability with other programming models including +describes the interoperability with other programming models, including clarification of undefined behaviors caused by mixed use of different models, advice to \openshmem library users and developers that may improve the portability -and performance of hybrid programs, and the definition of an OpenSHMEM extension +and performance of hybrid programs, and definition of an OpenSHMEM extension API that queries the interoperability features provided by an \openshmem library. \section{MPI Interoperability} -\openshmem and MPI are two commonly used parallel programming models for distributed -memory systems. The user can choose to utilize both models in the same program +\openshmem and MPI are two commonly used parallel programming models for +distributed-memory systems. The user can choose to utilize both models in the same program to efficiently and easily support various communication patterns. A vendor may implement the \openshmem and MPI libraries in different ways. For -instance, one may implement both \openshmem and MPI as standalone libraries -and each of them allocates and initializes fully isolated communication +instance, one may implement both \openshmem and MPI as standalone libraries, +each of which allocates and initializes fully isolated communication resources. Consequently, an \openshmem call does not interfere with any MPI communication in the same application. As the other common approach, however, -a vendor may also implement both \openshmem and MPI interfaces within the -same software system in order to share communication resource when possible. +a vendor may implement both \openshmem and MPI interfaces within the +same software system in order to share a communication resource when possible. In such a case, internal interference may occur. To improve interoperability and portability in \openshmem + MPI hybrid @@ -218,18 +218,18 @@ \section{MPI Interoperability} \subsection{Initialization} To ensure that a hybrid program can be portably performed with different vendor implementations, the \openshmem environment of the program must be initialized by -a call to \FUNC{shmem\_init} or \FUNC{shmem\_init\_thread}, and be finalized by +a call to \FUNC{shmem\_init} or \FUNC{shmem\_init\_thread} and be finalized by a call to \FUNC{shmem\_finalize}; the MPI environment of the program must be initialized -by a call to \FUNC{MPI\_Init} or \FUNC{MPI\_Init\_thread}, and be finalized by a +by a call to \FUNC{MPI\_Init} or \FUNC{MPI\_Init\_thread} and be finalized by a call to \FUNC{MPI\_Finalize}. \apiimpnotes{ Portable implementations of OpenSHMEM and MPI must ensure that the initialization calls can be made in an arbitrary order within a program; the same rule also -applies to the finalization calls. A software runtime that utilizes shared +applies to the finalization calls. A software runtime that utilizes a shared communication resource for \openshmem and MPI communication may maintain an internal reference counter in order to ensure that the shared resource is -initialized only once, and no shared resource is released until the last +initialized only once and thus no shared resource is released until the last finalization call is made. } @@ -237,9 +237,11 @@ \subsection{Initialization} \subsection{Dynamic Process Creation and MPMD Programming} \label{subsec:interoperability:mpmd} -MPI defines the dynamic process model that allows creation of processes after -an MPI application has started (e.g., by calling \FUNC{MPI\_Comm\_spawn}), -and provides the mechanism to establish communication +MPI defines a dynamic process model that allows creation of processes after +an MPI application has started (e.g., by calling \FUNC{MPI\_Comm\_spawn}) and +connection to independent processes (e.g., through \FUNC{MPI\_Comm\_accept} +and \FUNC{MPI\_Comm\_connect}) +and provides a mechanism to establish communication between the newly created processes and the existing MPI application (see MPI standard version 3.1, Chapter 10). Unlike MPI, \openshmem requires all PEs to collectively allocate and initialize @@ -272,12 +274,12 @@ \subsection{Thread Safety} Both \openshmem and MPI define the interaction with user threads in a program with routines that can be used for initializing and querying the thread environment. In a hybrid program, the user can request different thread levels -at the initialization calls of \openshmem and MPI environments, however, the +at the initialization calls of \openshmem and MPI environments; however, the returned support level provided by the \openshmem library might be different from that returned in an \openshmem-only program. For instance, the former -initialization call in a hybrid program may initialize resource with the user -requested thread level but the supported level cannot be updated by the latter -initialization call, if the underlying software runtime of \openshmem and MPI +initialization call in a hybrid program may initialize a resource with the +user-requested thread level, but the supported level cannot be updated by the latter +initialization call if the underlying software runtime of \openshmem and MPI share the same internal communication resource. The program should always check the \VAR{provided} thread level returned at the corresponding initialization call to portably ensure thread support in each @@ -290,18 +292,18 @@ \subsection{Mapping Process Identification Numbers} Similar to the PE identifier in \openshmem, MPI defines rank as the identification number of a process in a communicator. Both \openshmem PE and MPI rank are unique integers assigned from zero to one less than the total -number of processes. In a hybrid program, one may observe that the \openshmem +number of processes. In a hybrid program, the \openshmem PE and the MPI rank in \VAR{MPI\_COMM\_WORLD} of a process can be equal. This feature, however, may be provided by only some of the \openshmem and MPI implementations (e.g., if both environments share the same underlying process -manager), and is not portably guaranteed. A portable program should always -use the standard functions in each model, i.e., \FUNC{shmem\_my\_pe} in \openshmem +manager) and is not portably guaranteed. A portable program should always +use the standard functions in each model, namely, \FUNC{shmem\_my\_pe} in \openshmem and \FUNC{MPI\_Comm\_rank} in MPI, to query the process identification numbers in each communication environment and manage the mapping of identifiers in the program when necessary. -\subsection{RMA Memory Semantics, Completion, Ordering and Atomicity} +\subsection{RMA Memory Semantics, Completion, Ordering, and Atomicity} \label{subsec:interoperability:rma} Both \openshmem and MPI define similar RMA and atomic operations for remote memory @@ -341,7 +343,7 @@ \subsection{RMA Memory Semantics, Completion, Ordering and Atomicity} in MPI standard version 3.1, Section 11.5). A synchronization call in \openshmem, however, does not interfere with any outstanding operations issued in the MPI environment. For instance, -the \FUNC{shmem\_quiet} function only ensures completion of \openshmem RMA, +the \FUNC{shmem\_quiet} function ensures completion only of \openshmem RMA, AMO, and memory store operations. It does not force the completion of any MPI outstanding operations. To ensure the completion of RMA operations in MPI, the program should use an appropriate MPI synchronization routine in the @@ -357,9 +359,9 @@ \subsection{RMA Memory Semantics, Completion, Ordering and Atomicity} and \openshmem operations. \item Atomicity: \openshmem -guarantees only the atomicity of concurrent AMO operations that operate on +guarantees the atomicity only of concurrent AMO operations that operate on symmetric data with the same datatype. Access to the same symmetric object with -MPI atomic operations, such as an \FUNC{MPI\_Fetch\_and\_op}, may result in +MPI atomic operations, such as an \FUNC{MPI\_Fetch\_and\_op}, may result in an undefined result. \end{itemize} @@ -369,7 +371,7 @@ \subsection{RMA Memory Semantics, Completion, Ordering and Atomicity} and MPI, the memory or network synchronization internally issued for one programming model may also affect the status of operations in the other model. Although the user program must make necessary synchronization calls for both models -in order to ensure semantics correctness, a high performance implementation may +in order to ensure semantics correctness, a high-performance implementation may internally avoid the later synchronization made by the other model when no subsequent operation is issued between these two synchronization calls. } @@ -378,12 +380,12 @@ \subsection{Communication Progress} \label{subsec:interoperability:progress} \openshmem promises the progression of communication both with and without -\openshmem calls and requires the software progress mechanism in implementation +\openshmem calls and requires the software progress mechanism in the implementation (e.g., a progress thread) when the hardware does not provide asynchronous communication capabilities. In MPI, however, a weak progress semantics is applied. That is, -an MPI communication call is only guaranteed to complete in finite time. For +an MPI communication call is guaranteed only to complete in finite time. For instance, an MPI Put may be completed only when the remote process makes an MPI -call which internally triggers the progress of MPI, if the underlying hardware +call that internally triggers the progress of MPI, if the underlying hardware does not support asynchronous communication. A portable hybrid program should not assume that a call to the \openshmem library also makes progress for MPI, and it may have to explicitly manage the asynchronous communication in MPI in @@ -424,7 +426,7 @@ \section{Interoperability Query API} % \apidescription{ \FUNC{shmemx\_query\_interoperability} is an extended \openshmem routine that queries whether an interoperability property is supported by the \openshmem library. One of the -following property can be queried in an \openshmem program after finishing the +following properties can be queried in an \openshmem program after finishing the initialization call to \openshmem and that of the relevant programming models being used in the program. An OpenSHMEM library implementation may extend the available properties. From f8ebcfca1b83b2b9129fda7073129538706070ec Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 9 Sep 2019 11:42:56 -0500 Subject: [PATCH 06/43] Fix function format --- content/backmatter.tex | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/backmatter.tex b/content/backmatter.tex index 3ed7fca82..db3294e87 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -384,7 +384,7 @@ \subsection{Communication Progress} (e.g., a progress thread) when the hardware does not provide asynchronous communication capabilities. In MPI, however, a weak progress semantics is applied. That is, an MPI communication call is guaranteed only to complete in finite time. For -instance, an MPI Put may be completed only when the remote process makes an MPI +instance, an \FUNC{MPI\_Put} may be completed only when the remote process makes an MPI call that internally triggers the progress of MPI, if the underlying hardware does not support asynchronous communication. A portable hybrid program should not assume that a call to the \openshmem library also makes progress for MPI, From 63eef1db3c873a0021c08ea3432012ff10f89908 Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 9 Sep 2019 11:43:43 -0500 Subject: [PATCH 07/43] Change query API to shmem_ and move text into separate file --- content/backmatter.tex | 50 +++++------------------- content/shmem_query_interoperability.tex | 39 ++++++++++++++++++ 2 files changed, 49 insertions(+), 40 deletions(-) create mode 100644 content/shmem_query_interoperability.tex diff --git a/content/backmatter.tex b/content/backmatter.tex index db3294e87..446ae6a64 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -192,7 +192,7 @@ \chapter{Interoperability with other Programming Models}\label{sec:interoperabil describes the interoperability with other programming models, including clarification of undefined behaviors caused by mixed use of different models, advice to \openshmem library users and developers that may improve the portability -and performance of hybrid programs, and definition of an OpenSHMEM extension +and performance of hybrid programs, and definition of an OpenSHMEM API that queries the interoperability features provided by an \openshmem library. @@ -399,49 +399,19 @@ \subsection{Communication Progress} \FUNC{shmem\_init} and \FUNC{MPI\_init} provided by the same system. } -To avoid unnecessary overhead and programming complexity in the user program, -the \openshmem implementation may provide an extended \openshmem routine that -allows the user program to query the progress support for the MPI environment. -We introduce the definition and semantics of this routine in -Section~\ref{subsec:interoperability:query}. +\section{Query Interoperability} -\section{Interoperability Query API} -\label{subsec:interoperability:query} - -Determines whether an interoperability feature is supported by the \openshmem -library implementation. - -\begin{apidefinition} - -\begin{Csynopsis} -int @\FuncDecl{shmemx\_query\_interoperability}@(int property); -\end{Csynopsis} - -\begin{apiarguments} - \apiargument{IN}{property}{The interoperability property queried by the user.} -\end{apiarguments} +A hybrid user program can query the interoperability feature of an \openshmem +implementation in order to avoid unnecessary overhead and programming complexity. +For instance, the user program can eliminate manual progress polling for MPI +communication if the underlying software runtime guarantees the progression of +communication also for MPI even without explicit function calls. -% compiling error ? -% \apidescription{ -\FUNC{shmemx\_query\_interoperability} is an extended \openshmem routine that queries -whether an interoperability property is supported by the \openshmem library. One of the -following properties can be queried in an \openshmem program after finishing the -initialization call to \openshmem and that of the relevant programming models -being used in the program. An OpenSHMEM library implementation may extend the -available properties. - -\begin{itemize} - \item \VAR{SHMEM\_PROGRESS\_MPI} Query whether the \openshmem - implementation makes progress for the MPI communication used in the user program. -\end{itemize} -% } +\subsection{\textbf{SHMEM\_QUERY\_INTEROPERABILITY}} +\label{subsec:interoperability:query} +\input{content/shmem_query_interoperability} -\apireturnvalues{ - The return value is \CONST{1} if \VAR{property} is supported by the \openshmem library; - otherwise, it is \CONST{0}. -} -\end{apidefinition} \color{black} \chapter{History of OpenSHMEM}\label{sec:openshmem_history} diff --git a/content/shmem_query_interoperability.tex b/content/shmem_query_interoperability.tex new file mode 100644 index 000000000..8af1e26ca --- /dev/null +++ b/content/shmem_query_interoperability.tex @@ -0,0 +1,39 @@ +\apisummary{ + Determines whether an interoperability feature is supported by the \openshmem + library implementation. +} +\begin{apidefinition} + +\begin{Csynopsis} +int @\FuncDecl{shmem\_query\_interoperability}@(int property); +\end{Csynopsis} + +\begin{apiarguments} + \apiargument{IN}{property}{The interoperability property queried by the user.} +\end{apiarguments} + +% compiling error ? +% \apidescription{ +\FUNC{shmem\_query\_interoperability} queries whether an interoperability property +is supported by the \openshmem library. One of the following properties can be +queried in an \openshmem program after finishing the +initialization call to \openshmem and that of the relevant programming models +being used in the program. An \openshmem library implementation may extend the +available properties. + +\begin{itemize} +\item \VAR{SHMEM\_PROGRESS\_MPI} Query whether the \openshmem +implementation makes progress for the MPI communication used in the user program. +\end{itemize} +% } + +\apireturnvalues{ + The return value is \CONST{1} if \VAR{property} is supported by the \openshmem library; + otherwise, it is \CONST{0}. +} +\end{apidefinition} + +\apiimpnotes{ +Implementations that do not support interoperability with other programming models +may simply return \CONST{0} for the relevant interoperability query. +} From 325957a1c2e4cbd00f4b6a303279231997a96828 Mon Sep 17 00:00:00 2001 From: Min Si Date: Tue, 10 Sep 2019 06:37:30 -0500 Subject: [PATCH 08/43] Add example code for pe mapping --- content/backmatter.tex | 7 ++++++ example_code/hybrid_mpi_mapping_id.c | 36 ++++++++++++++++++++++++++++ 2 files changed, 43 insertions(+) create mode 100644 example_code/hybrid_mpi_mapping_id.c diff --git a/content/backmatter.tex b/content/backmatter.tex index 446ae6a64..8a5b833be 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -302,6 +302,13 @@ \subsection{Mapping Process Identification Numbers} in each communication environment and manage the mapping of identifiers in the program when necessary. +\subsubsection{Example} +The following example demonstrates how to manage the mapping of process +identifiers in a hybrid \openshmem and MPI program. + +\lstinputlisting[language={C}, tabsize=2, + basicstyle=\ttfamily\footnotesize] + {example_code/hybrid_mpi_mapping_id.c} \subsection{RMA Memory Semantics, Completion, Ordering, and Atomicity} \label{subsec:interoperability:rma} diff --git a/example_code/hybrid_mpi_mapping_id.c b/example_code/hybrid_mpi_mapping_id.c new file mode 100644 index 000000000..9720ce94f --- /dev/null +++ b/example_code/hybrid_mpi_mapping_id.c @@ -0,0 +1,36 @@ +#include +#include +#include +#include + +int main(int argc, char *argv[]) +{ + static long pSync[SHMEM_COLLECT_SYNC_SIZE]; + for (int i = 0; i < SHMEM_COLLECT_SYNC_SIZE; i++) + pSync[i] = SHMEM_SYNC_VALUE; + + MPI_Init(&argc, &argv); + shmem_init(); + + int mype = shmem_my_pe(); + int npes = shmem_n_pes(); + + static int myrank; + MPI_Comm_rank(MPI_COMM_WORLD, &myrank); + + int *mpi_ranks = shmem_calloc(npes, sizeof(int)); + + shmem_barrier_all(); + shmem_collect32(mpi_ranks, &myrank, 1, 0, 0, npes, pSync); + + if (mype == 0) + for (int i = 0; i < npes; i++) + printf("PE %d's MPI rank is %d\n", i, mpi_ranks[i]); + + shmem_free(mpi_ranks); + + shmem_finalize(); + MPI_Finalize(); + + return 0; +} From 348d60b9da3cc44a932afd32521fe4624cd80a30 Mon Sep 17 00:00:00 2001 From: Min Si Date: Tue, 10 Sep 2019 06:47:44 -0500 Subject: [PATCH 09/43] Minor text adjustment --- content/backmatter.tex | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/content/backmatter.tex b/content/backmatter.tex index 8a5b833be..88e240da2 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -212,7 +212,7 @@ \section{MPI Interoperability} In such a case, internal interference may occur. To improve interoperability and portability in \openshmem + MPI hybrid -programming, we clarify several aspects in the following subsections. +programming, we clarify the relevant semantics in the following subsections. \subsection{Initialization} @@ -282,8 +282,9 @@ \subsection{Thread Safety} initialization call if the underlying software runtime of \openshmem and MPI share the same internal communication resource. The program should always check the \VAR{provided} thread level returned -at the corresponding initialization call to portably ensure thread support in each -communication environment. +at the corresponding initialization call or query the level of thread support +after initialization to portably ensure thread support in each communication +environment. \subsection{Mapping Process Identification Numbers} @@ -303,8 +304,10 @@ \subsection{Mapping Process Identification Numbers} program when necessary. \subsubsection{Example} -The following example demonstrates how to manage the mapping of process -identifiers in a hybrid \openshmem and MPI program. +\label{subsubsec:interoperability:id:example} +The following example demonstrates how to manage the mapping between \openshmem +PE identifier and MPI ranks in \VAR{MPI\_COMM\_WORLD} in a hybrid \openshmem +and MPI program. \lstinputlisting[language={C}, tabsize=2, basicstyle=\ttfamily\footnotesize] From c089a75441df75e534adc34a6e36073389d07694 Mon Sep 17 00:00:00 2001 From: Min Si Date: Tue, 10 Sep 2019 10:51:01 -0500 Subject: [PATCH 10/43] Simplified version of dynamic process and rma sections --- content/backmatter.tex | 97 ++++++++---------------------------------- 1 file changed, 18 insertions(+), 79 deletions(-) diff --git a/content/backmatter.tex b/content/backmatter.tex index 88e240da2..9f552d21e 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -243,30 +243,13 @@ \subsection{Dynamic Process Creation and MPMD Programming} and \FUNC{MPI\_Comm\_connect}) and provides a mechanism to establish communication between the newly created processes and the existing MPI application (see -MPI standard version 3.1, Chapter 10). -Unlike MPI, \openshmem requires all PEs to collectively allocate and initialize +MPI standard version 3.1, Chapter 10). The dynamic process model can be used to +implement Multiple Program Multiple Data (MPMD) style program. +Unlike MPI, \openshmem follows the SPMD programming model. It starts +all processes at once and requires all PEs to collectively allocate and initialize resources used by the \openshmem library before any other \openshmem routine may -be called. Hence, attention must be paid when using \openshmem together with the -MPI dynamic process routines. Specifically, we clarify the following three scenarios: - -\begin{enumerate} -\item After MPI initialization and before any PEs start \openshmem initialization, -it is implementation defined whether processes created by a call to MPI dynamic -process routine are able to join the call to \FUNC{shmem\_init} or -\FUNC{shmem\_init\_thread} and establish the same \openshmem environment together -with other existing PEs. - -\item After \openshmem initialization, a process newly created by -the MPI dynamic process routine cannot join the existing \openshmem environment -that was initialized by other existing PEs. The \FUNC{shmem\_pe\_accessible} routine -may be used in this scenario to portably ensure that a remote PE is accessible -via \openshmem communication. - -\item After \openshmem initialization, it is implementation defined whether -processes newly created by MPI dynamic process routine can make a call to -\FUNC{shmem\_init} or \FUNC{shmem\_init\_thread} and establish a separate -\openshmem environment. -\end{enumerate} +be called. Hence, users should avoid using \openshmem and MPI dynamic process model +in the same program. \subsection{Thread Safety} @@ -313,68 +296,24 @@ \subsubsection{Example} basicstyle=\ttfamily\footnotesize] {example_code/hybrid_mpi_mapping_id.c} -\subsection{RMA Memory Semantics, Completion, Ordering, and Atomicity} +\subsection{RMA Programming Models} \label{subsec:interoperability:rma} Both \openshmem and MPI define similar RMA and atomic operations for remote memory -access, however, each model defines different semantics for memory synchronization, -operation completion, ordering, and atomicity. -We clarify the semantics differences and interoperability of these two models -as below. +access, however, each model defines different semantics and functions for memory +synchronization, operation completion, and ordering. To ensure semantics correctness +and portability, a hybrid program should always make appropriate \openshmem and MPI +synchronization calls for remote access in each environment respectively. -\begin{itemize} +\openshmem guarantees the atomicity only of concurrent \openshmem AMO operations +that operate on symmetric data with the same datatype. Access to the same symmetric +object with MPI atomic operations, such as an \FUNC{MPI\_Fetch\_and\_op}, may +result in an undefined result. -\item Memory Semantics: MPI defines the concept of public and private copies -for each RMA window. Any remote RMA operation can access only the -public copy of that window, and memory load\slash store can access only the -private copy. MPI defines two memory models for memory -synchronization between the copies: RMA separate and RMA unified (see definition -in MPI standard version 3.1, Section 11.4), and requires additional RMA -synchronization call to ensure consistent view on memory in each memory model -(see requirement of RMA synchronization in MPI standard version 3.1, Section 11.7). -Unlike MPI, the memory model in \openshmem is implicit. -However, additional synchronization is still required to ensure consistent view -between remote memory access and memory load\slash store (e.g., \FUNC{shmem\_barrier}). - -To ensure portability, a hybrid program should always make appropriate \openshmem -and MPI synchronization calls for remote access in each environment respectively -in order to ensure any remote updates are visible to the target PE -and also become visible to other remote access operations. For instance, a program -can make a call to \FUNC{shmem\_barrier} on both local and target PEs after -a \FUNC{shmem\_put} operation in order to ensure the remote update is visible to -the target PE, and then make a call to \FUNC{MPI\_Win\_sync} on the target -PE before the data can be accessed by other PEs using MPI RMA operations. - -\item Completion: Unlike \openshmem RMA operations, all MPI RMA communication -operations including the atomic operations such as \FUNC{MPI\_Accumulate} are -nonblocking. Similar to \openshmem nonblocking RMA, the program should perform -additional MPI synchronization to ensure any local buffers involved in the outstanding -MPI RMA operations can be safely reused (see definition of MPI RMA synchronization -in MPI standard version 3.1, Section 11.5). -A synchronization call in \openshmem, however, does not interfere -with any outstanding operations issued in the MPI environment. For instance, -the \FUNC{shmem\_quiet} function ensures completion only of \openshmem RMA, -AMO, and memory store operations. It does not force the completion -of any MPI outstanding operations. To ensure the completion of RMA operations -in MPI, the program should use an appropriate MPI synchronization routine in the -MPI context (e.g., using \FUNC{MPI\_Win\_flush\_all} to ensure remote completion -of all outstanding operations in the passive-target mode). - -\item Ordering: Unlike \openshmem ordering semantics, MPI does not ensure the -ordering of {\PUT} and {\GET} operations, however, it guarantees ordering between -MPI atomic operations from one process to the same (or overlapping) memory -locations at another process via the same window. A call to \FUNC{shmem\_fence} -forces neither ordering of any MPI operations, nor ordering between outstanding -MPI operations -and \openshmem operations. - -\item Atomicity: \openshmem -guarantees the atomicity only of concurrent AMO operations that operate on -symmetric data with the same datatype. Access to the same symmetric object with -MPI atomic operations, such as an \FUNC{MPI\_Fetch\_and\_op}, may result in an -undefined result. +Most RMA programs can be written using either \openshmem or MPI RMA. +It is recommended to choose only one of the RMA models in the same program, whenever +possible, for performance and code simplicity. -\end{itemize} \apiimpnotes{ In the implementations that share the same communication resources for \openshmem From d754e7a6c4d82a7aaa55aa8bba0fcf910bd40322 Mon Sep 17 00:00:00 2001 From: Min Si Date: Thu, 12 Sep 2019 13:25:08 -0500 Subject: [PATCH 11/43] Do not mention interference in first paragraph --- content/backmatter.tex | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/content/backmatter.tex b/content/backmatter.tex index 9f552d21e..97b19299a 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -205,11 +205,10 @@ \section{MPI Interoperability} A vendor may implement the \openshmem and MPI libraries in different ways. For instance, one may implement both \openshmem and MPI as standalone libraries, each of which allocates and initializes fully isolated communication -resources. Consequently, an \openshmem call does not interfere with any MPI -communication in the same application. As the other common approach, however, +resources. +As the other common approach, however, a vendor may implement both \openshmem and MPI interfaces within the same software system in order to share a communication resource when possible. -In such a case, internal interference may occur. To improve interoperability and portability in \openshmem + MPI hybrid programming, we clarify the relevant semantics in the following subsections. From b1791324076e9c87d31cff49290c0f9eeb285b71 Mon Sep 17 00:00:00 2001 From: Min Si Date: Thu, 12 Sep 2019 13:27:58 -0500 Subject: [PATCH 12/43] interop/mpmd: strong advice to not use dynamic process with shmem --- content/backmatter.tex | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/content/backmatter.tex b/content/backmatter.tex index 97b19299a..d1e7b18c7 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -242,12 +242,13 @@ \subsection{Dynamic Process Creation and MPMD Programming} and \FUNC{MPI\_Comm\_connect}) and provides a mechanism to establish communication between the newly created processes and the existing MPI application (see -MPI standard version 3.1, Chapter 10). The dynamic process model can be used to -implement Multiple Program Multiple Data (MPMD) style program. -Unlike MPI, \openshmem follows the SPMD programming model. It starts -all processes at once and requires all PEs to collectively allocate and initialize -resources used by the \openshmem library before any other \openshmem routine may -be called. Hence, users should avoid using \openshmem and MPI dynamic process model +MPI standard version 3.1, Chapter 10). +Unlike MPI, \openshmem starts all processes at once and requires all PEs to +collectively allocate and initialize resources (e.g., symmetric heap) used by +the \openshmem library before any other \openshmem routine may +be called. Communicating with a dynamically created process in the \openshmem +environment may result in undefined behavior. +Hence, users should not use \openshmem and MPI dynamic process model in the same program. From 56db04b27be33c31536be1c24d0284fb6048c080 Mon Sep 17 00:00:00 2001 From: Min Si Date: Tue, 24 Sep 2019 23:35:37 -0400 Subject: [PATCH 13/43] interop/rma: simply ask user to avoid using both RMA models --- content/backmatter.tex | 27 ++++++--------------------- 1 file changed, 6 insertions(+), 21 deletions(-) diff --git a/content/backmatter.tex b/content/backmatter.tex index d1e7b18c7..a8e879bf1 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -300,31 +300,16 @@ \subsection{RMA Programming Models} \label{subsec:interoperability:rma} Both \openshmem and MPI define similar RMA and atomic operations for remote memory -access, however, each model defines different semantics and functions for memory -synchronization, operation completion, and ordering. To ensure semantics correctness -and portability, a hybrid program should always make appropriate \openshmem and MPI -synchronization calls for remote access in each environment respectively. - -\openshmem guarantees the atomicity only of concurrent \openshmem AMO operations +access, however, a portable program should not assume interoperability between these +two RMA models. +For instance, \openshmem guarantees the atomicity only of concurrent \openshmem AMO operations that operate on symmetric data with the same datatype. Access to the same symmetric object with MPI atomic operations, such as an \FUNC{MPI\_Fetch\_and\_op}, may -result in an undefined result. - -Most RMA programs can be written using either \openshmem or MPI RMA. -It is recommended to choose only one of the RMA models in the same program, whenever +result in an undefined result. Furthermore, +because most RMA programs can be written using either \openshmem or MPI RMA, +users should choose only one of the RMA models in the same program, whenever possible, for performance and code simplicity. - -\apiimpnotes{ -In the implementations that share the same communication resources for \openshmem -and MPI, the memory or network synchronization internally issued for one -programming model may also affect the status of operations in the other model. -Although the user program must make necessary synchronization calls for both models -in order to ensure semantics correctness, a high-performance implementation may -internally avoid the later synchronization made by the other model when no -subsequent operation is issued between these two synchronization calls. -} - \subsection{Communication Progress} \label{subsec:interoperability:progress} From ab532fbfc6416ef54a44460a3bfe93d5de838b8e Mon Sep 17 00:00:00 2001 From: Min Si Date: Wed, 25 Sep 2019 06:36:55 -0400 Subject: [PATCH 14/43] interop/progress: mention query api to connect paragraphs --- content/backmatter.tex | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/content/backmatter.tex b/content/backmatter.tex index a8e879bf1..7c4ac8642 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -321,8 +321,12 @@ \subsection{Communication Progress} instance, an \FUNC{MPI\_Put} may be completed only when the remote process makes an MPI call that internally triggers the progress of MPI, if the underlying hardware does not support asynchronous communication. A portable hybrid program -should not assume that a call to the \openshmem library also makes progress for MPI, -and it may have to explicitly manage the asynchronous communication in MPI in +should not assume that a call to the \openshmem library also makes progress for MPI. +A call to \FUNC{shmem\_query\_interoperability} (see definition in \ref{subsec:interoperability:query}) +can be used to check whether the implementation provides such a functionality. +If it is provided, then the library ensures progression of +both \openshmem and MPI communication; otherwise, it may have to explicitly +manage the asynchronous communication in MPI in order to prevent any deadlock or performance degradation. \apiimpnotes{ From c57d957d2f58811db32d217f995f1b133427ef5a Mon Sep 17 00:00:00 2001 From: Min Si Date: Wed, 25 Sep 2019 09:19:35 -0400 Subject: [PATCH 15/43] interop/threads: add restriction for mixed thread levels --- content/backmatter.tex | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/content/backmatter.tex b/content/backmatter.tex index 7c4ac8642..15e91a033 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -256,7 +256,7 @@ \subsection{Thread Safety} \label{subsec:interoperability:thread} Both \openshmem and MPI define the interaction with user threads in a program with routines that can be used for initializing and querying the thread -environment. In a hybrid program, the user can request different thread levels +environment. In a hybrid program, the user may request different thread levels at the initialization calls of \openshmem and MPI environments; however, the returned support level provided by the \openshmem library might be different from that returned in an \openshmem-only program. For instance, the former @@ -269,6 +269,28 @@ \subsection{Thread Safety} after initialization to portably ensure thread support in each communication environment. +Both \openshmem and MPI define similar thread levels, namely, \VAR{THREAD\_SINGLE}, +\VAR{THREAD\_FUNNELED}, \VAR{THREAD\_SERIALIZED}, and \VAR{THREAD\_MULTIPLE}. +When requesting threading support in a hybrid program, however, +users should follow additional rules as described below. + +\begin{itemize} + \item The \VAR{THREAD\_SINGLE} thread level requires a single-threaded program. + Hence, users should not request \VAR{THREAD\_SINGLE} at the initialization + call of either \openshmem or MPI but request a different thread level at the + initialization call of the other model in the same program. + + \item The \VAR{THREAD\_FUNNELED} thread level allows only the main thread to + make communication calls. A hybrid program using the \VAR{THREAD\_FUNNELED} + thread level in both \openshmem and MPI should ensure the same main thread + is used in both communication environments. + + \item The \VAR{THREAD\_SERIALIZED} thread level requires the program to ensure + communication calls are not made concurrently by multiple threads. A hybrid + program should ensure serialized calls to both \openshmem and MPI libraries, + if the program uses \VAR{THREAD\_SERIALIZED} in one communication environment + and \VAR{THREAD\_SERIALIZED} or \VAR{THREAD\_FUNNELED} in the other one. +\end{itemize} \subsection{Mapping Process Identification Numbers} \label{subsec:interoperability:id} From 1c6b1976c1abb076384991eb2174169064d7378b Mon Sep 17 00:00:00 2001 From: Min Si Date: Wed, 25 Sep 2019 10:46:26 -0400 Subject: [PATCH 16/43] interop/id: use sync_all instead of barrier_all in example --- example_code/hybrid_mpi_mapping_id.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/example_code/hybrid_mpi_mapping_id.c b/example_code/hybrid_mpi_mapping_id.c index 9720ce94f..c72168d6e 100644 --- a/example_code/hybrid_mpi_mapping_id.c +++ b/example_code/hybrid_mpi_mapping_id.c @@ -20,7 +20,7 @@ int main(int argc, char *argv[]) int *mpi_ranks = shmem_calloc(npes, sizeof(int)); - shmem_barrier_all(); + shmem_sync_all(); shmem_collect32(mpi_ranks, &myrank, 1, 0, 0, npes, pSync); if (mype == 0) From 7e1508dca56531a7e8031fdad1bb286cf96ff6ea Mon Sep 17 00:00:00 2001 From: Min Si Date: Wed, 25 Sep 2019 11:00:20 -0400 Subject: [PATCH 17/43] interop/progress: minor text adjustment --- content/backmatter.tex | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/content/backmatter.tex b/content/backmatter.tex index 15e91a033..a3ff43f7b 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -342,13 +342,13 @@ \subsection{Communication Progress} an MPI communication call is guaranteed only to complete in finite time. For instance, an \FUNC{MPI\_Put} may be completed only when the remote process makes an MPI call that internally triggers the progress of MPI, if the underlying hardware -does not support asynchronous communication. A portable hybrid program -should not assume that a call to the \openshmem library also makes progress for MPI. -A call to \FUNC{shmem\_query\_interoperability} (see definition in \ref{subsec:interoperability:query}) -can be used to check whether the implementation provides such a functionality. -If it is provided, then the library ensures progression of -both \openshmem and MPI communication; otherwise, it may have to explicitly -manage the asynchronous communication in MPI in +does not support asynchronous communication. A hybrid program +should not assume that the \openshmem library also makes progress for MPI. +A call to \FUNC{shmem\_query\_interoperability} with the \VAR{SHMEM\_PROGRESS\_MPI} +property (see definition in \ref{subsec:interoperability:query}) +can be used to portably check whether the implementation provides asynchronous +progression also for MPI. If it is not provided, the user program may have to +explicitly manage the asynchronous communication in MPI in order to prevent any deadlock or performance degradation. \apiimpnotes{ From e062450a0db2a3e76c271554a7e63b331ae7cca9 Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 21 Oct 2019 15:05:18 -0500 Subject: [PATCH 18/43] interop: move interoperability to a separate file --- content/backmatter.tex | 188 +---------------------------------- content/interoperability.tex | 186 ++++++++++++++++++++++++++++++++++ 2 files changed, 187 insertions(+), 187 deletions(-) create mode 100644 content/interoperability.tex diff --git a/content/backmatter.tex b/content/backmatter.tex index a3ff43f7b..d5098e3a2 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -185,193 +185,7 @@ \chapter{Undefined Behavior in OpenSHMEM}\label{sec:undefined} \color{ForestGreen} -\chapter{Interoperability with other Programming Models}\label{sec:interoperability} - -OpenSHMEM routines may be used in conjunction with the routines of other -communication libraries or parallel languages in the same program. This section -describes the interoperability with other programming models, including -clarification of undefined behaviors caused by mixed use of different models, -advice to \openshmem library users and developers that may improve the portability -and performance of hybrid programs, and definition of an OpenSHMEM -API that queries the interoperability features provided by an \openshmem library. - - -\section{MPI Interoperability} - -\openshmem and MPI are two commonly used parallel programming models for -distributed-memory systems. The user can choose to utilize both models in the same program -to efficiently and easily support various communication patterns. - -A vendor may implement the \openshmem and MPI libraries in different ways. For -instance, one may implement both \openshmem and MPI as standalone libraries, -each of which allocates and initializes fully isolated communication -resources. -As the other common approach, however, -a vendor may implement both \openshmem and MPI interfaces within the -same software system in order to share a communication resource when possible. - -To improve interoperability and portability in \openshmem + MPI hybrid -programming, we clarify the relevant semantics in the following subsections. - - -\subsection{Initialization} -To ensure that a hybrid program can be portably performed with different vendor -implementations, the \openshmem environment of the program must be initialized by -a call to \FUNC{shmem\_init} or \FUNC{shmem\_init\_thread} and be finalized by -a call to \FUNC{shmem\_finalize}; the MPI environment of the program must be initialized -by a call to \FUNC{MPI\_Init} or \FUNC{MPI\_Init\_thread} and be finalized by a -call to \FUNC{MPI\_Finalize}. - -\apiimpnotes{ -Portable implementations of OpenSHMEM and MPI must ensure that the initialization -calls can be made in an arbitrary order within a program; the same rule also -applies to the finalization calls. A software runtime that utilizes a shared -communication resource for \openshmem and MPI communication may maintain an -internal reference counter in order to ensure that the shared resource is -initialized only once and thus no shared resource is released until the last -finalization call is made. -} - - -\subsection{Dynamic Process Creation and MPMD Programming} -\label{subsec:interoperability:mpmd} - -MPI defines a dynamic process model that allows creation of processes after -an MPI application has started (e.g., by calling \FUNC{MPI\_Comm\_spawn}) and -connection to independent processes (e.g., through \FUNC{MPI\_Comm\_accept} -and \FUNC{MPI\_Comm\_connect}) -and provides a mechanism to establish communication -between the newly created processes and the existing MPI application (see -MPI standard version 3.1, Chapter 10). -Unlike MPI, \openshmem starts all processes at once and requires all PEs to -collectively allocate and initialize resources (e.g., symmetric heap) used by -the \openshmem library before any other \openshmem routine may -be called. Communicating with a dynamically created process in the \openshmem -environment may result in undefined behavior. -Hence, users should not use \openshmem and MPI dynamic process model -in the same program. - - -\subsection{Thread Safety} -\label{subsec:interoperability:thread} -Both \openshmem and MPI define the interaction with user threads in a program -with routines that can be used for initializing and querying the thread -environment. In a hybrid program, the user may request different thread levels -at the initialization calls of \openshmem and MPI environments; however, the -returned support level provided by the \openshmem library might be different -from that returned in an \openshmem-only program. For instance, the former -initialization call in a hybrid program may initialize a resource with the -user-requested thread level, but the supported level cannot be updated by the latter -initialization call if the underlying software runtime of \openshmem and MPI -share the same internal communication resource. -The program should always check the \VAR{provided} thread level returned -at the corresponding initialization call or query the level of thread support -after initialization to portably ensure thread support in each communication -environment. - -Both \openshmem and MPI define similar thread levels, namely, \VAR{THREAD\_SINGLE}, -\VAR{THREAD\_FUNNELED}, \VAR{THREAD\_SERIALIZED}, and \VAR{THREAD\_MULTIPLE}. -When requesting threading support in a hybrid program, however, -users should follow additional rules as described below. - -\begin{itemize} - \item The \VAR{THREAD\_SINGLE} thread level requires a single-threaded program. - Hence, users should not request \VAR{THREAD\_SINGLE} at the initialization - call of either \openshmem or MPI but request a different thread level at the - initialization call of the other model in the same program. - - \item The \VAR{THREAD\_FUNNELED} thread level allows only the main thread to - make communication calls. A hybrid program using the \VAR{THREAD\_FUNNELED} - thread level in both \openshmem and MPI should ensure the same main thread - is used in both communication environments. - - \item The \VAR{THREAD\_SERIALIZED} thread level requires the program to ensure - communication calls are not made concurrently by multiple threads. A hybrid - program should ensure serialized calls to both \openshmem and MPI libraries, - if the program uses \VAR{THREAD\_SERIALIZED} in one communication environment - and \VAR{THREAD\_SERIALIZED} or \VAR{THREAD\_FUNNELED} in the other one. -\end{itemize} - -\subsection{Mapping Process Identification Numbers} -\label{subsec:interoperability:id} - -Similar to the PE identifier in \openshmem, MPI defines rank as the -identification number of a process in a communicator. Both \openshmem PE -and MPI rank are unique integers assigned from zero to one less than the total -number of processes. In a hybrid program, the \openshmem -PE and the MPI rank in \VAR{MPI\_COMM\_WORLD} of a process can be equal. -This feature, however, may be provided by only some of the \openshmem and MPI -implementations (e.g., if both environments share the same underlying process -manager) and is not portably guaranteed. A portable program should always -use the standard functions in each model, namely, \FUNC{shmem\_my\_pe} in \openshmem -and \FUNC{MPI\_Comm\_rank} in MPI, to query the process identification numbers -in each communication environment and manage the mapping of identifiers in the -program when necessary. - -\subsubsection{Example} -\label{subsubsec:interoperability:id:example} -The following example demonstrates how to manage the mapping between \openshmem -PE identifier and MPI ranks in \VAR{MPI\_COMM\_WORLD} in a hybrid \openshmem -and MPI program. - -\lstinputlisting[language={C}, tabsize=2, - basicstyle=\ttfamily\footnotesize] - {example_code/hybrid_mpi_mapping_id.c} - -\subsection{RMA Programming Models} -\label{subsec:interoperability:rma} - -Both \openshmem and MPI define similar RMA and atomic operations for remote memory -access, however, a portable program should not assume interoperability between these -two RMA models. -For instance, \openshmem guarantees the atomicity only of concurrent \openshmem AMO operations -that operate on symmetric data with the same datatype. Access to the same symmetric -object with MPI atomic operations, such as an \FUNC{MPI\_Fetch\_and\_op}, may -result in an undefined result. Furthermore, -because most RMA programs can be written using either \openshmem or MPI RMA, -users should choose only one of the RMA models in the same program, whenever -possible, for performance and code simplicity. - -\subsection{Communication Progress} -\label{subsec:interoperability:progress} - -\openshmem promises the progression of communication both with and without -\openshmem calls and requires the software progress mechanism in the implementation -(e.g., a progress thread) when the hardware does not provide asynchronous communication -capabilities. In MPI, however, a weak progress semantics is applied. That is, -an MPI communication call is guaranteed only to complete in finite time. For -instance, an \FUNC{MPI\_Put} may be completed only when the remote process makes an MPI -call that internally triggers the progress of MPI, if the underlying hardware -does not support asynchronous communication. A hybrid program -should not assume that the \openshmem library also makes progress for MPI. -A call to \FUNC{shmem\_query\_interoperability} with the \VAR{SHMEM\_PROGRESS\_MPI} -property (see definition in \ref{subsec:interoperability:query}) -can be used to portably check whether the implementation provides asynchronous -progression also for MPI. If it is not provided, the user program may have to -explicitly manage the asynchronous communication in MPI in -order to prevent any deadlock or performance degradation. - -\apiimpnotes{ -Implementations that provide both \openshmem and MPI interfaces should try -to ensure progress for both models when necessary and possible, for performance -reasons. For instance, an implementation may start making progress for -both \openshmem and MPI whenever possible, after the user program has called -\FUNC{shmem\_init} and \FUNC{MPI\_init} provided by the same system. -} - - -\section{Query Interoperability} - -A hybrid user program can query the interoperability feature of an \openshmem -implementation in order to avoid unnecessary overhead and programming complexity. -For instance, the user program can eliminate manual progress polling for MPI -communication if the underlying software runtime guarantees the progression of -communication also for MPI even without explicit function calls. - -\subsection{\textbf{SHMEM\_QUERY\_INTEROPERABILITY}} -\label{subsec:interoperability:query} -\input{content/shmem_query_interoperability} - +\input{content/interoperability} \color{black} \chapter{History of OpenSHMEM}\label{sec:openshmem_history} diff --git a/content/interoperability.tex b/content/interoperability.tex new file mode 100644 index 000000000..7257537ab --- /dev/null +++ b/content/interoperability.tex @@ -0,0 +1,186 @@ +\chapter{Interoperability with other Programming Models}\label{sec:interoperability} + +OpenSHMEM routines may be used in conjunction with the routines of other +communication libraries or parallel languages in the same program. This section +describes the interoperability with other programming models, including +clarification of undefined behaviors caused by mixed use of different models, +advice to \openshmem library users and developers that may improve the portability +and performance of hybrid programs, and definition of an OpenSHMEM +API that queries the interoperability features provided by an \openshmem library. + + +\section{MPI Interoperability} + +\openshmem and MPI are two commonly used parallel programming models for +distributed-memory systems. The user can choose to utilize both models in the same program +to efficiently and easily support various communication patterns. + +A vendor may implement the \openshmem and MPI libraries in different ways. For +instance, one may implement both \openshmem and MPI as standalone libraries, +each of which allocates and initializes fully isolated communication +resources. +As the other common approach, however, +a vendor may implement both \openshmem and MPI interfaces within the +same software system in order to share a communication resource when possible. + +To improve interoperability and portability in \openshmem + MPI hybrid +programming, we clarify the relevant semantics in the following subsections. + + +\subsection{Initialization} +To ensure that a hybrid program can be portably performed with different vendor +implementations, the \openshmem environment of the program must be initialized by +a call to \FUNC{shmem\_init} or \FUNC{shmem\_init\_thread} and be finalized by +a call to \FUNC{shmem\_finalize}; the MPI environment of the program must be initialized +by a call to \FUNC{MPI\_Init} or \FUNC{MPI\_Init\_thread} and be finalized by a +call to \FUNC{MPI\_Finalize}. + +\apiimpnotes{ +Portable implementations of OpenSHMEM and MPI must ensure that the initialization +calls can be made in an arbitrary order within a program; the same rule also +applies to the finalization calls. A software runtime that utilizes a shared +communication resource for \openshmem and MPI communication may maintain an +internal reference counter in order to ensure that the shared resource is +initialized only once and thus no shared resource is released until the last +finalization call is made. +} + + +\subsection{Dynamic Process Creation and MPMD Programming} +\label{subsec:interoperability:mpmd} + +MPI defines a dynamic process model that allows creation of processes after +an MPI application has started (e.g., by calling \FUNC{MPI\_Comm\_spawn}) and +connection to independent processes (e.g., through \FUNC{MPI\_Comm\_accept} +and \FUNC{MPI\_Comm\_connect}) +and provides a mechanism to establish communication +between the newly created processes and the existing MPI application (see +MPI standard version 3.1, Chapter 10). +Unlike MPI, \openshmem starts all processes at once and requires all PEs to +collectively allocate and initialize resources (e.g., symmetric heap) used by +the \openshmem library before any other \openshmem routine may +be called. Communicating with a dynamically created process in the \openshmem +environment may result in undefined behavior. +Hence, users should not use \openshmem and MPI dynamic process model +in the same program. + + +\subsection{Thread Safety} +\label{subsec:interoperability:thread} +Both \openshmem and MPI define the interaction with user threads in a program +with routines that can be used for initializing and querying the thread +environment. In a hybrid program, the user may request different thread levels +at the initialization calls of \openshmem and MPI environments; however, the +returned support level provided by the \openshmem library might be different +from that returned in an \openshmem-only program. For instance, the former +initialization call in a hybrid program may initialize a resource with the +user-requested thread level, but the supported level cannot be updated by the latter +initialization call if the underlying software runtime of \openshmem and MPI +share the same internal communication resource. +The program should always check the \VAR{provided} thread level returned +at the corresponding initialization call or query the level of thread support +after initialization to portably ensure thread support in each communication +environment. + +Both \openshmem and MPI define similar thread levels, namely, \VAR{THREAD\_SINGLE}, +\VAR{THREAD\_FUNNELED}, \VAR{THREAD\_SERIALIZED}, and \VAR{THREAD\_MULTIPLE}. +When requesting threading support in a hybrid program, however, +users should follow additional rules as described below. + +\begin{itemize} + \item The \VAR{THREAD\_SINGLE} thread level requires a single-threaded program. + Hence, users should not request \VAR{THREAD\_SINGLE} at the initialization + call of either \openshmem or MPI but request a different thread level at the + initialization call of the other model in the same program. + + \item The \VAR{THREAD\_FUNNELED} thread level allows only the main thread to + make communication calls. A hybrid program using the \VAR{THREAD\_FUNNELED} + thread level in both \openshmem and MPI should ensure the same main thread + is used in both communication environments. + + \item The \VAR{THREAD\_SERIALIZED} thread level requires the program to ensure + communication calls are not made concurrently by multiple threads. A hybrid + program should ensure serialized calls to both \openshmem and MPI libraries, + if the program uses \VAR{THREAD\_SERIALIZED} in one communication environment + and \VAR{THREAD\_SERIALIZED} or \VAR{THREAD\_FUNNELED} in the other one. +\end{itemize} + +\subsection{Mapping Process Identification Numbers} +\label{subsec:interoperability:id} + +Similar to the PE identifier in \openshmem, MPI defines rank as the +identification number of a process in a communicator. Both \openshmem PE +and MPI rank are unique integers assigned from zero to one less than the total +number of processes. In a hybrid program, the \openshmem +PE and the MPI rank in \VAR{MPI\_COMM\_WORLD} of a process can be equal. +This feature, however, may be provided by only some of the \openshmem and MPI +implementations (e.g., if both environments share the same underlying process +manager) and is not portably guaranteed. A portable program should always +use the standard functions in each model, namely, \FUNC{shmem\_my\_pe} in \openshmem +and \FUNC{MPI\_Comm\_rank} in MPI, to query the process identification numbers +in each communication environment and manage the mapping of identifiers in the +program when necessary. + +\subsubsection{Example} +\label{subsubsec:interoperability:id:example} +The following example demonstrates how to manage the mapping between \openshmem +PE identifier and MPI ranks in \VAR{MPI\_COMM\_WORLD} in a hybrid \openshmem +and MPI program. + +\lstinputlisting[language={C}, tabsize=2, + basicstyle=\ttfamily\footnotesize] + {example_code/hybrid_mpi_mapping_id.c} + +\subsection{RMA Programming Models} +\label{subsec:interoperability:rma} + +Both \openshmem and MPI define similar RMA and atomic operations for remote memory +access, however, a portable program should not assume interoperability between these +two RMA models. +For instance, \openshmem guarantees the atomicity only of concurrent \openshmem AMO operations +that operate on symmetric data with the same datatype. Access to the same symmetric +object with MPI atomic operations, such as an \FUNC{MPI\_Fetch\_and\_op}, may +result in an undefined result. Furthermore, +because most RMA programs can be written using either \openshmem or MPI RMA, +users should choose only one of the RMA models in the same program, whenever +possible, for performance and code simplicity. + +\subsection{Communication Progress} +\label{subsec:interoperability:progress} + +\openshmem promises the progression of communication both with and without +\openshmem calls and requires the software progress mechanism in the implementation +(e.g., a progress thread) when the hardware does not provide asynchronous communication +capabilities. In MPI, however, a weak progress semantics is applied. That is, +an MPI communication call is guaranteed only to complete in finite time. For +instance, an \FUNC{MPI\_Put} may be completed only when the remote process makes an MPI +call that internally triggers the progress of MPI, if the underlying hardware +does not support asynchronous communication. A hybrid program +should not assume that the \openshmem library also makes progress for MPI. +A call to \FUNC{shmem\_query\_interoperability} with the \VAR{SHMEM\_PROGRESS\_MPI} +property (see definition in \ref{subsec:interoperability:query}) +can be used to portably check whether the implementation provides asynchronous +progression also for MPI. If it is not provided, the user program may have to +explicitly manage the asynchronous communication in MPI in +order to prevent any deadlock or performance degradation. + +\apiimpnotes{ +Implementations that provide both \openshmem and MPI interfaces should try +to ensure progress for both models when necessary and possible, for performance +reasons. For instance, an implementation may start making progress for +both \openshmem and MPI whenever possible, after the user program has called +\FUNC{shmem\_init} and \FUNC{MPI\_init} provided by the same system. +} + + +\section{Query Interoperability} + +A hybrid user program can query the interoperability feature of an \openshmem +implementation in order to avoid unnecessary overhead and programming complexity. +For instance, the user program can eliminate manual progress polling for MPI +communication if the underlying software runtime guarantees the progression of +communication also for MPI even without explicit function calls. + +\subsection{\textbf{SHMEM\_QUERY\_INTEROPERABILITY}} +\label{subsec:interoperability:query} +\input{content/shmem_query_interoperability} \ No newline at end of file From 6c2e7b6b82b55e4e4479a97db6ce329ea96c16a3 Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 21 Oct 2019 15:08:38 -0500 Subject: [PATCH 19/43] interop/dynamic: delete MPMD in section title --- content/interoperability.tex | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/interoperability.tex b/content/interoperability.tex index 7257537ab..01d2ba9eb 100644 --- a/content/interoperability.tex +++ b/content/interoperability.tex @@ -46,7 +46,7 @@ \subsection{Initialization} } -\subsection{Dynamic Process Creation and MPMD Programming} +\subsection{Dynamic Process Creation} \label{subsec:interoperability:mpmd} MPI defines a dynamic process model that allows creation of processes after From 9cb51e1dddbe5a4fe1f967275160f09ec9957da5 Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 21 Oct 2019 17:27:12 -0500 Subject: [PATCH 20/43] interop/threads: adjust text based on f2f meeting feedback Adjust the text to address two issues: 1. It is recommendation to users but not requirement because such constraints are valid only when the implementation provides both models. 2. The additional rule for THREAD_SERIALIZED is misleading. --- content/interoperability.tex | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/content/interoperability.tex b/content/interoperability.tex index 01d2ba9eb..a516328d1 100644 --- a/content/interoperability.tex +++ b/content/interoperability.tex @@ -85,7 +85,10 @@ \subsection{Thread Safety} Both \openshmem and MPI define similar thread levels, namely, \VAR{THREAD\_SINGLE}, \VAR{THREAD\_FUNNELED}, \VAR{THREAD\_SERIALIZED}, and \VAR{THREAD\_MULTIPLE}. When requesting threading support in a hybrid program, however, -users should follow additional rules as described below. +the following additional rules are applied if the implementations of \openshmem +and MPI share the same internal communication resource. +Users are strongly advised to always follow these rules to ensure program +portability. \begin{itemize} \item The \VAR{THREAD\_SINGLE} thread level requires a single-threaded program. @@ -99,10 +102,11 @@ \subsection{Thread Safety} is used in both communication environments. \item The \VAR{THREAD\_SERIALIZED} thread level requires the program to ensure - communication calls are not made concurrently by multiple threads. A hybrid - program should ensure serialized calls to both \openshmem and MPI libraries, - if the program uses \VAR{THREAD\_SERIALIZED} in one communication environment - and \VAR{THREAD\_SERIALIZED} or \VAR{THREAD\_FUNNELED} in the other one. + communication calls are not made concurrently by multiple threads. If a + hybrid program uses \VAR{THREAD\_SERIALIZED} in one communication environment + and \VAR{THREAD\_SERIALIZED} or \VAR{THREAD\_FUNNELED} in the other one, it + should also guarantee that the \openshmem and MPI calls are not made concurrently + from two distinct threads. \end{itemize} \subsection{Mapping Process Identification Numbers} From 683f42373b32ab763460404aa8aee9ad975a65e1 Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 21 Oct 2019 17:51:02 -0500 Subject: [PATCH 21/43] interop/id: fix example --- example_code/hybrid_mpi_mapping_id.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/example_code/hybrid_mpi_mapping_id.c b/example_code/hybrid_mpi_mapping_id.c index c72168d6e..1e30b3879 100644 --- a/example_code/hybrid_mpi_mapping_id.c +++ b/example_code/hybrid_mpi_mapping_id.c @@ -1,5 +1,4 @@ #include -#include #include #include @@ -20,7 +19,6 @@ int main(int argc, char *argv[]) int *mpi_ranks = shmem_calloc(npes, sizeof(int)); - shmem_sync_all(); shmem_collect32(mpi_ranks, &myrank, 1, 0, 0, npes, pSync); if (mype == 0) From a21c85502cb022fbf98ffc806811cd49f9a9598f Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 21 Oct 2019 17:53:29 -0500 Subject: [PATCH 22/43] interop/rma: adjust text based on f2f meeting feedback --- content/interoperability.tex | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/interoperability.tex b/content/interoperability.tex index a516328d1..9ae69062c 100644 --- a/content/interoperability.tex +++ b/content/interoperability.tex @@ -138,9 +138,9 @@ \subsubsection{Example} \subsection{RMA Programming Models} \label{subsec:interoperability:rma} -Both \openshmem and MPI define similar RMA and atomic operations for remote memory -access, however, a portable program should not assume interoperability between these -two RMA models. +\openshmem and MPI each defines similar one-sided communication models, +however, a portable program should not assume interoperability between these +models. For instance, \openshmem guarantees the atomicity only of concurrent \openshmem AMO operations that operate on symmetric data with the same datatype. Access to the same symmetric object with MPI atomic operations, such as an \FUNC{MPI\_Fetch\_and\_op}, may From 09a6b841512b86c5735c210ffeadd8895180c35f Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 21 Oct 2019 17:55:57 -0500 Subject: [PATCH 23/43] interop/query: delete note to implementors --- content/shmem_query_interoperability.tex | 5 ----- 1 file changed, 5 deletions(-) diff --git a/content/shmem_query_interoperability.tex b/content/shmem_query_interoperability.tex index 8af1e26ca..df5d977b6 100644 --- a/content/shmem_query_interoperability.tex +++ b/content/shmem_query_interoperability.tex @@ -32,8 +32,3 @@ otherwise, it is \CONST{0}. } \end{apidefinition} - -\apiimpnotes{ -Implementations that do not support interoperability with other programming models -may simply return \CONST{0} for the relevant interoperability query. -} From 0b801f5e3d0c7f9da0dcf5d6a4743c5b29f074b8 Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 21 Oct 2019 23:58:09 -0500 Subject: [PATCH 24/43] interop/progress: adjust note to implementor --- content/interoperability.tex | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/content/interoperability.tex b/content/interoperability.tex index 9ae69062c..a1c55a3a4 100644 --- a/content/interoperability.tex +++ b/content/interoperability.tex @@ -170,10 +170,11 @@ \subsection{Communication Progress} \apiimpnotes{ Implementations that provide both \openshmem and MPI interfaces should try -to ensure progress for both models when necessary and possible, for performance -reasons. For instance, an implementation may start making progress for -both \openshmem and MPI whenever possible, after the user program has called -\FUNC{shmem\_init} and \FUNC{MPI\_init} provided by the same system. +to ensure progress for both models, when necessary and possible, for performance +reasons. For instance, an implementation +may utilize a software progress thread to process any software-handled +communication requests, after the user program has called +\FUNC{shmem\_init} and \FUNC{MPI\_Init} provided by the same system. } From f8eebf676660f7fe098720c4fcbd793fbce6cad0 Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 21 Oct 2019 23:59:07 -0500 Subject: [PATCH 25/43] interop/query: shorten overview example --- content/interoperability.tex | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/interoperability.tex b/content/interoperability.tex index a1c55a3a4..0fcc78b0a 100644 --- a/content/interoperability.tex +++ b/content/interoperability.tex @@ -183,8 +183,8 @@ \section{Query Interoperability} A hybrid user program can query the interoperability feature of an \openshmem implementation in order to avoid unnecessary overhead and programming complexity. For instance, the user program can eliminate manual progress polling for MPI -communication if the underlying software runtime guarantees the progression of -communication also for MPI even without explicit function calls. +communication if the \openshmem implementation guarantees asynchronous +communication also for MPI. \subsection{\textbf{SHMEM\_QUERY\_INTEROPERABILITY}} \label{subsec:interoperability:query} From b937c0a1f44a05cdf25437e78ace5d02e4a05040 Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 21 Oct 2019 23:59:51 -0500 Subject: [PATCH 26/43] interop/query: add example with MPI progress support --- content/shmem_query_interoperability.tex | 13 +++++++++ example_code/shmem_query_mpi_progress.c | 34 ++++++++++++++++++++++++ 2 files changed, 47 insertions(+) create mode 100644 example_code/shmem_query_mpi_progress.c diff --git a/content/shmem_query_interoperability.tex b/content/shmem_query_interoperability.tex index df5d977b6..a656f2497 100644 --- a/content/shmem_query_interoperability.tex +++ b/content/shmem_query_interoperability.tex @@ -31,4 +31,17 @@ The return value is \CONST{1} if \VAR{property} is supported by the \openshmem library; otherwise, it is \CONST{0}. } + +\begin{apiexamples} + +\apicexample + {The following example queries whether the \openshmem library supports asynchronous +progress for MPI. If it returns 1, the library guarantees the MPI nonblocking send +is processed while PE 0 is in the busy wait loop with repeated calls to +\FUNC{shmem\_int\_atomic\_fetch} so that deadlock will not occur.} + {./example_code/shmem_query_mpi_progress.c} + {} + +\end{apiexamples} + \end{apidefinition} diff --git a/example_code/shmem_query_mpi_progress.c b/example_code/shmem_query_mpi_progress.c new file mode 100644 index 000000000..063c320f7 --- /dev/null +++ b/example_code/shmem_query_mpi_progress.c @@ -0,0 +1,34 @@ +#include +#include +#include + +int main(int argc, char *argv[]) +{ + MPI_Init(&argc, &argv); + shmem_init(); + + int mype = shmem_my_pe(); + + if (!shmem_query_interoperability(SHMEM_PROGRESS_MPI)) + shmem_global_exit(EXIT_FAILURE); + + int a[100]; + static int b = 0; + if (mype == 0) { + MPI_Request req = MPI_REQUEST_NULL; + MPI_Isend(a, 100, MPI_INT, 1, 0, MPI_COMM_WORLD, &req); + + while (shmem_int_atomic_fetch(&b, 0) != 1); + + MPI_Wait(req, MPI_STATUS_IGNORE); + } else { + MPI_Recv(a, 100, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); + + shmem_int_atomic_set(&b, 1, 0); + } + + shmem_finalize(); + MPI_Finalize(); + + return 0; +} From c4da312bc3678df0e9452e8793e31a9d7f171b3a Mon Sep 17 00:00:00 2001 From: Gail Pieper Date: Tue, 22 Oct 2019 18:30:14 +0000 Subject: [PATCH 27/43] interop: made a pass by English editor --- content/interoperability.tex | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/content/interoperability.tex b/content/interoperability.tex index 0fcc78b0a..1ce88945d 100644 --- a/content/interoperability.tex +++ b/content/interoperability.tex @@ -1,4 +1,4 @@ -\chapter{Interoperability with other Programming Models}\label{sec:interoperability} +\chapter{Interoperability with Other Programming Models}\label{sec:interoperability} OpenSHMEM routines may be used in conjunction with the routines of other communication libraries or parallel languages in the same program. This section @@ -19,8 +19,8 @@ \section{MPI Interoperability} instance, one may implement both \openshmem and MPI as standalone libraries, each of which allocates and initializes fully isolated communication resources. -As the other common approach, however, -a vendor may implement both \openshmem and MPI interfaces within the +Another common approach +is to implement both \openshmem and MPI interfaces within the same software system in order to share a communication resource when possible. To improve interoperability and portability in \openshmem + MPI hybrid @@ -28,7 +28,7 @@ \section{MPI Interoperability} \subsection{Initialization} -To ensure that a hybrid program can be portably performed with different vendor +In order to ensure that a hybrid program can be portably performed with different vendor implementations, the \openshmem environment of the program must be initialized by a call to \FUNC{shmem\_init} or \FUNC{shmem\_init\_thread} and be finalized by a call to \FUNC{shmem\_finalize}; the MPI environment of the program must be initialized @@ -61,7 +61,7 @@ \subsection{Dynamic Process Creation} the \openshmem library before any other \openshmem routine may be called. Communicating with a dynamically created process in the \openshmem environment may result in undefined behavior. -Hence, users should not use \openshmem and MPI dynamic process model +Hence, users should not use \openshmem and MPI dynamic process models in the same program. @@ -98,11 +98,11 @@ \subsection{Thread Safety} \item The \VAR{THREAD\_FUNNELED} thread level allows only the main thread to make communication calls. A hybrid program using the \VAR{THREAD\_FUNNELED} - thread level in both \openshmem and MPI should ensure the same main thread + thread level in both \openshmem and MPI should ensure that the same main thread is used in both communication environments. \item The \VAR{THREAD\_SERIALIZED} thread level requires the program to ensure - communication calls are not made concurrently by multiple threads. If a + that communication calls are not made concurrently by multiple threads. If a hybrid program uses \VAR{THREAD\_SERIALIZED} in one communication environment and \VAR{THREAD\_SERIALIZED} or \VAR{THREAD\_FUNNELED} in the other one, it should also guarantee that the \openshmem and MPI calls are not made concurrently @@ -113,8 +113,8 @@ \subsection{Mapping Process Identification Numbers} \label{subsec:interoperability:id} Similar to the PE identifier in \openshmem, MPI defines rank as the -identification number of a process in a communicator. Both \openshmem PE -and MPI rank are unique integers assigned from zero to one less than the total +identification number of a process in a communicator. Both the \openshmem PE +and the MPI rank are unique integers assigned from zero to one less than the total number of processes. In a hybrid program, the \openshmem PE and the MPI rank in \VAR{MPI\_COMM\_WORLD} of a process can be equal. This feature, however, may be provided by only some of the \openshmem and MPI @@ -125,7 +125,7 @@ \subsection{Mapping Process Identification Numbers} in each communication environment and manage the mapping of identifiers in the program when necessary. -\subsubsection{Example} +\subsubsection*{Example} \label{subsubsec:interoperability:id:example} The following example demonstrates how to manage the mapping between \openshmem PE identifier and MPI ranks in \VAR{MPI\_COMM\_WORLD} in a hybrid \openshmem @@ -138,14 +138,14 @@ \subsubsection{Example} \subsection{RMA Programming Models} \label{subsec:interoperability:rma} -\openshmem and MPI each defines similar one-sided communication models, +\openshmem and MPI each define similar one-sided communication models; however, a portable program should not assume interoperability between these models. For instance, \openshmem guarantees the atomicity only of concurrent \openshmem AMO operations that operate on symmetric data with the same datatype. Access to the same symmetric object with MPI atomic operations, such as an \FUNC{MPI\_Fetch\_and\_op}, may result in an undefined result. Furthermore, -because most RMA programs can be written using either \openshmem or MPI RMA, +because most RMA programs can be written by using either \openshmem or MPI RMA, users should choose only one of the RMA models in the same program, whenever possible, for performance and code simplicity. From 192710f0804e13e5ea89592725f48ad6cb8c888c Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 28 Oct 2019 11:55:37 -0500 Subject: [PATCH 28/43] interop/query: header fix and use larger data size in example --- example_code/shmem_query_mpi_progress.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/example_code/shmem_query_mpi_progress.c b/example_code/shmem_query_mpi_progress.c index 063c320f7..528d12423 100644 --- a/example_code/shmem_query_mpi_progress.c +++ b/example_code/shmem_query_mpi_progress.c @@ -1,7 +1,9 @@ -#include +#include #include #include +int a[1048576]; + int main(int argc, char *argv[]) { MPI_Init(&argc, &argv); @@ -12,17 +14,16 @@ int main(int argc, char *argv[]) if (!shmem_query_interoperability(SHMEM_PROGRESS_MPI)) shmem_global_exit(EXIT_FAILURE); - int a[100]; static int b = 0; if (mype == 0) { MPI_Request req = MPI_REQUEST_NULL; - MPI_Isend(a, 100, MPI_INT, 1, 0, MPI_COMM_WORLD, &req); + MPI_Isend(a, 1048576, MPI_INT, 1, 0, MPI_COMM_WORLD, &req); while (shmem_int_atomic_fetch(&b, 0) != 1); - MPI_Wait(req, MPI_STATUS_IGNORE); + MPI_Wait(&req, MPI_STATUS_IGNORE); } else { - MPI_Recv(a, 100, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); + MPI_Recv(a, 1048576, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); shmem_int_atomic_set(&b, 1, 0); } From ba758db709cee000a1a422e9a68d8a81ce3fe2cb Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 28 Oct 2019 12:14:57 -0500 Subject: [PATCH 29/43] interop/dynamic: use MPI to communicate rather than disallow --- content/interoperability.tex | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/content/interoperability.tex b/content/interoperability.tex index 1ce88945d..b2ad96419 100644 --- a/content/interoperability.tex +++ b/content/interoperability.tex @@ -59,10 +59,9 @@ \subsection{Dynamic Process Creation} Unlike MPI, \openshmem starts all processes at once and requires all PEs to collectively allocate and initialize resources (e.g., symmetric heap) used by the \openshmem library before any other \openshmem routine may -be called. Communicating with a dynamically created process in the \openshmem -environment may result in undefined behavior. -Hence, users should not use \openshmem and MPI dynamic process models -in the same program. +be called. \openshmem does not support communication with dynamically created +or connected processes. In such a scenario, MPI must be used to communicate +with these processes. \subsection{Thread Safety} From 4e28a441bdd2e36daebd76eb7bb4e66998bd8b34 Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 28 Oct 2019 17:18:25 -0500 Subject: [PATCH 30/43] interop/threads: adjust text --- content/interoperability.tex | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/interoperability.tex b/content/interoperability.tex index b2ad96419..bd850476e 100644 --- a/content/interoperability.tex +++ b/content/interoperability.tex @@ -70,10 +70,10 @@ \subsection{Thread Safety} with routines that can be used for initializing and querying the thread environment. In a hybrid program, the user may request different thread levels at the initialization calls of \openshmem and MPI environments; however, the -returned support level provided by the \openshmem library might be different -from that returned in an \openshmem-only program. For instance, the former +returned support level provided by the \openshmem or MPI library might be different +from that returned in an non-hybrid program. For instance, the former initialization call in a hybrid program may initialize a resource with the -user-requested thread level, but the supported level cannot be updated by the latter +user-requested thread level, but the supported level cannot be updated by a subsequent initialization call if the underlying software runtime of \openshmem and MPI share the same internal communication resource. The program should always check the \VAR{provided} thread level returned From d69cf7ffb502df48bf5a5a2c62b97c75716a9740 Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 28 Oct 2019 17:20:18 -0500 Subject: [PATCH 31/43] interop/query: fix compiling issue --- content/shmem_query_interoperability.tex | 31 +++++++++++++----------- 1 file changed, 17 insertions(+), 14 deletions(-) diff --git a/content/shmem_query_interoperability.tex b/content/shmem_query_interoperability.tex index a656f2497..4fed3aae6 100644 --- a/content/shmem_query_interoperability.tex +++ b/content/shmem_query_interoperability.tex @@ -12,26 +12,29 @@ \apiargument{IN}{property}{The interoperability property queried by the user.} \end{apiarguments} -% compiling error ? -% \apidescription{ -\FUNC{shmem\_query\_interoperability} queries whether an interoperability property -is supported by the \openshmem library. One of the following properties can be -queried in an \openshmem program after finishing the -initialization call to \openshmem and that of the relevant programming models -being used in the program. An \openshmem library implementation may extend the -available properties. - -\begin{itemize} -\item \VAR{SHMEM\_PROGRESS\_MPI} Query whether the \openshmem -implementation makes progress for the MPI communication used in the user program. -\end{itemize} -% } +\apidescription{ + \FUNC{shmem\_query\_interoperability} queries whether an interoperability property + is supported by the \openshmem library. One of the following properties can be + queried in an \openshmem program after finishing the + initialization call to \openshmem and that of the relevant programming models + being used in the program. An \openshmem library implementation may extend the + available properties. + + \begin{itemize} + \item \VAR{SHMEM\_PROGRESS\_MPI} Query whether the \openshmem + implementation makes progress for the MPI communication used in the user program. + \end{itemize} +} \apireturnvalues{ The return value is \CONST{1} if \VAR{property} is supported by the \openshmem library; otherwise, it is \CONST{0}. } +\apinotes{ + None. +} + \begin{apiexamples} \apicexample From 310afa75276340fdbee6ecc2b5324576e6b9420c Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 28 Oct 2019 17:49:39 -0500 Subject: [PATCH 32/43] interop/id: replace PE "identifier" with "number" for consistency --- content/interoperability.tex | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/interoperability.tex b/content/interoperability.tex index bd850476e..4f6f343b3 100644 --- a/content/interoperability.tex +++ b/content/interoperability.tex @@ -111,7 +111,7 @@ \subsection{Thread Safety} \subsection{Mapping Process Identification Numbers} \label{subsec:interoperability:id} -Similar to the PE identifier in \openshmem, MPI defines rank as the +Similar to the PE number in \openshmem, MPI defines rank as the identification number of a process in a communicator. Both the \openshmem PE and the MPI rank are unique integers assigned from zero to one less than the total number of processes. In a hybrid program, the \openshmem @@ -127,7 +127,7 @@ \subsection{Mapping Process Identification Numbers} \subsubsection*{Example} \label{subsubsec:interoperability:id:example} The following example demonstrates how to manage the mapping between \openshmem -PE identifier and MPI ranks in \VAR{MPI\_COMM\_WORLD} in a hybrid \openshmem +PE numbers and MPI ranks in \VAR{MPI\_COMM\_WORLD} in a hybrid \openshmem and MPI program. \lstinputlisting[language={C}, tabsize=2, From 7723dfa0b262034d52c77d633dde6ab5d27e67a3 Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 28 Oct 2019 17:20:59 -0500 Subject: [PATCH 33/43] interop: adjust to use teams API --- content/interoperability.tex | 5 +++-- example_code/hybrid_mpi_mapping_id.c | 11 +++-------- 2 files changed, 6 insertions(+), 10 deletions(-) diff --git a/content/interoperability.tex b/content/interoperability.tex index 4f6f343b3..b54c64806 100644 --- a/content/interoperability.tex +++ b/content/interoperability.tex @@ -115,11 +115,12 @@ \subsection{Mapping Process Identification Numbers} identification number of a process in a communicator. Both the \openshmem PE and the MPI rank are unique integers assigned from zero to one less than the total number of processes. In a hybrid program, the \openshmem -PE and the MPI rank in \VAR{MPI\_COMM\_WORLD} of a process can be equal. +PE number in \LibHandleRef{SHMEM\_TEAM\_WORLD} +and the MPI rank in \VAR{MPI\_COMM\_WORLD} of a process can be equal. This feature, however, may be provided by only some of the \openshmem and MPI implementations (e.g., if both environments share the same underlying process manager) and is not portably guaranteed. A portable program should always -use the standard functions in each model, namely, \FUNC{shmem\_my\_pe} in \openshmem +use the standard functions in each model, namely, \FUNC{shmem\_team\_my\_pe} in \openshmem and \FUNC{MPI\_Comm\_rank} in MPI, to query the process identification numbers in each communication environment and manage the mapping of identifiers in the program when necessary. diff --git a/example_code/hybrid_mpi_mapping_id.c b/example_code/hybrid_mpi_mapping_id.c index 1e30b3879..e99ff5bcc 100644 --- a/example_code/hybrid_mpi_mapping_id.c +++ b/example_code/hybrid_mpi_mapping_id.c @@ -4,23 +4,18 @@ int main(int argc, char *argv[]) { - static long pSync[SHMEM_COLLECT_SYNC_SIZE]; - for (int i = 0; i < SHMEM_COLLECT_SYNC_SIZE; i++) - pSync[i] = SHMEM_SYNC_VALUE; - MPI_Init(&argc, &argv); shmem_init(); - int mype = shmem_my_pe(); - int npes = shmem_n_pes(); + int mype = shmem_team_my_pe(SHMEM_TEAM_WORLD); + int npes = shmem_team_n_pes(SHMEM_TEAM_WORLD); static int myrank; MPI_Comm_rank(MPI_COMM_WORLD, &myrank); int *mpi_ranks = shmem_calloc(npes, sizeof(int)); - shmem_collect32(mpi_ranks, &myrank, 1, 0, 0, npes, pSync); - + shmem_int_collect(SHMEM_TEAM_WORLD, mpi_ranks, &myrank, 1); if (mype == 0) for (int i = 0; i < npes; i++) printf("PE %d's MPI rank is %d\n", i, mpi_ranks[i]); From fc7a90da06c16f58e8b340545becd1665c9f99f3 Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 28 Oct 2019 18:11:57 -0500 Subject: [PATCH 34/43] interop/rma: only disable concurrent access to the same location --- content/interoperability.tex | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/content/interoperability.tex b/content/interoperability.tex index b54c64806..d1f291851 100644 --- a/content/interoperability.tex +++ b/content/interoperability.tex @@ -144,10 +144,8 @@ \subsection{RMA Programming Models} For instance, \openshmem guarantees the atomicity only of concurrent \openshmem AMO operations that operate on symmetric data with the same datatype. Access to the same symmetric object with MPI atomic operations, such as an \FUNC{MPI\_Fetch\_and\_op}, may -result in an undefined result. Furthermore, -because most RMA programs can be written by using either \openshmem or MPI RMA, -users should choose only one of the RMA models in the same program, whenever -possible, for performance and code simplicity. +result in an undefined result. Users should avoid situations where MPI and +\openshmem operations perform concurrent accesses to the same memory location. \subsection{Communication Progress} \label{subsec:interoperability:progress} From 256947138bb0c2bd2437c63ad15af471b0cc44ca Mon Sep 17 00:00:00 2001 From: Min Si Date: Tue, 29 Oct 2019 12:14:16 -0500 Subject: [PATCH 35/43] interop: minor text adjustment --- content/interoperability.tex | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/content/interoperability.tex b/content/interoperability.tex index d1f291851..3ab23498a 100644 --- a/content/interoperability.tex +++ b/content/interoperability.tex @@ -19,7 +19,7 @@ \section{MPI Interoperability} instance, one may implement both \openshmem and MPI as standalone libraries, each of which allocates and initializes fully isolated communication resources. -Another common approach +Another approach is to implement both \openshmem and MPI interfaces within the same software system in order to share a communication resource when possible. @@ -52,15 +52,15 @@ \subsection{Dynamic Process Creation} MPI defines a dynamic process model that allows creation of processes after an MPI application has started (e.g., by calling \FUNC{MPI\_Comm\_spawn}) and connection to independent processes (e.g., through \FUNC{MPI\_Comm\_accept} -and \FUNC{MPI\_Comm\_connect}) -and provides a mechanism to establish communication +and \FUNC{MPI\_Comm\_connect}). +It provides a mechanism to establish communication between the newly created processes and the existing MPI application (see MPI standard version 3.1, Chapter 10). Unlike MPI, \openshmem starts all processes at once and requires all PEs to collectively allocate and initialize resources (e.g., symmetric heap) used by the \openshmem library before any other \openshmem routine may be called. \openshmem does not support communication with dynamically created -or connected processes. In such a scenario, MPI must be used to communicate +or connected processes. In such a scenario, MPI can be used to communicate with these processes. From 3367f44fdeab87d957510b445badac0ed80d373c Mon Sep 17 00:00:00 2001 From: Min Si Date: Tue, 29 Oct 2019 14:36:09 -0500 Subject: [PATCH 36/43] interop/rma: clarify one-sided op and undefined behavior --- content/interoperability.tex | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/content/interoperability.tex b/content/interoperability.tex index 3ab23498a..b83eee462 100644 --- a/content/interoperability.tex +++ b/content/interoperability.tex @@ -145,7 +145,8 @@ \subsection{RMA Programming Models} that operate on symmetric data with the same datatype. Access to the same symmetric object with MPI atomic operations, such as an \FUNC{MPI\_Fetch\_and\_op}, may result in an undefined result. Users should avoid situations where MPI and -\openshmem operations perform concurrent accesses to the same memory location. +\openshmem one-sided operations perform concurrent accesses to the same memory +location; otherwise, the behavior is undefined. \subsection{Communication Progress} \label{subsec:interoperability:progress} From 5608cd0980d62803dd73b8abeced0167e97712f5 Mon Sep 17 00:00:00 2001 From: Min Si Date: Tue, 29 Oct 2019 14:46:50 -0500 Subject: [PATCH 37/43] interop: avoid using "user", use program instead. --- content/interoperability.tex | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/content/interoperability.tex b/content/interoperability.tex index b83eee462..bf6ad979d 100644 --- a/content/interoperability.tex +++ b/content/interoperability.tex @@ -68,12 +68,12 @@ \subsection{Thread Safety} \label{subsec:interoperability:thread} Both \openshmem and MPI define the interaction with user threads in a program with routines that can be used for initializing and querying the thread -environment. In a hybrid program, the user may request different thread levels +environment. A hybrid program may request different thread levels at the initialization calls of \openshmem and MPI environments; however, the returned support level provided by the \openshmem or MPI library might be different from that returned in an non-hybrid program. For instance, the former initialization call in a hybrid program may initialize a resource with the -user-requested thread level, but the supported level cannot be updated by a subsequent +requested thread level, but the supported level cannot be updated by a subsequent initialization call if the underlying software runtime of \openshmem and MPI share the same internal communication resource. The program should always check the \VAR{provided} thread level returned @@ -86,14 +86,14 @@ \subsection{Thread Safety} When requesting threading support in a hybrid program, however, the following additional rules are applied if the implementations of \openshmem and MPI share the same internal communication resource. -Users are strongly advised to always follow these rules to ensure program +It is strongly recommended to always follow these rules to ensure program portability. \begin{itemize} \item The \VAR{THREAD\_SINGLE} thread level requires a single-threaded program. - Hence, users should not request \VAR{THREAD\_SINGLE} at the initialization + Hence, a hybrid program should not request \VAR{THREAD\_SINGLE} at the initialization call of either \openshmem or MPI but request a different thread level at the - initialization call of the other model in the same program. + initialization call of the other model. \item The \VAR{THREAD\_FUNNELED} thread level allows only the main thread to make communication calls. A hybrid program using the \VAR{THREAD\_FUNNELED} @@ -144,7 +144,7 @@ \subsection{RMA Programming Models} For instance, \openshmem guarantees the atomicity only of concurrent \openshmem AMO operations that operate on symmetric data with the same datatype. Access to the same symmetric object with MPI atomic operations, such as an \FUNC{MPI\_Fetch\_and\_op}, may -result in an undefined result. Users should avoid situations where MPI and +result in an undefined result. A hybrid program should avoid situations where MPI and \openshmem one-sided operations perform concurrent accesses to the same memory location; otherwise, the behavior is undefined. From 7849b7f0caff04d616a17986efb2942735c5a9b0 Mon Sep 17 00:00:00 2001 From: Min Si Date: Tue, 29 Oct 2019 15:04:49 -0500 Subject: [PATCH 38/43] interop: delete query API. It is more suitable to expose the progress support of MPI through an MPI API rather than shmem_query_interoperability. The query API will be proposed with other useful properties in a separate proposal. --- content/interoperability.tex | 33 ++-------------- content/shmem_query_interoperability.tex | 50 ------------------------ example_code/shmem_query_mpi_progress.c | 35 ----------------- 3 files changed, 3 insertions(+), 115 deletions(-) delete mode 100644 content/shmem_query_interoperability.tex delete mode 100644 example_code/shmem_query_mpi_progress.c diff --git a/content/interoperability.tex b/content/interoperability.tex index bf6ad979d..810e127ae 100644 --- a/content/interoperability.tex +++ b/content/interoperability.tex @@ -4,9 +4,8 @@ \chapter{Interoperability with Other Programming Models}\label{sec:interoperabil communication libraries or parallel languages in the same program. This section describes the interoperability with other programming models, including clarification of undefined behaviors caused by mixed use of different models, -advice to \openshmem library users and developers that may improve the portability -and performance of hybrid programs, and definition of an OpenSHMEM -API that queries the interoperability features provided by an \openshmem library. +and advice to \openshmem library users and developers that may improve the portability +and performance of hybrid programs. \section{MPI Interoperability} @@ -160,31 +159,5 @@ \subsection{Communication Progress} call that internally triggers the progress of MPI, if the underlying hardware does not support asynchronous communication. A hybrid program should not assume that the \openshmem library also makes progress for MPI. -A call to \FUNC{shmem\_query\_interoperability} with the \VAR{SHMEM\_PROGRESS\_MPI} -property (see definition in \ref{subsec:interoperability:query}) -can be used to portably check whether the implementation provides asynchronous -progression also for MPI. If it is not provided, the user program may have to -explicitly manage the asynchronous communication in MPI in +It can explicitly manage the asynchronous communication of MPI in order to prevent any deadlock or performance degradation. - -\apiimpnotes{ -Implementations that provide both \openshmem and MPI interfaces should try -to ensure progress for both models, when necessary and possible, for performance -reasons. For instance, an implementation -may utilize a software progress thread to process any software-handled -communication requests, after the user program has called -\FUNC{shmem\_init} and \FUNC{MPI\_Init} provided by the same system. -} - - -\section{Query Interoperability} - -A hybrid user program can query the interoperability feature of an \openshmem -implementation in order to avoid unnecessary overhead and programming complexity. -For instance, the user program can eliminate manual progress polling for MPI -communication if the \openshmem implementation guarantees asynchronous -communication also for MPI. - -\subsection{\textbf{SHMEM\_QUERY\_INTEROPERABILITY}} -\label{subsec:interoperability:query} -\input{content/shmem_query_interoperability} \ No newline at end of file diff --git a/content/shmem_query_interoperability.tex b/content/shmem_query_interoperability.tex deleted file mode 100644 index 4fed3aae6..000000000 --- a/content/shmem_query_interoperability.tex +++ /dev/null @@ -1,50 +0,0 @@ -\apisummary{ - Determines whether an interoperability feature is supported by the \openshmem - library implementation. -} -\begin{apidefinition} - -\begin{Csynopsis} -int @\FuncDecl{shmem\_query\_interoperability}@(int property); -\end{Csynopsis} - -\begin{apiarguments} - \apiargument{IN}{property}{The interoperability property queried by the user.} -\end{apiarguments} - -\apidescription{ - \FUNC{shmem\_query\_interoperability} queries whether an interoperability property - is supported by the \openshmem library. One of the following properties can be - queried in an \openshmem program after finishing the - initialization call to \openshmem and that of the relevant programming models - being used in the program. An \openshmem library implementation may extend the - available properties. - - \begin{itemize} - \item \VAR{SHMEM\_PROGRESS\_MPI} Query whether the \openshmem - implementation makes progress for the MPI communication used in the user program. - \end{itemize} -} - -\apireturnvalues{ - The return value is \CONST{1} if \VAR{property} is supported by the \openshmem library; - otherwise, it is \CONST{0}. -} - -\apinotes{ - None. -} - -\begin{apiexamples} - -\apicexample - {The following example queries whether the \openshmem library supports asynchronous -progress for MPI. If it returns 1, the library guarantees the MPI nonblocking send -is processed while PE 0 is in the busy wait loop with repeated calls to -\FUNC{shmem\_int\_atomic\_fetch} so that deadlock will not occur.} - {./example_code/shmem_query_mpi_progress.c} - {} - -\end{apiexamples} - -\end{apidefinition} diff --git a/example_code/shmem_query_mpi_progress.c b/example_code/shmem_query_mpi_progress.c deleted file mode 100644 index 528d12423..000000000 --- a/example_code/shmem_query_mpi_progress.c +++ /dev/null @@ -1,35 +0,0 @@ -#include -#include -#include - -int a[1048576]; - -int main(int argc, char *argv[]) -{ - MPI_Init(&argc, &argv); - shmem_init(); - - int mype = shmem_my_pe(); - - if (!shmem_query_interoperability(SHMEM_PROGRESS_MPI)) - shmem_global_exit(EXIT_FAILURE); - - static int b = 0; - if (mype == 0) { - MPI_Request req = MPI_REQUEST_NULL; - MPI_Isend(a, 1048576, MPI_INT, 1, 0, MPI_COMM_WORLD, &req); - - while (shmem_int_atomic_fetch(&b, 0) != 1); - - MPI_Wait(&req, MPI_STATUS_IGNORE); - } else { - MPI_Recv(a, 1048576, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); - - shmem_int_atomic_set(&b, 1, 0); - } - - shmem_finalize(); - MPI_Finalize(); - - return 0; -} From e3a5aea66cb034c4891651815e75b2b2ca202446 Mon Sep 17 00:00:00 2001 From: Min Si Date: Tue, 29 Oct 2019 15:13:40 -0500 Subject: [PATCH 39/43] interop: use \ac{PE} --- content/interoperability.tex | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/content/interoperability.tex b/content/interoperability.tex index 810e127ae..ec908f9ba 100644 --- a/content/interoperability.tex +++ b/content/interoperability.tex @@ -55,7 +55,7 @@ \subsection{Dynamic Process Creation} It provides a mechanism to establish communication between the newly created processes and the existing MPI application (see MPI standard version 3.1, Chapter 10). -Unlike MPI, \openshmem starts all processes at once and requires all PEs to +Unlike MPI, \openshmem starts all processes at once and requires all \acp{PE} to collectively allocate and initialize resources (e.g., symmetric heap) used by the \openshmem library before any other \openshmem routine may be called. \openshmem does not support communication with dynamically created @@ -110,11 +110,11 @@ \subsection{Thread Safety} \subsection{Mapping Process Identification Numbers} \label{subsec:interoperability:id} -Similar to the PE number in \openshmem, MPI defines rank as the -identification number of a process in a communicator. Both the \openshmem PE +Similar to the \ac{PE} number in \openshmem, MPI defines rank as the +identification number of a process in a communicator. Both the \openshmem \ac{PE} and the MPI rank are unique integers assigned from zero to one less than the total number of processes. In a hybrid program, the \openshmem -PE number in \LibHandleRef{SHMEM\_TEAM\_WORLD} +\ac{PE} number in \LibHandleRef{SHMEM\_TEAM\_WORLD} and the MPI rank in \VAR{MPI\_COMM\_WORLD} of a process can be equal. This feature, however, may be provided by only some of the \openshmem and MPI implementations (e.g., if both environments share the same underlying process @@ -127,7 +127,7 @@ \subsection{Mapping Process Identification Numbers} \subsubsection*{Example} \label{subsubsec:interoperability:id:example} The following example demonstrates how to manage the mapping between \openshmem -PE numbers and MPI ranks in \VAR{MPI\_COMM\_WORLD} in a hybrid \openshmem +\ac{PE} numbers and MPI ranks in \VAR{MPI\_COMM\_WORLD} in a hybrid \openshmem and MPI program. \lstinputlisting[language={C}, tabsize=2, From ec2ca5fee68c9d617824e93315043f22628435d3 Mon Sep 17 00:00:00 2001 From: Min Si Date: Tue, 29 Oct 2019 15:24:01 -0500 Subject: [PATCH 40/43] interop: use \ac{MPI} --- content/interoperability.tex | 80 ++++++++++++++++++------------------ 1 file changed, 40 insertions(+), 40 deletions(-) diff --git a/content/interoperability.tex b/content/interoperability.tex index ec908f9ba..f109b6b45 100644 --- a/content/interoperability.tex +++ b/content/interoperability.tex @@ -8,21 +8,21 @@ \chapter{Interoperability with Other Programming Models}\label{sec:interoperabil and performance of hybrid programs. -\section{MPI Interoperability} +\section{\ac{MPI} Interoperability} -\openshmem and MPI are two commonly used parallel programming models for +\openshmem and \ac{MPI} are two commonly used parallel programming models for distributed-memory systems. The user can choose to utilize both models in the same program to efficiently and easily support various communication patterns. -A vendor may implement the \openshmem and MPI libraries in different ways. For -instance, one may implement both \openshmem and MPI as standalone libraries, +A vendor may implement the \openshmem and \ac{MPI} libraries in different ways. For +instance, one may implement both \openshmem and \ac{MPI} as standalone libraries, each of which allocates and initializes fully isolated communication resources. Another approach -is to implement both \openshmem and MPI interfaces within the +is to implement both \openshmem and \ac{MPI} interfaces within the same software system in order to share a communication resource when possible. -To improve interoperability and portability in \openshmem + MPI hybrid +To improve interoperability and portability in \openshmem + \ac{MPI} hybrid programming, we clarify the relevant semantics in the following subsections. @@ -30,15 +30,15 @@ \subsection{Initialization} In order to ensure that a hybrid program can be portably performed with different vendor implementations, the \openshmem environment of the program must be initialized by a call to \FUNC{shmem\_init} or \FUNC{shmem\_init\_thread} and be finalized by -a call to \FUNC{shmem\_finalize}; the MPI environment of the program must be initialized +a call to \FUNC{shmem\_finalize}; the \ac{MPI} environment of the program must be initialized by a call to \FUNC{MPI\_Init} or \FUNC{MPI\_Init\_thread} and be finalized by a call to \FUNC{MPI\_Finalize}. \apiimpnotes{ -Portable implementations of OpenSHMEM and MPI must ensure that the initialization +Portable implementations of OpenSHMEM and \ac{MPI} must ensure that the initialization calls can be made in an arbitrary order within a program; the same rule also applies to the finalization calls. A software runtime that utilizes a shared -communication resource for \openshmem and MPI communication may maintain an +communication resource for \openshmem and \ac{MPI} communication may maintain an internal reference counter in order to ensure that the shared resource is initialized only once and thus no shared resource is released until the last finalization call is made. @@ -48,87 +48,87 @@ \subsection{Initialization} \subsection{Dynamic Process Creation} \label{subsec:interoperability:mpmd} -MPI defines a dynamic process model that allows creation of processes after -an MPI application has started (e.g., by calling \FUNC{MPI\_Comm\_spawn}) and +\ac{MPI} defines a dynamic process model that allows creation of processes after +an \ac{MPI} application has started (e.g., by calling \FUNC{MPI\_Comm\_spawn}) and connection to independent processes (e.g., through \FUNC{MPI\_Comm\_accept} and \FUNC{MPI\_Comm\_connect}). It provides a mechanism to establish communication -between the newly created processes and the existing MPI application (see -MPI standard version 3.1, Chapter 10). -Unlike MPI, \openshmem starts all processes at once and requires all \acp{PE} to +between the newly created processes and the existing \ac{MPI} application (see +\ac{MPI} standard version 3.1, Chapter 10). +Unlike \ac{MPI}, \openshmem starts all processes at once and requires all \acp{PE} to collectively allocate and initialize resources (e.g., symmetric heap) used by the \openshmem library before any other \openshmem routine may be called. \openshmem does not support communication with dynamically created -or connected processes. In such a scenario, MPI can be used to communicate +or connected processes. In such a scenario, \ac{MPI} can be used to communicate with these processes. \subsection{Thread Safety} \label{subsec:interoperability:thread} -Both \openshmem and MPI define the interaction with user threads in a program +Both \openshmem and \ac{MPI} define the interaction with user threads in a program with routines that can be used for initializing and querying the thread environment. A hybrid program may request different thread levels -at the initialization calls of \openshmem and MPI environments; however, the -returned support level provided by the \openshmem or MPI library might be different +at the initialization calls of \openshmem and \ac{MPI} environments; however, the +returned support level provided by the \openshmem or \ac{MPI} library might be different from that returned in an non-hybrid program. For instance, the former initialization call in a hybrid program may initialize a resource with the requested thread level, but the supported level cannot be updated by a subsequent -initialization call if the underlying software runtime of \openshmem and MPI +initialization call if the underlying software runtime of \openshmem and \ac{MPI} share the same internal communication resource. The program should always check the \VAR{provided} thread level returned at the corresponding initialization call or query the level of thread support after initialization to portably ensure thread support in each communication environment. -Both \openshmem and MPI define similar thread levels, namely, \VAR{THREAD\_SINGLE}, +Both \openshmem and \ac{MPI} define similar thread levels, namely, \VAR{THREAD\_SINGLE}, \VAR{THREAD\_FUNNELED}, \VAR{THREAD\_SERIALIZED}, and \VAR{THREAD\_MULTIPLE}. When requesting threading support in a hybrid program, however, the following additional rules are applied if the implementations of \openshmem -and MPI share the same internal communication resource. +and \ac{MPI} share the same internal communication resource. It is strongly recommended to always follow these rules to ensure program portability. \begin{itemize} \item The \VAR{THREAD\_SINGLE} thread level requires a single-threaded program. Hence, a hybrid program should not request \VAR{THREAD\_SINGLE} at the initialization - call of either \openshmem or MPI but request a different thread level at the + call of either \openshmem or \ac{MPI} but request a different thread level at the initialization call of the other model. \item The \VAR{THREAD\_FUNNELED} thread level allows only the main thread to make communication calls. A hybrid program using the \VAR{THREAD\_FUNNELED} - thread level in both \openshmem and MPI should ensure that the same main thread + thread level in both \openshmem and \ac{MPI} should ensure that the same main thread is used in both communication environments. \item The \VAR{THREAD\_SERIALIZED} thread level requires the program to ensure that communication calls are not made concurrently by multiple threads. If a hybrid program uses \VAR{THREAD\_SERIALIZED} in one communication environment and \VAR{THREAD\_SERIALIZED} or \VAR{THREAD\_FUNNELED} in the other one, it - should also guarantee that the \openshmem and MPI calls are not made concurrently + should also guarantee that the \openshmem and \ac{MPI} calls are not made concurrently from two distinct threads. \end{itemize} \subsection{Mapping Process Identification Numbers} \label{subsec:interoperability:id} -Similar to the \ac{PE} number in \openshmem, MPI defines rank as the +Similar to the \ac{PE} number in \openshmem, \ac{MPI} defines rank as the identification number of a process in a communicator. Both the \openshmem \ac{PE} -and the MPI rank are unique integers assigned from zero to one less than the total +and the \ac{MPI} rank are unique integers assigned from zero to one less than the total number of processes. In a hybrid program, the \openshmem \ac{PE} number in \LibHandleRef{SHMEM\_TEAM\_WORLD} -and the MPI rank in \VAR{MPI\_COMM\_WORLD} of a process can be equal. -This feature, however, may be provided by only some of the \openshmem and MPI +and the \ac{MPI} rank in \VAR{MPI\_COMM\_WORLD} of a process can be equal. +This feature, however, may be provided by only some of the \openshmem and \ac{MPI} implementations (e.g., if both environments share the same underlying process manager) and is not portably guaranteed. A portable program should always use the standard functions in each model, namely, \FUNC{shmem\_team\_my\_pe} in \openshmem -and \FUNC{MPI\_Comm\_rank} in MPI, to query the process identification numbers +and \FUNC{MPI\_Comm\_rank} in \ac{MPI}, to query the process identification numbers in each communication environment and manage the mapping of identifiers in the program when necessary. \subsubsection*{Example} \label{subsubsec:interoperability:id:example} The following example demonstrates how to manage the mapping between \openshmem -\ac{PE} numbers and MPI ranks in \VAR{MPI\_COMM\_WORLD} in a hybrid \openshmem -and MPI program. +\ac{PE} numbers and \ac{MPI} ranks in \VAR{MPI\_COMM\_WORLD} in a hybrid \openshmem +and \ac{MPI} program. \lstinputlisting[language={C}, tabsize=2, basicstyle=\ttfamily\footnotesize] @@ -137,13 +137,13 @@ \subsubsection*{Example} \subsection{RMA Programming Models} \label{subsec:interoperability:rma} -\openshmem and MPI each define similar one-sided communication models; +\openshmem and \ac{MPI} each define similar one-sided communication models; however, a portable program should not assume interoperability between these models. For instance, \openshmem guarantees the atomicity only of concurrent \openshmem AMO operations that operate on symmetric data with the same datatype. Access to the same symmetric -object with MPI atomic operations, such as an \FUNC{MPI\_Fetch\_and\_op}, may -result in an undefined result. A hybrid program should avoid situations where MPI and +object with \ac{MPI} atomic operations, such as an \FUNC{MPI\_Fetch\_and\_op}, may +result in an undefined result. A hybrid program should avoid situations where \ac{MPI} and \openshmem one-sided operations perform concurrent accesses to the same memory location; otherwise, the behavior is undefined. @@ -153,11 +153,11 @@ \subsection{Communication Progress} \openshmem promises the progression of communication both with and without \openshmem calls and requires the software progress mechanism in the implementation (e.g., a progress thread) when the hardware does not provide asynchronous communication -capabilities. In MPI, however, a weak progress semantics is applied. That is, -an MPI communication call is guaranteed only to complete in finite time. For -instance, an \FUNC{MPI\_Put} may be completed only when the remote process makes an MPI -call that internally triggers the progress of MPI, if the underlying hardware +capabilities. In \ac{MPI}, however, a weak progress semantics is applied. That is, +an \ac{MPI} communication call is guaranteed only to complete in finite time. For +instance, an \FUNC{MPI\_Put} may be completed only when the remote process makes an \ac{MPI} +call that internally triggers the progress of \ac{MPI}, if the underlying hardware does not support asynchronous communication. A hybrid program -should not assume that the \openshmem library also makes progress for MPI. -It can explicitly manage the asynchronous communication of MPI in +should not assume that the \openshmem library also makes progress for \ac{MPI}. +It can explicitly manage the asynchronous communication of \ac{MPI} in order to prevent any deadlock or performance degradation. From 9d1a32274820ac5aba69b6e7d1c21851cc09bdc9 Mon Sep 17 00:00:00 2001 From: Min Si Date: Fri, 6 Dec 2019 14:12:27 -0600 Subject: [PATCH 41/43] interop: add reference to section 4.1 (progress definition) --- content/interoperability.tex | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/content/interoperability.tex b/content/interoperability.tex index f109b6b45..64113032d 100644 --- a/content/interoperability.tex +++ b/content/interoperability.tex @@ -153,7 +153,8 @@ \subsection{Communication Progress} \openshmem promises the progression of communication both with and without \openshmem calls and requires the software progress mechanism in the implementation (e.g., a progress thread) when the hardware does not provide asynchronous communication -capabilities. In \ac{MPI}, however, a weak progress semantics is applied. That is, +capabilities (see Section \ref{subsec:progress}). +In \ac{MPI}, however, a weak progress semantics is applied. That is, an \ac{MPI} communication call is guaranteed only to complete in finite time. For instance, an \FUNC{MPI\_Put} may be completed only when the remote process makes an \ac{MPI} call that internally triggers the progress of \ac{MPI}, if the underlying hardware From 9f11fd4b42904dd57ce5cce23fab2b6c0badeed3 Mon Sep 17 00:00:00 2001 From: Min Si Date: Fri, 6 Dec 2019 15:44:18 -0600 Subject: [PATCH 42/43] interop: new mapping id example based on comm_split --- content/interoperability.tex | 12 +++++++++- .../hybrid_mpi_mapping_id_shmem_comm.c | 24 +++++++++++++++++++ 2 files changed, 35 insertions(+), 1 deletion(-) create mode 100644 example_code/hybrid_mpi_mapping_id_shmem_comm.c diff --git a/content/interoperability.tex b/content/interoperability.tex index 64113032d..318c34086 100644 --- a/content/interoperability.tex +++ b/content/interoperability.tex @@ -124,7 +124,7 @@ \subsection{Mapping Process Identification Numbers} in each communication environment and manage the mapping of identifiers in the program when necessary. -\subsubsection*{Example} +\subsubsection*{Examples} \label{subsubsec:interoperability:id:example} The following example demonstrates how to manage the mapping between \openshmem \ac{PE} numbers and \ac{MPI} ranks in \VAR{MPI\_COMM\_WORLD} in a hybrid \openshmem @@ -134,6 +134,16 @@ \subsubsection*{Example} basicstyle=\ttfamily\footnotesize] {example_code/hybrid_mpi_mapping_id.c} +The following example demonstrates an alternative approach for managing the mapping +of process identification numbers in a hybrid program. The program creates a +new MPI communicator, named \VAR{shmem\_comm}, that contains all +processes in \VAR{MPI\_COMM\_WORLD} and uses the same \ac{MPI} rank and +\openshmem \ac{PE} numbering. + +\lstinputlisting[language={C}, tabsize=2, + basicstyle=\ttfamily\footnotesize] + {example_code/hybrid_mpi_mapping_id_shmem_comm.c} + \subsection{RMA Programming Models} \label{subsec:interoperability:rma} diff --git a/example_code/hybrid_mpi_mapping_id_shmem_comm.c b/example_code/hybrid_mpi_mapping_id_shmem_comm.c new file mode 100644 index 000000000..cf2b86809 --- /dev/null +++ b/example_code/hybrid_mpi_mapping_id_shmem_comm.c @@ -0,0 +1,24 @@ +#include +#include +#include + +int main(int argc, char *argv[]) +{ + MPI_Init(&argc, &argv); + shmem_init(); + + int mype = shmem_my_pe(); + + MPI_Comm shmem_comm; + MPI_Comm_split(MPI_COMM_WORLD, 0, mype, &shmem_comm); + + int myrank; + MPI_Comm_rank(shmem_comm, &myrank); + printf("PE %d's MPI rank is %d\n", mype, myrank); + + MPI_Comm_free(&shmem_comm); + shmem_finalize(); + MPI_Finalize(); + + return 0; +} From 816b271a72ad76eb00f4bc188921bea4e3769f6d Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 9 Dec 2019 10:19:45 -0600 Subject: [PATCH 43/43] interop: adjust text --- content/interoperability.tex | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/interoperability.tex b/content/interoperability.tex index 318c34086..4cec41b76 100644 --- a/content/interoperability.tex +++ b/content/interoperability.tex @@ -137,8 +137,8 @@ \subsubsection*{Examples} The following example demonstrates an alternative approach for managing the mapping of process identification numbers in a hybrid program. The program creates a new MPI communicator, named \VAR{shmem\_comm}, that contains all -processes in \VAR{MPI\_COMM\_WORLD} and uses the same \ac{MPI} rank and -\openshmem \ac{PE} numbering. +processes in \VAR{MPI\_COMM\_WORLD} and each process has the same \ac{MPI} rank +number as its \openshmem \ac{PE} number. \lstinputlisting[language={C}, tabsize=2, basicstyle=\ttfamily\footnotesize]