From 9f1e8d1cb732f6f21c787484e32df6267d832dad Mon Sep 17 00:00:00 2001 From: John Alsop Date: Mon, 26 Jan 2026 14:41:18 -0600 Subject: [PATCH 1/7] Add formal API-level memory consistency model Add a section formally defining memory ordering guaranteed by the OpenSHMEM API. This formalization section describes how a formal PL-level memory model like C++ can be extended to define the API-level memory semantics provided by OpenSHMEM. This adds clarity and reduces ambiguity about how OpenSHMEM interacts with the memory system, which is increasingly important as the API is implemented on accelerators with SW managed coherence --- content/api_mem_model.tex | 200 +++++++++++++++++ content/api_observable.tex | 436 +++++++++++++++++++++++++++++++++++++ figures/api_relations.png | Bin 0 -> 53238 bytes main_spec.tex | 3 + 4 files changed, 639 insertions(+) create mode 100644 content/api_mem_model.tex create mode 100644 content/api_observable.tex create mode 100644 figures/api_relations.png diff --git a/content/api_mem_model.tex b/content/api_mem_model.tex new file mode 100644 index 000000000..3c79acdae --- /dev/null +++ b/content/api_mem_model.tex @@ -0,0 +1,200 @@ + +To precisely define what behavior the \openshmem interface guarantees for both users and implementers, we formally define an API-level memory model. An API-level model builds upon a programming language-level (PL-level) memory model, and it defines how API operations impact memory. This is useful for determining the ordering and visibility requirements for \openshmem memory accesses when implemented on any device, but it is of particular importance for devices like GPUs that have software managed caches requiring expensive bulk flush or invalidate actions to enforce memory ordering guarantees. Without a precisely specified memory model, API implementers and users may disagree on what behavior is guaranteed by the API, and this can lead to poor performance (too many bulk cache actions) or buggy code (too few bulk cache actions). + +Although \openshmem may be implemented in multiple programming languages, this section describes how \openshmem extends the C++ memory consistency model. C++ is chosen as an example because of its generality, and because it forms the basis for common GPU programming languages (CUDA, hip). We describe how the guarantees described here can be adapted for GPU memory models in Section~\ref{subsubsec:api_scoped}. + +\subsubsection{Baseline PL-level Axiomatic Memory Model (C++)}\label{subsubsec:api_pl_baseline} + +A memory consistency model (MCM) defines what values loads are allowed to return in a multi-threaded program. One common way to formally specify a MCM is via axioms over relations between events in a dynamic execution. Axiomatic formalization begins by considering all possible “candidate” executions of a parallel program. Loads may return the value of a conflicting store anywhere else in the execution, or the initial memory value at that address (i.e., allowing all possible reorderings). The model then defines relations between accesses in the candidate executions, and using axioms over these relations, defines which executions are legal (hence restricting which reorderings are allowed by the compiler or hardware). + + + +The C++ memory model is defined axiomatically in the C++ specification. Although many prior works aim to mathematize or improve this formalism, here we simply try to provide sufficient understanding to build upon it with the \openshmem API model. + +Under the PL-level C++ memory model, events consist of reads (\textit{R}), writes (\textit{W}), atomic read-modify-writes (\textit{RMW}) perform both a read and write to memory, and fences (\textit{F}). Events that access memory may be atomic or non-atomic. Atomic events are associated with a memory order which can be acquire, release, acquire-release, or relaxed (we leave out consume and sequential consistent memory orders here for simplicity, but this does not impact how the PL-level semantics are extended for the API-level model). The following relations are defined over the events in a candidate execution, and will be used to derive API-level relations in Section~\ref{subsubsec:api_relations}: + +\begin{itemize} + \item \textbf{Sequenced before (\textit{sb}):} Relates memory events from the same thread based on the order they are evaluated (also called program order in some contexts) + \item \textbf{Modification order (\textit{mo}):} A total ordering between all writes to the same address. + \item \textbf{Reads-from (\textit{rf}):} Relates a write \textit{W} and any read \textit{R} to the same address such that \textit{R} returns the value written by \textit{W}. + \item \textbf{Synchronizes-with (\textit{sw}):} Same as \textit{rf}, but also requires that \textit{W} and \textit{R} are atomic, \textit{W} has release or acquire-release memory order, and \textit{R} has acquire or acquire-release memory order (this relation is also called synchronization order in some contexts). + \item \textbf{Happens-before (\textit{hb}):} Transitive closure of \textit{sw} and \textit{sb}. +\end{itemize} + +Furthermore, a \textbf{data race} is defined as a relation between two accesses $A$ and $B$ where all of the following are true: +\begin{itemize} + \item $A$ and $B$ access the same address. + \item At least one access is a write. + \item At least one access is not atomic. + \item $A$ and $B$ are not related by \textit{hb}. +\end{itemize} + +Although the axiomatic model places no requirements on when each event must physically update or access physical memory, it prohibits executions that don't adhere to a set of constraints, and by extension it disallows any architecture or compilation scheme that can result in such an execution. Specifically, C++ constrains candidate execution with the following axioms\footnote{When using SC memory order, there is an additional total order over all SC atomics that must similarly be consistent with happens-before}: + +\textbf{Axioms} + +\begin{itemize} + \item $acyclic(hb)$: \textit{hb} cannot have any cycles. + \item $irreflexive(rf;hb)$: \textit{hb} must be consistent with the \textbf{rf} relation. + \item $irreflexive((rf^{-1})^{?};mo;rf^?;hb)$: \textit{hb} must be consistent with the \textit{mo} order of writes, as well as any reads that read from writes in mo order (often called coherence requirement). + \item $irreflexive(rf \cup (mo;mo;rf^-1) \cup (mo;rf))$: The read of an atomic read-modify-write access must always read the last value (in the modification order) written before the write of the read-modify-write access. +\end{itemize} + + +Finally, the C++ specification states that, if a data race exists in a candidate execution that satisfies the above axioms, then the program's behavior is undefined. If a data race does not exist, then the program's behavior must conform to one of the conforming candidate executions. + +\subsubsection{API-Level Memory Model Overview}\label{subsubsec:api_model_overview} + +The \openshmem API-level memory model extends PL-level memory models like C++ to define how \openshmem operations interact with memory. While PL-level models operate on fine-grain loads, stores, and fences, the API-level memory model must additionally operate on coarse-grain API operations like get, put, barrier, etc. that can execute asynchronously and involve multiple PEs. + +To describe the desired \openshmem behavior, we make the following additions to the existing PL-level model: + +\begin{enumerate} + \item\label{mcmsteps:one} \textbf{PEs and operation events}: Introduce a new event type that corresponds to an API operation and a new address property that specifies the target PE. + \item\label{mcmsteps:two} \textbf{Per-operation observable accesses}: For each operation type, define a set of observable memory accesses (these are not related to the initiating thread by \textit{sb}). + \item\label{mcmsteps:three} \textbf{API-level relations}: Define new API-level ordering relations over operation events and their observable accesses that will constrain \openshmem memory ordering. + \item\label{mcmsteps:four} \textbf{Fused collectives}: Fuse matched collective operation calls into a single event. + \item\label{mcmsteps:five} \textbf{Updated relations, axioms}: Update PL-level axioms to incorporate new events and relations. +\end{enumerate} + +Each of the above steps adds a piece of required support. \ref{mcmsteps:one} provides a basis for operation-level semantics across multiple PE memories. \ref{mcmsteps:two} enables asynchrony since a thread can continue execution while observable accesses execute in the background. \ref{mcmsteps:three} allows the programmer to explicitly enforce ordering of asynchronous operations using the ordering and synchronization operations provided by \openshmem. \ref{mcmsteps:four} Supports multi-PE collective operations and establishes ordering across PEs even when there are no observable accesses. \ref{mcmsteps:five} formally updates the PL-level axioms to establish a complete API-level memory model. + +After defining how \openshmem extends the C++ memory model, we describe how the addition of scoped synchronization in GPU memory models impacts the API-level model. + +\subsubsection{PEs and Operation Events}\label{subsubsec:api_pes_operations} + +When defining behavior at the \openshmem API level, we must first consider two key differences in program execution relative to a PL-level model. First, \openshmem applications span multiple PEs, each of which has its own local memory space. Therefore, the PL-level definition of memory address is extended to include both the PE-local address and the PE ID. Normal memory accesses can only access addresses on the PE local to the thread that issued it, which is constant for the life of a thread. \openshmem operations may trigger accesses to the local PE’s memory and/or the memory of remote PEs. Dynamic memory accesses in an execution are only considered to access the same address if both the address and PE ID attributes match (impacting the definitions of of \textit{mo} and \textit{rf}). + +Second, since the \openshmem API does not define the underlying PL-level implementation, API operations do not resolve to a function consisting of PL-level events. Thus, these operations must be treated as a new type of primitive in the API-level memory model: the operation event. An operation event specifies its type, and each type defines the set of minimum observable accesses triggered by the operation (described next). Operation events participate in \textit{sb} and derived relations, similar to normal thread access events. Although operation events may be associated with separate memory access events (see Section~\ref{subsubsec:api_observable}), the operation events themselves do not participate directly in access-based relations like \textit{mo}, \textit{rf}, and \textit{sw}. + +\subsubsection{Per-Operation Observable Accesses}\label{subsubsec:api_observable} + +For the purposes of the API memory model, each \openshmem operation type defines the minimum set of observable memory accesses performed by the operation. These accesses are considered to be "issued" by the associated. These accesses must occur as part of the operation, however there is no guarantee that they will be executed by the initiating thread, or that they will be the only accesses performed by the API operation implementation. Observable accesses have the same semantics as normal accesses and take part in the same relations, with one exception: there is no \textit{sb} relationship between observable accesses and the initiating thread of an API operation. Accesses performed by an implementation that are not included in the observable set must not modify data that may be observed from application code external to \openshmem operation implementations. + +Table~\ref{table:api_observable} defines the observable accesses performed for each \openshmem operation. Each access denotes the address as well as the PE it targets ($my\_pe$ refers to the PE of the initiating thread). In many cases, we also show control and compute pseudocode to define which addresses are accessed, which values are stored, and which values must eventually be loaded\footnote{We use pseudocode rather than C++ here because it allows to clearly specify where memory accesses are required.}. However, this single-threaded control flow does not constrain the actual implementation and does not necessarily imply a total \textit{sb} ordering between observable accesses. Within an operation, \textit{sb} is defined to exist only between observable accesses when there is a true control or data dependency between two accesses. Specifically, $(A, B) \in sb$ only if $A$ and $B$ are observable accesses of the same operation and one of the following is true: + \begin{itemize} + \item $B$ takes a value derived from $A$ as an operand. + \item $B$ has a control dependency on the result of $A$ such that $A$'s evaluation determines whether $B$ will occur. + \item $B$ has an ordering dependency $A$. This type of dependency currently exists only in put\_signal[\_nbi] where the final store to $sig_addr$ is dependent on all prior observable accesses. + \end{itemize} +Many operations have no visible accesses (e.g., memory ordering routines, whose semantics are defined by the API-level relations described in Section~\ref{subsubsec:api_relations}). + +\input{content/api_observable} + +\subsubsection{Fused Collectives}\label{subsubsec:api_integrate} + +\openshmem defines multiple collective operations that synchronize across multiple PEs in a team, in some cases without any observable accesses to memory (e.g., as is the case with barrier). +There are multiple requirements on the use of these operations. In particular, each dynamic call to a collective operation must be paired with a matching call from every other PE in the team. +The API-level memory model fuses these matching calls into a single event in the candidate execution. +This fused operation and the associated observable accesses (if any) are considered to be executed collectively by all nodes in the synchronizing set of PEs, allowing us to establish ordering relations across these PEs (defined in Section~\ref{subsubsec:api_relations}). +This does not mean the observable accesses must occur atomically, and it does not constrain how the operation is implemented (i.e., any thread or PE in the team may perform each observable access). + +\subsubsection{API-Level Relations}\label{subsubsec:api_relations} + +Section~\ref{subsubsec:api_observable} details how operations trigger memory accesses, but it does not define how the ordering of these accesses are constrained. For this, we define a set of API-level relations between normal memory accesses, API operations, and observable accesses issued by these operations. These new relations will be used to extend the relations and axioms that constrain program behavior in Section~\ref{subsubsec:api_integrate}. + + + +\subsubsubsection{Implicit Local Visibility (ilv)} +Implicit local visibility relates local observable accesses from any \openshmem operation and prior hb-ordered memory accesses from the initiating PE. This ensures, for example, that a put operation will load an up-to-date version of data from the source buffer. Specifically, $(A, B) \in ilv$ only if all of the following are true: +\begin{itemize} + \item $A$ is \textit{hb} ordered before the operation event that issues $B$. + \item $A$ and $B$ target the same PE (the PE of the caller thread). +\end{itemize} + +\subsubsubsection{Local Completion Order (lco)} +Local completion order relates local observable accesses from a blocking \openshmem operation and subsequent \textit{hb} ordered accesses and operations from the initiating PE. This ensures, for example, that a blocking put completes its accesses to the source buffer before a subsequent store from the initiating thread overwrites the data. Specifically, $(A, B) \in lco$ only if all of the following are true: +\begin{itemize} + \item $A$ is an observable access issued from a blocking operation event (i.e., an operation that does not have nbi in its name). + \item $A$ targets the PE of the operation event that issues it. + \item The operation event that issues $A$ is \textit{hb} ordered before $B$, or is \textit{hb} ordered before an operation event that issues $B$. +\end{itemize} + +\subsubsubsection{Remote Delivery Order (rdo)} + +Remote delivery order relates accesses separated by an \openshmem fence operation. However, this relation does not establish full connectivity; it only forms a connection between observable accesses to the same PE from operations \textit{hb} ordered before and after the fence, and between \textit{hb} ordered accesses before a fence and observable accesses from operations \textit{hb} ordered after a fence. Specifically, $(A, B) \in rdo$ only if there is an \openshmem fence operation $Fop$ and any of the following conditions are true: +\begin{itemize} + \item $A$ is a memory access that is \textit{hb} ordered before $Fop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $Fop$, and the context associated with $Fop$ does NOT have the $OPENSHMEM_CTX_NOSTORE$ option enabled. + \item $A$ is an observable access issued by a fence-ordered operation (defined below) that is \textit{hb} ordered before $Fop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $Fop$, and $A$ and $B$ target the same PE. +\end{itemize} + +\textbf{fence-ordered operations}: blocking or nonblocking put, blocking or nonblocking atomic memory operation, blocking or nonblocking put\_with\_signal. + +\subsubsubsection{Remote Completion Order (rco)} + +Remote completion order establishes full connectivity between accesses separated by an \openshmem quiet or barrier operation. All thread accesses or observable accesses that are \textit{hb} ordered before the quiet or barrier operation are ordered before all thread accesses or observable accesses that are \textit{hb} ordered after the quiet or barrier operation. Specifically, $(A, B) \in rdo$ only if there is an \openshmem quiet or barrier operation $QBop$ and any of the following conditions are true: +\begin{itemize} + \item $A$ is a memory access that is \textit{hb} ordered before $QBop$, $B$ is an access that is \textit{hb} ordered after $QBop$, and the context associated with $QBop$ does NOT have the $OPENSHMEM_CTX_NOSTORE$ option enabled. + \item $A$ is a memory access that is \textit{hb} ordered before $QBop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $QBop$, and the context associated with $QBop$ does NOT have the $OPENSHMEM_CTX_NOSTORE$ option enabled. + \item $A$ is an observable access issued by an operation that is \textit{hb} ordered before $QBop$ and $B$ is an access that is \textit{hb} ordered after $QBop$. + \item $A$ is an observable memory access issued by a qb-ordered operation (defined below) that is \textit{hb} ordered before $QBop$ and $B$ is an observable access issued by an operation that is \textit{hb} ordered after $QBop$. +\end{itemize} + +\textbf{qb-ordered operations}: blocking or nonblocking put, nonblocking get, blocking or nonblocking atomic memory operation, blocking or nonblocking put\_with\_signal. + +\subsubsubsection{API Synchronizes-With (asw)} + +API synchronizes-with is the API-level analog of the PL-level \textit{sw} relation. It relates observable accesses from legally synchronizing \openshmem operations that form a \textit{rf} relationship. Specifically, $(A, B) \in asw$ only if $(A, B) \in rf$ and any of the following are true: +\begin{itemize} + \item $A$ is an observable read or read-modify-write access issued by an \openshmem operation from any of the following categories: + atomic memory operations, signaling operations, point-to-point synchronization routines, or distributed locking routines. + \item $B$ is an observable write or read-modify-write access issued by an \openshmem operation from any of the following categories: + atomic memory operations, signaling operations, point-to-point synchronization routines, or distributed locking routines. +\end{itemize} + +\subsubsection{Example API Execution}\label{subsubsubsec:api_relations_example} + +\begin{figure}[h] +\includegraphics[width=0.75\textwidth]{figures/api_relations.png} +\centering +\caption{Example \openshmem code execution depicting observable accesses, API-level relations, and fused collective.} +\label{fig:api_execution} +\end{figure} + +Figure~\ref{fig:api_execution} illustrates how the above concepts are applied in an example \openshmem code execution. \openshmem operations are denoted by rounded boxes, and their observable accesses are shown in dotted square boxes. +Matched barrier calls from each PE are fused into a single operation event. +Operations are ordered within the calling thread by \textit{sb}, but their observable accesses are constrained only by the added API relations (in addition to \textit{sb} orderings within the operation): +\begin{itemize} +\item \textit{ilv} orders the store of $a$ in PE0 (and all prior hb-ordered accesses) with the load of $a$ from the put operation. +\item \textit{lco} orders the load of $a$ in PE1 (and all subsequent hb-ordered accesses) with the load of $b$ from the (blocking) wait\_until operation. +\item \textit{rdo} orders the bulk store of a from the put operation in PE0 with the store of $b$ from the fence-separated atomic\_set operation. +\item \textit{rco} orders all \textit{hb} ordered accesses and the accesses of \textit{hb} ordered \openshmem operations before and after the fused barrier event spanning PE0 and PE1. +\item \textit{asw} orders the store of $b$ from the atomic\_set operation in PE0 with the consuming (\textit{rf} related) load of $b$ from the wait\_until operation in PE1. +\end{itemize} + + +\subsubsection{Updating the PL-Level Model to an API-Level Model}\label{subsubsec:api_integrate} + +We leverage the newly defined observable accesses and ordering relations to define the API-level memory model, which is based on the C++ model described in Section~\ref{subsubsec:api_pl_baseline}. +We first define a new relation, API happens-before (\textit{api\_hb}), to be the transitive closure of the C++ \textit{hb} and all API-level ordering relations introduced in Section~\ref{subsubsec:api_relations}: + +\begin{itemize} + \item $api\_hb := (hb \cup ilv \cup lco \cup rdo \cup rco \cup asw)^+$ +\end{itemize} + +In addition, an \textbf{API data race} is defined to relate any two accesses $A$ and $B$ where all of the following are true: +\begin{itemize} + \item $A$ and $B$ access the same address on the same PE. + \item At least one access is a write. + \item At least one access is not atomic. + \item $A$ and $B$ are not related by \textit{api\_hb}. +\end{itemize} + +The following axioms are similarly updated to use \textit{api\_hb} rather than \textit{hb}, and to extend RMW atomicity to \openshmem atomic memory operations: +\begin{itemize} + \item $acyclic(api\_hb)$: \textit{hb} cannot have any cycles. + \item $irreflexive(rf;api\_hb)$: \textit{hb} must be consistent with the \textbf{rf} relation. + \item $irreflexive((rf^{-1})^{?};mo;rf^?;api\_hb)$: \textit{hb} must be consistent with the \textit{mo} order of writes, as well as any reads that read from writes in mo order (often called coherence requirement). + \item $irreflexive(rf;(sb^amo\_op)^? \cup mo;mo;(;sb^{amo\_op})^?;rf^-1 \cup mo;rf;(sb^{amo\_op})^?$ where $sb^{amo\_op}$ relates $sb$ ordered observable accesses issued by the same \openshmem atomic memory operation: The read of an atomic read-modify-write access must always read the last value (in the modification order) written before the write of the read-modify-write access. Similarly, the observable read of an \openshmem atomic memory operation must always read the last value (in the modification order) written before the observable write of the atomic memory operation. +\end{itemize} + +Finally, similar to the C++ memory model, if an API data race exists in a candidate execution that satisfies the above axioms, then the program's behavior is undefined. If an API data race does not exist, then the program's behavior must conform to one of the axiom-satisfying candidate executions. + +\subsubsection{Compatibility with a Scoped Memory Model}\label{subsubsec:api_scoped} + +Memory models for accelerators like GPUs often extend CPU-oriented memory models like C++ with the notion of synchronization scopes. Scopes are necessary to enable efficient synchronization in a system with software-managed cache coherence. A scope is a property assigned to each atomic access that corresponds to a grouping of threads in a program execution. The threads in a grouping may share a cache in the cache hierarchy, such that communication between those threads can occur more efficiently at that cache level. +When synchronizing between two threads, the synchronizing accesses must use a scope that contains both synchronizing threads; if they do not, this represents a heterogeneous race in the scoped memory model, and program behavior is undefined. + +Properly scoped synchronization is critical for GPU applications, but it is not necessary to expose it at the level of the \openshmem API. Scopes are important for communication between threads that share a cache level, but the threads participating in inter-PE communication governed by \openshmem generally do not share a cache. +Of course, implementing \openshmem operations may require synchronization and communication between threads in a PE that can benefit from scopes, e.g., when offloading a non-blocking operation to a helper thread. However, this synchronization should be entirely performed within the \openshmem implementation transparently to the user of \openshmem API. In this way, \openshmem users and implementers may freely and independently optimize their own code without agreeing on the relative locations of \openshmem helper threads. +This policy has the added benefit of retaining a single simple and portable API-level interface for \openshmem that may be applied on top of memory models that may or may not use scoped synchronization. \ No newline at end of file diff --git a/content/api_observable.tex b/content/api_observable.tex new file mode 100644 index 000000000..77ec917d9 --- /dev/null +++ b/content/api_observable.tex @@ -0,0 +1,436 @@ +\TableIndex{Per-Operation Observable Accesses} + + + +\begin{longtable}{|p{0.260\textwidth}|p{0.645\textwidth}|} +\hline +\textbf{Operation Category} & \textbf{Operation Type, Params, and Observable Accesses} +\tabularnewline\hline +%% +\multirow{6}{*}{Remote Access Routines} + & \begin{verbatim}shmem_put[_nbi](dest, source, nelems, pe): + for i in nelems: + R1 = LD source[i], my_pe + ST R1 -> dest[i], pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_p(dest, value, pe): + ST value -> dest, pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_iput(dest, source, dst, sst, nelems, pe): + for i in nelems: + R1 = LD source[i*sst], my_pe + ST R1 -> dest[i*dst], pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_get[_nbi](dest, source, nelems, pe): + for i in nelems: + R1 = LD source[i], pe + ST R1 -> dest[i], my_pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_g(dest, value, pe): + return LD value -> dest, pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_iget(dest, source, dst, sst, nelems, pe): + for i in nelems: + R1 = LD source[i*sst], pe + ST R1 -> dest[i*dst], my_pe + \end{verbatim} + \tabularnewline\hline +\multirow{21}{*}{Atomic Memory Operations} + & \begin{verbatim}shmem_atomic_fetch(source, pe): + return LD source, pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_set(dest, value, pe): + ST value -> dest, pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_compare_swap(dest, cond, value, pe): + R1 = LD dest, pe + if R1 == cond: + ST value -> dest, pe + return R1 + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_swap(dest, cond, value, pe): + R1 = LD dest, pe + ST value -> dest, pe + return R1 + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_fetch_inc(dest, pe): + R1 = 1 + LD dest, pe + R2 = R1 + 1 + ST R2 -> dest, pe + return R1 + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_inc(dest, pe): + R1 = 1 + LD dest, pe + R2 = R1 + 1 + ST R2 -> dest, pe + return R1 + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_fetch_add(dest, value, pe): + R1 = 1 + LD dest, pe + R2 = R1 + value + ST R2 -> dest, pe + return R1 + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_add(dest, value, pe): + R1 = 1 + LD dest, pe + R2 = R1 + value + ST R2 -> dest, pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_fetch_and(dest, value, pe): + R1 = 1 + LD dest, pe + R2 = R1 & value + ST R2 -> dest, pe + return R1 + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_and(dest, value, pe): + R1 = 1 + LD dest, pe + R2 = R1 & value + ST R2 -> dest, pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_fetch_or(dest, value, pe): + R1 = 1 + LD dest, pe + R2 = R1 | value + ST R2 -> dest, pe + return R1 + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_or(dest, value, pe): + R1 = 1 + LD dest, pe + R2 = R1 | value + ST R2 -> dest, pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_fetch_xor(dest, value, pe): + R1 = 1 + LD dest, pe + R2 = R1 ^ value + ST R2 -> dest, pe + return R1 + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_xor(dest, value, pe): + R1 = 1 + LD dest, pe + R2 = R1 ^ value + ST R2 -> dest, pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_fetch_nbi(fetch, source, pe): + R1 = LD source, pe + ST R1 -> fetch, my_pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_compare_swap_nbi(fetch, dest, cond, +value, pe): + R1 = LD dest, pe + if R1 == cond: + ST value -> dest, pe + ST R1 -> fetch, my_pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_swap_nbi(fetch, dest, cond, value, +pe): + R1 = LD dest, pe + ST value -> dest, pe + ST R1 -> fetch, my_pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_fetch_inc_nbi(dest, pe): + R1 = 1 + LD dest, pe + R2 = R1 + 1 + ST R2 -> dest, pe + ST R1 -> fetch, my_pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_fetch_add_nbi(dest, value, pe): + R1 = 1 + LD dest, pe + R2 = R1 + value + ST R2 -> dest, pe + ST R1 -> fetch, my_pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_fetch_and_nbi(dest, value, pe): + R1 = 1 + LD dest, pe + R2 = R1 & value + ST R2 -> dest + ST R1 -> fetch, my_pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_fetch_or_nbi(dest, value, pe): + R1 = 1 + LD dest, pe + R2 = R1 | value + ST R2 -> dest + ST R1 -> fetch, my_pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_fetch_xor_nbi(dest, value, pe): + R1 = 1 + LD dest, pe + R2 = R1 ^ value + ST R2 -> dest + ST R1 -> fetch, my_pe + \end{verbatim} + \tabularnewline\hline +\multirow{2}{*}{Signaling Operations} + & \begin{verbatim}shmem_put_signal[_nbi](dest, source, nelems, +sig_addr, signal, sig_op, pe): + for i in nelems: + R1 = LD source[i], my_pe + ST R1 -> dest[i], pe + switch sig_op: + case SHMEM_SIGNAL_SET: + ST signal -> sig_addr, pe + case SHMEM_SIGNAL_ADD: + R2 = R1 + signal + ST R2 -> sig_addr, pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_signal_fetch(sig_addr): + return LD sig_addr, my_pe + \end{verbatim} + \tabularnewline\hline +\multirow{8}{*}{Collective Routines} + & \begin{verbatim}shmem_barrier_all(): + + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_sync(team): + + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_sync_all(): + + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_alltoall(team, dest, source, nelems): + chunk_size = nelems/size(team) + for pe_src in team: + for pe_dst in team + for i in chunk_size: + R1 = LD source[chunk_size*pe_dst+i], pe_src + ST R1 -> dest[chunk_size*pe_src+i], pe_dst + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_alltoalls(team, dest, source, dst, sst, +nelems): + chunk_size = nelems/size(team) + for pe_src in team: + for pe_dst in team + for i in chunk_size: + src_idx = sst*(chunk_size*pe_dst+i) + dst_idx = dst*(chunk_size*pe_src+i) + R1 = LD source[src_idx], pe_src + ST R1 -> dest[dst_idx], pe_dst + for i in nelems: + R1 = LD source[i], PE_root + for pe in team: + ST R1 -> dest[i], pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_[f]collect(team, dest, source, nelems): + for pe_src in team: + for pe_dst in team + for i in nelems: + R1 = LD source[i], pe_src + ST R1 -> dest[nelems*pe_dst+i], pe_dst + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem__reduce(team, dest, source, nreduce): + for i in nreduce: + R1 = op.init + for pe in team: + R2 = LD source[i], pe + R1 = op.func(R1, R2) + for pe in team: + ST R1 -> dest[i], p + \end{verbatim} + \tabularnewline\hline +\multirow{15}{*}{Point-to-Point \\ Synchronization Routines} + & \begin{verbatim}shmem_wait_until(ivar, cmp, cmp_value): + R1 = LD ivar, my_pe + while !cmp.test(R1, cmp_value): + R1 = LD ivar, my_pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_wait_until_all(ivars, nelems, status, +cmp, cmp_value): + for i in nelems: + R1 = LD ivars[i], my_pe + while !cmp.test(R1, cmp_value): + R1 = LD ivar, my_pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_wait_until_any(ivars, nelems, status, cmp, +cmp_value): + for i in nelems: + R1 = LD ivars[i], my_pe + if cmp.test(R1, cmp_value): + return i + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_wait_until_some(ivars, nelems, indices, +status, cmp, cmp_value): + num_found = 0 + while num_found == 0: + for i in nelems: + R1 = LD ivars[i], my_pe + if cmp.test(R1, cmp_value): + ST i -> indices[num_found], my_pe + num_found += 1 + return num_found + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_wait_until_all_vector(ivars, nelems, status, +cmp, cmp_values): + for i in nelems: + R1 = LD ivars[i], my_pe + R2 = LD cmp_values[i], my_pe + while !cmp.test(R1, R2): + R1 = LD ivar, my_pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_wait_until_any_vector(ivars, nelems, status, +cmp, cmp_values): + for i in nelems: + R1 = LD ivars[i], my_pe + R2 = LD cmp_values[i], my_pe + if cmp.test(R1, R2): + return i + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_wait_until_some_vector(ivars, nelems, +indices, status, cmp, cmp_values): + num_found = 0 + while num_found == 0: + for i in nelems: + R1 = LD ivars[i], my_pe + R2 = LD cmp_values[i], my_pe + if cmp.test(R1, R2): + ST i -> indices[num_found], my_pe + num_found += 1 + return num_found + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_test(ivar, cmp, cmp_value): + R1 = LD ivar, my_pe + return cmp.test(R1, cmp_value) ? 1 : 0 + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_test_all(ivars, nelems, status, cmp, +cmp_value): + for i in nelems: + R1 = LD ivar, my_pe + if !cmp.test(R1, cmp_value) + return 0 + return 1 + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_test_any(ivars, nelems, status, cmp, +cmp_value): + for i in nelems: + R1 = LD ivar, my_pe + if cmp.test(R1, cmp_value) + return i + return SIZE_MAX + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_test_some(ivars, nelems, indices, status, cmp, +cmp_value): + num_found = 0 + for i in nelems: + R1 = LD ivar, my_pe + if cmp.test(R1, cmp_value) + return i + ST i -> indices[num_found], my_pe + return num_found + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_test_all_vector(ivars, nelems, status, cmp, +cmp_values): + for i in nelems: + R1 = LD ivar, my_pe + R2 = LD cmp_values[i], my_pe + if !cmp.test(R1, R2) + return 0 + return 1 + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_test_any_vector(ivars, nelems, status, cmp, +cmp_values): + for i in nelems: + R1 = LD ivar, my_pe + R2 = LD cmp_values[i], my_pe + if cmp.test(R1, R2) + return i + return SIZE_MAX + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_test_some_vector(ivars, nelems, indices, +status, cmp, cmp_values): + num_found = 0 + for i in nelems: + R1 = LD ivar, my_pe + R2 = LD cmp_values[i], my_pe + if cmp.test(R1, R2) + return i + ST i -> indices[num_found], my_pe + return num_found + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_signal _wait_until(sig_addr, cmp, cmp_value): + R1 = LD sig_addr, my_pe + while !cmp.test(R1, cmp_value): + R1 = LD sig_addr, my_pe + return R1 + \end{verbatim} + \tabularnewline\hline +\multirow{2}{*}{Memory Ordering Routines} + & \begin{verbatim}shmem_fence(): + + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_quiet(): + + \end{verbatim} + \tabularnewline\hline +\multirow{3}{*}{Distributed Locking Routines} + & \begin{verbatim}shmem_clear_lock(lock): + ST UNSET -> lock, my_pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_set_lock(lock): + R1 = LD lock, my_pe + while R1 != UNSET: + R1 = LD lock, my_pe + ST SET -> lock, my_pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_test_lock(lock): + R1 = LD lock, my_pe + if R1 == UNSET: + ST SET -> lock, my_pe + return 1 + return 0 + \end{verbatim} + \tabularnewline\hline + + +%% +%\EnvVarDecl{SHMEM\_INFO} +% & Any +% & Print helpful text about all these environment variables +% \tabularnewline\hline +\caption{Observable accesses defined for each \openshmem operation type.}\label{table:api_observable} +\end{longtable} \ No newline at end of file diff --git a/figures/api_relations.png b/figures/api_relations.png new file mode 100644 index 0000000000000000000000000000000000000000..732cb7da18878ea2fd23d54125af260fab13f04c GIT binary patch literal 53238 zcmce;cRW}B`v!g_yX-x)g(zf?jErnWB75&q*(+OSgk-0TWJ}84BrD0@iZUWuS-<<# z=R1Di$K&_!@A0U2Z@kVq&+{Djecjh}-J-NKl+NSR;v)!hUPW0!2SLun!e3KdZ1~Pk z&pRaW55_$mB{`(DpZ+_1an?pwT^2zq;t7t-u;6PvSLHkR5QMM|{fp7%QfP@F1JNo9 zvbVg9*QNvS-g?j|vpLF^mM<(gsMvJb((z3??vI4&Ag7f83gIdBItNoKky0UX>aL*{fHt%58^O z+S}W6^72BY+Dilk1uNhiPm+`I9#~h=SXo&`#>Xo;JC`nb{k_s;ZcjE(-``*AWPSD| zw!|YnIQR@XC1vFZDV>P3!u9J|hU(~aEsUJuwmlp4>J|5$_fZn!;+v}`r_V}iR$rzX zn-~9`&-!PxXJfBky*ekVp%bIX`fGc8Tz;#A%A{O3_~^(leN~jZ_;wj(|6v=`2~suA zd-Qt5-b^5^*^S=b-abdKI7~RHM`8T$itLj&;nA)~J$lsoW2E?TQW9&?>F;4pc9}Z) z2M^vXEk9WO!py>Q1|g=Rig22&QJIiAJ-XL3@OR;6K9+?jW{F~QrT+Qj(=!k+b5gss z9KqJ@d&Lr^x7N3(bn&@dva#JUrt!vHBf#3x{mSxc`o=?5l|EqB0r( zOdnqpCeqi}mzd3|nZbg=k|tt-GlIO_QQ*Ub=#U!q}SIdiSJDQ=c9Q`hBmf__WiuZQ+f1NUV~%XrAPYr2`q|f z|8rVZu6g|#6XUWsAogCWeqpRVeCE13!Hl@Std1r+j~BBEXveKw{2KmF_5K<75|8H5 ztmZLB?Y@er_%>rr6#5g5oc}uHAC`j^;}(xz5f8WuvQ#mpY5?m0qD&Gm+#54GP^&^qa{H0O;o8S&mvX7B{z0nGP*bG?|pV*E)F2C^>!QOj&;x}dXG8$X@z$1?)| zJDu6sv)tn7j>9_#mz(3rSZhyR_g`6{2zs9n6MvSg376Kv!QpXQTGZ30)IH8TSFT*? z?(4g5VZrK)k6gHTvBx@s@;3gKiJxh=r}lE@<;71|35PplJL#{g2gA#gdijaluC1(G z)6pSaSX|WB*N=pArs50XL)dvLa%II9HqrCEJo!%=k*s!}F-j{d+HzD^XRr40_sGMb z&(dNoxACZTwYTFwG&eVAq2$#M!eR2$vOQgh=HHy0J)z`C!b1`g6N5i}(#dV8s34)@ zHzp7GcDz#-*imWKiy%G?jG8jXLcX}j1wOvG5*Fhh-z}E4+)K$|lqrtvg@tw)h+`CP z%OhWBvP=xgQzr?JjD|>dPmSiE0}mG9ZMv%W%s-5!vv;98^kK72NGCjs*Pt2~Nk~cx zIZr2C9)TQ>cT*XAVL+@zf}TBlCL7WK$D_Tg3lo8;R^hW_ze~Je5l$);V7}2hRW^Vx zBa=@XeHvST)BPMeNnCz4zJIC)q%o1g5&H-XRwOhu)O4{kwspYpt#fiU*+Mt{j=mb5 zh{1lE$)gB#wG^c3goH&zFcBfz^OKgW5Z$7pqJomTdUM~sqj~r49U9Mne|NZL`+;Nr*0@bJg>zR| z7ptHk83hGJqD8%|lT!)t<@^^fnr(%G7e!s^aq@;DGcxFrpmE1e2y9QQ{^rB_kOAyi z5)u-wix)`(_;`60+}#Dp@v#yU6BE+YqaQtbL}Fqc!6aWaWV@L@d!(4C5(6PS{5&13 z-x;Lv)hp5o^uWl~N5O57glEt2^M<%$ZZGNT>YBl9C~IhRuD_j)`~3OV=PzGw#BGsB zcbth;3`sQ!pkZcab`(TRzkN(|q(E;Gd3)=74$1NO`wt&(JXPa(^5n?{cJ>bIO4U1e zXkZ^gF8s0e(Zb4#J&s8x?xkvKr8-=M%Piey!`XT+ALbB3Ap7q6BcZAi`ZB|@69gsGmf1$7&e_Swo~`FXUJ{ht_8C`2 zpBGyDku#)o(z#9@w^{mNiL7S4biB%j;BARs@e6f6NEEH^5(XudU%!5}b#duB*k0DY zb0-RR{)MYoDQ>(>lzMPYd_?~v5luw>;pee~wn=TNV?Nidq(M@$a2vUy7V40+t%NwfX7}g+8~GbSKj}@BUy0y54xM@X+wkP+X(;J3Bie{g0d- z9Tf}=Xa}2&7Z&4X`WaaW+!VKKCY_V2w;=yR^x`@?>Jyn&o|q0J;u>FLqmsen*?6bk ztzyAw+kOd!}xCIv#73~kj2ZZ|mY@k9mr#)xq;lL-I zwd!{iTPcmY3^_G2E)M%oU4e+p4DXaD1Xw_3C}=WxDt4ZeDZ8NB_Ge_8*1DDn<5bFeKV%30maK?u0X>24pnC1k z6z#!_T^Ib+OiWeK!#VqPSeCU6;%uGg&rTE}8E(NHKW0i$Z7&aWWr*Edi{q^B%7k&) zXi-s7sWKY=Cu(OOui4<@;k7LFBms)cFi`rGzBR6*`tt$2R|m#wGDcO>(4xyj?ru&c z6?9Kxbnpi&F3{}eal8#?PQ_`ECLnjeMb2_$l|tXuep;(X_y zW}agr`%I-3N2fkL4QtSi3mMbvs_(eU;!8$-pECIZdQ9Cxu4gf5V)44xd^XnG+q=8B z*P=Lzh|GqQj^8NkK4x?L34G(TEEOq-}6&&Xx?eU z*(Xmj>AfkVvBt;8ySuxux$tA{06Mu{+0agj70btc>C$ySKgkatKIDwpqmgZH&J>Cf zW-&1uEG(?Gi%5`^@9qWo3Nk+=y+K6vxZ;MY>UdjJ{9;woojtA!F&6z#(+%4v zlUoYPvoY(KUz}e{*yXM*;g;+3!c&`5E*Xw-z#4lDDo|5l;o!763g_kLW4pPz$@xRo zM0v^JECSJvYWLkY&Q>JIxU8!zV23By>2P=b@7aM&$imHiqm~g*vR|z;*S~H$tt%bUoY8JcP243Y|S^X-3!0+~}w%45a>WqfQR##)*YU z-WAVe(~m&u1u+H}3ITw+6@8%yp+{%M+grrzlTx*p3~NlM-#>uSC;kE0v^MpQ3<``y zAGt`EyqqC_z{ez~^QtN;Ej(rQ0=6EmQEgl_gI|@R9>R*@H=Be~^@@>3R zutq`KOoMN(P7zE&Wc+`w*M^$$%j-L=N3Vx$FMDp>{g^I7RAnF<8<7K< z%BAsu1EL`N4g{p8qp$qzG#xF$ijmTT8$N!a=Q>Tllb%s_^AyR>{9Tag20Qw3W~S3H zLy1mN`MqV0kJ&ONfYP`4p#&kqiftPl8hWTHeQ-A0f0qVU98JL2K2b>N4=SfAE~wwl zj9Kg(SBdTjhN5pglc8kzJ9?AQh=?k6#-Y!jL(3Zf;G>rk(u)RVA1!f?m78JuQZw_} zIXJYv(k(JoX~Q(Hn@ zuV25OXyFZsksi^=8!A^1^-4}DEuNa$)o^xN_2z5%PG8_4**0HgO_+W4-uCH4=Sz{y z#V<8k%dh%U-@zx>zO%{|47(w0>^wZtg6Vy0xqMEr33n|oEh)!sTSM@z@jmj?=1!ifu;{ugE`8}ztc2HQ&Jz8?dss5}$B!Qimt8?h zb8>>DBy-BkaA?F0%G$7AzI+)O6(tMD6p9c`k9+IWb*Uw1kowb;LnP?ky=6c_UKG9( zi~p0m@gx>S4Z8h$?TWibv&c1_=j-y&{B*eAakD@J1!YeKZ3rhOCfJ;)`0ifQ*QfsU z=@T&}C5Ih>5X@qo$9iiUkxY!|tbeK#2cLdZfK(i#)T8chWqJvZbM#$!tJr164Yca& z>T4H6YvW>Bj*}&_UO{PnqGKi)vA({Zp;!DBaXmOVkdw-`JKOwy$EQaOzJkdv(Ywcy zr1D8IQl>mL^Opb62bg1xQ7^@AjSWfPhsDAs)|#HtKYrNriHkpTu(dT!55EL6iP_xI z*?Hqf@vB#%YD?GdWQ~pK6H2Gpk>C5ewLKdGI8$yK=CiiCT8NOa@=3e*(X#r{A9s4r z`o`W={?yMnq$gEfR08*(*xi>^P&fu}q%KMU#oHTzAIY2M6NYk>kN2u7}7qW9XQDNCxIJ%W&-b)h4g zgo=vOyA&eWSd|TtZ|!TSL~?YBLc1>e|GB^P?VI_Z-`_Q)_)gwMmcb`&L|E$rPXo-( zu}(Vd3}>H^BQS&uMVyw(V_ct1-fb79q^913?VbU^@5Ft`{*CaslExFe*F{D1A9{Pk z9N7>)cJ?qq+E{pxvqI7m?+zpgQmuZ>-H)}Y-Q?GF-+!G(aQcf)RnRZ@o(0rJh2YFW4V;;p zi>5CM3RLlF#P_c27!x|51GbJeZ^Capzs`_ejnGCQZd3JLpgWPY!?#K`hBY%%zrmHl z@)o2wIj5KI-LL(F(RSm(?x=~cru?;g{EJTsm}SBhsS?b%f`9&OZ>AZqsTZR z*wn@VSsFQTnflz*D;C%g`OzFwGtTFml0Wnb)sN%&8>}B z$z2$**;f42lf>ziVsa)(=Ja6TS|A7Mv8sHo%*{TT0HrVy$a-AKH{f8c%{G}7;snVY zf7e8UXaud5f8f#JqpYtqGe18O0VOZ7z!mI}mJMhAQeqpTNmMMS88D?N)Gbp;;b_(l#J7#IYd z#7t-g{2~ht4}V@%gjd#d@_ukfg4!)V<$PoK>gS@H5NP9Er)z42>reiy>c6q47|dqJ z3bM)a#$Fk?)wI75oyfZe1?d*>BIXP2QNn3$C?kQC+mCS1XN;Nz8m)91H~PH;5;r`Q zVQHo8%t{gI!tIaNf+B$@erA`OYFF-y?`|?{~*WJBxiT=dx+^w;+Z(x`r7viph+Sq^2+zJH-a>-Qoa@2>Z75heFT+t zNC^iE9GRY;9$*pN#pzbMpIp4DrK=mZv}BnZH|uTF6~~Zh!EmsW7mpyP2id1R!siVw zE!o`N-Pw#bD=I3+DlBlMPxdW(Dt^{)$8Fx-cXs$KlMPpslar5~^Tj=XzJoK{FBlTT zdfk3;<_s&C$}vX7X&z(FWfJAvMNttmso|A}>i= zujSj3j55#8bRw;q)Q?$vl_Q%rd(~L6dxUjT9uxnx6ru%ybv~q}S(urJhXBATg>~xt zbh+GwqqPE)*!G9s+uvHuD{3#%jR9Ulc>vX!$n}{9Isz&_%VNAOpMXP~-`Buh(!#1JH`bG3B$@1Lp?o8VSpTB z;lu)!yp#m6M;?CpqL-^tFl_d7rh&1SPiJuDl*{&<+V8+g|08KzavQpf@XmWs0Cl`j zNrDVm?y+tV(Wbg2*Q9^MgcbWR>cpikem2f-DXL5OlfI%{{cSHV(cO*d=u8PuA4yYC zbOc13285o=Oo~E92Jb0HPwE+)Clkqj5+(qb3&d`a5krNW`fIb@)CX#anW*5-W(C>2 zL$B+D`|}Y@t)`dfx*xNOx-a3rfB$|A>dgRW*fLMW?r~W6XW(UOM8IZBlnPBSbMqY3 z*U}2Q>hR^ll`AC4$;pm_Tcm(=2O{I_)_(j5hOz_jeDLO`E2MfxCnu*l(KvhxZVWxZ zT;&1BJ^*YNlW!Y8G%hQUxq!#q9v!1UI5sj8hQ{a5h1zJM&?~)*J`sVM1ImXPR8MQUq(F8ktE*G=L-{5PwHDxSX(xqX#gLXDpyG7G9}^4YpwH|{ zWEXXvL*QPMSA~V;ibtN8mlLQcZ73kTG-OO7;eEp|OlNC>K|6qwHd ziWwzn)6&w`=0o>ili<!`%G3{k_EQJN_Jt7HMxl6SkPiuJS)`RI`Jpl z?HH`_GG8nOaw{qxJ8`rx$0$R#XP)Uay{wFnCkZ7EH#p1iq6h_ zMnox0Ne6(k=y-lTdd$8qh0h2dC}}v)gRj-(rY>-E<3nj5G45C+;jtAfGJ_}xzV3je z3lC&lzIE%?92{O?0)X0mo`<((L@tD#Fnhe=c{xoD6tLg7%cxKi5s15C3|5!pKRXB- zDWJ+sVCLfEgYtlHCh+uF`Nj>>%6Qn6TE!!{PK^X&O*SYD~1I`>X2M0Fn1lZ+p zORuA?uJN0=TPpA*;2Q~#{){$ZBd-nKJc2oT0kt|*kBVky%ySo_=apgOz@h^UNgo%; z7Anfm$497`M*@NhoI%L5a&Q1qT^Upmm!@M%ZEfu}J2b?-DlUEuf(-CtFw63AYU}Io z>d&PUH#Nrwk-Rs&GxGVfD-ybFEkyh7$rJKisGD7V8nOh_)YgjVM^B$2bEp!`Aq%`A zP7qYta`Sb6B84I`Y`#fX7PwS{68U6lLSbk=)Hz* zg@gxQ78XW2xVYG8(g>eOgHTwzyX&D*HhRzg%`?nq;4IPdrMRfbOfv~+o@EdxLif7H z?{#F;#45JI5S4#mi>Ic7mRzrRa*lDopE&TFq+4M7_KH8drI+GTV1mp^7sY8MKR^H5 z1FUwcHwJylPo9()5Ujgn#rj>)yLF4Lh>GXp#fo=?hM(=P+6d8(-;7W%8=ah@GORWz znZ#5&nP6vUmn*j|gQHSj;C}a%3CA&^m$JclAgj~uI6Do*-k#-_5!il0_+7m;gYiOM za}pu`vym>7@9uG;s7(dK^yXX7pT@hl_L9N65E%5eV(c9S|IVhI6AJt=>s{x zblB(Kn&Z!PifU6*DF4r}TsH}3;Z>UE#@EOKXRgT(X@w|CbHPX*lY)NU7_}85tnyyi z@bFwPNrG)r6Sep!L{K`tOL}jfw8>`I_z;$l+4ZxnP@=Fj7d_V@H(}(s4EY+(T0?b2 zpvlzngcq4-;I}oBKuTC9g#ds{!as{^Ev>973c$p1jm*pv0`Yq3Luv{NtOjGE-vj*? zsrM6l3$g?75wDfph%t_b0OJ(%T6+1L)?6ox(QKjHhN?N>2Ees{W1ycEOhHN-4CO=B zi1W)K+j5ZgYG^@t(tqm=g&9QS=g-fqWb3zU8V{}Rc;IwlHUkAf@W$Z$?*Zdf%+tm( z!~0JYuGIh-skwoo67vrHnrHjDF)`;}muNlXd#E`9!r{tCK<1;{-Z@ryjQF=A&e3_1m|3fjZ+kVP0jxEvYb{$Y=Zo zwttroo3@zC*%$bHc;L%-@9vDX#?5x*G&C?Q4`eHW)F=!|9JI?j!BdbW&E!=6`4_;? zf@+YUQ@%ECm|!(@-LjxyC@L$%QApBY^j;##T_4PqZ*^LzyDy&kg7}oVA4TzK&eg^a zJm@tUM9wr@}KoglA6JujsmLuRiVt4YptfQyL zQMBwd+vwkIfq(GT9&q6}M-bkvjwJ5(_IQR6%bP}K75br*yJvKN%Xl{SW(@4VGX#SG z%IDhy8o)I&q>WX4I?n*R`i^wZ(98fRk^pQvfp*i@(rOm2>wrTUHi;Qq!2YYJudky$ ziY%;06Uvb`;LdWixXOX&$~jFHlu-Gg?e@EC*wol>>w3CFRByfac4!xCN_sj0cvK+X z!V1s=my3CVu^J7B1rpiC-Z6gSnR^QXO!%S zmrs5d$Xu6|eYm*~1}iNCgGa#OwmJ&8SqiAUubz&|9=fzwt7$$9_#OpA8R8Fa6!P%% zciq&K!a@L92d%6;?uaJ}*qiob-#L0Z1DxWEmoHT&6hFWWUU6GcNbeG-2f*j3+x)9@ zFJm9^c;li;om-cj?8O=Xa98__t&oQ&5y?+U^D^5v24*}mz7m1uhfH9L%I!;YI*a4; zabm`G!dG+K@azQZ%-1e$nian< zS{|XfPG^<+f!q^CGIWZTJbunxD!R>`Os`<5tf#As_wy3Ju&|!{xl^(D*jS1<8vK(0 zz$l@AiEj|DS{EO*&n(j89XVhI5fBh0)i<^`)PI#qB{-crIuP^0x1zP$1a}ySJ&Kj4 z)lf$S1V|Gyh+BG+tOBb2JTj6CBi1J)e!l(V#}Dj`$X{JDC))0T3(iv@!kgXA%X3_; zahlZ9(J@adxdGH0Ak)UN>Z=LPK*ZD@lQ-PFw~79fd-PiKKAfdDK@i z(Qa{h+6qu6$`TuwQB_7jZ2^!9^k?BVJTCxI`@54a+P7|5ZchNx0c>#2>%iry z*528<0OW$#kD>xF!P&g^^%q4CaeGIHN;s8ThA0V8c#4LGbU+%9&0~Roq*Y(uhH-SY ztSJsE$w|rh{Co~D4u*%vvPhYQZs_UB_YixNX0u!FX?k#;NGrB#{Y(!WSX-A^C)oe^ zB+tL@xI=5kP?Ok()^lyF1^BlZDFxTvwL2O`w@__Djv?J{Nl+R%b*jwo;wSem^$Tsx zG#J)40kzVzWj^!t!36LzRG=eBmcpj&xkncVq~82{N0O%7S>=8Q?&)2oWkA$z%(aBH zdIK@bu<4D=d8~~wzvI8+;x{lI<$tMrFt)pRIo{>jTbx(`FmfXKcjNgvIhDTT+`k~- z;Bq3hJ(8onvG>`~Yugih_G>TIOPwBePR@H09&2;`SyH`>d@b0q8?7gamQ2~AbEZ8M z*COlE8IoI!al}HDKqQF6+hgt8!D(ycI9gtTamcw+!+1YO%P1}E?E;;Ym`fUWQuugP zYL9YlIp!S015KPr*k4D#Qk#JIJK1ZpNslP0si6ei?samoJkBv5yW}7L0#d`%K5I4( z4npsmOEir{()$RE=o)0a*49?jHwN0q#&PVbDHk|6I*Z?9C*NR)t&5U0RqE37@0PRL zAQA$F1WvP~?EzFgotnD4t5pClOS7$oj>L`c@KK`ob4{GQCXMuT!j9K$ZMhKB-L>)k zr6f%dZo)y+LUydd zSA>O?)$PaUi$@DoO{PS9Phqb@{TPBnBv*8+)&7k+>~O-a_MRe`sN)7h!d@^7jSaKG zcte0e#+TJ`bmWDzk1aJoA{{jI#v#>mw{FCK=aE~s%t4Rl7&ow~3sIa|$e*syK5dI^ z4=sSTfVEoyvBdG}<2z6c&IBAfA+CohK`1Ro#n zu|n0PDYZkc0lfIxXo%rYB|Q0jYlkWM?&21YJOYZ5UeZeo*BG>raMK^@Lke@~%$z)r?q2SUJDftaVh0n`CCq)h9y=cY%sQUo7|REvE193PIP+}GR<@ph0O5 z=xLZo&XeowYy6tK^fhDY_gT`I4?nEZ2OO0pE;6X=xrR54zg@f9;gI#KDr7ygi&9B8 z|E^?(MxNjRorkgf4D8@A3v2>Dqm~&N!zVXhE}alEN^+%LSSkdr7I#^6d+MuKUz7{o z;|)8Yy?|cMD=3I?VhJUd)zl;kIGmgX!L>r&m{~x86haYN#{xzDqWJQqOHP|AoPySv z=Oe}p?l_H=V}Ro7_-0hh`xhIWy>g2#!VF1ook~IuWGRz+qUM!3P{o8OgIuNr+YpUD zYBe<7*{4S~Ij0%(_md0 zI>zzV;%)Ae!{VW_@er)p>Rk5DLrhVZ7vN~5l8GwWE$oPtZPRfd zE18)jgOIbB)Ahxd#i++?oMYjv3oflLH$}dBdN@ku*y)3m*x#{gjY`+2QkPdQ-NYobKWUW6!$fhJa;dHZZ^7GOflQ}%1PGs$G*w2GY-9EejvC~sdn@2H}zY|1Z^10zg#7~|R34ppCip;C(V7`#!!qpmM&p95w8 z&F}W3JPMpOa25$K*Xdx>^0ofQUc@qbdc`;F9I)z)faT5tjD~Ii@RBZAU%GV3*4DNe zY$g>@@q&wk30{27jMe6s461)r%~o6rC8hzi{qCrX2E0=6s-9p3z50y!{tbq zo2{K4NpR2a-@j!!ySmNKdOTzfMSzTMQ)2RO*CV4sbxu6$`T2Rd%K&~gZr#33(6WR& zb*5l~*Dh{Q9bYz;?EXS3XPFPA_urDcHi)Cu3pFG|=171_Z>0D(Sg;civ+oUYDh~<< zFE270WV=1JWjV*3rIgmm{Hmnny1o6Y%gw_7AEu{QlW%iaLcKUGY(!eS&T4}ykfJRpD}3KMeHL$bg@X}L5tl+ISnll|vR)iyBrz^H+{C5XLB zvGU~C&pc?RaL>tc!-WzR1A#nVSX?asAKxm#N~9Uww_u5B$-qL%wzAPh5Qr)IFLHCA zK~M@Dn0CooUr0lZ({M$r-v1iKtb&7HH_i8mLJDzQZ5Cz07rYTEL^w zNFcAQ9I5s8{&yW{I6#`g342|MOztS^<(_y<>idhs@QQaprckEm%FoZAu*6yY@k8tO z?Fc}~DpH8#>Cv_$1v2-&AR7#g4S38aJ*Mk&``7`vpNRTl!#Fvb+~B&H~|p zmgq*FSEE}a9q5B1xUwBX%3{5@Rvfy2NnVw3HCAAlQ>ZFo`fI z3IFF_zKc(f^Ff1D&d+4ltCNv-U?l`v127&vqf}rN>k}p+b%jUf19p3npO5xfhz6cW z{^iEA1O8m}qXw>9%zE>k6^q_6-F3zj)3 z$xR``IwqUK7Ds0ZNNXGf43w#mqxLZ@Y>2=R{PFKPZT03}%YsNEZ2|KB3|=|79d zfO_)xMVqF4C(J9ZmFQKCkb<&!{``3eNH*!%y8o0Y#ZUkd1$R*(8L=Ye`39Fz%VYcI^h||gP;TVU&x&95Q<1gO{y6rcb ztKIS{z8SOYy~$8A9Jm@9#~5biiTV)yn|M6@XQN7N_~ur7rE2fT%C|qgz}sjc`B^qE zbUVJkKB7S~>$Ha9)Zx3_cCs;Y4H^TUO3hP}tKoM(sd!?J^u875In`31>33Yqw`lE z7=`6;kfR*~-*_4#sma*x?rAclUMZdzg176*QjFBx{wKW>a`u8mQ`)`nE3r{$^(0}4 zWn@ zYncf9J&5Ocl?G!VOoRB0=*g@|Qx*R7&_WOi5+JjK^Hi$EqioY?)^lN1RSLO)M?Dlc z4ZR6uxp_+SFcy~p8;VSoNSE!&91`S9sZp<0kf^j^vqKdy9EXeJ`k4efG zvL?s4k4yqne>WAb5iu*1VujW}+BSkN0W#;uXJgD|&NosJ8)Xv*&W+0EH8`@O%{^J> zqkzUrH*QqD!%dD#z06c=shL2M;8SXL-#R37H+Xbn~A8yr9ylIxTc>qBks#u|rQq z=)sN^MMvlK3)5nHT9FS27Ld3Dm{6$~r41uFX(W7e@ZU#u3i`a;H2gEe6!4Bg&+2sl z`jtk*XFv06mdjEP-@oKMKNzMSS%w#xqy16`6whZsXcEWg?$G$Ru^wAc%DkDtjzHE* zhxff)XAgZ=%tfTH5c$o>Os4t6RkRg3TYzlZRrwo2YimWPd{#?<)vQbblgZd;?OZu4 z%ifZ=Gmqd$^}j!ppcRcEiNJWJo7n;(MTx7(jj+))F&37BGw8F*5Z~anY zI1Zj*3wgX)umBKmUwG-I{hpTW;&t{rL$Aa0E!bmCi?DgXSb##!3<2~bV(y{v z5x@!t+3>_vc%sLH2SdDea*;PV)Sk{&2^NB_{pH_lmnu-1_9zol__~NX?v^2Xm&kO; zRA}qTs|$bZ0h}}av1Sb=7P;cFI|p4G#+{IGUBFv`{8Q|c*ACLt(|Z{yUES~91J}Bi zp&`#_3Ucymxmr2TpUb)PGWUy`HUzJ1RllZYZkGY6cNsMS{FCKX8gz=D9MRkUzoc1> z!ob~e8_>rSQc}X>;u??bRpv-eAAY@`mJ>SvdrwqSg9R=6;Vi=WiLN$p%4o%g=aCz$C$+jBk|1inPxe^3`0spPM zoE)3e!zItALk37VzkQ@wSXjoa;YV_LLpiTst9^Q)-W|koO6Gdx;&=(R0`yX=Tm#v+ zxh(M50ci$P@|bnYpPg0qQ&NFKrc<6l*Mt)GP~y`T??{}Vk6h_fpFZuzC^rPzV!Z7g zKtfd4MD@SL8#iu@$&1A;);@biJ1-iyD4NvMjyh-$z}0yd41>O}LVGM1pnw&PRbO^sd%dL@*Duh#&nqTK%# zw9+wp9zKvNQ3t8rT@8)6zKZ^gF1HL~OG*lg3Uy(~%s~Gdm)WF8;+URo6a+G!z2lDW zBp6KtouFkS28zx0ZOPpdNzx{yJNDzVyipVS%iorl!555bhdnT2W@<~pdkZ<*U={`e zFEN>ypFh6jSNuR6|4uQW_+PxD)x2Jax&Two9YC+cdEjE1ql63fa z+HoY-BDfkxp+d(rWhX?N&dbqtH+um8nHooKzxHlfLxT~n=(xrHXzTa|t)u;-=!tF9 zWkDWJXmzfi^>fqPPvt}%;=ogpfH}j9Q9TsT6m`-g*NR7OSnEZ1baXI-L4=rL;D);T z8wmn2@EX8;mMDT$2ODirW~hBeZ3FViKX4D+1?fum&d0-%vH*7RKjHt)MFOU~@x|Z> zC~*vgNa}ZjmPi7?`1xHb6F+7<))O0OI&e(BxjF@@(G)5qXjOA0b4yyNiPk!u#rrEk zf2~fI8Y$hI;A7bt6KfaBklj4ZyoUZ&YomF=C=mfasf-|F0q;e^lWX#_7DzG3TL}0f z6PyxJLkcpq#p~wUv4ANgZHh@2)R!ixRNDvZEpKGR0HPU!pl%%m3{peRIX%8hkCj3v zXZ^NgT@UxS77piwt8)glON<-!Cv#{=w_XcXDtIz*yPzAB;H@|CwHFMpUMSvC z6gJJ7iw6#k{is)3(Jd-Io~r*~qwZ(uq@RZ%aCn7u=T7&*&Niul+6ODrt?qwOVleVz z2@O_!SCUz1Ud*O9D#Bxf08JV@(NO&YPRCDj9<&xJ=`b`oYNoG ztw~D=vN{c*sRBl;w|&U=oECn?>hNX zcxtiH`Z`2F8u>liQchs=(tiR z453*tj;oA~tBi?F^+|AF^S#^nWz_#XlFk}D^rC@^f3~=N4e#yTZo1Mu{59JRHQfBw za6`!MXUDfl)A{}0(Nfm0>Gy2w6(;3L%@3Pm&YpGK+(X^Q6H6Q$jFg(U%peJw{#xn* zeeBGiE`c-6;Dk+M#dFK-Fqc(S#5P}?2Tsr>!bc7shjsz7@=Vcek;# zvQQ4HI?h%+)5#yOMUWF1tPH~YBrn>^;p$r6X61c%XMZuC0Ap(6jl;S%9rSC5jZc6% z00)sgT&t4v1^i zCw9LtyVVS7rwqZuw+2q>i`#say3sx@tmqyixZq3|lKgy?>SU4S-1CaZ(4&n{i>VI` zNrKZ|_f5(&L4PQ0S@puPh9r%S16GmNbz*oR$D#SEP^uYFm1(?QTAQq_1I8TnrTJR3#oVt;o zmP5>3n|!OO^A-p)U?f$4WVrn;I7%O=FGJ%9@R_a8p8t=a9*SoSt;}!l_9@y* z{+*|yZY=p4nU4yxeDHy~?f~X^yscgjdT~9fVip*TC?n#!EQd(N<*$0L_36`#8}A=9 z)M%*Gm2K|qWq4r-KY9`kV1_tie=S!r&n%s7VsbL9i7m6|2I*eXs)PkRy4AOM_`4`1 z$Le>9*R}kIetZfTM%$c5si>qUZ|dq=4Sy>HnX}S!u^QA8H)u-+@plp0k!n#beisGf;UY*eUM5PqZ7^YLy4!-%Y2MVBwV!Zt1!GaSn!w20J#3kAv;@27CF5b*< zHm{fiPs9FP2+?Iw-*<`QZ*nJVaVL+xaiBsc7x?_Vl9E`dkGX%Mr^|QwE6A>Vvp$kZ zKD4eml^M}ji22@YVBlMpcVaH$8#%g-O7@f}n8bs&TmHZURiyc6;HhuTolWS&z(i1G zt|n^#A&}6(noizVy?y((>LXTchg8+N=IZ~_n3Z>Z0yo|*Z?+QfBM77aZ|z)ZtxI=n ziDAv*z>)-a!MHDoh>RA^ZRKz!_U!fRYljdpqEBKOp{ z{Ld@hyqV;BiXV0cE%?CynXj46M-yIdGr+ieD<^Qg5?)lb6nc~*(8e*TCkgWwH{;#6 z=8OPjqb5Eu3?e9)6(%6N52y_&c-fhw%q!4ogf`yQOfWZ2>VnV=3r=0Wx&uTj>LLK4 zG3-5v$tof&vtbROArGaswas)Xqn}DmN#VGU$(?-TKizh`zJ+t&T^gz0C(MdL499Y$ zVg;p%-E(tJc2{U4b69Wwk7nSu)TAVvU+R@Z>a}F(3+Q=wwERIkDkg@0bLUsh-V{}N znOn}5e!d$RziYj=IuR>5MW|dN%a`ICoEvYO_)`ERt-MdP^%}Hy^ELftLGLWf?+qL7 z`CA~9%0n0WYds#cQY^YzAP0mC>?<$SsBx!ya8OI1m(p)%R_f?OG{^qq#K2-+?$8QofXM4T#o}v9Vzz z_SenLpZEDkmR)Qto1d$T>6pF#PZ&GXH@AM|W~zb~$WC+bLkF%>4e6v&Yh4$F4H2Ck+l=N3}AxB-IGt4bEX zfA6E8B)L@mwjpWwYL;1XW+murd}3mEo(45T-(xvOReV-AZjT#Nzs3(YY~M@1W#dGv z8hB~V978%1yeG7~S*xfJxc9fSqlRPXlZ4D`33fX=zLP`cqFWP^{80CR6fy68kG>0} z&dK_?xJRt+^~>!|-n_%2AW8kLu3>17AoS69sia)G)8KPeG_DQ${`tgGyuLth`}F&# z^%4MkiKS-Vy!7xIklLe1dqhwq4W5GzIL;)XN`YdICn0n|Ce13YPooSb3QB%s-ZiPR zTimEK7VP9|X#&^X9hH8@Hss1XZ%m5#T-fb=9R43=TERlwd%@w+&)!Tx71rDJ$Q*Q_ ziXn60)&Q|+C)Dd%SFj@JnBbwv`9C_QkJIP?#wtbXs7=N{L0VnpH>lBDopde;`PngB z2M2ldY^kLM8`XK>wV2<$Nrqa`VEgQn-@m(_E_NPa_W0sznJMX=2wk8;LPF)bf6wQ; zVIbt4-gjWkp-zB7CL#Ii+kZfX!!_2ydAWytuy1i)pw>qHbTk+w~i33==y|o zm!lhkd;AT83E*r}>3kym&#-{H7%;Rw;zyYqzhMcIKfgXc{q_I$LbWbig0%tp+X{u0 zL*GI@+6+ZA3mK{_n!hm;i~^r3#oHUkPVB{u8vWAJCA|gqAe^~QPftT*G+WREPtSY& z^_yoz02OWgvO0_wkjTr;eK=X?$#1w$`QH^MPDk7oi|B^&_iHJhYg@aW4ewgh?$j;N zOl6gWirZrWb~QTJU4B)`{H>oJokE1apH+r240Knxz6T-)_O2ndKsNlJ*6fq=Ah2_i^Xpmc+DgQNli(jg%!sOT1?Bm@CP zQW{jcQ94HHn*aUa{{HKnb~1POI&oLNrW;JJ#*y z&^*V0I{or=IXB)Qn#F$m3-$bZY~v-&w8nVG%NqywiCa@N5ElgPB&A+gf$ z^9pmChX?hy&YnFRV^&7Oz)L79kIy2#x2gaa1jQ->z^2G)YtwlH@S`)5S%Oy3vZKB& zMeup=+~X0>^rWN^ux*pbgoK2^H}Kg`zZP7G#xwin3NwxAN};=ZLq#rI`yXn2pO1A1 z8Euc07lj;e{UJ!Vn*9bJvx+~3o7knxv><0F2Gt8xX(++ig};2~O7qp7yQ;e#IGU7b2`M-9fxkkJ8oC^hIhpx#vzEBWS+K$uSPoGdv7H z2sq}Z&>{d0;cI>;$jRyKR~9OQ;4o~wA~V8v`LSm-EaRbi`N z`TPX;2Jeo&6?cV5_-FB*d7Ss@_comYtHD5?pIi_yl;1!`c8lm}dtT5=TF{74tRgK)x!NT%Q((=|0{kms~f8yM=^Qh!2zJ^dWD zpAKeg#hWHw85{ds@=7XV*wSP)%lojSq&) zgyqv~!Y>ATI{4i;WNBRE^QIU=DszRMZ1;Wd?>X9z*=`z~@SIOw!8Q=i1pemZb)9#Frx>6%;V|@RJS%nQ9p@mlOJxH!S#yl(*uV5eg6I4MsY z*v_T&?*VT{ohWalINI`s@adnF!uCR6S+?-(ncb$e_1*t?slCCSDDbyzcuuO_<+si_ zSYYtwdH9IuzQl!B-~2-sf1NTt?ZE6t{q@!8!`Dti+rborl#QqZ4q9@eopMgIhWdI8 zX4H36AMk~E`>M;RZJ5(N4&|jb246WV=mJ357DS(oBGmHqd&$VCR_(o6d9n4i=2;FI z7p4%94QMiZCFPCuW>iGWy5}vNPxKv%!oSnkC@lRCQ(*S4L({hddwcmsUu6QSuR@jJ z-cheefA;KtJagV*ohzM!|B2N9Z=3ImROj;EmrsKUyb#W4d{%c=;{6|Tr?2GD5$5Jm`$ibD&R@Aii&27;pFf7Zv1J$~Z_yfJlnY`mlI>6U%W zBhr1cdQpt%V=WC)_HrYBc~sd4a;N_k&QH3hwmyU@`+tpjXOPF9^b_2WaWS? z;gErV7?Dv?NRN#Se@~tqgD3E#GK*IbY%3&JKjC9AC0<(#VU$!?r;pR)fydnRe&4w1 z_+#@%+pGx+T~)Kyx4YlO1SO6)$b4izCqqxZw57{mR!&r5MHt0QD1DJgPTh8vA%2uc%jDf8MA8g$uuawdKTnUfA3u}fz8)GIB2s$ zu?PrH0CGLGJA#3EJTsqa@qf}iC{ivDMrmI9V?01N^sCR3gPl*x^~k*{Z|e)n;E|Eb z31@jQ%Z9vhAX5R+dD43~SJ!a>UJ+nW8!%4f5YqO;a~jHR(7HGdUdXq1^5fo9%1@gY z=^(GOiauCrp-CK$y0iZzcxX9j3==tx0=r4a<9DOF0(Hj>fsA|ED#m4~BCmA+rU92aM1i&n2ZvK-0bMQ*WkA?NKJUl`7@*9x^0^Bg5 zbH~F#mxIZIkAy@B8aqhF@OC6>De+g*@35#{O7P;#tEwg2R=tOH$W-ZzwNP7ya@p?b zcSUOkP0-5Z$NljIxdwqR>CGKxu=U~57rwrA{@u%$!BXF?L7JooN`hPFvD?ZzfXJ-jOb0S3?J$8|c1N zR4=9Lt;MeN-yFX0;LVM&Fn8i96QP+f!LLGb8D=VD;3 zkrp0OREqiB6AWmwZYpZPIZ>_dm1&qQ4VKzIm3?`GaQc>){Kxcet9o6BYR)~|A|1!^ zPk1etmSfV>FOtx2k0*@v?F6hpSu`b+5%*_+#josH1b_NJ4kmF$C7`L~7zg6r>l<%k zo<423gO7m=ml7~|LNO`nm#^skIZeWDpqLO;EAij*b&BNs^&5M$PYJmap?i1K+%RM_Yz7se0Kx zb`gQ7x*3)?wUm<Ty;MGo31c4^OS@;4^%0le*LmY zlkiY~O-@d(te_A$R1#qV6R({Zxsl)=0>i*bwEcyaFFzQ5L0%u4Sq0nugZ(iaGipL|tb1=RKjmx{qu0pm zRn{fiiDDqjEt*^&ieb4{n@RjIw<;Ij4Z&Rq2ni=nI$Sz+c-8(&bg`ihS(*(Q=2Hs!)nHD6! z7a_SIl%Hh5-@i87tyAgxrv0~sv!MINryg(pA1&u#SNZypecZZqs5dl9vhRKiTKl8Z zCsO4xtS@jk7Dh6kc6^@GD>eKt;@!f#$$G@LCC4v;}lqnWZlM{(Ivm$_ZL3rPyN0 zmz1v)Bd@g#fY~B|>4m}k+1PSUUl%_?M7MefFcrhi^wM~@&OEeM|l!Y#nXTd z3g`=q-wwVz7o`7t4!XNp!u4BMLhA~($yM1u+gU}x%iJ9Emw{SSvj`<09W`|zsF*+B zFD<@Kg_`E7s)69EL_8!B@29w}XdEorCU5uefd|UK>nBawpc424U{!_A!N4oyp<;7V zMBaR(TVz6rOm7`58j+SSNcd~`;3NT06*vtzjJAEcJb$$@fL8)MlmVcJEBpEmxnRJr z4vVoe>;Q%qh2af>r`CLDw-h#%a3T|1_veQ$DL541c&l!nl>U8i@ClqaRp#NbfoYn< zEz0vumUqEY983@Of~W&`J(LZvbtX{BuoUc`z&(wc>gPIh16(yAuY)}_wDF7E4G?q{ z{OWjg{Ptrt%%F2sCye+Ag$z6lx<0V56}Bq~q}f}0S}iHwyWiyhB1Am^N+jsL!vJ#l z{YQZz-*#Uryf6#CJ5V)3(*(@iAii5P_REE{go=vFA#@IUps`Z*&qG6rq0|r8zPIdA zYnrNGX5LDhI?xR$54ZcjNi~+S5gD@wTp3q(wc-OByg~&%ZvXA)A68C-;y^F}gTV>e zdZ!wi3Y!1}9^YOH*2mnwh09P^W$6HnGA9?8Go}I>2_PImYag`B>$X7-0G?*bsF@W7 z=&57G!1es{#S2HJ94*)H(%^Wib3=6I3DipOya+l3aGP0@GnYZp7T)5jXUAlpfQ-!8 zuTr`a4_@3bSd9C>*exsQ)q_1d=z;fUXU$_+??hHQjiPNhjt7fn&)N^RmGI@m#L zuD5s9NM~AgUwVUo&6dmd4BLU)&==OXciw);&OW2UjPo}?4L@JEjbAMqSp66pGifl7 zEuX?5v-CzGp0M8Huk2`_raM*Nf35K7CB6Y{jC7mNI%Dom_EQj+uaNEKFDeD@AA!b7 z9G}1U6=<~kwG=M-7zZ}O2c(+zo2<}rvJN8&!y`(hWI$K`yH#87+x^n(M&g=|-AM7U zk2ek*Meo6w%P#)+O9$;&ZG;%@9{H3NMt>j3ip8m+{>{ACWS?VJXS7mI(Qbdp1u(Yv z!e?UavNpf#Z!IJxB#7;`J;%9NYj}I_2CU#M?io@eQ@ejXcIGlQG%C0y?Y(g;p7i6w zLWi4vOk4mppPXgFH~sM9BS)rH8Y53ZyCOKzNoJaeg zUiX`2u>?#_V_)q>y7({$NeMZIr;!0nB%Dm|=rP6yxyYQhFtOG~_oOiY4}+u+w(Lb} z3I`OUNSf=J;HpmdX%*JX@NPU)XsA_+@|*bckYIN(JTo;%En~~8xPaWyVa2au<*i^J zazBn4Vi#Htv`!N#ue_DO_Ia!ghRO7TS+(a^+X9>45;OA`##wAX`$6DWN@Y7hrHG95 z03UJDIB>Fx{^#*ofB>4W9B{F7@3qG3m2L|Fkz|Y!dkBL?mUMvJGrwC$N+YPpmV6J~ z0gqNtN;>QW%c}GM(W)+F$$tRch9ntx3{}vK*W6%lD$V!Sb4%_WM_=Lvg$G^_*@a=>OXaL$@-B z(+XrLQ11v5IS_`a?1W`bQ$qtuyzcaab{3yixrws3;lE3&J_5pq6c?gzytqjFV$^pz)VDoXy!+fkx9f+4xetCEy*_dd`0YVdz*@f}l4(g!g!u?%85WTY$|z>s zl?(6(xlxk26`F%*wxB@wT}*TI1L0vA!&I$RaP}+0%;YNAn-H+G2q|5;@*Mn`V9NG5 zFXl*C+g|``U_Cu1XsCyWQy)p~l-`Z7dwegy>=eb5zyuISUK@N2ZH{BJZd9vP)dMmFT?euCCHsUUAaeZt ztM?AN3Dap$+h=0G4?k3(tD?NbMT}mxr6r1_%z#6UWs>oh9}{^>mh?bTd)8&`cVuPU ziUg;fLwZdBC^{7*Na~nMKjOGU9Zq4hx?`bm3dj<)M8SF_&;Rq7-;#)Bae|MLDWA^d z&@`7pIruxT4Nk9py-+|b=K>0AL{SAC3exwBifYHbnG6$r2<53dcfXBexKU;qKRwsr zhuD=s!IcrChd+^TduX}2?Ljy~xLteHS?ISjmy3z-)-AD!Y~SVJwfT`p8CAOJZ&Fe3 z??_h09>Wfp2QY|y2FB?CS8g+N7Z7^A%_bZLoYB^Pf#qedM19`<%h7N#l0aPVP_A=w z{UwEn{;twgQB6DL$$PzC*1vw2Tkg_QjZsk^xv54U)$x(Y(o#clL&~?7{aT0Hv6L@O zfxGXprRiB(dQC^4X5QN^4)^I&31v5+7W(nOqX`6G1a#Adm3U-8MJpIKC$L(xc`Ew% z{c4On6$||`xFJJiEc$GH8xH~r6M_u-b;fN4RM-&r8de7WOA=C>8{ zmZ!SU+|V(Voc^|)!5?8&vqJamuWy)5q%V0)5Q#?D0nUHVVLIJbuBj#H;y+?41Q0(KKM3ZxoeHW9^~UW&pJGfN$7*M7h`(&mu9?Ck?%;wVN)Fr`~z zf98)IH(QivL-}0`3$DN36|RT^nD`N65H@K*e$>Aw+pj$W<{I0lO%UK<^sgZTRub^i z@ZEhqXeI!t-U0g`Z-9KTAOFwMi_Y{sf#;SFm|FV!M|kzir%LobHTNz!7rl&Z9*`Hca4rl!RF|BWM4)DHO=;kI z&`+i}?aXFfBXkbobvJeDy*%tS%W?2?_U?2#|4@1tmA~#-Ot_MkktYN-WXTXm&Sq{k zSh8-u-*gM<%s4%!Nj52?BxyCx{TA~H5gtPBl1ZmdfAuTf(*bp%HerxP1 zskzvJYG~`0Tt9Mv1#i6y*U953+u<({$yPL+9u^-k+mKEj-EQHT+}#JA5j>)GL>__W zlwSC5=}AwY;svTPVXr`2&-g9DWS>IX%NZHNJsW`=^_jxhTMJ$LO6~z>VDoISQ9=F;qjagOF(B+M zgwx7W#Ns^+$0*pPHEriBpMd2;zRz_lxh8@u(|7M5nToXTG4*EIzn5+la7hOaKXk<^ z@7jK)0X&gki4W>(_MZmetW;{#cVxfh#S9qgSthk=w;K2Y4rb=gh9mDi{=WX!mvkv9 zIFmyin5=@%QCT@;b>~mXeNgfOL+0VZ-M*+tzo8NUelAlGoxzdHD3y7#=^wJhE7RB& zHr2fWhWQ{8ho7^8XAaa#GAR%KmKzSV^iJ?p6Q-R~I_EgY7!v1_F7j{+De1tr0n5Wli)+do9=ggTQ;fUC31}z{pADw4 zr4optBxj`K(K$V=e*OAWv(wP`LUmg>vyr&*I5jSSRwVqzMGg)Ec((C({6BMXa;gaM z!;#?-KhX_r>dVLVq^^8?HxAG%O-;?y1XV!fKt#97p6zibF%UQhjXMAw7?MJsek9tn z2|CEy0rgBuR~gVK+0NqTH2j~S=KL@as1gzrO_qm0z%B-KbaA|P{3YH40R}#KIRUpK z{Qb8Fy5X(Sp;cM}39#@q0e8W#+n=)Q2%8*^iJB18?(8f5HNI7=I{-}YY#czjO6aU!`=|Xj&G<;D*jU-)Bz_;rG{H) z@$$Qb&j>jZPe*gCA;3R$s|pl>{m?*499=*#mG+f9a5aQ?b>}#GHcBClBy!L+S7|J* z>D0me)c7jY2u;SbJ{)?pH)#2C@iEcEKSX)5Cw-FmZ$6XRXxdku_woIptkdaq&tF!h zIjjGB&Cki%clA};Vjx9JYz0w`pM}SP(WbOTxy-5M(cVS%vnG4nQSNn)?-fFG9(1(y z0p5c7yybHA7*~vNSH@6ax{X7v71n!UQRETaBlm)aBxvxYc1CRZ^bcIOdVTbPVMBct z*@q`Ge2{@LVY4zTJd@^AJ7Gk}p}gh7v@c^sRu*Zbs!AHd_EDtcs)Kk_zMWL(ovPcQ zKn^MGjo5OTwsm?hNbQ|=!JL0`#k^wmVE66AjU0oub`V6^{CdS)R;!>{_9pO)Wjp%+ zWj>ZP^Ji)#TiWf#+Tkpj?b2_vEaQLob@bM6j*!l0oJ-G%G>De~5Zs#Rg>yr5RWF)n zpd_9;Pzg0?$UtS zbEO-w8-`gKs2pb)q8n?u;IwG(`!Q;ne1G5*9AcoPJyhKZvSuF;Af7#c-g$f2aDQG~ zq;)SeiFvYfpvt2_bEf$(*5l0faR6^XmkKBz&h>)L&Pco1enaJu(=j?>bK=*&x{6Bd zM=h*)-m_x_G%(e}Hl(GpH$n=^5}?95AsDrM z2QV#&QTH9su3@D7@?tiN%s+fCL(V_{Y3G@R&06p}eS=>fOE^q!Vy;D}I{k-$O%9V? zwD=BJ(W-vuM_3HOXch!Um#d9}zqc5E?D5RM^jL5Ihs#V8WH)r=qcm7*IF~jX74F`t zBK2oohKAFl&VZ_~eI??18B0+c>SG_hN;tfMHrc=;Ul z6V%uZ`#TATmai4%R#%T0TVz(R$WB|y%$p{8S2$&s6A#o|hty8JVm?rRO>Ifokm4w< z5Z3wZLsuYo(4Vc!P{2Po@orvS4wrK179CR^tiRh1e|K$=n4vk;l99xT1Hk#s1_u-5 z{zj=IlYpoBM4Uh72~(+@M+sJWr4}cR83&XE=MVt|z$~V=yS(S#(eyNtoWipyAM$wC z(%G4mm)_#U1m_DladM=Xgt}v#gQ`1ZuSBrsN_ubD8m9=Ev_fqR`Rc04gO0`P)YN78?pYkPX_uPwiH zj7T3rI>l~mMBvhK4mt=N6V+w0E9-2TXvY}eS3Sv5PW~{BG}8$TMN=3?=LnFe_;>KEilj| zeIpCu=6X$^r)@@dtpTx;;E8B!Q&W+WU~3%oXt|lgRo@5IVUV@IDoiDI z%7hM=@zdD;$S=1+FN{=?P`LU+*c`Um+SUerp|Yw|k)ZoDI|*P0^%gZx4}P603aO52 zQ?k26dD_F$8I=1r_lx~kVdg;IXbYFId)|`R!Sap5KBrs#rK<4d@C#;=?Sv(qpXGwa z_qm~~dr^-}l_grIA+@d!_PpGSS0QvH4|?`sF(71MV4#=r`>0D>ICBy5Y#?taz$ zZ=sD48dT6%skbCsNo^+teHCKAXhggI1j_7$-RMUKf86dE7L@M_z_4ilV)F6ZMVnW# znJH4k0Zy+M2_wv=#j2oeTvKm4W)C3=9Di+dEy$fV4z5Btl^@CQ`K?lB-7I-2168~g zBTMvm6xHwL-zg?JLKrb>p4Cu8g1y2*%GjNG1;7z?Ap1Hub(J`Fc5d!7{2BNr!R@J{ zCGB4(!`2zde(=5s`C3x)xY~{Jaul*_fC!3t`M{Lin@rif@SU|l1vpRfmzBZyh_&hx zb6?a!vc=h@r7Hkm2HzY(;Ah#Ae@I9)A+-x~Mm|9YfdV9TqoW4sb@UQHs_`?)O}d7xC& z4?>jdQZ(n6d7E{NZz}&(`Y$@6D!1b~$=r>C&@z}x(e$z~ryiq~@)Lq7-`Fr@>Bpm> zt!I=FB*#`yEg4IgAQ3;OZd>M&Rrl`u)yEd7>16|ahb?I&!traTo?`gfVMaA30$c}> znpkf$PyPXeQ_$8$&>anUJi;o%rHlB%H|Lnlr53qx~ z_%|n61rA1|l%JM2mt}CXIp?T>4#z>*pz^#8ZI{tfoK(EZUX!4QCY>dmp7kYvuypX6q6D`VENca9W1{ETf+Av|Wqr3eKypw#=D7Bl-?wy3txN1n zQb9)bAu3*cYhh7%S0OJ&Va}Jm(5r;o;X7kbLY%Td$ zAu4T}a{ZIVPv8YhQHHl7vkhD8y$PptivDcb#vO#I-Z?^y6hn_y=1lY0w*6| zboF8Y*Y#&+E@YOf(7gfMBqs+)zuuRDxR{tvK>vW*3V?71eTk0eF*^&q8(ha40O_ZTAg-+$v)c zwXm;Fb-3-ge*pyYdX*&7?CsnA3UsMy}KPJaR*Jq!Qkj}{Pf_;>)na|R4x8<^h7x;jUPg| zCy@h;f5#r77RZ}Md6+ytJs>P$>=^gw(U)gt%b<+`>P&qt?8n&68Cx8M4Cdv&7OgmK z4@TzWiz*Z3sCe@k)M$r+=l}B(l&8@D(3$k7_fxB$(eQT4&vn?O0)6GLqBC8dEuY^B z1U3JSfryvLI%Ti<)hX>o%qQoAd11-*_D!*|+P*b?01^CoT?M&zaZ=xuen?&OVQ~{( zW|TUHajYrF-P0~83ZXiK(krOMCQVHcg{1UAHI9ul*Dn;1EqLQ z_Cm^7u{$%lha@|R9@v4jTqcx696{@+va1P8Wy<=o6CUoWgCr9);LCa<@%Q5*liqzD zbX&}yV#w9VJP|CtomHj2zRi*_}l;y-vH!fb^pFe zs^s$iibU{GxI&YuGmc7p(S{MSV#IBX;+sYz8lF!DXO7uaIkDDQ!B2YbulHSY7?r%6 zRObzb1F(4`iNX8Q(xP^jK1c)rO6Yh4onbv}+llsM`hWZY3LWCjL%@?$UdJ1szkx0OjkU9W4m-32DO{ z7LrJz67mIwH{xNyIK*%M_2dRc`5{BUz(C5X@gFeOCuNv!WrX^y9P=64Ej%&SK5Y$R<==_>^;VkF!ACym(tDX*_O*`{% zX^>d?h#xftM*eUx-ITQ3dkoo?laxRG`nj(uffPrCc!DUM=IWz?hloc3 zZluC{`A8}G?K)|x`d_S%*-9X@3$zWPm4@MzX<`f|GpSO|<* zgX2a1RXeZ<0SJN8uDp|#k(~*RI~0T@4uX0Zr9f<~lq!KUA%`UDpuwF)VKQJyJAy%! z)}nSyK;p8IUnLE&W>$e@)t%P_P7TX|%uIbnDKE8MX8QypSq5+Z~p-R4h1c1Kn)_OM_%Vdlqd1ReUc!QaGbS2M{ivrW>4^H+H%Q*%@SJ%`-b zl4n2X#X)Kn0?UnSc99h%vh>hRF5ZS=3C?2>9gqXB3(`I&;ITj(eLM^dXZNaGNqmM; z;GugsC5F(1v-0NtA67>mF*T8Ssg1}2`Ju?BkC%g+I|o=p6$Q|6GyygsG6jK4XqZtk zP0UHYV#(HH%u#A*Hb)lc$n7l-K;huw;c~)pS!7S>TY%pgYI;CxLqD^q?nhyG)5q}3 zXMd4;z!4x&=!6Rm0|N=%+F(+;V+>8`nM9sKuY5jYIT%vC>Q*Z1JQ0hxMR*pZh+GU!Tk{9UE&6n(39oX^hCI| zC^%I}=;SO;&-Zr=H{$Gp#p^J54tfZ6GnxJ;)-1VzEn++&mH97n)9k#m^ABgh(si+> z#UJJ^OvXqaBpwF3Y}k}dZ^(RtDc}Td_V>o6$WfDV4FQPKWb<%mmJpy)U0SW}+G?YcvA2iv zKwM@ugG`GN3=NTy2a}+fY0VHfo0JV$ithSH)rW|cObc2Aq0fJCcccuG{bl3>^pq36DihQi&dEgOc z>LzMP-+T5Tuvx3|mb)J27&-Y2q(Gtc5J(Cvxm{1zliJ~AquU9r`I5iMYDRBEjt9ID za|RRQL^62mK$in{MdjB+)!yZfi}3hz_5qX>nOiIV8enhql!46=US^NJgy;qX$}tTv z?p&A9MGxT=_yJlW4R4U?OCn6T6HFd35?ws?Ei>>9;UP>Dm*(M#fK0*D-rA4+MKUH< z?+QZg_5)v@ew>K+N^A}l14GMU9F%mE(=go+-ie*Fd+ge`Z=;gpqN!0hz+Q&OJ_iy7 z_xWetfF42G7eBux8tjZ?#a3s{R*5)On%4@ml+uAPcwBAc^%vbn9tyy5G1C-|i=fHW zT@qn0bwjaT!OJykYp$|F@5r*%UWmegyaIp=plurWrn5k;0L)C_u*9$39a-V9`?8)1 z=}rMwlQDX5jeq@jm3T?^DXy)NJnIu3EGnlifaeyfDluw3<0~`3MlX^W)$(a))vKQT zt)p!T6=jGPy}PCMyDFJU-1VxnKMKTv!Uo(DTdrp?pPOv|{)xb~27cS-DyN(bbahkt zZ5z_>KF~f_QDY^P1R8w^)eB^V`Ty^uoc+5HU<kh2vN%?grjClR1;&eFB_g?&V>BgqDJIvlb0$ zbGN_V-iv-2E@jqy(9{!ri1M0pWFC|&JLgJU8+kg~(6X2A@fCtcuq&oae|Q4z-fNVW z)l2~|$}W)M!~w0&lH1ZNf#lNkMFaVHnPZm>D)tXH)8on}C=3_IR$dw3IpBq96mnz@ zeb;G*rkxJA@{fmAC!hk?T9Xznde@_;gw&si(Wz@oPM15%W8Iu6WXiMewojKkyOmF4 zBz0ZAuG*7uQGyHgf4A5S&P1-xE0))fguB;W46R*ie0QawlC9)IjnbZ7xzUfU?l^b( zX~i)hF=evB6|< zlv1w&TZ)b}BE@bYa zg}hlR0}!w5yM8}7`15NuB3YxBT8#{UilGD2)}*|scd~+^-fZK|F`wwsgP&6VVjC8D zwJS)=MI_?fQ=OromDF4UGGhEBM{(m@m3g!--JZ^s?tMf^)2?56$ zFVd1O9aRX8j`fzzNn4P491@}mVMAI|N;!)US;v29iTW*Bh-Nk=?5wQrc-&#CbR=BT zE_l8_FU(ZgAK6!1VsdTYquIXZ($U}W+UK_O%%P0R@rKRK#LadDG{*?-rmjdnDx-8| z^>A2fdIoCh@39^&?!FF+nHrrEKip_qYwViv-r7ttvfhr8>)tOnz3(qtE}GpLk0WhH z<*M@L>e9^NDNd4!>&)JQ$9q@m-csMLuvr_=EU!``n)n`ep8nO?&+mBbSZOGJ#7&+^ zu;x%yT1zVThLQH=LFAWT#9oQ?1Ai&3W>q!i*_|`ZAO$^hU{Z4gltjrpA}-eB}Rd8{EO{Iih}-<+n6&D-|vCkH$P zx|Rh99-G8O;azi~;k*5Nu=x$3x=w8DXQ=ByUkTN9K;BxIbV^f6>2bSo+QoRT4~a{y zt*s>=9$#pn0aOrTUZJXvs`zvV&J?^7I)z3LzwCOf`L_+6X~JDXa9m>S{eN{Bb~ZAm zmgRYHH`!_&007iq?!XGQr64GLhkVYdji1p?rjQ@52kx-4`S5lfAY^oD<0&S0j~@-> zpaz67ZIZzLsL|JibGF}O{=V_yDZ!w|ZSI6lK^MYeDgeYaelM?gE+!*`_st&**poCO zNI}_yt4HiN-Z!8~J@;0gGxA3jN=QPQpk4-E(C3_7YKhC%9|~5MX8))Rx3J1+fWP3& zBSKORH;~ss=_gLy0lIL;f+7NtZ3jxZ#BXHE|MfRN97>T3s8fE=}I9t&eEa;D5o$W-hVs6R}^fX5r5)C8V`q^g*Y5RrxJ zFSJOFc6q)Tn$jf#ws9(OiWt~u?d~=Lj@TIDJmSPX`0}R?Hl)%^9V2|1P# zoVv=Ys^Q9}&jL`}EEeFm$=J02Cpv0KYbfg1NIL0nt!4U98Z1r!O;o1O8yH?R% zY{v`^2G&Vv^2mb_(7t;84#d?JT*TPUzt&{9k6rKV{VzyH2pUUtn`1QO>@4SE3p%c4 zsL5^ESA+<)Ukh8+z}$>N>9h^J2tj*^5|1IN?~_wnA1x4EZ@oTVy0tU&CO@AHiPXU= zh4kSd4s&&LyM*h-KXv$}I)56x_{X#HFTIu}@n2ZCoSfW$OY@u76$qu6Gytl00Hk?? zMFRz(MlcE=1ExmqCoSMe$%ksky6h{$2&0F_r(dWy-FgR5AEB~>>LuMdBd21hg~e6MxxdBP}!&FlNwq&MuSi|98fr&oFdoR)Om~2?$HGR5Mc{#_qwvuXnr& z(fVzm5S68_7L`vz4&mAmQlJS*elkK2)~t}#1(+a4sb+u^M>s$PyR@aAB|99FWFPu8 z5=?FCKrM-)`CwSqYk-x58s!&(n``VQz-j@FVg#Niks-n;p}!y1k<#nz=F18IM7Ywu zesIy4kuz=w#J2N3IR9lX4+ep?GQIrAc=%$_W{2e)cKX;=@XJ^w3bFaw)&-GNIR8AR zyL1Ppp^}#K7Jn^Qah0*Oj%aj9w7`0~{L+V*&Fn0KfP+)x>OA17kvT;rUG%WLL^I9O zd3NT8G{i-ogc%lMpwf{?gApbogMkaz;BCz6TdG;#>rDaX)a*6*Jb(g)l5WJ|iTmzt za1-t`gAE*_5dZJM7Eu94KU`s|mCVL7-`YSf!Hdeg&naG%y(yp5lj>sz$AI z{q|7+2s9~7ZpBC4oIQN0`-DYBrw(>knn0X5z?FGNOA8M)n?TPkv+jqC8cjpPXy_o5 zL6h0Gq|eIMwjL5G>+TFe)Ewm6DS=RMXu5AJrDyTTC>RhLK#IxfHu@O8l;%)q_ngty z>??OOD$kQU=QILhS|BS9+|S&E*u#?B_P~h>1$Y_+1sb7XtHXC!l$U>WbH(n~t&f_TGuaA|-_ zGfO?94iB?f6n>Kv2zyOaiQYpO7V`;6iUV8);7eJ$^{H7Ga&$2$gD=+g!|q#H-BMq= zLC>t(`PF-=wTst?8MO916+S!?KNRgn8AJLh1AHCO7kvbpVK8$JNlrq^kkeF*nwpw( z2|p|}GyuTTK$Twc+mt99D1a$D^|&dV+mVN+l-f||w~s2^{CK$e_G9Gt7IXe2JtvLH zD$33_=^~-ot51M(Tgi1<-1NxQnW9OeKmeg8A_SCe{M4B^UwpatJpLRyoC#f0FaLP5EPrzCtPKJJ&m zoMBfClk!98yn~L7G+I3hfs3(d)$TMlPW+ma`>yp2=zS U46Y%A?ZV}VjoJyR|h@! z_&gA>2M+`15{}^vlw~X-C`gU0>O4H}FPDS98XyD;zy&kh?}2=$D-aWDYKa&hi zpaue}_qr*_AXJ79HGDqiG^Cb`Wv~_`gHrYzwxAm{r@`XdSh<9I`ARV0hj$`c)**D}zeK-ljiJ6e1om^i;3N`oNuS(1_j~mKtRjaq2D_|27*mMLL@)F z)uYOH$$7hF&%%jS@tB4~A0|_=J#x>~) zx9ck26Dac(he6O1QvE`ymS~A(U6fpM6{AbIR?8)+Y^ppW_wgo*z~Z#qI7Hb$m6aU5 z{jQWFL}Bk9ZoR<&>3hl*IdqkZePio}OEtpDKyk+fq0fLuJaQ)v41^ z?mzqOGIET8FCG#~fR~|?$gHU}c$KKdvLlVse$0EN8cHfhNqM%}5Ff=D_&J^-r=VqG z%|04<_|?XECnd^`imyTv)upa6nik8qTX|Ea(NbL7tZF9&+t)WW zA%F{L<*}`xLMNlu`K>@)zspQLc^F{MF2AgD9_*K)EOg0orh0m5`~yEBG>$-Vih&j4 z=44OO|6bp7VvKC$`qIc({-v|yXEd0e{}q2P9eq|`QnR}(0qT7J=mzSP&|Tz4{``p0 zw$u>8GX%~|w+S7c_f)#%HxmQfL=}*z2|nEZ)MYD@Q&gm-Gar=xQup4y^E~D8adBsO zWLy9U0i-$;Fpb1voz13!trV9}g%p^0wFUWPv!s7_vM2z<7}kG;R3yieqN!g}eL0Ia zrb+O(-&zhU-hPUrU%rg8#1mApZWjAp<+578@N55-m+D#ks2|*m!TKg!5x=fZHrf2> z>r;J1X<%xaqmf5+c3Fw#Y9DnjhFUzYi4pWdk!Igd_P?{e;xunI0)8Z zC)36ty3x#HRbc-iEQ!}Sp?yOMPCB1qTg%}FD@H%a$6j<89v>~eR#Ns zjrLCs*sm#Gb&8K!2*t_ zzX=_rB{Sde3&Z9H$S1)qTBD(?yhbbJbPj)W*&*}m`CsZsEg3zY>}POd-q|UE;XhHm zIZD0pheJ{Q-?HL-K-tgtG~AU1#x9EdL({bXK_%t>Rzr2gfC9hBo|#@zaHwVjssU~; zsJTs3M>!^Pg7h!)Yd~y`t2u#iA^Wqy5w)sjGr72U3tBb$vI0 z^6+uJ-hqt(^#$P`0sW-^R3@ry_i2;-*o-bIGV z@qE}a+gV(*G&hH=des{<1&1ewh+&XRjha7>iH*z5B*)bCTwZ?fu`)`T%JB!m&oD^k z6{0Hc6%O0>{x)e14UHy1qjI_(9G=gcnyY6wuJV)$f*cKnVCQowAdq=m8_RtL+v3my z&>zaHtDgXIDg0s`BIK&(=08K#;;ilMX)VV1^eYqWHW)##rU;~Zgmqot>k&VY#0Jwi zQX2zR+zAK@$e9Q#zNN?F{KJO-N&}gGg+!R9K5mDaNrXe^bK^!SAj!ggKq>CPjVxLWA8{l>po=>|DCWj%*Dk? z`?`6h@>>Cm4Z^F%!mr{YMN-ZSr#@wHCCz8PZbTjVq3Tge`Y8`?L0LQd6t(=ID4I+7 zsroO!b2TRCxvHx5YrU9Ll0Es^1ve|*B%NC|XFnS#>o0RV_ekfTGzHVIO=H_q4t%wH z0-#R_X5F7vzHztsXoOMMRiPsNYT=0Mb_W6WPD+gLj-QzF9}@*cnY|@zV&K}Yh`I*ii)y_qy_P-pt)Ao_b@EPl;0rNlOU;Qa+AI$ohy|5auxx`^k zQOj5neBDlemX)?pDV5g-*FE{+#_HD0j08>Q=}SHe%gF z8SCOWRS~a*h_~=6UV?2z5&rsljG2UJ4b6cJ* zdUtoZUFTkRq>y2@KKfIvxNoSqg`Kf4&AEO%x7B*7_T@cV!8@SfL|_W&Z$AMKI_eWa zb&E~6PVEKh!>Mj6H_jcek6eAi12`PaFlbZ*=%)$28u%PXBaY3ljVmVbr_PMz^2poN z`V+4?*;VpjPA&=0J9#zu_^e)~#f-~%I~>7ed>eafHMbeW%Y4CNC-B|QGM@Q*^5xy5 zv8j|hm#IICy6Ud)vrb`iSoet?s@G{@igViEaU*`jZ_I~bGdKM}wJ&MlfjjT8a^hUL z>-7yBuCKsQ*~LY0^ufZ#0Vbu&Ksokqj&BUjr!CaNmkL%JP91!B;_vG4toWKq+64K0 zhk?dk+DG!r0@Lq2IK4K-n-bwF32pBBzt4rA4J%5D)PlEyS?z^TnLz zSe@W;dQQt#Y^UgxD~<2_hZ^OVo=^GDE#CYY#BkMM2`y_jOP`%KM7{tJTsao3Ko$b9u`F1z?i zsMS~FF8V8Qzdjn^EWQ#F5*Am2^&fV4@DDPN6pLfPr-Z_D7k#?G$5Cfdf+t%?o}N1$ zn>s-rr@l6NysCUsLVNE*Rzuzd?$~itT-k1AUi$27581Ex-hJm5x#0QH5_789;(R!D z-GKki?>tsLG^wYo)%hsp5Uu|CbecwLCm zJMsZ8Q9SU=S?fQ|N1FDT`&NgR{V=lF{S-_=8eHXL@gU1A35yOko(5h*SFu; zx*JXzuXups1qq;-)8_nPN>l5>58xSfY!eOmX(?Vf?MkF}am*Ym-A0NC!5i4oFD?l(7HZ!*2>#jkIdcYvwdrv%-8_a=9be)fd&i~-kY#R?qd7BNut!tc@WN6z z{qVu+m%E-lj(o*EH(qQsGQ8PZ_#im()t`ScBii)2#o%2I0snVVtq)FOsNrRdCBIFWuvkBK9e&QPay76w02)l6x7sem< z%7ROC>Gpu>`qn^e|K!HF;>vZimoJU}Uuj<+&DHz1yGf-|nUbLpB_w5*S$#-~G80MW zp+X56B2vbphzt=TA^J!vLzzQnlFWP~rI13I&b9mg-nHI$t#j5oXC2EQ#mC;y9-im9 z?(4el`*yY9{vtL&EPu&HL&ZX$^r7j3*r#7_@3UNp?dzk?QYa%4s#b{*g)nu=x=d~! z9%-N+2$zWuVm5kY86p!pLe_Pdj?q}~kfcFY(z?a;vC&h?a{F)N0J5W8D(azYKFt)q zD6v#(5>2>KR!e;uN zku~K|W{I3%fFPyeu%g-#)qg9XfiHjXqb#h^U#D>v!KBwd!T1_Rm@E-_zZf@4T$QZJcX}e8JS>VlJ#Z_}+A56%tR& zk%q{(<|^WP=-`;0`NkqHFU`p1qSGUgSmt8#M8Ef>B33K1#l}Lf zw4`JTnjB%#MBiuZi3a^^=tdLQPG)_*@DjA>kc;dTm=vJ|7M*SXF1B$}PIsxYe8_>y zz2A=(Ot|gn0DPwKBxx(bVKHY&exly=OhNYvG@dOr-wgi`9vtowEwO2vERojws3gd~ z-eq<`e=mz?$=I4Jk6s&@DD3cYU21Yye<~uvEMvA+#$xksPp))p_EfgR4v6vuOl9r& zv**L)_1SG`Y=W<)XpM7>tt6+8k5v4 z8y5%)9Fu5WF(2~xlgUm}=+b9V)H8J9C3^MIXSQ>=zDTeaKH;Y{0`I~4V>wkpBCS9B zsxOxB7Iyt-w6M$k5Vni6#+PB|{`~!W4eFi-et*GlHsC192?CuUiwYYUvgm}fOV-%q zGf3J{^YD3e@lH#0@ZOBqRr(PT+qvUERvtJm8u?m#coE&8*05_46%N&stk`o1=6 zm@_0UZP%VsiU?77YL_a`&fp@ut?j)n{X3=sJX;jY!m~Zhz+D+)caoVYNJh;26kmKt zM`H8*RXgG(N7UF#-?`(T+oEFdXgx=IT5{e4Bf6x{tszmKi7lsebMa;lAUUDv@bIwm@>)4+7NV#m-b#B>;JFxlp72Ra`P!@g^Mh1!QGyhzHWsA}6t|t{{ z-DPGcsf%9~HCC!K?$xt=(hEyB`^< z#0z30%W2I>^D}z&O$xe;~Y_s0y$rQ4Wl0v zeK#Ljwn;R00~Qk$KFiP9M34cDNq#r9+`rx0PZ%23kLIP_ejy#{ z4zJjMEq>0bl5ykOU|Lpw@8OJo-5RA|YP+u_#$5fiX2nIcx(S1MXeSq4y} zZFyI+WKMV9S^p?JP}%doxFEw8fnCaMv;~f%j+%q z;{W;k-fWxXP?=TCdYsrX%=N1Am{q{F2I-pu4T9@q&spD%vUk+RPvWn7&i_MqzQ2zi+>7n8@3T@yi(v>o<2Bz67P zLx25jZI$?(AL8x#HvKtbb{uM;*U+Qwoj9}BnUKk$5W#VuK@OGkHEuSp2VH2wy_idr zZOzT^Q-gyuBGWf;`YQ1pzqUhjzn?{B)6~;=8X+d8YSHlkbw16US*}A<8~Y3~4$o#n z+PYDnjfJUImRu^VQb(tr7C1A0>~MITWTU8q?ebt+E5y5$moH6Bw#Rfj#(cl!hPGkY zr9s0oa^E`qo1QO~k%2SB{+QDE%Mc#Uj&+<%3lcwA*`+*7%|Co|Kl}07x-hoA2g%@I z9a9uTpgJEXi+e`u z+q9hUPV(l}atjqGanz?LF3uY_`bYnR5~dr%s%go3r<|P?>_t5hkly$~3_c$e z0MtQY3A;6S)PMerHG1IiDi~lY6p-uf;xu_k^l4LKEyvX)zoted^M`yyWd`QlhLdn>tA#OZkJ-&?73asGp!+n|;$ zXKgEVeGy;Z@wQzieHtVaM;4AEtFE?>U*n}`q@&T!WI=gS6^qZs&z|YK%0wk%F}*L@ zyTNEV(B+eI@bTq&c@^Hu7Z@6G!pBeG4|hWi_yre)%V(2hHV{*$&6i4gy4MmP23)lq zSGq3R)qi~LAW4EJt1=perB4Rqc6x4r7C$KLsa1FyG!;;JjYU&0XnLKYsfFW6t-yUNS{yoQ)J=f$ov@e5v_tcIiTZykk!m0?U;p zdb4wA37-;cQM2EhEu9Aye&LDH5Y09=!YBWLK!}P;Rb9#V2X>6^WTheBg)LHJ5?5M$ zPZbvA2tCD0+4BDVbp)8g*}=@C^@IBfmDu>yob@|4>5#-o#o7quM_gA96Iv6;x*dNW zn(%RkqsN&c%l)pDn#_A41-vJ4s;Hw?G{T3ek4||)35*f-qnDS@e2Vfs-)G3C6F|Q4 zFR~Pqb#m?Wtjq1pYUL9r+>3Q|bpwDQQQVO1!~Ll~20$8;;Pw*p^UeoDm>Nz{&v^Mf z(6S9xH=!_9&Fwx)N&n$fvsq!)<8MnEQAr3({&*qt2+t_E=Nl@hfC-=>Fd3obK+#1c z&!3wjz?qvPxsw0P156uH$Q0!r(Jr@_f-|4R3 zQP$K}=2gYlQzFU&xBuFh;%BV#-p$YDrDeuzKd&*(`IKoJ51BtVRwJUdb+d+njZ#|t#a%*FP}2^2=Mx&)T(>;aoe~&?Tebppa%F!{f4``&Vtl)8@RdLQ zE7c>2>q&{ly z@4Tn7Z+x$9TKy_=)|qpoG0TULp_ye>lKBM2O7SUUe1 z;VJnds=e+^KPMaYxH)r4(e)UEby2=QZo9Y+A{rV;1u}YK_*H<3$lb_4QKCuOyU8-I zv*E_wXP-xo2_>Vq?OtEykI9ET4lDQ?E1YwD3whXNP`VAX&^z;HZ zY!xNbMe7B`4!8^w2==W_4%1O15xTr^8KOmg4*i`vI$|CYxrd27U%Y~3055%gMb=qJ zm1rSdixEc~P}XHSyrMtz>7dPYdf|-I%KnTC?1D`F6Q9ewWNtj&=fc3w)aN_>^UU{m`>UOf28ap%Ea`99PrGH~yZA-2m?eiHtpAAv?jU68@+Qc+BuVn)f9%u7MB) zNv-Y+1;2qn-8Kh)5>zBX4Du`w@dTV?)=?numa$`#-C}H$(%HB1St0_rVL~F_fK)pq z9$M@%sg4W6eX;&T-@A3&{?MxD2vV?D)I0KplH1Vw9Zd)Ht(4M$^T^x7;N9MVjFL^L zWjOTa)upc}Euy2Ze+{;z7k%e#F4mz1X0#e}K{o?|q7LLUWgrJYj}^3Y_Jsb$0+)C> zK&=*}jRL}<63JI`CxJ>vqJLdo+FP}0CFdw;s)<+96}ky!M@R3WX_BH$jCmj;uwDnz zJI+NP;Lm^fCRuSFkca@)5oCA&vijwQ;NgwpAsXj3*G8bgr;A(?b%X!UOWxk{=pHvT z^EgTjRCq1i zWVaFiFmDx;_oF;|6N!|`A|}N+)Mk|5XTj!uTeFAv>xUOY7{I( z@jrsNQ^)H5fPqO(=3!t>Nr^(v>DT5aB6m(QNG6?3wsMbei^YjVjseNv|6eYg|skCe_R0>d#%XQ&FkWF z|2_<~yNND0f79qMFYp>RbI_zm$i{XASs!G*qZVblg+72+a6=|Dj`HwmtZ>*a_>9Oh z7}Pj>{r{zWTCJ$)Q72oVU>nJa7@y21ljd&%BvLiZJ5<4Xj(;}_7XD; za6q0_XcRz~*9A1r$94C4_t~~95v+*D>FrjuOu8?w^aWug&vQifjoi=o4agYhyB4z{ zrdl#+r&7%RasU`G+Q&^8_BUt=+M#FOLSFXNGy zOpY6|9evLw?lK>Bpx)B|UQK!=t13SMm#w~Zk$}8B7kouvLt_g0HLnHj9e8Va#I9L` z44(Dx?T(|bZ{E|Ep}c$85T+Gs zoPey|K+3b=eTg-@Eo(_C%$w0A}YRJ5)MC)CZR_%`|GDRHgj%V>$<&8p|KxOO4_10q?z z<)qWjZ{G;#u1prndJS^D+Ixzg?q>PG@v7D>oTVWPdBy>`Ckr*>d*U!JkV7e$z_{z! zRBug)T&&oK0B_xBwGVb3x5h6z>19T+3y#@jJaDb^h_05byOz)+d-r9y+gwcZ=HwII zi+%DmDbkltR=KA(4wO1qDftT|r~i2^RQCIBY^+3s#$%R(2o}W#dg3z2kKduDqm7zx zBX9qBnL5FdCA;YI{@TjL2Ok6rLhC0TF06|^$Iv9cCRXj_e5O{?-AJWt0(aBDJcW3- z?-a$+6j$@uMB<>_K#1VJnWoUujV2Olwv3ir7R{ohFLCypNXic@OH2Nyv*WsFlgJtE ztKxE3M^g8Blu1pNkONyqu#XZ?LAd1k3N41@M^$x`cuDT$H|Ll!q;0RP;EexJ71LoR zrg$lMPgd8he9Mo(5J#Kc-mcIXqyH&s69Jk8@3<#nXKpc@+9ax%?)bz+6^7*U=cmKOz-u0|ZyWM-)6U+F z{;t{b0_{tY!DoTDkbyNMmY^cep*6?24Q#vQPEQzCUkj(mvbwY71yF zC@9yca^vCNEiYg-hVOEu>-BOWq`=@{fvmb`Q~S)D6=oQaP7(Q&qtM~%Z-><@xK<3Q%$LlJG^5jln2_VFo{D-r{nMWNi$w6ohyS|mZ3*ci02 zcctjSExpZe8!BMwRP6NLCh?@Q9iO8}9sn$Yu`~4%c3|?C`oLZ|x}Y#(nw|f_T__7@ zsKv=L?(rxu?*=T)Xsx3EzPFbFo&oS14Ii1AwA{!)(;uAnBnv)QAigHK@Ol)n0&^dX zWPOtCao^0;N+F{cjn(@~e=-B|Tv?t;h85-Qm#P^j8MN%aRm-2v&lgXQr8Sy*u1A_a zz{O}z7cO=(nGy&H5G1-27$(2_EO^Oz_|>v<1o+}3$-(dXn~2=)S9$qUj1fWPmCkY& zfcz370cY@kRO`JWr~-EwR73|a$+SXyf(NUu6k0U?2?oeOdM)kly$E9Af=|N-plYK2 zHG0=YvE2qg<#6T*@~eJ!1Mop@#?@M$q;K!-4M$I*al5J?#nnH4GkQMnJC7F1(^V|0 z$0O0E9%Ur4NxhS1ZvBThJk534mBN|{TAjw9P{CESb=5ami?7qp)Vo~V(%HEyu5mPS zw!Xo(K*)Lf^jPS(y=-BG3idNrv^2Q8$; zNAKAUy2wJYM3fzfUyt_#Hi09dxoUU#) zRxmX+jp?LNSi;0Z_1k4vWcfn%T<$jEMj#HW7dWV;715EAk)bZ^A(y0k`O>Ar32q&i z1l%=~Q!b-TpDqB)T4w0{E(laZkwP=kECsU+J$h@+YK9D|mPm45R@Mq_!YY&oCJPN0%wK=;`-LHdb77vd1EKHqhuCPn@`s*f751(AhC zIV0+a8TlKxGVJ#39~@X&8ff?e&`CpPXkS%P7wcWDiBcLsj}l?u?G2ObiGMK(c&2(` zzSMO=h|vC?uJ!a{^=X3Gw&{kxAuuo|GZfjih2f{KTq&apUaPZS&((7CAGD_=h_Gt$ z;nG`*vvEIvVN^Dkf`9x5fV}!5SipAx!%7x-0y?O0L}23dzn$ zoWV@#%Ff~Ai$6hP@L+q@=2e>&czyeNdiG>z=jO&dueqn_Ep7P3ECfLe*u=wk?y+ zro&Z=?6sN& z8_AxLRqE)jdj}SMVk)=^ZqfaE0iQ+=< z=x)Y4Oc3~4oTB&hX8~SFz=X#OYe#fIgA12k_BrJ9W5rS7HAZipq-UMl=H}c8BmL=L zpB6#1m988(SbZ>x&e>>Z)61RzF2Xv-$iGs-Y;^mxfp%sKI&ZYJv@nV@Bq6Z8QAv}~ zD3FkB2aUF%n~c>Eaye+2HogqvC#ew>+c~I|shxS~%$YM)1N#guXMar%Rn|Xdw7lDD zEgqD#N|PcPWly(a5;_7s;P2kl`T{Yv8ivIf*CX4;=m8r6wD-r^+5Ft48Wfhp)LOB3 zR=%a`0I~#L=c2BF801NgQ?L=7+685(>W@T;7BqAV@Bqb$5 z-$@lh`(8zy2vhv3bmI_`~7G-GwQ&uN8XPy<8QYhl^w0ljB#%mov@bap1h|f=cEnnPZ z2;{X`45I%uWP3sud3U^r93otpFnL6N{5JXG`s9I?wfIDfJ1bu+MWoJu-5BkhL!o@RFu+~`TNKPrjHKVFj>m5;J)M88&#Jc? ztt7(=V^S^jmzFtW>G$?as-*X)?c%J5S!LOqIX^{@|E|xJ{7)H%n~$>?I!vg#mz6JW z&!pd#1H)XOH@|+t=gl@?91>hpyY`m#k<-f2N|-wd*~>#?yMJ+fc&W0xbJd&LqoaI# z_O9MwcP#dLmJ{u_F8lWB+0<_8EUc6_wvP3$CFak0yw159Tw~j|CL*fe_?Fk&beWdn z1qmDH6ln0Y#Nzvlr@r{yUW(`(nI^daz_+%=SmZ^WMS~>>i`{0!uqlB-`b+u+La%^v z|0cXXyJTG79{2#|4Yr6?MEzF#m`?cCxZ%@TN65B%vk+5ecUQb0mZ~elCr8>_VS1u^ zwOuIgqh2veYklKbIE=pN^%>$5c|tk@=URg4opbVhziuYB+NBT4_<6q8zol}Rs-CcE zBwRo7I~U#UMRJJvv(#tLSkX}$-Xqd}B)bijW;Ub!1?yA2rzdmCj6r}YB(=lh=HMWd zW9U@n9UL58_TtLXKPM+@4>Oktsx$Can#fkJh={#OoItAn8R-O6TP00B*e$`Z!MmZA zAdRkC!p>3ouQMq3o~c^BkTE1=J^FYUmhTOj+iYQE|3ZpzxVXqF=Nw`R>uULs3z@G* zO7gH$lCn}~Cmi@xtE?PhRR)J^?hcOXzG9EG+}yoqUo&6~DtglP;WmN?$kPLFC_wWcqHMER)=C3&Z>26-L*Okn3V|vgfvE?n%9+vMxp>8{I=`6PwVkjZpp-xP z29e%G(kf|WbOQi0?wA0G@~YBDK}%BAPW7uJEG@bipS61Ar!BxIl4ROX~}d^fL08Q zPyyF}|MDour@j1^ep=J<7gR@%^Gtc{-0kIjz2s_{=YtI}nesuDe!cG5(Hfce8f4Wo zM6nKP+_cUGqykqcfB8rH#DWEn?@uPe0`Q%T6@p0*5geDsVl&(?oEb>9%R1v&CB$a! zy-B;b_w#;q>KM^VQ-%AI6mtM~@vQ>Ql4ng5LWfG6odr^*-|#M0e<+FT7; z10qctl~~TBJ5_k)1Mb>#rCTbM?WPSfL835(6zDABukC07IKcwxW7L$I@6JpGW`c$232R485q_re$mX!A1xTzz_H@0 z^JcR5V2z&{t6}0eoI*^ZJt7XX@2 zm=*S{e?3{|y!J7tAjBYaC(wpQ{o@l=pNm_@6y*MzIL%F2zLOTX?7oO;yU9FuE zlI~L`o#D?!ASEqEjXBpa4Ar~$gjw0{XLmt~O0)n!QmfPb%^i_!C05jNx9fca4Ho-T zK>kT5Yy>v|KPTc7^!|Xa^2}QBQ30SX|Js5Ya%8NlSoa_NwEPM$2Yey;1Fur^FL`Jg z2B9B^56W*3CtqT*_D8q>W5?2P&a=Agw2LqG^Is1a-)xs;jF`QC-NOAV&9UU_*_Lv6S%FMoes?DAPNkIMdaBZutb+if<}!5>by>$7>k z3LBic!;f`Z^;k{GR$2p@N9C zI`Pne0rTJQH6d2kpB=HMhLdA5z>rf1@n86_28)c!3K6T^74MOa7fB0VlF~AGC;9l; zyT1-DxIsugXy7DBGhSP++S@j)X^oe6&@6u VkRkFO9wzX0L`|FW Date: Thu, 5 Feb 2026 12:26:03 -0500 Subject: [PATCH 2/7] formal api model: minor fixes fix some formatting and grammar errors --- content/api_mem_model.tex | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/content/api_mem_model.tex b/content/api_mem_model.tex index 3c79acdae..dfbb48fa9 100644 --- a/content/api_mem_model.tex +++ b/content/api_mem_model.tex @@ -11,7 +11,7 @@ \subsubsection{Baseline PL-level Axiomatic Memory Model (C++)}\label{subsubsec:a The C++ memory model is defined axiomatically in the C++ specification. Although many prior works aim to mathematize or improve this formalism, here we simply try to provide sufficient understanding to build upon it with the \openshmem API model. -Under the PL-level C++ memory model, events consist of reads (\textit{R}), writes (\textit{W}), atomic read-modify-writes (\textit{RMW}) perform both a read and write to memory, and fences (\textit{F}). Events that access memory may be atomic or non-atomic. Atomic events are associated with a memory order which can be acquire, release, acquire-release, or relaxed (we leave out consume and sequential consistent memory orders here for simplicity, but this does not impact how the PL-level semantics are extended for the API-level model). The following relations are defined over the events in a candidate execution, and will be used to derive API-level relations in Section~\ref{subsubsec:api_relations}: +Under the PL-level C++ memory model, events consist of reads (\textit{R}), writes (\textit{W}), atomic read-modify-writes (\textit{RMW}, which perform both a read and write to memory), and fences (\textit{F}). Events that access memory may be atomic or non-atomic. Atomic events are associated with a memory order which can be acquire, release, acquire-release, or relaxed (we leave out consume and sequential consistent memory orders here for simplicity, but this does not impact how the PL-level semantics are extended for the API-level model). The following relations are defined over the events in a candidate execution, and will be used to derive API-level relations in Section~\ref{subsubsec:api_relations}: \begin{itemize} \item \textbf{Sequenced before (\textit{sb}):} Relates memory events from the same thread based on the order they are evaluated (also called program order in some contexts) @@ -114,23 +114,23 @@ \subsubsubsection{Remote Delivery Order (rdo)} Remote delivery order relates accesses separated by an \openshmem fence operation. However, this relation does not establish full connectivity; it only forms a connection between observable accesses to the same PE from operations \textit{hb} ordered before and after the fence, and between \textit{hb} ordered accesses before a fence and observable accesses from operations \textit{hb} ordered after a fence. Specifically, $(A, B) \in rdo$ only if there is an \openshmem fence operation $Fop$ and any of the following conditions are true: \begin{itemize} - \item $A$ is a memory access that is \textit{hb} ordered before $Fop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $Fop$, and the context associated with $Fop$ does NOT have the $OPENSHMEM_CTX_NOSTORE$ option enabled. + \item $A$ is a memory access that is \textit{hb} ordered before $Fop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $Fop$, and the context associated with $Fop$ does NOT have the $OPENSHMEM\_CTX\_NOSTORE$ option enabled. \item $A$ is an observable access issued by a fence-ordered operation (defined below) that is \textit{hb} ordered before $Fop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $Fop$, and $A$ and $B$ target the same PE. \end{itemize} -\textbf{fence-ordered operations}: blocking or nonblocking put, blocking or nonblocking atomic memory operation, blocking or nonblocking put\_with\_signal. +\textbf{fence-ordered operations}: blocking or nonblocking put, blocking or nonblocking atomic memory operation, blocking or nonblocking put\_signal. \subsubsubsection{Remote Completion Order (rco)} Remote completion order establishes full connectivity between accesses separated by an \openshmem quiet or barrier operation. All thread accesses or observable accesses that are \textit{hb} ordered before the quiet or barrier operation are ordered before all thread accesses or observable accesses that are \textit{hb} ordered after the quiet or barrier operation. Specifically, $(A, B) \in rdo$ only if there is an \openshmem quiet or barrier operation $QBop$ and any of the following conditions are true: \begin{itemize} - \item $A$ is a memory access that is \textit{hb} ordered before $QBop$, $B$ is an access that is \textit{hb} ordered after $QBop$, and the context associated with $QBop$ does NOT have the $OPENSHMEM_CTX_NOSTORE$ option enabled. - \item $A$ is a memory access that is \textit{hb} ordered before $QBop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $QBop$, and the context associated with $QBop$ does NOT have the $OPENSHMEM_CTX_NOSTORE$ option enabled. + \item $A$ is a memory access that is \textit{hb} ordered before $QBop$, $B$ is an access that is \textit{hb} ordered after $QBop$, and the context associated with $QBop$ does NOT have the $OPENSHMEM\_CTX\_NOSTORE$ option enabled. + \item $A$ is a memory access that is \textit{hb} ordered before $QBop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $QBop$, and the context associated with $QBop$ does NOT have the $OPENSHMEM\_CTX\_NOSTORE$ option enabled. \item $A$ is an observable access issued by an operation that is \textit{hb} ordered before $QBop$ and $B$ is an access that is \textit{hb} ordered after $QBop$. \item $A$ is an observable memory access issued by a qb-ordered operation (defined below) that is \textit{hb} ordered before $QBop$ and $B$ is an observable access issued by an operation that is \textit{hb} ordered after $QBop$. \end{itemize} -\textbf{qb-ordered operations}: blocking or nonblocking put, nonblocking get, blocking or nonblocking atomic memory operation, blocking or nonblocking put\_with\_signal. +\textbf{qb-ordered operations}: blocking or nonblocking put, nonblocking get, blocking or nonblocking atomic memory operation, blocking or nonblocking put\_signal. \subsubsubsection{API Synchronizes-With (asw)} @@ -185,7 +185,7 @@ \subsubsection{Updating the PL-Level Model to an API-Level Model}\label{subsubse \item $acyclic(api\_hb)$: \textit{hb} cannot have any cycles. \item $irreflexive(rf;api\_hb)$: \textit{hb} must be consistent with the \textbf{rf} relation. \item $irreflexive((rf^{-1})^{?};mo;rf^?;api\_hb)$: \textit{hb} must be consistent with the \textit{mo} order of writes, as well as any reads that read from writes in mo order (often called coherence requirement). - \item $irreflexive(rf;(sb^amo\_op)^? \cup mo;mo;(;sb^{amo\_op})^?;rf^-1 \cup mo;rf;(sb^{amo\_op})^?$ where $sb^{amo\_op}$ relates $sb$ ordered observable accesses issued by the same \openshmem atomic memory operation: The read of an atomic read-modify-write access must always read the last value (in the modification order) written before the write of the read-modify-write access. Similarly, the observable read of an \openshmem atomic memory operation must always read the last value (in the modification order) written before the observable write of the atomic memory operation. + \item $irreflexive(rf;(sb^{amo\_op})^? \cup mo;mo;(;sb^{amo\_op})^?;rf^-1 \cup mo;rf;(sb^{amo\_op})^?$ where $sb^{amo\_op}$ relates $sb$ ordered observable accesses issued by the same \openshmem atomic memory operation: The read of an atomic read-modify-write access must always read the last value (in the modification order) written before the write of the read-modify-write access. Similarly, the observable read of an \openshmem atomic memory operation must always read the last value (in the modification order) written before the observable write of the atomic memory operation. \end{itemize} Finally, similar to the C++ memory model, if an API data race exists in a candidate execution that satisfies the above axioms, then the program's behavior is undefined. If an API data race does not exist, then the program's behavior must conform to one of the axiom-satisfying candidate executions. From d48a22d9d1dda237de02f118891f86c5bb51c4f4 Mon Sep 17 00:00:00 2001 From: John Alsop Date: Thu, 12 Feb 2026 16:10:14 -0600 Subject: [PATCH 3/7] formal api model: update data race definition define data race to include racy API observable accesses and normal PL-level accesses (including atomics). If these race, behavior is undefined. --- content/api_mem_model.tex | 12 ++++++------ content/api_observable.tex | 2 +- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/content/api_mem_model.tex b/content/api_mem_model.tex index dfbb48fa9..3b9384c8d 100644 --- a/content/api_mem_model.tex +++ b/content/api_mem_model.tex @@ -132,16 +132,16 @@ \subsubsubsection{Remote Completion Order (rco)} \textbf{qb-ordered operations}: blocking or nonblocking put, nonblocking get, blocking or nonblocking atomic memory operation, blocking or nonblocking put\_signal. -\subsubsubsection{API Synchronizes-With (asw)} +\subsubsubsection{API Synchronizes-With (asw)}\label{subsubsubsec:api_relations_asw} API synchronizes-with is the API-level analog of the PL-level \textit{sw} relation. It relates observable accesses from legally synchronizing \openshmem operations that form a \textit{rf} relationship. Specifically, $(A, B) \in asw$ only if $(A, B) \in rf$ and any of the following are true: \begin{itemize} - \item $A$ is an observable read or read-modify-write access issued by an \openshmem operation from any of the following categories: - atomic memory operations, signaling operations, point-to-point synchronization routines, or distributed locking routines. - \item $B$ is an observable write or read-modify-write access issued by an \openshmem operation from any of the following categories: - atomic memory operations, signaling operations, point-to-point synchronization routines, or distributed locking routines. + \item $A$ is an observable read or read-modify-write access issued by a synchronizing \openshmem operation. + \item $B$ is an observable write or read-modify-write access issued by a synchronizing \openshmem operation. \end{itemize} +In the above, "synchronizing \openshmem operations" are defined as any operation from the following categories: atomic memory operations, signaling operations, point-to-point synchronization routings, or distributed locking routines. + \subsubsection{Example API Execution}\label{subsubsubsec:api_relations_example} \begin{figure}[h] @@ -176,7 +176,7 @@ \subsubsection{Updating the PL-Level Model to an API-Level Model}\label{subsubse \begin{itemize} \item $A$ and $B$ access the same address on the same PE. \item At least one access is a write. - \item At least one access is not atomic. + \item Both $A$ and $B$ are normal accesses and at least one of them is not atomic, or both $A$ and $B$ are observable memory accesses issued by \openshmem operations and at least one of them is issued by a non-synchronizing \openshmem operation (defined in Section~\ref{subsubsubsec:api_relations_asw}), or one of the accesses is a normal memory access and the other is an observable memory access issued by an \openshmem operation. \item $A$ and $B$ are not related by \textit{api\_hb}. \end{itemize} diff --git a/content/api_observable.tex b/content/api_observable.tex index 77ec917d9..9e4f4ff28 100644 --- a/content/api_observable.tex +++ b/content/api_observable.tex @@ -257,7 +257,7 @@ ST R1 -> dest[i], p \end{verbatim} \tabularnewline\hline -\multirow{15}{*}{Point-to-Point \\ Synchronization Routines} +\multirow{15}{*}{\shortstack[l]{Point-to-Point\\ Synchronization Routines}} & \begin{verbatim}shmem_wait_until(ivar, cmp, cmp_value): R1 = LD ivar, my_pe while !cmp.test(R1, cmp_value): From 27a272fe4326aed74de1ee04dc78c1a1473b9adb Mon Sep 17 00:00:00 2001 From: John Alsop Date: Fri, 13 Feb 2026 12:13:30 -0600 Subject: [PATCH 4/7] formal api model: minor fixes and clarifications fix a few instances of errors identified by reviewers --- content/api_mem_model.tex | 14 +++++++------- content/api_observable.tex | 32 ++++++++++++++++---------------- 2 files changed, 23 insertions(+), 23 deletions(-) diff --git a/content/api_mem_model.tex b/content/api_mem_model.tex index 3b9384c8d..2dc609f1b 100644 --- a/content/api_mem_model.tex +++ b/content/api_mem_model.tex @@ -75,7 +75,7 @@ \subsubsection{Per-Operation Observable Accesses}\label{subsubsec:api_observable \begin{itemize} \item $B$ takes a value derived from $A$ as an operand. \item $B$ has a control dependency on the result of $A$ such that $A$'s evaluation determines whether $B$ will occur. - \item $B$ has an ordering dependency $A$. This type of dependency currently exists only in put\_signal[\_nbi] where the final store to $sig_addr$ is dependent on all prior observable accesses. + \item $B$ has an ordering dependency $A$. This type of dependency currently exists only in put\_signal[\_nbi] where the final store to $sig_addr$ is ordered after all prior observable accesses to source and dest. \end{itemize} Many operations have no visible accesses (e.g., memory ordering routines, whose semantics are defined by the API-level relations described in Section~\ref{subsubsec:api_relations}). @@ -114,7 +114,7 @@ \subsubsubsection{Remote Delivery Order (rdo)} Remote delivery order relates accesses separated by an \openshmem fence operation. However, this relation does not establish full connectivity; it only forms a connection between observable accesses to the same PE from operations \textit{hb} ordered before and after the fence, and between \textit{hb} ordered accesses before a fence and observable accesses from operations \textit{hb} ordered after a fence. Specifically, $(A, B) \in rdo$ only if there is an \openshmem fence operation $Fop$ and any of the following conditions are true: \begin{itemize} - \item $A$ is a memory access that is \textit{hb} ordered before $Fop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $Fop$, and the context associated with $Fop$ does NOT have the $OPENSHMEM\_CTX\_NOSTORE$ option enabled. + \item $A$ is a memory access that is \textit{hb} ordered before $Fop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $Fop$, and the context associated with $Fop$ does NOT have the $SHMEM\_CTX\_NOSTORE$ option enabled. \item $A$ is an observable access issued by a fence-ordered operation (defined below) that is \textit{hb} ordered before $Fop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $Fop$, and $A$ and $B$ target the same PE. \end{itemize} @@ -124,8 +124,8 @@ \subsubsubsection{Remote Completion Order (rco)} Remote completion order establishes full connectivity between accesses separated by an \openshmem quiet or barrier operation. All thread accesses or observable accesses that are \textit{hb} ordered before the quiet or barrier operation are ordered before all thread accesses or observable accesses that are \textit{hb} ordered after the quiet or barrier operation. Specifically, $(A, B) \in rdo$ only if there is an \openshmem quiet or barrier operation $QBop$ and any of the following conditions are true: \begin{itemize} - \item $A$ is a memory access that is \textit{hb} ordered before $QBop$, $B$ is an access that is \textit{hb} ordered after $QBop$, and the context associated with $QBop$ does NOT have the $OPENSHMEM\_CTX\_NOSTORE$ option enabled. - \item $A$ is a memory access that is \textit{hb} ordered before $QBop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $QBop$, and the context associated with $QBop$ does NOT have the $OPENSHMEM\_CTX\_NOSTORE$ option enabled. + \item $A$ is a memory access that is \textit{hb} ordered before $QBop$, $B$ is an access that is \textit{hb} ordered after $QBop$, and the context associated with $QBop$ does NOT have the $SHMEM\_CTX\_NOSTORE$ option enabled. + \item $A$ is a memory access that is \textit{hb} ordered before $QBop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $QBop$, and the context associated with $QBop$ does NOT have the $SHMEM\_CTX\_NOSTORE$ option enabled. \item $A$ is an observable access issued by an operation that is \textit{hb} ordered before $QBop$ and $B$ is an access that is \textit{hb} ordered after $QBop$. \item $A$ is an observable memory access issued by a qb-ordered operation (defined below) that is \textit{hb} ordered before $QBop$ and $B$ is an observable access issued by an operation that is \textit{hb} ordered after $QBop$. \end{itemize} @@ -182,9 +182,9 @@ \subsubsection{Updating the PL-Level Model to an API-Level Model}\label{subsubse The following axioms are similarly updated to use \textit{api\_hb} rather than \textit{hb}, and to extend RMW atomicity to \openshmem atomic memory operations: \begin{itemize} - \item $acyclic(api\_hb)$: \textit{hb} cannot have any cycles. - \item $irreflexive(rf;api\_hb)$: \textit{hb} must be consistent with the \textbf{rf} relation. - \item $irreflexive((rf^{-1})^{?};mo;rf^?;api\_hb)$: \textit{hb} must be consistent with the \textit{mo} order of writes, as well as any reads that read from writes in mo order (often called coherence requirement). + \item $acyclic(api\_hb)$: \textit{api_hb} cannot have any cycles. + \item $irreflexive(rf;api\_hb)$: \textit{api_hb} must be consistent with the \textbf{rf} relation. + \item $irreflexive((rf^{-1})^{?};mo;rf^?;api\_hb)$: \textit{api_hb} must be consistent with the \textit{mo} order of writes, as well as any reads that read from writes in mo order (often called coherence requirement). \item $irreflexive(rf;(sb^{amo\_op})^? \cup mo;mo;(;sb^{amo\_op})^?;rf^-1 \cup mo;rf;(sb^{amo\_op})^?$ where $sb^{amo\_op}$ relates $sb$ ordered observable accesses issued by the same \openshmem atomic memory operation: The read of an atomic read-modify-write access must always read the last value (in the modification order) written before the write of the read-modify-write access. Similarly, the observable read of an \openshmem atomic memory operation must always read the last value (in the modification order) written before the observable write of the atomic memory operation. \end{itemize} diff --git a/content/api_observable.tex b/content/api_observable.tex index 9e4f4ff28..b9867c95a 100644 --- a/content/api_observable.tex +++ b/content/api_observable.tex @@ -31,7 +31,7 @@ \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_g(dest, value, pe): - return LD value -> dest, pe + return LD value \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_iget(dest, source, dst, sst, nelems, pe): @@ -63,67 +63,67 @@ \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_fetch_inc(dest, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 + 1 ST R2 -> dest, pe return R1 \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_inc(dest, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 + 1 ST R2 -> dest, pe return R1 \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_fetch_add(dest, value, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 + value ST R2 -> dest, pe return R1 \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_add(dest, value, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 + value ST R2 -> dest, pe \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_fetch_and(dest, value, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 & value ST R2 -> dest, pe return R1 \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_and(dest, value, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 & value ST R2 -> dest, pe \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_fetch_or(dest, value, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 | value ST R2 -> dest, pe return R1 \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_or(dest, value, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 | value ST R2 -> dest, pe \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_fetch_xor(dest, value, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 ^ value ST R2 -> dest, pe return R1 \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_xor(dest, value, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 ^ value ST R2 -> dest, pe \end{verbatim} @@ -149,35 +149,35 @@ \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_fetch_inc_nbi(dest, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 + 1 ST R2 -> dest, pe ST R1 -> fetch, my_pe \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_fetch_add_nbi(dest, value, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 + value ST R2 -> dest, pe ST R1 -> fetch, my_pe \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_fetch_and_nbi(dest, value, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 & value ST R2 -> dest ST R1 -> fetch, my_pe \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_fetch_or_nbi(dest, value, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 | value ST R2 -> dest ST R1 -> fetch, my_pe \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_fetch_xor_nbi(dest, value, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 ^ value ST R2 -> dest ST R1 -> fetch, my_pe From b8b423318baeb7a29f86872474d4ce4135822182 Mon Sep 17 00:00:00 2001 From: John Alsop Date: Thu, 19 Feb 2026 10:10:00 -0600 Subject: [PATCH 5/7] formal api model: minor fixes and edits for clarity -text edits for clarity, consistency with the rest of the spec and the c++ spec (e.g., address->memory location, brief prose description of acquire/release semantics...) -fix unescaped underscore bug --- content/api_mem_model.tex | 44 +++++++++++++++++++-------------------- 1 file changed, 22 insertions(+), 22 deletions(-) diff --git a/content/api_mem_model.tex b/content/api_mem_model.tex index 2dc609f1b..e7c0bd7f4 100644 --- a/content/api_mem_model.tex +++ b/content/api_mem_model.tex @@ -1,30 +1,28 @@ To precisely define what behavior the \openshmem interface guarantees for both users and implementers, we formally define an API-level memory model. An API-level model builds upon a programming language-level (PL-level) memory model, and it defines how API operations impact memory. This is useful for determining the ordering and visibility requirements for \openshmem memory accesses when implemented on any device, but it is of particular importance for devices like GPUs that have software managed caches requiring expensive bulk flush or invalidate actions to enforce memory ordering guarantees. Without a precisely specified memory model, API implementers and users may disagree on what behavior is guaranteed by the API, and this can lead to poor performance (too many bulk cache actions) or buggy code (too few bulk cache actions). -Although \openshmem may be implemented in multiple programming languages, this section describes how \openshmem extends the C++ memory consistency model. C++ is chosen as an example because of its generality, and because it forms the basis for common GPU programming languages (CUDA, hip). We describe how the guarantees described here can be adapted for GPU memory models in Section~\ref{subsubsec:api_scoped}. +This section describes how \openshmem extends the C++ memory consistency model. C++ is chosen as an example because of its generality and ubiquity in the CPU domain, and because it forms the basis for common GPU programming languages (CUDA, hip). Section~\ref{subsubsec:api_pl_baseline} describes the baseline C++ memory model, Sections~\ref{subsubsec:api_model_overview}-\ref{subsubsec:api_integrate} describe how this programming language-level model is extended to provide API-level semantics for \openshmem, and Section~\ref{subsubsec:api_scoped} describes how the guarantees of the API-level model can be adapted for GPU memory models. \subsubsection{Baseline PL-level Axiomatic Memory Model (C++)}\label{subsubsec:api_pl_baseline} -A memory consistency model (MCM) defines what values loads are allowed to return in a multi-threaded program. One common way to formally specify a MCM is via axioms over relations between events in a dynamic execution. Axiomatic formalization begins by considering all possible “candidate” executions of a parallel program. Loads may return the value of a conflicting store anywhere else in the execution, or the initial memory value at that address (i.e., allowing all possible reorderings). The model then defines relations between accesses in the candidate executions, and using axioms over these relations, defines which executions are legal (hence restricting which reorderings are allowed by the compiler or hardware). - - +A memory consistency model (MCM) defines what values loads are allowed to return in a multi-threaded program. One common way to formally specify a MCM is via axioms over relations between events in a dynamic execution. Axiomatic formalization begins by considering all possible “candidate” executions of a parallel program. Loads may return the value of a conflicting store anywhere else in the execution, or the initial value at that memory location (i.e., allowing all possible reorderings). The model then defines relations between accesses in the candidate executions, and using axioms over these relations, defines which executions are legal (hence restricting which reorderings are allowed by the compiler or hardware). The C++ memory model is defined axiomatically in the C++ specification. Although many prior works aim to mathematize or improve this formalism, here we simply try to provide sufficient understanding to build upon it with the \openshmem API model. -Under the PL-level C++ memory model, events consist of reads (\textit{R}), writes (\textit{W}), atomic read-modify-writes (\textit{RMW}, which perform both a read and write to memory), and fences (\textit{F}). Events that access memory may be atomic or non-atomic. Atomic events are associated with a memory order which can be acquire, release, acquire-release, or relaxed (we leave out consume and sequential consistent memory orders here for simplicity, but this does not impact how the PL-level semantics are extended for the API-level model). The following relations are defined over the events in a candidate execution, and will be used to derive API-level relations in Section~\ref{subsubsec:api_relations}: +Under the PL-level C++ memory model, events consist of reads (\textit{R}), writes (\textit{W}), atomic read-modify-writes (\textit{RMW}, which perform both a read and write to memory), and fences (\textit{F}). Events that access memory may be atomic or non-atomic. Atomic events are associated with a memory order which can be acquire, release, acquire-release, or relaxed\footnote{we leave out consume and sequential consistent memory orders here for simplicity, but this does not impact how the PL-level semantics are extended for the API-level model}. These memory orders determine how atomic accesses can legally synchronize (release and acquire are used with the source and destination of a synchronizing access pair, respectively). The following relations are defined over the events in a candidate execution, and will be used to derive API-level relations in Section~\ref{subsubsec:api_relations}: \begin{itemize} \item \textbf{Sequenced before (\textit{sb}):} Relates memory events from the same thread based on the order they are evaluated (also called program order in some contexts) - \item \textbf{Modification order (\textit{mo}):} A total ordering between all writes to the same address. - \item \textbf{Reads-from (\textit{rf}):} Relates a write \textit{W} and any read \textit{R} to the same address such that \textit{R} returns the value written by \textit{W}. + \item \textbf{Modification order (\textit{mo}):} A total ordering between all modifying accesses (writes or read-modify-writes) to the same memory location. + \item \textbf{Reads-from (\textit{rf}):} Relates a write \textit{W} and any read \textit{R} to the same memory location such that \textit{R} returns the value written by \textit{W}. \item \textbf{Synchronizes-with (\textit{sw}):} Same as \textit{rf}, but also requires that \textit{W} and \textit{R} are atomic, \textit{W} has release or acquire-release memory order, and \textit{R} has acquire or acquire-release memory order (this relation is also called synchronization order in some contexts). \item \textbf{Happens-before (\textit{hb}):} Transitive closure of \textit{sw} and \textit{sb}. \end{itemize} Furthermore, a \textbf{data race} is defined as a relation between two accesses $A$ and $B$ where all of the following are true: \begin{itemize} - \item $A$ and $B$ access the same address. - \item At least one access is a write. + \item $A$ and $B$ access the same memory location. + \item At least one access is a modifying access (write or read-modify-write). \item At least one access is not atomic. \item $A$ and $B$ are not related by \textit{hb}. \end{itemize} @@ -50,7 +48,7 @@ \subsubsection{API-Level Memory Model Overview}\label{subsubsec:api_model_overvi To describe the desired \openshmem behavior, we make the following additions to the existing PL-level model: \begin{enumerate} - \item\label{mcmsteps:one} \textbf{PEs and operation events}: Introduce a new event type that corresponds to an API operation and a new address property that specifies the target PE. + \item\label{mcmsteps:one} \textbf{PEs and operation events}: Introduce a new event type that corresponds to an API operation and a new memory location property that specifies the target PE. \item\label{mcmsteps:two} \textbf{Per-operation observable accesses}: For each operation type, define a set of observable memory accesses (these are not related to the initiating thread by \textit{sb}). \item\label{mcmsteps:three} \textbf{API-level relations}: Define new API-level ordering relations over operation events and their observable accesses that will constrain \openshmem memory ordering. \item\label{mcmsteps:four} \textbf{Fused collectives}: Fuse matched collective operation calls into a single event. @@ -63,15 +61,17 @@ \subsubsection{API-Level Memory Model Overview}\label{subsubsec:api_model_overvi \subsubsection{PEs and Operation Events}\label{subsubsec:api_pes_operations} -When defining behavior at the \openshmem API level, we must first consider two key differences in program execution relative to a PL-level model. First, \openshmem applications span multiple PEs, each of which has its own local memory space. Therefore, the PL-level definition of memory address is extended to include both the PE-local address and the PE ID. Normal memory accesses can only access addresses on the PE local to the thread that issued it, which is constant for the life of a thread. \openshmem operations may trigger accesses to the local PE’s memory and/or the memory of remote PEs. Dynamic memory accesses in an execution are only considered to access the same address if both the address and PE ID attributes match (impacting the definitions of of \textit{mo} and \textit{rf}). +When defining behavior at the \openshmem API level, we must first consider two key differences in program execution relative to a PL-level model. First, \openshmem applications span multiple PEs, each of which has its own local memory space. Therefore, the PL-level definition of memory location is extended to include both the PE-local memory address and the PE ID. Normal memory accesses can only access memory locations on the local PE, which is constant for the life of a thread. \openshmem operations may trigger accesses to the local PE’s memory and/or the memory of remote PEs. Dynamic memory accesses in an execution are only considered to access the same memory location if both the local memory address and PE ID attributes match (impacting the definitions of of \textit{mo} and \textit{rf}). Second, since the \openshmem API does not define the underlying PL-level implementation, API operations do not resolve to a function consisting of PL-level events. Thus, these operations must be treated as a new type of primitive in the API-level memory model: the operation event. An operation event specifies its type, and each type defines the set of minimum observable accesses triggered by the operation (described next). Operation events participate in \textit{sb} and derived relations, similar to normal thread access events. Although operation events may be associated with separate memory access events (see Section~\ref{subsubsec:api_observable}), the operation events themselves do not participate directly in access-based relations like \textit{mo}, \textit{rf}, and \textit{sw}. \subsubsection{Per-Operation Observable Accesses}\label{subsubsec:api_observable} -For the purposes of the API memory model, each \openshmem operation type defines the minimum set of observable memory accesses performed by the operation. These accesses are considered to be "issued" by the associated. These accesses must occur as part of the operation, however there is no guarantee that they will be executed by the initiating thread, or that they will be the only accesses performed by the API operation implementation. Observable accesses have the same semantics as normal accesses and take part in the same relations, with one exception: there is no \textit{sb} relationship between observable accesses and the initiating thread of an API operation. Accesses performed by an implementation that are not included in the observable set must not modify data that may be observed from application code external to \openshmem operation implementations. +For the purposes of the API memory model, each \openshmem operation type defines the minimum set of observable memory accesses performed by the operation. These accesses are considered to be ``issued'' by the associated operation. These accesses must occur as part of the operation, however the API-level model places no constraints on the thread that executes observable accesses (i.e., they may or may not be executed by the initiating thread), and it does not prevent an implementation from performing other (non-observable) accesses. Observable accesses have the same semantics as normal accesses and take part in the same relations, with one exception: there is no \textit{sb} relationship between observable accesses of an API operation event and any other event issued by the same thread, except as defined later in Section~\ref{subsubsec:api_relations}. +Note, however, that the operation event that issues these observable accesses \textit{does} form an \textit{sb} relationship with other events issued by the initiating thread. +In addition, accesses performed by an implementation that are not included in the observable set must not modify data that may be observed from application code external to \openshmem operation implementations. -Table~\ref{table:api_observable} defines the observable accesses performed for each \openshmem operation. Each access denotes the address as well as the PE it targets ($my\_pe$ refers to the PE of the initiating thread). In many cases, we also show control and compute pseudocode to define which addresses are accessed, which values are stored, and which values must eventually be loaded\footnote{We use pseudocode rather than C++ here because it allows to clearly specify where memory accesses are required.}. However, this single-threaded control flow does not constrain the actual implementation and does not necessarily imply a total \textit{sb} ordering between observable accesses. Within an operation, \textit{sb} is defined to exist only between observable accesses when there is a true control or data dependency between two accesses. Specifically, $(A, B) \in sb$ only if $A$ and $B$ are observable accesses of the same operation and one of the following is true: +Table~\ref{table:api_observable} defines the observable accesses performed for each \openshmem operation. Each access denotes the memory location as well as the PE it targets ($my\_pe$ refers to the PE of the initiating thread). In many cases, we also show control and compute pseudocode to define which memory locations are accessed, which values are stored, and which values must eventually be loaded\footnote{We use pseudocode rather than C++ here because it allows to clearly specify where memory accesses are required.}. However, this single-threaded control flow does not constrain the actual implementation and does not necessarily imply a total \textit{sb} ordering between observable accesses. Within an operation, \textit{sb} is defined to exist only between observable accesses when there is a true control or data dependency between two accesses. Specifically, $(A, B) \in sb$ only if $A$ and $B$ are observable accesses of the same operation and one of the following is true: \begin{itemize} \item $B$ takes a value derived from $A$ as an operand. \item $B$ has a control dependency on the result of $A$ such that $A$'s evaluation determines whether $B$ will occur. @@ -122,7 +122,7 @@ \subsubsubsection{Remote Delivery Order (rdo)} \subsubsubsection{Remote Completion Order (rco)} -Remote completion order establishes full connectivity between accesses separated by an \openshmem quiet or barrier operation. All thread accesses or observable accesses that are \textit{hb} ordered before the quiet or barrier operation are ordered before all thread accesses or observable accesses that are \textit{hb} ordered after the quiet or barrier operation. Specifically, $(A, B) \in rdo$ only if there is an \openshmem quiet or barrier operation $QBop$ and any of the following conditions are true: +Remote completion order establishes full connectivity between accesses separated by an \openshmem quiet or barrier operation. All thread accesses or observable accesses that are \textit{hb} ordered before the quiet or barrier operation are ordered before all thread accesses or observable accesses that are \textit{hb} ordered after the quiet or barrier operation. Specifically, $(A, B) \in rco$ only if there is an \openshmem quiet or barrier operation $QBop$ and any of the following conditions are true: \begin{itemize} \item $A$ is a memory access that is \textit{hb} ordered before $QBop$, $B$ is an access that is \textit{hb} ordered after $QBop$, and the context associated with $QBop$ does NOT have the $SHMEM\_CTX\_NOSTORE$ option enabled. \item $A$ is a memory access that is \textit{hb} ordered before $QBop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $QBop$, and the context associated with $QBop$ does NOT have the $SHMEM\_CTX\_NOSTORE$ option enabled. @@ -145,15 +145,15 @@ \subsubsubsection{API Synchronizes-With (asw)}\label{subsubsubsec:api_relations_ \subsubsection{Example API Execution}\label{subsubsubsec:api_relations_example} \begin{figure}[h] -\includegraphics[width=0.75\textwidth]{figures/api_relations.png} +\includegraphics[width=0.75\textwidth]{figures/api_relations_v2.png} \centering -\caption{Example \openshmem code execution depicting observable accesses, API-level relations, and fused collective.} +\caption{Example \openshmem code execution depicting observable accesses, example API-level relations, and a fused barrier collective operation event.} \label{fig:api_execution} \end{figure} Figure~\ref{fig:api_execution} illustrates how the above concepts are applied in an example \openshmem code execution. \openshmem operations are denoted by rounded boxes, and their observable accesses are shown in dotted square boxes. Matched barrier calls from each PE are fused into a single operation event. -Operations are ordered within the calling thread by \textit{sb}, but their observable accesses are constrained only by the added API relations (in addition to \textit{sb} orderings within the operation): +Operations are ordered within the calling thread by \textit{sb}, but their observable accesses are constrained only by the added API relations, described below (in addition to \textit{sb} orderings within the operation): \begin{itemize} \item \textit{ilv} orders the store of $a$ in PE0 (and all prior hb-ordered accesses) with the load of $a$ from the put operation. \item \textit{lco} orders the load of $a$ in PE1 (and all subsequent hb-ordered accesses) with the load of $b$ from the (blocking) wait\_until operation. @@ -161,7 +161,7 @@ \subsubsection{Example API Execution}\label{subsubsubsec:api_relations_example} \item \textit{rco} orders all \textit{hb} ordered accesses and the accesses of \textit{hb} ordered \openshmem operations before and after the fused barrier event spanning PE0 and PE1. \item \textit{asw} orders the store of $b$ from the atomic\_set operation in PE0 with the consuming (\textit{rf} related) load of $b$ from the wait\_until operation in PE1. \end{itemize} - +Note that~\ref{fig:api_execution} depicts only a representative sample of API relations, since an exhaustive depiction would be cluttered. \subsubsection{Updating the PL-Level Model to an API-Level Model}\label{subsubsec:api_integrate} @@ -174,7 +174,7 @@ \subsubsection{Updating the PL-Level Model to an API-Level Model}\label{subsubse In addition, an \textbf{API data race} is defined to relate any two accesses $A$ and $B$ where all of the following are true: \begin{itemize} - \item $A$ and $B$ access the same address on the same PE. + \item $A$ and $B$ access the same memory location on the same PE. \item At least one access is a write. \item Both $A$ and $B$ are normal accesses and at least one of them is not atomic, or both $A$ and $B$ are observable memory accesses issued by \openshmem operations and at least one of them is issued by a non-synchronizing \openshmem operation (defined in Section~\ref{subsubsubsec:api_relations_asw}), or one of the accesses is a normal memory access and the other is an observable memory access issued by an \openshmem operation. \item $A$ and $B$ are not related by \textit{api\_hb}. @@ -182,9 +182,9 @@ \subsubsection{Updating the PL-Level Model to an API-Level Model}\label{subsubse The following axioms are similarly updated to use \textit{api\_hb} rather than \textit{hb}, and to extend RMW atomicity to \openshmem atomic memory operations: \begin{itemize} - \item $acyclic(api\_hb)$: \textit{api_hb} cannot have any cycles. - \item $irreflexive(rf;api\_hb)$: \textit{api_hb} must be consistent with the \textbf{rf} relation. - \item $irreflexive((rf^{-1})^{?};mo;rf^?;api\_hb)$: \textit{api_hb} must be consistent with the \textit{mo} order of writes, as well as any reads that read from writes in mo order (often called coherence requirement). + \item $acyclic(api\_hb)$: \textit{api\_hb} cannot have any cycles. + \item $irreflexive(rf;api\_hb)$: \textit{api\_hb} must be consistent with the \textbf{rf} relation. + \item $irreflexive((rf^{-1})^{?};mo;rf^?;api\_hb)$: \textit{api\_hb} must be consistent with the \textit{mo} order of writes, as well as any reads that read from writes in mo order (often called coherence requirement). \item $irreflexive(rf;(sb^{amo\_op})^? \cup mo;mo;(;sb^{amo\_op})^?;rf^-1 \cup mo;rf;(sb^{amo\_op})^?$ where $sb^{amo\_op}$ relates $sb$ ordered observable accesses issued by the same \openshmem atomic memory operation: The read of an atomic read-modify-write access must always read the last value (in the modification order) written before the write of the read-modify-write access. Similarly, the observable read of an \openshmem atomic memory operation must always read the last value (in the modification order) written before the observable write of the atomic memory operation. \end{itemize} From 3aadcbd3578706e157161ececa20e26a7c65ab8a Mon Sep 17 00:00:00 2001 From: John Alsop Date: Thu, 26 Feb 2026 18:00:17 -0600 Subject: [PATCH 6/7] formal api model: C++ -> C11 Use C11 as the PL-level basis for the formal API-level model. This is consistent with the rest of the document, and differs from the C++ model in no relevant ways. --- content/api_mem_model.tex | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/content/api_mem_model.tex b/content/api_mem_model.tex index e7c0bd7f4..11b6d8880 100644 --- a/content/api_mem_model.tex +++ b/content/api_mem_model.tex @@ -1,15 +1,15 @@ To precisely define what behavior the \openshmem interface guarantees for both users and implementers, we formally define an API-level memory model. An API-level model builds upon a programming language-level (PL-level) memory model, and it defines how API operations impact memory. This is useful for determining the ordering and visibility requirements for \openshmem memory accesses when implemented on any device, but it is of particular importance for devices like GPUs that have software managed caches requiring expensive bulk flush or invalidate actions to enforce memory ordering guarantees. Without a precisely specified memory model, API implementers and users may disagree on what behavior is guaranteed by the API, and this can lead to poor performance (too many bulk cache actions) or buggy code (too few bulk cache actions). -This section describes how \openshmem extends the C++ memory consistency model. C++ is chosen as an example because of its generality and ubiquity in the CPU domain, and because it forms the basis for common GPU programming languages (CUDA, hip). Section~\ref{subsubsec:api_pl_baseline} describes the baseline C++ memory model, Sections~\ref{subsubsec:api_model_overview}-\ref{subsubsec:api_integrate} describe how this programming language-level model is extended to provide API-level semantics for \openshmem, and Section~\ref{subsubsec:api_scoped} describes how the guarantees of the API-level model can be adapted for GPU memory models. +This section describes how \openshmem extends the C11 memory consistency model. C11 is chosen as an example because of its generality and ubiquity in the CPU domain, and because it forms the basis for common GPU programming languages (CUDA, hip). Section~\ref{subsubsec:api_pl_baseline} summarizes the baseline C11 memory model, Sections~\ref{subsubsec:api_model_overview}-\ref{subsubsec:api_integrate} describe how this programming language-level model is extended to provide API-level semantics for \openshmem, and Section~\ref{subsubsec:api_scoped} describes how the guarantees of the API-level model can be adapted for GPU memory models. -\subsubsection{Baseline PL-level Axiomatic Memory Model (C++)}\label{subsubsec:api_pl_baseline} +\subsubsection{Baseline PL-level Axiomatic Memory Model (C11)}\label{subsubsec:api_pl_baseline} A memory consistency model (MCM) defines what values loads are allowed to return in a multi-threaded program. One common way to formally specify a MCM is via axioms over relations between events in a dynamic execution. Axiomatic formalization begins by considering all possible “candidate” executions of a parallel program. Loads may return the value of a conflicting store anywhere else in the execution, or the initial value at that memory location (i.e., allowing all possible reorderings). The model then defines relations between accesses in the candidate executions, and using axioms over these relations, defines which executions are legal (hence restricting which reorderings are allowed by the compiler or hardware). -The C++ memory model is defined axiomatically in the C++ specification. Although many prior works aim to mathematize or improve this formalism, here we simply try to provide sufficient understanding to build upon it with the \openshmem API model. +The C11 memory model is defined axiomatically in the C11 specification \S5.1.2.4\footnote{Formally, the \Cstd[11] specification is ISO/IEC 9899:2011(E).}. Although many prior works aim to mathematize or improve this formalism, here we simply try to provide sufficient understanding to build upon it with the \openshmem API model. -Under the PL-level C++ memory model, events consist of reads (\textit{R}), writes (\textit{W}), atomic read-modify-writes (\textit{RMW}, which perform both a read and write to memory), and fences (\textit{F}). Events that access memory may be atomic or non-atomic. Atomic events are associated with a memory order which can be acquire, release, acquire-release, or relaxed\footnote{we leave out consume and sequential consistent memory orders here for simplicity, but this does not impact how the PL-level semantics are extended for the API-level model}. These memory orders determine how atomic accesses can legally synchronize (release and acquire are used with the source and destination of a synchronizing access pair, respectively). The following relations are defined over the events in a candidate execution, and will be used to derive API-level relations in Section~\ref{subsubsec:api_relations}: +Under the PL-level C11 memory model, events consist of reads (\textit{R}), writes (\textit{W}), atomic read-modify-writes (\textit{RMW}, which perform both a read and write to memory), and fences (\textit{F}). Events that access memory may be atomic or non-atomic. Atomic events are associated with a memory order which can be acquire, release, acquire-release, or relaxed\footnote{We leave out consume and sequential consistent memory orders here for simplicity, but this does not impact how the PL-level semantics are extended for the API-level model}. These memory orders determine how atomic accesses can legally synchronize (release and acquire are used with the source and destination of a synchronizing access pair, respectively). The following relations are defined over the events in a candidate execution, and will be used to derive API-level relations in Section~\ref{subsubsec:api_relations}: \begin{itemize} \item \textbf{Sequenced before (\textit{sb}):} Relates memory events from the same thread based on the order they are evaluated (also called program order in some contexts) @@ -27,7 +27,7 @@ \subsubsection{Baseline PL-level Axiomatic Memory Model (C++)}\label{subsubsec:a \item $A$ and $B$ are not related by \textit{hb}. \end{itemize} -Although the axiomatic model places no requirements on when each event must physically update or access physical memory, it prohibits executions that don't adhere to a set of constraints, and by extension it disallows any architecture or compilation scheme that can result in such an execution. Specifically, C++ constrains candidate execution with the following axioms\footnote{When using SC memory order, there is an additional total order over all SC atomics that must similarly be consistent with happens-before}: +Although the axiomatic model places no requirements on when each event must update or access physical memory, it prohibits executions that don't adhere to a set of constraints, and by extension it disallows any architecture or compilation scheme that can result in such an execution. Specifically, C11 constrains candidate execution with the following axioms\footnote{Instead of the verbose prose axioms defined in the C11 specification, here we use the more concise (and equivalent) representation of the C11 axioms outlined by Batty et al. in "Overhauling SC atomics in C11 and OpenCL" (Symposium on the Principles of Programming Languages, 2016).}: \textbf{Axioms} @@ -39,11 +39,11 @@ \subsubsection{Baseline PL-level Axiomatic Memory Model (C++)}\label{subsubsec:a \end{itemize} -Finally, the C++ specification states that, if a data race exists in a candidate execution that satisfies the above axioms, then the program's behavior is undefined. If a data race does not exist, then the program's behavior must conform to one of the conforming candidate executions. +Finally, the C11 specification states that, if a data race exists in a candidate execution that satisfies the above axioms, then the program's behavior is undefined. If a data race does not exist, then the program's behavior must conform to one of the conforming candidate executions. \subsubsection{API-Level Memory Model Overview}\label{subsubsec:api_model_overview} -The \openshmem API-level memory model extends PL-level memory models like C++ to define how \openshmem operations interact with memory. While PL-level models operate on fine-grain loads, stores, and fences, the API-level memory model must additionally operate on coarse-grain API operations like get, put, barrier, etc. that can execute asynchronously and involve multiple PEs. +The \openshmem API-level memory model extends PL-level memory models like C11 to define how \openshmem operations interact with memory. While PL-level models operate on fine-grain loads, stores, and fences, the API-level memory model must additionally operate on coarse-grain API operations like get, put, barrier, etc. that can execute asynchronously and involve multiple PEs. To describe the desired \openshmem behavior, we make the following additions to the existing PL-level model: @@ -57,7 +57,7 @@ \subsubsection{API-Level Memory Model Overview}\label{subsubsec:api_model_overvi Each of the above steps adds a piece of required support. \ref{mcmsteps:one} provides a basis for operation-level semantics across multiple PE memories. \ref{mcmsteps:two} enables asynchrony since a thread can continue execution while observable accesses execute in the background. \ref{mcmsteps:three} allows the programmer to explicitly enforce ordering of asynchronous operations using the ordering and synchronization operations provided by \openshmem. \ref{mcmsteps:four} Supports multi-PE collective operations and establishes ordering across PEs even when there are no observable accesses. \ref{mcmsteps:five} formally updates the PL-level axioms to establish a complete API-level memory model. -After defining how \openshmem extends the C++ memory model, we describe how the addition of scoped synchronization in GPU memory models impacts the API-level model. +After defining how \openshmem extends the C11 memory model, we describe how the addition of scoped synchronization in GPU memory models impacts the API-level model. \subsubsection{PEs and Operation Events}\label{subsubsec:api_pes_operations} @@ -71,7 +71,7 @@ \subsubsection{Per-Operation Observable Accesses}\label{subsubsec:api_observable Note, however, that the operation event that issues these observable accesses \textit{does} form an \textit{sb} relationship with other events issued by the initiating thread. In addition, accesses performed by an implementation that are not included in the observable set must not modify data that may be observed from application code external to \openshmem operation implementations. -Table~\ref{table:api_observable} defines the observable accesses performed for each \openshmem operation. Each access denotes the memory location as well as the PE it targets ($my\_pe$ refers to the PE of the initiating thread). In many cases, we also show control and compute pseudocode to define which memory locations are accessed, which values are stored, and which values must eventually be loaded\footnote{We use pseudocode rather than C++ here because it allows to clearly specify where memory accesses are required.}. However, this single-threaded control flow does not constrain the actual implementation and does not necessarily imply a total \textit{sb} ordering between observable accesses. Within an operation, \textit{sb} is defined to exist only between observable accesses when there is a true control or data dependency between two accesses. Specifically, $(A, B) \in sb$ only if $A$ and $B$ are observable accesses of the same operation and one of the following is true: +Table~\ref{table:api_observable} defines the observable accesses performed for each \openshmem operation. Each access denotes the memory location as well as the PE it targets ($my\_pe$ refers to the PE of the initiating thread). In many cases, we also show control and compute pseudocode to define which memory locations are accessed, which values are stored, and which values must eventually be loaded\footnote{We use pseudocode rather than C11 here because it allows us to more clearly specify where memory accesses are required.}. However, this single-threaded control flow does not constrain the actual implementation and does not necessarily imply a total \textit{sb} ordering between observable accesses. Within an operation, \textit{sb} is defined to exist only between observable accesses when there is a true control or data dependency between two accesses. Specifically, $(A, B) \in sb$ only if $A$ and $B$ are observable accesses of the same operation and one of the following is true: \begin{itemize} \item $B$ takes a value derived from $A$ as an operand. \item $B$ has a control dependency on the result of $A$ such that $A$'s evaluation determines whether $B$ will occur. @@ -81,7 +81,7 @@ \subsubsection{Per-Operation Observable Accesses}\label{subsubsec:api_observable \input{content/api_observable} -\subsubsection{Fused Collectives}\label{subsubsec:api_integrate} +\subsubsection{Fused Collectives}\label{subsubsec:api_fused} \openshmem defines multiple collective operations that synchronize across multiple PEs in a team, in some cases without any observable accesses to memory (e.g., as is the case with barrier). There are multiple requirements on the use of these operations. In particular, each dynamic call to a collective operation must be paired with a matching call from every other PE in the team. @@ -165,8 +165,8 @@ \subsubsection{Example API Execution}\label{subsubsubsec:api_relations_example} \subsubsection{Updating the PL-Level Model to an API-Level Model}\label{subsubsec:api_integrate} -We leverage the newly defined observable accesses and ordering relations to define the API-level memory model, which is based on the C++ model described in Section~\ref{subsubsec:api_pl_baseline}. -We first define a new relation, API happens-before (\textit{api\_hb}), to be the transitive closure of the C++ \textit{hb} and all API-level ordering relations introduced in Section~\ref{subsubsec:api_relations}: +We leverage the newly defined observable accesses and ordering relations to define the API-level memory model, which is based on the C11 model summarized in Section~\ref{subsubsec:api_pl_baseline}. +We first define a new relation, API happens-before (\textit{api\_hb}), to be the transitive closure of the C11 \textit{hb} and all API-level ordering relations introduced in Section~\ref{subsubsec:api_relations}: \begin{itemize} \item $api\_hb := (hb \cup ilv \cup lco \cup rdo \cup rco \cup asw)^+$ @@ -188,11 +188,11 @@ \subsubsection{Updating the PL-Level Model to an API-Level Model}\label{subsubse \item $irreflexive(rf;(sb^{amo\_op})^? \cup mo;mo;(;sb^{amo\_op})^?;rf^-1 \cup mo;rf;(sb^{amo\_op})^?$ where $sb^{amo\_op}$ relates $sb$ ordered observable accesses issued by the same \openshmem atomic memory operation: The read of an atomic read-modify-write access must always read the last value (in the modification order) written before the write of the read-modify-write access. Similarly, the observable read of an \openshmem atomic memory operation must always read the last value (in the modification order) written before the observable write of the atomic memory operation. \end{itemize} -Finally, similar to the C++ memory model, if an API data race exists in a candidate execution that satisfies the above axioms, then the program's behavior is undefined. If an API data race does not exist, then the program's behavior must conform to one of the axiom-satisfying candidate executions. +Finally, similar to the C11 memory model, if an API data race exists in a candidate execution that satisfies the above axioms, then the program's behavior is undefined. If an API data race does not exist, then the program's behavior must conform to one of the axiom-satisfying candidate executions. \subsubsection{Compatibility with a Scoped Memory Model}\label{subsubsec:api_scoped} -Memory models for accelerators like GPUs often extend CPU-oriented memory models like C++ with the notion of synchronization scopes. Scopes are necessary to enable efficient synchronization in a system with software-managed cache coherence. A scope is a property assigned to each atomic access that corresponds to a grouping of threads in a program execution. The threads in a grouping may share a cache in the cache hierarchy, such that communication between those threads can occur more efficiently at that cache level. +Memory models for accelerators like GPUs often extend CPU-oriented memory models like C11 with the notion of synchronization scopes. Scopes are necessary to enable efficient synchronization in a system with software-managed cache coherence. A scope is a property assigned to each atomic access that corresponds to a grouping of threads in a program execution. The threads in a grouping may share a cache in the cache hierarchy, such that communication between those threads can occur more efficiently at that cache level. When synchronizing between two threads, the synchronizing accesses must use a scope that contains both synchronizing threads; if they do not, this represents a heterogeneous race in the scoped memory model, and program behavior is undefined. Properly scoped synchronization is critical for GPU applications, but it is not necessary to expose it at the level of the \openshmem API. Scopes are important for communication between threads that share a cache level, but the threads participating in inter-PE communication governed by \openshmem generally do not share a cache. From d00c0f02e36535586a570aaf4f05cb6f20ef710f Mon Sep 17 00:00:00 2001 From: John Alsop Date: Fri, 27 Feb 2026 11:20:17 -0600 Subject: [PATCH 7/7] formal api memory model: minor fixes, formatting updates Minor fixes to observable accesses, grammar, and formatting for consistency. --- content/api_mem_model.tex | 28 +++++++++++++--------------- content/api_observable.tex | 8 ++++---- 2 files changed, 17 insertions(+), 19 deletions(-) diff --git a/content/api_mem_model.tex b/content/api_mem_model.tex index 11b6d8880..7f3d89f0a 100644 --- a/content/api_mem_model.tex +++ b/content/api_mem_model.tex @@ -1,7 +1,7 @@ To precisely define what behavior the \openshmem interface guarantees for both users and implementers, we formally define an API-level memory model. An API-level model builds upon a programming language-level (PL-level) memory model, and it defines how API operations impact memory. This is useful for determining the ordering and visibility requirements for \openshmem memory accesses when implemented on any device, but it is of particular importance for devices like GPUs that have software managed caches requiring expensive bulk flush or invalidate actions to enforce memory ordering guarantees. Without a precisely specified memory model, API implementers and users may disagree on what behavior is guaranteed by the API, and this can lead to poor performance (too many bulk cache actions) or buggy code (too few bulk cache actions). -This section describes how \openshmem extends the C11 memory consistency model. C11 is chosen as an example because of its generality and ubiquity in the CPU domain, and because it forms the basis for common GPU programming languages (CUDA, hip). Section~\ref{subsubsec:api_pl_baseline} summarizes the baseline C11 memory model, Sections~\ref{subsubsec:api_model_overview}-\ref{subsubsec:api_integrate} describe how this programming language-level model is extended to provide API-level semantics for \openshmem, and Section~\ref{subsubsec:api_scoped} describes how the guarantees of the API-level model can be adapted for GPU memory models. +This section describes how \openshmem extends the C11 memory consistency model. C11 is chosen as an example because of its generality and ubiquity in the CPU domain, and because it forms the basis for common GPU programming languages (for example, CUDA, hip). Section~\ref{subsubsec:api_pl_baseline} summarizes the baseline C11 memory model, Sections~\ref{subsubsec:api_model_overview}-\ref{subsubsec:api_integrate} describe how this programming language-level model is extended to provide API-level semantics for \openshmem, and Section~\ref{subsubsec:api_scoped} describes how the guarantees of the API-level model can be adapted for GPU memory models. \subsubsection{Baseline PL-level Axiomatic Memory Model (C11)}\label{subsubsec:api_pl_baseline} @@ -29,13 +29,11 @@ \subsubsection{Baseline PL-level Axiomatic Memory Model (C11)}\label{subsubsec:a Although the axiomatic model places no requirements on when each event must update or access physical memory, it prohibits executions that don't adhere to a set of constraints, and by extension it disallows any architecture or compilation scheme that can result in such an execution. Specifically, C11 constrains candidate execution with the following axioms\footnote{Instead of the verbose prose axioms defined in the C11 specification, here we use the more concise (and equivalent) representation of the C11 axioms outlined by Batty et al. in "Overhauling SC atomics in C11 and OpenCL" (Symposium on the Principles of Programming Languages, 2016).}: -\textbf{Axioms} - \begin{itemize} \item $acyclic(hb)$: \textit{hb} cannot have any cycles. - \item $irreflexive(rf;hb)$: \textit{hb} must be consistent with the \textbf{rf} relation. - \item $irreflexive((rf^{-1})^{?};mo;rf^?;hb)$: \textit{hb} must be consistent with the \textit{mo} order of writes, as well as any reads that read from writes in mo order (often called coherence requirement). - \item $irreflexive(rf \cup (mo;mo;rf^-1) \cup (mo;rf))$: The read of an atomic read-modify-write access must always read the last value (in the modification order) written before the write of the read-modify-write access. + \item $irreflexive(rf;hb)$: \textit{hb} must be consistent with the \textit{rf} relation. + \item $irreflexive((rf^{-1})^{?};mo;rf^?;hb)$: \textit{hb} must be consistent with the \textit{mo} order of writes, as well as any reads that read from writes in \textit{mo} order (often called coherence requirement). + \item $irreflexive(rf \cup (mo;mo;rf^{-1}) \cup (mo;rf))$: The read of an atomic read-modify-write access must always read the last value (in \textit{mo}) written before the write of the read-modify-write access. \end{itemize} @@ -75,7 +73,7 @@ \subsubsection{Per-Operation Observable Accesses}\label{subsubsec:api_observable \begin{itemize} \item $B$ takes a value derived from $A$ as an operand. \item $B$ has a control dependency on the result of $A$ such that $A$'s evaluation determines whether $B$ will occur. - \item $B$ has an ordering dependency $A$. This type of dependency currently exists only in put\_signal[\_nbi] where the final store to $sig_addr$ is ordered after all prior observable accesses to source and dest. + \item $B$ has an ordering dependency $A$. This type of dependency currently exists only in put\_signal[\_nbi] where the final store to $sig\_addr$ is ordered after all prior observable accesses to source and dest. \end{itemize} Many operations have no visible accesses (e.g., memory ordering routines, whose semantics are defined by the API-level relations described in Section~\ref{subsubsec:api_relations}). @@ -140,7 +138,7 @@ \subsubsubsection{API Synchronizes-With (asw)}\label{subsubsubsec:api_relations_ \item $B$ is an observable write or read-modify-write access issued by a synchronizing \openshmem operation. \end{itemize} -In the above, "synchronizing \openshmem operations" are defined as any operation from the following categories: atomic memory operations, signaling operations, point-to-point synchronization routings, or distributed locking routines. +In the above, "synchronizing \openshmem operations" are defined as any operation from the following categories: atomic memory operations, signaling operations, point-to-point synchronization routines, or distributed locking routines. \subsubsection{Example API Execution}\label{subsubsubsec:api_relations_example} @@ -155,9 +153,9 @@ \subsubsection{Example API Execution}\label{subsubsubsec:api_relations_example} Matched barrier calls from each PE are fused into a single operation event. Operations are ordered within the calling thread by \textit{sb}, but their observable accesses are constrained only by the added API relations, described below (in addition to \textit{sb} orderings within the operation): \begin{itemize} -\item \textit{ilv} orders the store of $a$ in PE0 (and all prior hb-ordered accesses) with the load of $a$ from the put operation. +\item \textit{ilv} orders the store of $a$ in PE0 (and all prior \textit{hb}-ordered accesses) with the load of $a$ from the put operation. \item \textit{lco} orders the load of $a$ in PE1 (and all subsequent hb-ordered accesses) with the load of $b$ from the (blocking) wait\_until operation. -\item \textit{rdo} orders the bulk store of a from the put operation in PE0 with the store of $b$ from the fence-separated atomic\_set operation. +\item \textit{rdo} orders the bulk store of $a$ from the put operation in PE0 with the store of $b$ from the fence-separated atomic\_set operation. \item \textit{rco} orders all \textit{hb} ordered accesses and the accesses of \textit{hb} ordered \openshmem operations before and after the fused barrier event spanning PE0 and PE1. \item \textit{asw} orders the store of $b$ from the atomic\_set operation in PE0 with the consuming (\textit{rf} related) load of $b$ from the wait\_until operation in PE1. \end{itemize} @@ -175,7 +173,7 @@ \subsubsection{Updating the PL-Level Model to an API-Level Model}\label{subsubse In addition, an \textbf{API data race} is defined to relate any two accesses $A$ and $B$ where all of the following are true: \begin{itemize} \item $A$ and $B$ access the same memory location on the same PE. - \item At least one access is a write. + \item At least one access is a modifying access (write or read-modify-write). \item Both $A$ and $B$ are normal accesses and at least one of them is not atomic, or both $A$ and $B$ are observable memory accesses issued by \openshmem operations and at least one of them is issued by a non-synchronizing \openshmem operation (defined in Section~\ref{subsubsubsec:api_relations_asw}), or one of the accesses is a normal memory access and the other is an observable memory access issued by an \openshmem operation. \item $A$ and $B$ are not related by \textit{api\_hb}. \end{itemize} @@ -183,9 +181,9 @@ \subsubsection{Updating the PL-Level Model to an API-Level Model}\label{subsubse The following axioms are similarly updated to use \textit{api\_hb} rather than \textit{hb}, and to extend RMW atomicity to \openshmem atomic memory operations: \begin{itemize} \item $acyclic(api\_hb)$: \textit{api\_hb} cannot have any cycles. - \item $irreflexive(rf;api\_hb)$: \textit{api\_hb} must be consistent with the \textbf{rf} relation. - \item $irreflexive((rf^{-1})^{?};mo;rf^?;api\_hb)$: \textit{api\_hb} must be consistent with the \textit{mo} order of writes, as well as any reads that read from writes in mo order (often called coherence requirement). - \item $irreflexive(rf;(sb^{amo\_op})^? \cup mo;mo;(;sb^{amo\_op})^?;rf^-1 \cup mo;rf;(sb^{amo\_op})^?$ where $sb^{amo\_op}$ relates $sb$ ordered observable accesses issued by the same \openshmem atomic memory operation: The read of an atomic read-modify-write access must always read the last value (in the modification order) written before the write of the read-modify-write access. Similarly, the observable read of an \openshmem atomic memory operation must always read the last value (in the modification order) written before the observable write of the atomic memory operation. + \item $irreflexive(rf;api\_hb)$: \textit{api\_hb} must be consistent with the \textit{rf} relation. + \item $irreflexive((rf^{-1})^{?};mo;rf^?;api\_hb)$: \textit{api\_hb} must be consistent with the \textit{mo} order of writes, as well as any reads that read from writes in \textit{mo} order (often called coherence requirement). + \item $irreflexive(rf;(sb^{amo\_op})^? \cup mo;mo;(;sb^{amo\_op})^?;rf^{-1} \cup mo;rf;(sb^{amo\_op})^?$ where $sb^{amo\_op}$ relates $sb$ ordered observable accesses issued by the same \openshmem atomic memory operation: The read of an atomic read-modify-write access must always read the last value (in the modification order) written before the write of the read-modify-write access. Similarly, the observable read of an \openshmem atomic memory operation must always read the last value (in the modification order) written before the observable write of the atomic memory operation. \end{itemize} Finally, similar to the C11 memory model, if an API data race exists in a candidate execution that satisfies the above axioms, then the program's behavior is undefined. If an API data race does not exist, then the program's behavior must conform to one of the axiom-satisfying candidate executions. @@ -196,5 +194,5 @@ \subsubsection{Compatibility with a Scoped Memory Model}\label{subsubsec:api_sco When synchronizing between two threads, the synchronizing accesses must use a scope that contains both synchronizing threads; if they do not, this represents a heterogeneous race in the scoped memory model, and program behavior is undefined. Properly scoped synchronization is critical for GPU applications, but it is not necessary to expose it at the level of the \openshmem API. Scopes are important for communication between threads that share a cache level, but the threads participating in inter-PE communication governed by \openshmem generally do not share a cache. -Of course, implementing \openshmem operations may require synchronization and communication between threads in a PE that can benefit from scopes, e.g., when offloading a non-blocking operation to a helper thread. However, this synchronization should be entirely performed within the \openshmem implementation transparently to the user of \openshmem API. In this way, \openshmem users and implementers may freely and independently optimize their own code without agreeing on the relative locations of \openshmem helper threads. +Of course, implementing \openshmem operations may require synchronization and communication between threads in a PE that can benefit from scopes, e.g., when offloading a non-blocking operation to a helper thread. However, this synchronization should be entirely performed within the \openshmem implementation transparently to the user of the \openshmem API. In this way, \openshmem users and implementers may freely and independently optimize their own code without agreeing on the relative locations of \openshmem helper threads. This policy has the added benefit of retaining a single simple and portable API-level interface for \openshmem that may be applied on top of memory models that may or may not use scoped synchronization. \ No newline at end of file diff --git a/content/api_observable.tex b/content/api_observable.tex index b9867c95a..91cfd3661 100644 --- a/content/api_observable.tex +++ b/content/api_observable.tex @@ -30,8 +30,8 @@ ST R1 -> dest[i], my_pe \end{verbatim} \tabularnewline\cline{2-2} - & \begin{verbatim}shmem_g(dest, value, pe): - return LD value + & \begin{verbatim}shmem_g(source, pe): + return LD source \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_iget(dest, source, dst, sst, nelems, pe): @@ -56,7 +56,7 @@ return R1 \end{verbatim} \tabularnewline\cline{2-2} - & \begin{verbatim}shmem_atomic_swap(dest, cond, value, pe): + & \begin{verbatim}shmem_atomic_swap(dest, value, pe): R1 = LD dest, pe ST value -> dest, pe return R1 @@ -141,7 +141,7 @@ ST R1 -> fetch, my_pe \end{verbatim} \tabularnewline\cline{2-2} - & \begin{verbatim}shmem_atomic_swap_nbi(fetch, dest, cond, value, + & \begin{verbatim}shmem_atomic_swap_nbi(fetch, dest, value, pe): R1 = LD dest, pe ST value -> dest, pe