diff --git a/drafts/25-WIP-collective-edits.txt b/drafts/25-WIP-collective-edits.txt new file mode 100644 index 0000000..ac1a67e --- /dev/null +++ b/drafts/25-WIP-collective-edits.txt @@ -0,0 +1,426 @@ +To: J3 J3/25-#### +From: Brandon Cook & Dan Bonachea +Subject: Edits for US20 Collective Subroutines for Prefix Reductions +Date: 2025-October-17 +References: 25-144r1, 25-177r1, 25-166r2, 25-195r1, 25-127r1, + 25-007r1, WG5/N-2239 + +1. Background +============= + +The Fortran 202Y work list (WG5/N-2239) includes work item US20: +"Add Intrinsic and collective subroutines for prefix operations" + +Paper 25-144r1 "Requirements for US20: Collective Subroutines for +Prefix Operations" presents illustrative use cases and requirements +for collective subroutines for prefix reduction. That paper was +passed at J3 meeting #236 in June 2025. Specifications and syntax for +the collective subroutine variants of prefix reduction operations, +25-177r1, was passed in the October 2025 meeting #237. + +2. Syntax Adjustments +===================== + +Since the passage of 25-177r1, subsequent papers 25-166r2 and 25-195r1 +have suggested additional syntax adjustments in order to maintain +uniformity with closely related features under concurrent development. + +Syntax changes in this paper, relative to 25-177r1 are as follows: + +1. Additional forms have been introduced to accommodate the + presence of the TEAM argument (work item DIN1, 25-127r1) and the + COMPLETION argument (work item US04, 25-166r2). + +2. The IDENTITY argument to CO_REDUCE_PREFIX_EXCLUSIVE has been renamed to + INITIAL (as recommended by 25-195r1). + +Note that a combined edits paper for orthogonal work-items DIN1 and US04 is +still forthcoming, which will provide the edits in section 16.6 that are +cross-referenced by the edits in this paper. + +3. Edits Relative to 25-007r1 +============================= + +------------------------------------------------------------------------- +[xv] Add to "Intrinsic procedures" the sentences: + +"The new intrinsic subroutines CO_SUM_PREFIX_INCLUSIVE, +CO_SUM_PREFIX_EXCLUSIVE, CO_REDUCE_PREFIX_INCLUSIVE, and +CO_REDUCE_PREFIX_EXCLUSIVE perform collective prefix reduction +operations across images." + +------------------------------------------------------------------------- +[383] In 16.7 Standard generic intrinsic procedures, Table 16.1, +after the entry for CO_REDUCE add two new entries (with four forms each): + +" +CO_REDUCE_PREFIX_EXCLUSIVE (A, OPERATION, INITIAL [, STAT, ERRMSG]) or \ + C Generalized exclusive prefix reduction across images. +CO_REDUCE_PREFIX_EXCLUSIVE (A, OPERATION, INITIAL, COMPLETION \ + [, STAT, ERRMSG]) or +CO_REDUCE_PREFIX_EXCLUSIVE (A, OPERATION, INITIAL, TEAM \ + [, STAT, ERRMSG]) or +CO_REDUCE_PREFIX_EXCLUSIVE (A, OPERATION, INITIAL, TEAM, COMPLETION \ + [, STAT, ERRMSG]) +" + +and: + +" +CO_REDUCE_PREFIX_INCLUSIVE (A, OPERATION [, STAT, ERRMSG]) or \ + C Generalized inclusive prefix reduction across images. +CO_REDUCE_PREFIX_INCLUSIVE (A, OPERATION, COMPLETION \ + [, STAT, ERRMSG]) or +CO_REDUCE_PREFIX_INCLUSIVE (A, OPERATION, TEAM \ + [, STAT, ERRMSG]) or +CO_REDUCE_PREFIX_INCLUSIVE (A, OPERATION, TEAM, COMPLETION \ + [, STAT, ERRMSG]) +" + +------------------------------------------------------------------------- +[383] In 16.7 Standard generic intrinsic procedures, Table 16.1, +after the entry for CO_SUM add two new entries (with four forms each): + +" +CO_SUM_PREFIX_EXCLUSIVE (A, [, STAT, ERRMSG]) or \ + C Compute exclusive prefix sum across images. +CO_SUM_PREFIX_EXCLUSIVE (A, COMPLETION [, STAT, ERRMSG]) or +CO_SUM_PREFIX_EXCLUSIVE (A, TEAM [, STAT, ERRMSG]) or +CO_SUM_PREFIX_EXCLUSIVE (A, TEAM, COMPLETION [, STAT, ERRMSG]) +" + +and: + +" +CO_SUM_PREFIX_INCLUSIVE (A, [, STAT, ERRMSG]) or \ + C Compute inclusive prefix sum across images. +CO_SUM_PREFIX_INCLUSIVE (A, COMPLETION [, STAT, ERRMSG]) or +CO_SUM_PREFIX_INCLUSIVE (A, TEAM [, STAT, ERRMSG]) or +CO_SUM_PREFIX_INCLUSIVE (A, TEAM, COMPLETION [, STAT, ERRMSG]) +" + +------------------------------------------------------------------------- +[411:20+] In 16.9 Specifications of the standard intrinsic procedures, +after the specification of CO_REDUCE, add: + +16.9.?? \ +CO_REDUCE_PREFIX_EXCLUSIVE (A, OPERATION, INITIAL [, STAT, ERRMSG]) or +CO_REDUCE_PREFIX_EXCLUSIVE (A, OPERATION, INITIAL, COMPLETION \ + [, STAT, ERRMSG]) or +CO_REDUCE_PREFIX_EXCLUSIVE (A, OPERATION, INITIAL, TEAM \ + [, STAT, ERRMSG]) or +CO_REDUCE_PREFIX_EXCLUSIVE (A, OPERATION, INITIAL, TEAM, COMPLETION \ + [, STAT, ERRMSG]) + +<> Generalized exclusive prefix reduction across images. + +<> Collective subroutine. + +<> + +A shall not be polymorphic. It shall not be of a type with an ultimate + component that is allocatable or a pointer. It shall have the same shape, + type, and type parameter values, in corresponding references. It shall + not be a coindexed object. It is an INTENT (INOUT) argument. + + If A is scalar, the computed value provided to any given image is the + result of the exclusive prefix reduction operation described below. + If A is an array, each element of the computed value provided to any + given image is equal to the result of the exclusive prefix reduction + operation described below, as applied to corresponding elements of A in + corresponding references. + + The computed value is assigned to A if no error condition occurs. + Otherwise, A becomes undefined. + +INITIAL shall be a scalar with the same declared type and type parameter + values as A. It is an INTENT (IN) argument. INITIAL shall have the + same value in corresponding references. + +OPERATION shall be a pure function with exactly two arguments; the result + and each argument shall be a scalar, nonallocatable, noncoarray, + nonpointer, nonpolymorphic data object with the same type and + type parameter values as A. The arguments shall not be optional. + If one argument has the ASYNCHRONOUS, TARGET, or VALUE attribute, + the other shall have that attribute. OPERATION shall implement a + mathematically associative operation. OPERATION shall be the same + function on all images in corresponding references. + + The computed value for an exclusive prefix reduction over a list of + values is the result of an iterative process. Each scalar input value + provided by image i in the specified team is referred to as A_i. The + corresponding computed result value provided to image i in the specified + team is referred to as R_i. S_i is initially the ordered list [INITIAL, + A_1, ..., A_{i-1}]. Each iteration starts with a processor-dependent + choice of item x from the list S_i. Adjacent items x and y (where x + precedes y) are removed from the list and replaced with the value of + OPERATION(x, y). The process terminates when the list has only one item; + this is the computed value of R_i. + +TEAM shall be a scalar of type TEAM_TYPE from the intrinsic module + ISO_FORTRAN_ENV. It is an INTENT (IN) argument. + +COMPLETION shall be a scalar of type COMPLETION_TYPE from the intrinsic + module ISO_FORTRAN_ENV. It is an INTENT (INOUT) argument. + +STAT (optional) shall be a noncoindexed integer scalar with a decimal + exponent range of at least four. It is an INTENT (OUT) argument. + +ERRMSG (optional) shall be a noncoindexed default character scalar. It + is an INTENT (INOUT) argument. + +The semantics of TEAM, COMPLETION, STAT and ERRMSG are described in 16.6. + +<> The subroutine below demonstrates how to use +CO_REDUCE_PREFIX_EXCLUSIVE to perform a collective exclusive prefix +reduction analogous to the intrinsic function MAXLOC: + +SUBROUTINE co_prefix_maxloc(value, image) + USE, INTRINSIC :: IEEE_ARITHMETIC, ONLY: IEEE_VALUE, IEEE_NEGATIVE_INF + REAL, INTENT(INOUT) :: value + INTEGER, INTENT(OUT) :: image + + TYPE :: tuple + REAL :: value + INTEGER :: image + END TYPE + TYPE(tuple) :: t + + t = tuple(value, THIS_IMAGE()) + CALL CO_REDUCE_PREFIX_EXCLUSIVE(t, OPERATION=find_maxloc, & + INITIAL=tuple(IEEE_VALUE(1.0,IEEE_NEGATIVE_INF), 0) ) + value = t%value ! The largest value provided by a prior image, + image = t%image ! .. and the index of that image, + +CONTAINS + PURE FUNCTION find_maxloc(lhs,rhs) RESULT(maxloc) + TYPE(tuple), INTENT(IN) :: lhs,rhs + TYPE(tuple) :: maxloc + + maxloc = MERGE(lhs, rhs, lhs%value >= rhs%value) + END FUNCTION find_maxloc +END SUBROUTINE co_prefix_maxloc + + +16.9.?? \ +CO_REDUCE_PREFIX_INCLUSIVE (A, OPERATION [, STAT, ERRMSG]) or +CO_REDUCE_PREFIX_INCLUSIVE (A, OPERATION, COMPLETION \ + [, STAT, ERRMSG]) or +CO_REDUCE_PREFIX_INCLUSIVE (A, OPERATION, TEAM \ + [, STAT, ERRMSG]) or +CO_REDUCE_PREFIX_INCLUSIVE (A, OPERATION, TEAM, COMPLETION \ + [, STAT, ERRMSG]) + +<> Generalized inclusive prefix reduction across images. + +<> Collective subroutine. + +<> + +A shall not be polymorphic. It shall not be of a type with an ultimate + component that is allocatable or a pointer. It shall have the same shape, + type, and type parameter values, in corresponding references. It shall + not be a coindexed object. It is an INTENT (INOUT) argument. + + If A is scalar, the computed value provided to any given image is the + result of the inclusive prefix reduction operation described below. + If A is an array, each element of the computed value provided to any + given image is equal to the result of the inclusive prefix reduction + operation described below, as applied to corresponding elements of A in + corresponding references. + + The computed value is assigned to A if no error condition occurs. + Otherwise, A becomes undefined. + +OPERATION shall be a pure function with exactly two arguments; the result + and each argument shall be a scalar, nonallocatable, noncoarray, + nonpointer, nonpolymorphic data object with the same type and + type parameter values as A. The arguments shall not be optional. + If one argument has the ASYNCHRONOUS, TARGET, or VALUE attribute, + the other shall have that attribute. OPERATION shall implement a + mathematically associative operation. OPERATION shall be the same + function on all images in corresponding references. + + The computed value for an inclusive prefix reduction over a list of + values is the result of an iterative process. Each scalar input value + provided by image i in the specified team is referred to as A_i. The + corresponding computed result value provided to image i in the specified + team is referred to as R_i. S_i is initially the ordered list [A_1, ..., + A_i]. Each iteration starts with a processor-dependent choice of item x + from the list S_i. Adjacent items x and y (where x precedes y) are + removed from the list and replaced with the value of OPERATION(x, y). + The process terminates when the list has only one item; this is the + computed value of R_i. + +TEAM shall be a scalar of type TEAM_TYPE from the intrinsic module + ISO_FORTRAN_ENV. It is an INTENT (IN) argument. + +COMPLETION shall be a scalar of type COMPLETION_TYPE from the intrinsic + module ISO_FORTRAN_ENV. It is an INTENT (INOUT) argument. + +STAT (optional) shall be a noncoindexed integer scalar with a decimal + exponent range of at least four. It is an INTENT (OUT) argument. + +ERRMSG (optional) shall be a noncoindexed default character scalar. It + is an INTENT (INOUT) argument. + +The semantics of TEAM, COMPLETION, STAT and ERRMSG are described in 16.6. + +<> The subroutine below demonstrates how to use +CO_REDUCE_PREFIX_INCLUSIVE to compute a collective segmented prefix sum. +A segmented prefix sum takes, as input, an ordered list of values and +corresponding list of logicals, and the logicals delineate the various +segments of the prefix sum. For example: + + values: 1 2 4 5 6 7 8 9 + logicals: F F T T T F F T + result: 1 3 4 9 15 7 15 9 + +Note the segmented_sum operation used below is noncommutative. + +SUBROUTINE co_prefix_segment_sum(value, flag) + REAL, INTENT(INOUT) :: value + LOGICAL, INTENT(IN) :: flag + + TYPE :: tuple + REAL :: value + LOGICAL :: flag + END TYPE + TYPE(tuple) :: t + + t = tuple(value, flag) + CALL CO_REDUCE_PREFIX_INCLUSIVE(t, OPERATION=segmented_sum) + value = t%value + +CONTAINS + PURE FUNCTION segmented_sum(lhs,rhs) RESULT(sum) + TYPE(tuple), INTENT(IN) :: lhs,rhs + TYPE(tuple) :: sum + + IF (lhs%flag .eqv. rhs%flag) THEN + sum%value = lhs%value + rhs%value + ELSE + sum%value = rhs%value + END IF + sum%flag = rhs%flag + END FUNCTION segmented_sum +END SUBROUTINE co_prefix_segment_sum + +------------------------------------------------------------------------- +[412:4+] In 16.9 Specifications of the standard intrinsic procedures, +after the specification of CO_SUM, add: + +16.9.?? CO_SUM_PREFIX_EXCLUSIVE (A [, STAT, ERRMSG]) or + CO_SUM_PREFIX_EXCLUSIVE (A, COMPLETION [, STAT, ERRMSG]) or + CO_SUM_PREFIX_EXCLUSIVE (A, TEAM [, STAT, ERRMSG]) or + CO_SUM_PREFIX_EXCLUSIVE (A, TEAM, COMPLETION [, STAT, ERRMSG]) + +<> Compute exclusive prefix sum across images. + +<> Collective subroutine. + +<> + +A shall be of numeric type. It shall have the same shape, type, and + type parameter values, in corresponding references. It shall not be + a coindexed object. It is an INTENT (INOUT) argument. + + The computed value provided to image one in the specified team is equal + to the value zero. If A is scalar, the computed value provided to any + given image i in the specified team (with i greater than one) is equal to + a processor-dependent approximation to the sum of the values of A in + corresponding references provided by images 1 to (i-1) in the specified + team. If A is an array, each element of the computed value provided to + any given image i in the specified team (with i greater than one) is + equal to a processor-dependent approximation to the sum of the values in + corresponding elements of A in corresponding references provided by + images 1 to (i-1) in the specified team. + + The computed value is assigned to A if no error condition occurs. + Otherwise, A becomes undefined. + +TEAM shall be a scalar of type TEAM_TYPE from the intrinsic module + ISO_FORTRAN_ENV. It is an INTENT (IN) argument. + +COMPLETION shall be a scalar of type COMPLETION_TYPE from the intrinsic + module ISO_FORTRAN_ENV. It is an INTENT (INOUT) argument. + +STAT (optional) shall be a noncoindexed integer scalar with a decimal + exponent range of at least four. It is an INTENT (OUT) argument. + +ERRMSG (optional) shall be a noncoindexed default character scalar. It + is an INTENT (INOUT) argument. + +The semantics of TEAM, COMPLETION, STAT and ERRMSG are described in 16.6. + +<> If the number of images in the current team is three and +the value of A is [1, 2] on image one, [3, 4] on image two, and [5, 6] +on image three, after executing the statement CALL +CO_SUM_PREFIX_EXCLUSIVE(A), the value of A is [0, 0] on image one, [1, 2] +on image two, and [4, 6] on image three. + + +16.9.?? CO_SUM_PREFIX_INCLUSIVE (A [, STAT, ERRMSG]) or + CO_SUM_PREFIX_INCLUSIVE (A, COMPLETION [, STAT, ERRMSG]) or + CO_SUM_PREFIX_INCLUSIVE (A, TEAM [, STAT, ERRMSG]) or + CO_SUM_PREFIX_INCLUSIVE (A, TEAM, COMPLETION [, STAT, ERRMSG]) + +<> Compute inclusive prefix sum across images. + +<> Collective subroutine. + +<> + +A shall be of numeric type. It shall have the same shape, type, and + type parameter values, in corresponding references. It shall not be + a coindexed object. It is an INTENT (INOUT) argument. + + If A is scalar, the computed value provided to any given image i in the + specified team is equal to a processor-dependent approximation to the sum + of the values of A in corresponding references provided by images 1 to i + in the specified team. If A is an array, each element of the computed + value provided to any given image i in the specified team is equal to a + processor-dependent approximation to the sum of the values in + corresponding elements of A in corresponding references provided by + images 1 to i in the specified team. + + The computed value is assigned to A if no error condition occurs. + Otherwise, A becomes undefined. + +TEAM shall be a scalar of type TEAM_TYPE from the intrinsic module + ISO_FORTRAN_ENV. It is an INTENT (IN) argument. + +COMPLETION shall be a scalar of type COMPLETION_TYPE from the intrinsic + module ISO_FORTRAN_ENV. It is an INTENT (INOUT) argument. + +STAT (optional) shall be a noncoindexed integer scalar with a decimal + exponent range of at least four. It is an INTENT (OUT) argument. + +ERRMSG (optional) shall be a noncoindexed default character scalar. It + is an INTENT (INOUT) argument. + +The semantics of TEAM, COMPLETION, STAT and ERRMSG are described in 16.6. + +<> If the number of images in the current team is three and +the value of A is [1, 2] on image one, [3, 4] on image two, and [5, 6] +on image three, after executing the statement CALL +CO_SUM_PREFIX_INCLUSIVE(A), the value of A is [1, 2] on image one, [4, 6] +on image two, and [9, 12] on image three. + +------------------------------------------------------------------------- +[596:19-20] In Annex A.2 Processor dependencies, replace the following +line: + +"* the computed value of the intrinsic subroutine CO_REDUCE (16.9.57) and + the intrinsic subroutine CO_SUM (16.9.58);" + +with the following line: + +"* the computed value of the intrinsic subroutines CO_REDUCE (16.9.57), + CO_REDUCE_PREFIX_EXCLUSIVE (16.9.??), CO_REDUCE_PREFIX_INCLUSIVE + (16.9.??), CO_SUM (16.9.58), CO_SUM_PREFIX_EXCLUSIVE (16.9.??) and + CO_SUM_PREFIX_INCLUSIVE (16.9.??);" + +------------------------------------------------------------------------- + +===END=== diff --git a/drafts/coll-edit-notes.txt b/drafts/coll-edit-notes.txt new file mode 100644 index 0000000..4f4858b --- /dev/null +++ b/drafts/coll-edit-notes.txt @@ -0,0 +1,46 @@ +Collective subroutines edits TODO: + +* Find better ways to test the CO_REDUCE_PREFIX examples + +Examples: +EXCLUSIVE + MAXLOC over a derived type of real value and integer image ID + computes max value in prefix and the image that provided it +INCLUSIVE + derived type: value and boolean + illustrates segmented prefix reduction + MPI Example 6.24. + +============================================ + +Pathological example: + +pure function OPERATION(x,y) result(r) + INTEGER :: x, y, r + r = MAX(a,b,THIS_IMAGE()) +end function + +This OPERATION is a pure function as defined in F23 15.7 and 16.1. +It is associative and commutative. +It also satisfies all the other requirements for the OPERATION argument to +CO_REDUCE or CO_REDUCE_PREFIX_* with A of integer type. + +Passing this OPERATION along with the following arguments to either CO_REDUCE +or CO_REDUCE_PREFIX_* will reveal some information about which images evaluated +the OPERATION for any given input element (and for any given image's result). + +A = [0, 0] +INITIAL = 0 + +Other similar OPERATION functions can be crafted over a derived type to reveal +arbitrary information about which images executed the operation and even in +what order. + +Potential resolution for all CO_REDUCE intrinsics: +"OPERATION shall not depend on the value of THIS_IMAGE()." + + +============================================ + + +