Prep for 5.0.9amzn1 release #13530

bwbarrett · 2025-11-24T22:20:19Z

Release 5.0.9 + the single Fabric/Domain per process patch series (which will be in v5.0.10).

bot:notacherrypick

`requested` in `MPI_Init_thread` would invoke the error handler, even though it is an useful override in some threaded library use cases. Signed-off-by: Aurelien Bouteiller <abouteil@amd.com> (cherry picked from commit 27332fc)

(single,etc) in addition to numeric 0-3 values Signed-off-by: Aurelien Bouteiller <abouteil@amd.com> (cherry picked from commit 3de2489)

…ages Including, but not limited to: * Added much more description of and distinction between the MPI world model and the MPI session model. Updated a lot of old, pre-MPI-world-model/pre-MPI-session-model text that was now stale / outdated, especially in the following pages: * MPI_Init(3), MPI_Init_thread(3) * MPI_Initialized(3) * MPI_Finalize(3) * MPI_Finalized(3) * MPI_Session_init(3) * MPI_Session_finalize(3) * Numerous formatting updates * Slightly improve the C code examples * Describe the mathematical relationship between the various MPI_THREAD_* constants in MPI_Init_thread(3) * Note that the mathematical relationships render nicely in HTML, but don't render entirely properly in nroff. This commit author is of the opinion that the nroff rendering is currently "good enough", and some Sphinx maintainer will fix it someday. * Add descriptions about the $OMPI_MPI_THREAD_LEVEL env variable and how it is used in MPI_Init_thread(3) * Added more seealso links Signed-off-by: Jeff Squyres <jeff@squyres.com> (cherry picked from commit aff3afd)

…it doc. Signed-off-by: Aurelien Bouteiller <abouteil@amd.com>

Signed-off-by: Aurelien Bouteiller <abouteil@amd.com>

Thanks to Ben Menadue for pointing out that ompi_fortran_string_c2f() missed a case to properly terminate the resulting Fortran string when copying from a longer C source string. Signed-off-by: Jeff Squyres <jeff@squyres.com> (cherry picked from commit 694e78a)

Followup to commit 694e78a: Ben Menadue correctly pointed out that < should have been <=. Signed-off-by: Jeff Squyres <jeff@squyres.com> (cherry picked from commit cc03d5b)

The table added in 061f908 (A variety of docs updates:, 2022-09-12) mentioning the different prefixes for Open MPI, PMIx and PRRTE MCA parameters set via environment variables has one too many "R"'s in 'PRRTE_MCA_': the correct prefix is 'PRTE_MCA_'. Fix that, and make it clear that it is not a typo. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> (cherry picked from commit bd9adb4)

…-c2f-string-copy v5.0.x: fortran: fix ompi string c2f where len(fstr) < len(cstr)

…ar-prefix-5.0 v5.0.x: docs/mca.rst: fix MCA environment variable prefix for PRRTE

…level-ignored@v5 v5.0.x: Thread level set from ENV crashes (cherry open-mpi#13211)

PMIx v5.0.9 PRRTE v3.0.12 Signed-off-by: Ralph Castain <rhc@pmix.org>

Check PMIx/PRRTE release branches prior to release

… or failure The MCA_PML_OB1_ADD_ACK_TO_PENDING method creates a mca_pml_ob1_pckt_pending_t to hold an ack to be sent later. This method builds the pending packet then puts it on the mca_pml_ob1.pckt_pending list for later transmission. It does not, however, set the required hdr_size field on the struct. This leads to issues when the packet is later sent because it could contain any value. With some btls this will lead to memory corruption (if the size is not checked against btl_max_send_size) or just allocation failure because the size is too big. In other situations it could lead to a truncated packet being send (if the size previously in hdr_size is smaller than an ack). To fix the issue this commit gets rid of the macro entirely and replaces it with a new inline helper method that does the same thing. This helper uses the existing mca_pml_ob1_add_to_pending helper (which sets hdr_size) to reduce duplicated code. Tested and verified this fixes a critical issue triggered on our hardware. Signed-off-by: Nathan Hjelm <hjelmn@google.com> (cherry picked from commit 48490b9)

…where_pending_packets_can_have_incorrect_header_sizes Fix bug in MCA_PML_OB1_ADD_ACK_TO_PENDING that causes memory overruns…

- Update VERSION file to v5.0.9rc1 with correct date (23 September 2025) - Update NEWS with actual changes from v5.0.8 to v5.0.9rc1 including: * PMIx v5.0.9 and PRRTE v3.0.12 updates * GPFS 5.2.3-0+ support * OFI accelerator memory enhancements * Critical PML OB1 bug fix for memory overruns * Fortran string conversion fixes * Threading improvements * Various documentation and build system fixes Signed-off-by: Tomislav Janjusic <tomislavj@nvidia.com>

Signed-off-by: Mikhail Brinskii <mikhailb@nvidia.com>

Signed-off-by: Sergey Lebedev <sergeyle@nvidia.com> (cherry picked from commit 0caae60)

V5.0.x: v5.0.9rc1

…ocal_id_v5 COLL/UCC: set node local id - v5.0.x

OMPI/MCA/PML/UCX: Set node local id - v5.0.x

In some cases the CUDA install directory contains two libcuda.so and this breaks OMPI CUDA detection. Pick the first of these libraries seems to be a good soltuion for all cases. Signed-off-by: George Bosilca <gbosilca@nvidia.com> (cherry picked from commit c7e27b9)

Signed-off-by: xbw <78337767+xbw22109@users.noreply.github.com> (cherry picked from commit 0999325)

v5.0.x: Only pick one CUDA

Signed-off-by: charlesgwaldman <120225331+charlesgwaldman@users.noreply.github.com> (cherry picked from commit a6b8cd3)

Fix `see-also` errors in the document (v5.0.x)

Use unique, NVIDIA-specific workflow names so that it's easier to identify these workflows on the github dashboard backend. Signed-off-by: Jeff Squyres <jeff@squyres.com> (cherry picked from commit dcac103)

…ia-github-actions NVIDIA github workflows: use unique workflow names (v5.0.x)

…ng-fix Update history.rst (spelling) (v5.0.x)

Signed-off-by: Tomislav Janjusic <tomislavj@nvidia.com>

v5.0.x: prepare v5.0.9rc2 release

Signed-off-by: Kento Hasegawa <hasegawa.kento@fujitsu.com> (cherry picked from commit 4b1b9a9)

…nitialization COLL/UCC: Fix initialization in non-blocking (v5.0.x)

Signed-off-by: Tomislav Janjusic <tomislavj@nvidia.com>

v5.0.9: consolidate news.

Open MPI v5.0.9

Signed-off-by: Jessie Yang <jiaxiyan@amazon.com> (cherry picked from commit f65f900)

Add FI_COMPLETION flag to ensure completion entries are generated for all data transfer operations. Signed-off-by: Jessie Yang <jiaxiyan@amazon.com> (cherry picked from commit 15fe246)

Share the domain between the MTL and BTL layers to reduce the total number of domains created. This helps avoid hitting system resource limits on platforms with high core counts. Instead of having the common code allocate a single domain with the superset of all required capabilities, we attempt to reuse an existing fabric and domain if the providers can support MTL’s and BTL’s different capability sets. This approach allows providers that support domain sharing to reuse resources efficiently while still preserving flexibility. If the providers cannot reuse the fabric and domain due to incompatible requirements, separate domains will be created as before. Signed-off-by: Jessie Yang <jiaxiyan@amazon.com> (cherry picked from commit 69d2737)

Signed-off-by: Brian Barrett <bbarrett@amazon.com>

abouteiller and others added 30 commits July 10, 2025 16:38

OMPI_MPI_THREAD_LEVEL can now take 'multiple' 'MPI_THREAD_MULTIPLE'

6ad6cc8

(single,etc) in addition to numeric 0-3 values Signed-off-by: Aurelien Bouteiller <abouteil@amd.com> (cherry picked from commit 3de2489)

Add missing file MPI_Session_c2f.3.rst referenced from MPI_Session_in…

1bea4f7

…it doc. Signed-off-by: Aurelien Bouteiller <abouteil@amd.com>

Update hardcoded version values in documentation for Init_threads

91a8d34

Signed-off-by: Aurelien Bouteiller <abouteil@amd.com>

fortran: fix off-by-one string copy error

cc72cc7

Followup to commit 694e78a: Ben Menadue correctly pointed out that < should have been <=. Signed-off-by: Jeff Squyres <jeff@squyres.com> (cherry picked from commit cc03d5b)

Merge pull request open-mpi#13382 from jsquyres/pr/v5.0.x/fix-fortran…

0f471d4

…-c2f-string-copy v5.0.x: fortran: fix ompi string c2f where len(fstr) < len(cstr)

Merge pull request open-mpi#13389 from phil-blain/doc-mca-prrte-env-v…

3f02701

…ar-prefix-5.0 v5.0.x: docs/mca.rst: fix MCA environment variable prefix for PRRTE

Merge pull request open-mpi#13329 from abouteiller/bugfix/env-thread-…

30821f8

…level-ignored@v5 v5.0.x: Thread level set from ENV crashes (cherry open-mpi#13211)

Update submodules to latest PMIx/PRRTE releases

b357357

PMIx v5.0.9 PRRTE v3.0.12 Signed-off-by: Ralph Castain <rhc@pmix.org>

Merge pull request open-mpi#13395 from rhc54/cmr50/check

9be4387

Check PMIx/PRRTE release branches prior to release

Merge pull request open-mpi#13409 from hjelmn/v5.0.x_fix_pml_ob1_bug_…

359a19f

…where_pending_packets_can_have_incorrect_header_sizes Fix bug in MCA_PML_OB1_ADD_ACK_TO_PENDING that causes memory overruns…

OMPI/MCA/PML/UCX: Set node local id - v5.0.x

fa0a52b

Signed-off-by: Mikhail Brinskii <mikhailb@nvidia.com>

COLL/UCC: set node local id

e3beb10

Signed-off-by: Sergey Lebedev <sergeyle@nvidia.com> (cherry picked from commit 0caae60)

Merge pull request open-mpi#13412 from janjust/v5.0.x

237cdcc

V5.0.x: v5.0.9rc1

Merge pull request open-mpi#13418 from Sergei-Lebedev/topic/ucc_set_l…

43a7a16

…ocal_id_v5 COLL/UCC: set node local id - v5.0.x

Merge pull request open-mpi#13414 from brminich/ucx/add_local_id_5.0.x

6b190d8

OMPI/MCA/PML/UCX: Set node local id - v5.0.x

Fix see-also errors in the document.

b156808

Signed-off-by: xbw <78337767+xbw22109@users.noreply.github.com> (cherry picked from commit 0999325)

Merge pull request open-mpi#13434 from janjust/v5.0.x

0645ff0

v5.0.x: Only pick one CUDA

Update history.rst (spelling)

314d738

Signed-off-by: charlesgwaldman <120225331+charlesgwaldman@users.noreply.github.com> (cherry picked from commit a6b8cd3)

Merge pull request open-mpi#13441 from jsquyres/pr/v5.0.x/docs-update

552f601

Fix `see-also` errors in the document (v5.0.x)

NVIDIA github workflows: use unique workflow names

8050607

Use unique, NVIDIA-specific workflow names so that it's easier to identify these workflows on the github dashboard backend. Signed-off-by: Jeff Squyres <jeff@squyres.com> (cherry picked from commit dcac103)

Merge pull request open-mpi#13446 from jsquyres/pr/v5.0.x/rename-nvid…

f64e9f7

…ia-github-actions NVIDIA github workflows: use unique workflow names (v5.0.x)

Merge pull request open-mpi#13442 from jsquyres/pr/v5.0.x/docs-spelli…

04d592f

…ng-fix Update history.rst (spelling) (v5.0.x)

v5.0.x: prepare v5.0.9rc2 release

37bf448

Signed-off-by: Tomislav Janjusic <tomislavj@nvidia.com>

janjust and others added 10 commits October 15, 2025 15:06

Merge pull request open-mpi#13448 from janjust/v5.0.x

e982ef6

v5.0.x: prepare v5.0.9rc2 release

COLL/UCC: Fix initialization in non-blocking and persistent

8d88926

Signed-off-by: Kento Hasegawa <hasegawa.kento@fujitsu.com> (cherry picked from commit 4b1b9a9)

Merge pull request open-mpi#13459 from mentOS31/v5.0.x-coll_ucc_fix_i…

c5038b5

…nitialization COLL/UCC: Fix initialization in non-blocking (v5.0.x)

v5.0.9: consolidate news.

6fe1db1

Signed-off-by: Tomislav Janjusic <tomislavj@nvidia.com>

Merge pull request open-mpi#13489 from janjust/v5.0.x

b79100b

v5.0.9: consolidate news.

Merge tag 'v5.0.9' into v5.0.x-aws

2c46af3

Open MPI v5.0.9

btl/ofi: Set domain threading model based on MPI thread support

655d661

Signed-off-by: Jessie Yang <jiaxiyan@amazon.com> (cherry picked from commit f65f900)

btl/ofi: Add FI_COMPLETION flag to tx and rx attributes

84c0e93

Add FI_COMPLETION flag to ensure completion entries are generated for all data transfer operations. Signed-off-by: Jessie Yang <jiaxiyan@amazon.com> (cherry picked from commit 15fe246)

dist: Prep for 5.0.9amzn1 release

22b7e2e

Signed-off-by: Brian Barrett <bbarrett@amazon.com>

bwbarrett requested a review from jiaxiyan November 24, 2025 22:20

github-actions bot added this to the v5.0.8 milestone Nov 24, 2025

open-mpi deleted a comment from github-actions bot Nov 24, 2025

jiaxiyan approved these changes Nov 24, 2025

View reviewed changes

bwbarrett merged commit adf9a96 into open-mpi:v5.0.x-aws Nov 25, 2025
17 of 18 checks passed

bwbarrett deleted the dist/5.0.9amzn1-prep branch November 25, 2025 00:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prep for 5.0.9amzn1 release #13530

Prep for 5.0.9amzn1 release #13530

Uh oh!

bwbarrett commented Nov 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

Prep for 5.0.9amzn1 release #13530

Prep for 5.0.9amzn1 release #13530

Uh oh!

Conversation

bwbarrett commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

bwbarrett commented Nov 24, 2025 •

edited

Loading