Add barrier before nvshmem_team_split_strided to ensure proper initialization
#478
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I'm getting
Move
nvshmem_barrier_all()to execute beforenvshmem_team_split_stridedrather than after. This is required because team split is a collective operation that must be called by all PEs in the parent team, and all PEs must reach this call in a synchronized manner.Without the barrier after nvshmemx_init_attr(), ranks may complete initialization at different times, leading to race conditions where some PEs attempt to split teams before others have finished NVSHMEM initialization. This can cause undefined behavior and incorrect team formation.
The barrier after team split was unnecessary per NVSHMEM documentation: teams are immediately usable after creation without intervening synchronization.