Skip to content

Add hybrid Slurm support to rrun with PMIx-based coordination#844

Draft
pentschev wants to merge 1 commit intorapidsai:mainfrom
pentschev:rrun-slurm-hybrid
Draft

Add hybrid Slurm support to rrun with PMIx-based coordination#844
pentschev wants to merge 1 commit intorapidsai:mainfrom
pentschev:rrun-slurm-hybrid

Conversation

@pentschev
Copy link
Member

This PR adds hybrid Slurm support for rrun, enabling RapidsMPF to run without MPI. This is achieved by reusing SlurmBackend that provides a passthrough mode.

The new hybrid Slurm execution model allows running one task per node, multiple GPUs per task with parent-mediated coordination:

  • Root parent launches rank 0 first to obtain UCXX address, then broadcasts via PMIx to parents in all nodes
  • rrun parents coordinate via PMIx then spawn non-root ranks with the root rank address

The hybrid execution mode requires specifying the number of tasks/ranks (-n) directly to rrun so it knows it should run in hybrid mode and know how many tasks/ranks per node it should launch.

Usage example:

  srun \
      --mpi=pmix \
      --nodes=2 \
      --ntasks-per-node=1 \
      --cpus-per-task=144 \
      --gpus-per-task=4 \
      --gres=gpu:4 \
      rrun -n 4 ./benchmarks/bench_shuffle -C ucxx

Requires #775

@pentschev pentschev self-assigned this Feb 5, 2026
@pentschev pentschev added feature request New feature or request non-breaking Introduces a non-breaking change labels Feb 5, 2026
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 5, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@pentschev
Copy link
Member Author

/ok to test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant