Skip to content

GPU Peer2Peer communication via --num_ranks_in_server argument #60

@siddharth9820

Description

@siddharth9820

I wanted to get a better understanding of the argument --num_ranks_in_servers in the image classification runtime referenced here. I had the following questions:

  1. Should it be equal to the number of GPUs per node?
  2. In the readme for runtime, I do not see any mention of this argument in the instructions to run the framework. From my understanding, assigning the correct value for this argument is important to enable Peer2Peer communication among GPUs using gloo. Otherwise, pipedream switches to a suboptimal communication routine by yanking the data off to the CPU and then sending it via gloo. Please let me know if I should be using this argument to operate the framework in it's most optimal format. If yes, I'd suggest an update of the readme file.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions