-
Notifications
You must be signed in to change notification settings - Fork 115
Open
Description
I wanted to get a better understanding of the argument --num_ranks_in_servers in the image classification runtime referenced here. I had the following questions:
- Should it be equal to the number of GPUs per node?
- In the readme for runtime, I do not see any mention of this argument in the instructions to run the framework. From my understanding, assigning the correct value for this argument is important to enable Peer2Peer communication among GPUs using gloo. Otherwise, pipedream switches to a suboptimal communication routine by yanking the data off to the CPU and then sending it via gloo. Please let me know if I should be using this argument to operate the framework in it's most optimal format. If yes, I'd suggest an update of the readme file.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels